Duxbury.Press.Modern.Mathematical.Statistics.with.Applications.0534404731.pdf

This preview shows page 1 out of 850 pages.

Unformatted text preview: Modern Mathematical Statistics with Applications Jay L. Devore California Polytechnic State University Kenneth N. Berk Illinois State University Australia ¥ Canada ¥ Mexico ¥ Singapore ¥ Spain ¥ United Kingdom ¥ United States Modern Mathematical Statistics with Applications Jay L. Devore and Kenneth N. Berk Acquisitions Editor: Carolyn Crockett Editorial Assistant: Daniel Geller Technology Project Manager: Fiona Chong Senior Assistant Editor: Ann Day Marketing Manager: Joseph Rogove Marketing Assistant: Brian Smith Marketing Communications Manager: Darlene Amidon-Brent Manager, Editorial Production: Kelsey McGee Creative Director: Rob Hugel Art Director: Lee Friedman Print Buyer: Rebecca Cross Permissions Editor: Joohee Lee Production Service and Composition: G&S Book Services Text Designer: Carolyn Deacy Copy Editor: Anita Wagner Cover Designer: Eric Adigard Cover Image: Carl Russo Cover Printer: Phoenix Color Corp Printer: RR Donnelley-Crawfordsville ' 2007 Duxbury, an imprint of Thomson Brooks/Cole, a part of The Thomson Corporation. Thomson, the Star logo, and Brooks/Cole are trademarks used herein under license. Thomson Higher Education 10 Davis Drive Belmont, CA 94002-3098 USA ALL RIGHTS RESERVED. No part of this work covered by the copyright hereon may be reproduced or used in any form or by any means graphic, electronic, or mechanical, including photocopying, recording, taping, web distribution, information storage and retrieval systems, or in any other manner without the written permission of the publisher . Printed in the United States of America 1 2 3 4 5 6 7 09 08 07 06 05 For more information about our products, contact us at: Thomson Learning Academic Resource Center 1-800-423-0563 For permission to use material from this text or product, submit a request online at . Any additional questions about permissions can be submitted by e-mail to [email protected] Library of Congress Control Number: 2005929405 ISBN 0-534-40473-1 To my wife Carol whose continuing support of my writing efforts over the years has made all the difference. To my wife Laura who, as a successful author, is my mentor and role model. About the Authors Jay L. Devore Jay Devore received a B.S. in Engineering Science from the University of California, Berkeley, and a Ph.D. in Statistics from Stanford University. He previously taught at the University of Florida and Oberlin College, and has had visiting positions at Stanford, Harvard, the University of Washington, and New York University. He has been at California Polytechnic State University, San Luis Obispo, since 1977, where he is currently a professor and chair of the Department of Statistics. Jay has previously authored v e other books, including Probability and Statistics for Engineering and the Sciences, currently in its 6th edition. He is a Fellow of the American Statistical Association, an associate editor for the Journal of the American Statistical Association, and received the Distinguished Teaching Award from Cal Poly in 1991. His recreational interests include reading, playing tennis, traveling, and cooking and eating good food. Kenneth N. Berk Ken Berk has a B.S. in Physics from Carnegie Tech (now Carnegie Mellon) and a Ph.D. in Mathematics from the University of Minnesota. He is Professor Emeritus of Mathematics at Illinois State University and a Fellow of the American Statistical Association. He founded the Software Reviews section of The American Statistician and edited it for six years. He served as secretary/treasurer, program chair, and chair of the Statistical Computing Section of the American Statistical Association, and he twice co-chaired the Interface Symposium, the main annual meeting in statistical computing. His published work includes papers on time series, statistical computing, regression analysis, and statistical graphics and the book Data Analysis with Microsoft Excel (with Patrick Carey). iii Brief Contents 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Overview and Descriptive Statistics 1 Probability 49 Discrete Random Variables and Probability Distributions 94 Continuous Random Variables and Probability Distributions 154 Joint Probability Distributions 229 Statistics and Sampling Distributions 278 Point Estimation 325iv Statistical Intervals Based on a Single Sample 375 Tests of Hypotheses Based on a Single Sample 417 Inferences Based on Two Samples 472 The Analysis of Variance 539 Regression and Correlation 599 Goodness-of-Fit Tests and Categorical Data Analysis 707 Alternative Approaches to Inference 743 Appendix Tables 781 Answers to Odd-Numbered Exercises 809 Index 829 iv Contents Preface viii 1 Overview and Descriptive Statistics 1 1.1 1.2 1.3 1.4 2 56 Introduction 94 Random Variables 95 Probability Distributions for Discrete Random Variables 99 Expected Values of Discrete Random Variables 109 Moments and Moment Generating Functions 118 The Binomial Probability Distribution 125 *Hypergeometric and Negative Binomial Distributions 134 *The Poisson Probability Distribution 142 Continuous Random Variables and Probability Distributions 154 4.1 4.2 4.3 4.4 4.5 4.6 4.7 5 Introduction 49 Sample Spaces and Events 50 Axioms, Interpretations, and Properties of Probability Counting Techniques 65 Conditional Probability 73 Independence 83 Discrete Random Variables and Probability Distributions 94 3.1 3.2 3.3 3.4 3.5 3.6 3.7 4 9 Probability 49 2.1 2.2 2.3 2.4 2.5 3 Introduction 1 Populations and Samples 2 Pictorial and Tabular Methods in Descriptive Statistics Measures of Location 25 Measures of Variability 33 Introduction 154 Probability Density Functions and Cumulative Distribution Functions Expected Values and Moment Generating Functions 167 The Normal Distribution 175 *The Gamma Distribution and Its Relatives 190 *Other Continuous Distributions 198 *Probability Plots 206 *Transformations of a Random Variable 216 155 Joint Probability Distributions 229 Introduction 229 5.1 Jointly Distributed Random Variables 230 5.2 Expected Values, Covariance, and Correlation 242 v vi Contents 5.3 *Conditional Distributions 249 5.4 *Transformations of Random Variables 5.5 *Order Statistics 267 6 Statistics and Sampling Distributions 278 6.1 6.2 6.3 6.4 7 Introduction 278 Statistics and Their Distributions 279 The Distribution of the Sample Mean 291 The Distribution of a Linear Combination 300 Distributions Based on a Normal Random Sample 309 Appendix: Proof of the Central Limit Theorem 323 Point Estimation 7.1 7.2 7.3 7.4 8 262 325 Introduction 325 General Concepts and Criteria 326 *Methods of Point Estimation 344 *Sufficiency 355 *Information and Efficiency 364 Statistical Intervals Based on a Single Sample 375 Introduction 375 Basic Properties of Confidence Intervals 376 Large-Sample Confidence Intervals for a Population Mean and Proportion Intervals Based on a Normal Population Distribution 393 *Confidence Intervals for the Variance and Standard Deviation of a Normal Population 401 8.5 *Bootstrap Confidence Intervals 404 8.1 8.2 8.3 8.4 9 Tests of Hypotheses Based on a Single Sample 417 9.1 9.2 9.3 9.4 9.5 10 Introduction 417 Hypotheses and Test Procedures 418 Tests About a Population Mean 428 Tests Concerning a Population Proportion 442 P-Values 448 *Some Comments on Selecting a Test Procedure 456 Inferences Based on Two Samples 472 Introduction 472 10.1 z Tests and Confidence Intervals for a Difference Between Two Population Means 473 10.2 The Two-Sample t Test and Confidence Interval 487 10.3 Analysis of Paired Data 497 10.4 Inferences About Two Population Proportions 507 10.5 *Inferences About Two Population Variances 515 10.6 *Comparisons Using the Bootstrap and Permutation Methods 520 11 The Analysis of Variance 539 Introduction 539 11.1 Single-Factor ANOVA 540 11.2 *Multiple Comparisons in ANOVA 552 11.3 *More on Single-Factor ANOVA 560 385 Contents 11.4 *Two-Factor ANOVA with Kij  1 570 11.5 *Two-Factor ANOVA with Kij > 1 584 12 Regression and Correlation 599 12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 13 Introduction 599 The Simple Linear and Logistic Regression Models 600 Estimating Model Parameters 611 Inferences About the Regression Coefficient b1 626 Inferences Concerning mY # x* and the Prediction of Future Y Values Correlation 648 *Aptness of the Model and Model Checking 660 *Multiple Regression Analysis 668 *Regression with Matrices 689 640 Goodness-of-Fit Tests and Categorical Data Analysis 707 Introduction 707 13.1 Goodness-of-Fit Tests When Category Probabilities Are Completely Specified 13.2 *Goodness-of-Fit Tests for Composite Hypotheses 716 13.3 Two-Way Contingency Tables 729 14 Alternative Approaches to Inference 743 14.1 14.2 14.3 14.4 14.5 Introduction 743 *The Wilcoxon Signed-Rank Test 744 *The Wilcoxon Rank-Sum Test 752 *Distribution-Free Confidence Intervals 757 *Bayesian Methods 762 *Sequential Methods 770 Appendix Tables 781 A.1 A.2 A.3 A.4 A.5 A.6 A.7 A.8 A.9 A.10 A.11 A.12 A.13 A.14 A.15 A.16 A.17 Cumulative Binomial Probabilities 782 Cumulative Poisson Probabilities 784 Standard Normal Curve Areas 786 The Incomplete Gamma Function 788 Critical Values for t Distributions 789 Tolerance Critical Values for Normal Population Distributions Critical Values for Chi-Squared Distributions 791 t Curve Tail Areas 792 Critical Values for F Distributions 794 Critical Values for Studentized Range Distributions 800 Chi-Squared Curve Tail Areas 801 Critical Values for the Ryan–Joiner Test of Normality 803 Critical Values for the Wilcoxon Signed-Rank Test 804 Critical Values for the Wilcoxon Rank-Sum Test 805 Critical Values for the Wilcoxon Signed-Rank Interval 806 Critical Values for the Wilcoxon Rank-Sum Interval 807 b Curves for t Tests 808 Answers to Odd-Numbered Exercises 809 Index 829 790 708 vii Preface Purpose Our objective is to provide a postcalculus introduction to the discipline of statistics that ¥ ¥ ¥ ¥ ¥ Has mathematical integrity and contains some underlying theory. Shows students a broad range of applications involving real data. Is very current in its selection of topics. Illustrates the importance of statistical software. Is accessible to a wide audience, including mathematics and statistics majors (yes, there are a few of the latter), prospective engineers and scientists, and those business and social science majors interested in the quantitative aspects of their disciplines. A number of currently available mathematical statistics texts are heavily oriented toward a rigorous mathematical development of probability and statistics, with much emphasis on theorems, proofs, and derivations. The emphasis is more on mathematics than on statistical practice. Even when applied material is included, the scenarios are often contrived (many examples and exercises involving dice, coins, cards, widgets, or a comparison of treatment A to treatment B). So in our exposition we have tried to achieve a balance between mathematical foundations and statistical practice. Some may feel discomfort on grounds that because a mathematical statistics course has traditionally been a feeder into graduate programs in statistics, students coming out of such a course must be well prepared for that path. But that view presumes that the mathematics will provide the hook to get students interested in our discipline. That may happen for a few mathematics majors. However, our experience is that the application of statistics to real-world problems is far more persuasive in getting quantitatively oriented students to pursue a career or take further coursework in statistics. Let s rst dra w them in with intriguing problem scenarios and applications. Opportunities for exposing them to mathematical foundations will follow in due course. In our view it is more important for students coming out of this course to be able to carry out and interpret the results of a two-sample t test or simple regression analysis than to manipulate joint moment generating functions or discourse on various modes of convergence. Content The book certainly does include core material in probability (Chapter 2), random variables and their distributions (Chapters 3—5),and sampling theory (Chapter 6). But our desire to balance theory with application/data analysis is re ected in the w ay the book starts out, with a chapter on descriptive and exploratory statistical techniques rather than an immediate foray into the axioms of probability and their consequences. After viii Preface ix the distributional infrastructure is in place, the remaining statistical chapters cover the basics of inference. In addition to introducing core ideas from estimation and hypothesis testing (Chapters 7—10),there is emphasis on checking assumptions and looking at the data prior to formal analysis. Modern topics such as bootstrapping, permutation tests, residual analysis, and logistic regression are included. Our treatment of regression, analysis of variance, and categorical data analysis (Chapters 11—13) is de nitely more oriented to dealing with real data than with theoretical properties of models. We also show many examples of output from commonly used statistical software packages, something noticeably absent in most other books pitched at this audience and level. (Figures 10.1 and 11.14 have been reproduced here for illustrative purposes.) For example, the rst section on multiple re gression toward the end of Chapter 12 uses no matrix algebra but instead relies on output from software as a basis for making inferences. 40 Interaction Plot(data means)for vibration 30 Source 1 2 3 4 5 Final 17 16 15 14 20 Source 13 17 10 * * * * 16 15 14 Material A P S Material 13 0 Control Exper Figure 10.1 1 2 3 4 5 A P S Figure 11.14 Mathematical Level The challenge for students at this level should lie with mastery of statistical concepts as well as with mathematical wizardry. Consequently, the mathematical prerequisites and demands are reasonably modest. Mathematical sophistication and quantitative reasoning ability are, of course, crucial to the enterprise. Students with a solid grounding in univariate calculus and some exposure to multivariate calculus should feel comfortable with what we are asking of them. The several sections where matrix algebra appears (transformations in Chapter 5 and the matrix approach to regression in the last section of Chapter 12) can easily be deemphasized or skipped entirely. Our goal is to redress the balance between mathematics and statistics by putting more emphasis on the latter. The concepts, arguments, and notation contained herein will certainly stretch the intellects of many students. And a solid mastery of the material will be required in order for them to solve many of the roughly 1300 exercises included in the book. Proofs and derivations are included where appropriate, but we think it likely that obtaining a conceptual understanding of the statistical enterprise will be the major challenge for readers. x Preface Recommended Coverage There should be more than enough material in our book for a year-long course. Those wanting to emphasize some of the more theoretical aspects of the subject (e.g., moment generating functions, conditional expectation, transformations, order statistics, suf cienc y) should plan to spend correspondingly less time on inferential methodology in the latter part of the book. We have tried to help instructors by marking certain sections as optional (using an *). Optional is not synonymous with unimportant ; an * is just an indication that what comes afterward makes at most minimal use of what is contained in a section so marked. Other than that, we prefer to rely on the experience and tastes of individual instructors in deciding what should be presented. We would also like to think that students could be asked to read an occasional subsection or even section on their own and then work exercises to demonstrate understanding, so that not everything would need to be presented in class. Remember that there is never enough time in a course of any duration to teach students all that we d like them to know! Acknowledgments We gratefully acknowledge the plentiful feedback provided by the following reviewers: Bhaskar Bhattacharya, Southern Illinois University; Ann Gironella, Idaho State University; Tiefeng Jiang, University of Minnesota; Iwan Praton, Franklin & Marshall College; and Bruce Trumbo, California State University, East Bay. A special salute goes to Bruce Trumbo for going way beyond his mandate in providing us an incredibly thoughtful review of 40+ pages containing many wonderful ideas and pertinent criticisms. Matt Carlton, a Cal Poly colleague of one of the authors, has provided stellar service as an accuracy checker, and has also prepared a solutions manual. Our emphasis on real data would not have come to fruition without help from the many individuals who provided us with data in published sources or in personal communications; we greatly appreciate all their contributions. We very much appreciate the production services provided by the folks at G&S Book Services. Our production editor, Gretchen Otto, did a rst-rate job of mo ving the book through the production process, and was always prompt and considerate in her communications with us. Thanks to our copy editor, Anita Wagner, for employing a light touch and not taking us too much to task for our occasional grammatical and technical lapses. The staff at Brooks/Cole—Duxb ury has been as supportive on this project as on ones with which we have previously been involved. Special kudos go to Carolyn Crockett, Ann Day, Dan Geller, and Kelsey McGee, and apologies to any whose names were inadvertently omitted from this list. A Final Thought It is our hope that students completing a course taught from this book will feel as passionately about the subject of statistics as we still do after so many years in the profession. Only teachers can really appreciate how gratifying it is to hear from a student after he or she has completed a course that the experience had a positive impact and maybe even affected a career choice. Jay Devore Ken Berk CHAPTER ONE Overview and Descriptive Statistics Introduction Statistical concepts and methods are not only useful but indeed often indispensable in understanding the world around us. They provide ways of gaining new insights into the behavior of many phenomena that you will encounter in your chosen field of specialization. The discipline of statistics teaches us how to make intelligent judgments and informed decisions in the presence of uncertainty and variation. Without uncertainty or variation, there would be little need for statistical methods or statisticians. If the yield of a crop were the same in every field, if all individuals reacted the same way to a drug, if everyone gave the same response to an opinion survey, and so on, then a single observation would reveal all desired information. An interesting example of variation arises in the course of performing emissions testing on motor vehicles. The expense and time requirements of the Federal Test Procedure (FTP) preclude its widespread use in vehicle inspection programs. As a result, many agencies have developed less costly and quicker tests, which it is hoped replicate FTP results. According to the journal article “Motor Vehicle Emissions Variability” (J. Air Waste Manag. Assoc., 1996: 667–675), the acceptance of the FTP as a gold standard has led to the widespread belief that repeated measurements on the same vehicle would yield identical (or nearly identical) results. The authors of the article applied the FTP to seven vehicles 1 2 CHAPTER 1 Overview and Descriptive Statistics characterized as “high emitters.” Here are the results of four hydrocarbon and carbon dioxide tests on one such vehicle: HC (g/mile) CO (g/mile) 13.8 118 18.3 149 32.2 232 32.5 236 The substantial variation in both the HC and CO measurements casts considerable doubt on conventional wisdom and makes it much more difficult to make precise assessments about emissions levels. Ho...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture