Modern Mathematical Statistics with Applications.pdf -...

This preview shows page 1 out of 858 pages.

Unformatted text preview: Springer Texts in Statistics Series Editors: G. Casella S. Fienberg I. Olkin For further volumes: Modern Mathematical Statistics with Applications Second Edition Jay L. Devore California Polytechnic State University Kenneth N. Berk Illinois State University Jay L. Devore California Polytechnic State University Statistics Department San Luis Obispo California USA [email protected] Kenneth N. Berk Illinois State University Department of Mathematics Normal Illinois USA [email protected] Additional material to this book can be downloaded from ISBN 978-1-4614-0390-6 e-ISBN 978-1-4614-0391-3 DOI 10.1007/978-1-4614-0391-3 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2011936004 # Springer Science+Business Media, LLC 2012 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media ( ) To my wife Carol whose continuing support of my writing efforts over the years has made all the difference. To my wife Laura who, as a successful author, is my mentor and role model. About the Authors Jay L. Devore Jay Devore received a B.S. in Engineering Science from the University of California, Berkeley, and a Ph.D. in Statistics from Stanford University. He previously taught at the University of Florida and Oberlin College, and has had visiting positions at Stanford, Harvard, the University of Washington, New York University, and Columbia. He has been at California Polytechnic State University, San Luis Obispo, since 1977, where he was chair of the Department of Statistics for 7 years and recently achieved the exalted status of Professor Emeritus. Jay has previously authored or coauthored five other books, including Probability and Statistics for Engineering and the Sciences, which won a McGuffey Longevity Award from the Text and Academic Authors Association for demonstrated excellence over time. He is a Fellow of the American Statistical Association, has been an associate editor for both the Journal of the American Statistical Association and The American Statistician, and received the Distinguished Teaching Award from Cal Poly in 1991. His recreational interests include reading, playing tennis, traveling, and cooking and eating good food. Kenneth N. Berk Ken Berk has a B.S. in Physics from Carnegie Tech (now Carnegie Mellon) and a Ph.D. in Mathematics from the University of Minnesota. He is Professor Emeritus of Mathematics at Illinois State University and a Fellow of the American Statistical Association. He founded the Software Reviews section of The American Statistician and edited it for 6 years. He served as secretary/treasurer, program chair, and chair of the Statistical Computing Section of the American Statistical Association, and he twice co-chaired the Interface Symposium, the main annual meeting in statistical computing. His published work includes papers on time series, statistical computing, regression analysis, and statistical graphics, as well as the book Data Analysis with Microsoft Excel (with Patrick Carey). vi Contents Preface x 1 Overview and Descriptive Statistics 1 1.1 1.2 1.3 1.4 2 56 Introduction 96 Random Variables 97 Probability Distributions for Discrete Random Variables 101 Expected Values of Discrete Random Variables 112 Moments and Moment Generating Functions 121 The Binomial Probability Distribution 128 Hypergeometric and Negative Binomial Distributions 138 The Poisson Probability Distribution 146 Continuous Random Variables and Probability Distributions 158 4.1 4.2 4.3 4.4 4.5 4.6 4.7 5 Introduction 50 Sample Spaces and Events 51 Axioms, Interpretations, and Properties of Probability Counting Techniques 66 Conditional Probability 74 Independence 84 Discrete Random Variables and Probability Distributions 96 3.1 3.2 3.3 3.4 3.5 3.6 3.7 4 9 Probability 50 2.1 2.2 2.3 2.4 2.5 3 Introduction 1 Populations and Samples 2 Pictorial and Tabular Methods in Descriptive Statistics Measures of Location 24 Measures of Variability 32 Introduction 158 Probability Density Functions and Cumulative Distribution Functions Expected Values and Moment Generating Functions 171 The Normal Distribution 179 The Gamma Distribution and Its Relatives 194 Other Continuous Distributions 202 Probability Plots 210 Transformations of a Random Variable 220 159 Joint Probability Distributions 232 5.1 5.2 5.3 5.4 5.5 Introduction 232 Jointly Distributed Random Variables 233 Expected Values, Covariance, and Correlation Conditional Distributions 253 Transformations of Random Variables 265 Order Statistics 271 245 vii viii Contents 6 Statistics and Sampling Distributions 284 6.1 6.2 6.3 6.4 7 Point Estimation 331 7.1 7.2 7.3 7.4 8 8.5 10.2 10.3 10.4 10.5 10.6 Introduction 484 z Tests and Confidence Intervals for a Difference Between Two Population Means 485 The Two-Sample t Test and Confidence Interval 499 Analysis of Paired Data 509 Inferences About Two Population Proportions 519 Inferences About Two Population Variances 527 Comparisons Using the Bootstrap and Permutation Methods 532 The Analysis of Variance 552 11.1 11.2 11.3 11.4 11.5 12 Introduction 425 Hypotheses and Test Procedures 426 Tests About a Population Mean 436 Tests Concerning a Population Proportion 450 P-Values 456 Some Comments on Selecting a Test Procedure 467 Inferences Based on Two Samples 484 10.1 11 Introduction 382 Basic Properties of Confidence Intervals 383 Large-Sample Confidence Intervals for a Population Mean and Proportion Intervals Based on a Normal Population Distribution 401 Confidence Intervals for the Variance and Standard Deviation of a Normal Population 409 Bootstrap Confidence Intervals 411 Tests of Hypotheses Based on a Single Sample 425 9.1 9.2 9.3 9.4 9.5 10 Introduction 331 General Concepts and Criteria 332 Methods of Point Estimation 350 Sufficiency 361 Information and Efficiency 371 Statistical Intervals Based on a Single Sample 382 8.1 8.2 8.3 8.4 9 Introduction 284 Statistics and Their Distributions 285 The Distribution of the Sample Mean 296 The Mean, Variance, and MGF for Several Variables 306 Distributions Based on a Normal Random Sample 315 Appendix: Proof of the Central Limit Theorem 329 Introduction 552 Single-Factor ANOVA 553 Multiple Comparisons in ANOVA 564 More on Single-Factor ANOVA 572 Two-Factor ANOVA with Kij ¼ 1 582 Two-Factor ANOVA with Kij > 1 597 Regression and Correlation 613 12.1 12.2 12.3 Introduction 613 The Simple Linear and Logistic Regression Models 614 Estimating Model Parameters 624 Inferences About the Regression Coefficient b1 640 391 Contents 12.4 12.5 12.6 12.7 12.8 13 654 Goodness-of-Fit Tests and Categorical Data Analysis 723 13.1 13.2 13.3 14 Inferences Concerning mY x  and the Prediction of Future Y Values Correlation 662 Assessing Model Adequacy 674 Multiple Regression Analysis 682 Regression with Matrices 705 Introduction 723 Goodness-of-Fit Tests When Category Probabilities Are Completely Specified 724 Goodness-of-Fit Tests for Composite Hypotheses 732 Two-Way Contingency Tables 744 Alternative Approaches to Inference 758 14.1 14.2 14.3 14.4 Introduction 758 The Wilcoxon Signed-Rank Test 759 The Wilcoxon Rank-Sum Test 766 Distribution-Free Confidence Intervals 771 Bayesian Methods 776 Appendix Tables 787 A.1 A.2 A.3 A.4 A.5 A.6 A.7 A.8 A.9 A.10 A.11 A.12 A.13 A.14 A.15 A.16 Cumulative Binomial Probabilities 788 Cumulative Poisson Probabilities 790 Standard Normal Curve Areas 792 The Incomplete Gamma Function 794 Critical Values for t Distributions 795 Critical Values for Chi-Squared Distributions 796 t Curve Tail Areas 797 Critical Values for F Distributions 799 Critical Values for Studentized Range Distributions 805 Chi-Squared Curve Tail Areas 806 Critical Values for the Ryan–Joiner Test of Normality 808 Critical Values for the Wilcoxon Signed-Rank Test 809 Critical Values for the Wilcoxon Rank-Sum Test 810 Critical Values for the Wilcoxon Signed-Rank Interval 811 Critical Values for the Wilcoxon Rank-Sum Interval 812 b Curves for t Tests 813 Answers to Odd-Numbered Exercises 814 Index 835 ix Preface Purpose Our objective is to provide a postcalculus introduction to the discipline of statistics that • • • • • Has mathematical integrity and contains some underlying theory. Shows students a broad range of applications involving real data. Is very current in its selection of topics. Illustrates the importance of statistical software. Is accessible to a wide audience, including mathematics and statistics majors (yes, there are a few of the latter), prospective engineers and scientists, and those business and social science majors interested in the quantitative aspects of their disciplines. A number of currently available mathematical statistics texts are heavily oriented toward a rigorous mathematical development of probability and statistics, with much emphasis on theorems, proofs, and derivations. The focus is more on mathematics than on statistical practice. Even when applied material is included, the scenarios are often contrived (many examples and exercises involving dice, coins, cards, widgets, or a comparison of treatment A to treatment B). So in our exposition we have tried to achieve a balance between mathematical foundations and statistical practice. Some may feel discomfort on grounds that because a mathematical statistics course has traditionally been a feeder into graduate programs in statistics, students coming out of such a course must be well prepared for that path. But that view presumes that the mathematics will provide the hook to get students interested in our discipline. This may happen for a few mathematics majors. However, our experience is that the application of statistics to real-world problems is far more persuasive in getting quantitatively oriented students to pursue a career or take further coursework in statistics. Let’s first draw them in with intriguing problem scenarios and applications. Opportunities for exposing them to mathematical foundations will follow in due course. We believe it is more important for students coming out of this course to be able to carry out and interpret the results of a two-sample t test or simple regression analysis than to manipulate joint moment generating functions or discourse on various modes of convergence. Content The book certainly does include core material in probability (Chapter 2), random variables and their distributions (Chapters 3–5), and sampling theory (Chapter 6). But our desire to balance theory with application/data analysis is reflected in the way the book starts out, with a chapter on descriptive and exploratory statistical x Preface xi techniques rather than an immediate foray into the axioms of probability and their consequences. After the distributional infrastructure is in place, the remaining statistical chapters cover the basics of inference. In addition to introducing core ideas from estimation and hypothesis testing (Chapters 7–10), there is emphasis on checking assumptions and examining the data prior to formal analysis. Modern topics such as bootstrapping, permutation tests, residual analysis, and logistic regression are included. Our treatment of regression, analysis of variance, and categorical data analysis (Chapters 11–13) is definitely more oriented to dealing with real data than with theoretical properties of models. We also show many examples of output from commonly used statistical software packages, something noticeably absent in most other books pitched at this audience and level. Mathematical Level The challenge for students at this level should lie with mastery of statistical concepts as well as with mathematical wizardry. Consequently, the mathematical prerequisites and demands are reasonably modest. Mathematical sophistication and quantitative reasoning ability are, of course, crucial to the enterprise. Students with a solid grounding in univariate calculus and some exposure to multivariate calculus should feel comfortable with what we are asking of them. The several sections where matrix algebra appears (transformations in Chapter 5 and the matrix approach to regression in the last section of Chapter 12) can easily be deemphasized or skipped entirely. Our goal is to redress the balance between mathematics and statistics by putting more emphasis on the latter. The concepts, arguments, and notation contained herein will certainly stretch the intellects of many students. And a solid mastery of the material will be required in order for them to solve many of the roughly 1,300 exercises included in the book. Proofs and derivations are included where appropriate, but we think it likely that obtaining a conceptual understanding of the statistical enterprise will be the major challenge for readers. Recommended Coverage There should be more than enough material in our book for a year-long course. Those wanting to emphasize some of the more theoretical aspects of the subject (e.g., moment generating functions, conditional expectation, transformations, order statistics, sufficiency) should plan to spend correspondingly less time on inferential methodology in the latter part of the book. We have opted not to mark certain sections as optional, preferring instead to rely on the experience and tastes of individual instructors in deciding what should be presented. We would also like to think that students could be asked to read an occasional subsection or even section on their own and then work exercises to demonstrate understanding, so that not everything would need to be presented in class. Remember that there is never enough time in a course of any duration to teach students all that we’d like them to know! Acknowledgments We gratefully acknowledge the plentiful feedback provided by reviewers and colleagues. A special salute goes to Bruce Trumbo for going way beyond his mandate in providing us an incredibly thoughtful review of 40+ pages containing xii Preface many wonderful ideas and pertinent criticisms. Our emphasis on real data would not have come to fruition without help from the many individuals who provided us with data in published sources or in personal communications. We very much appreciate the editorial and production services provided by the folks at Springer, in particular Marc Strauss, Kathryn Schell, and Felix Portnoy. A Final Thought It is our hope that students completing a course taught from this book will feel as passionately about the subject of statistics as we still do after so many years in the profession. Only teachers can really appreciate how gratifying it is to hear from a student after he or she has completed a course that the experience had a positive impact and maybe even affected a career choice. Jay L. Devore Kenneth N. Berk CHAPTER ONE Overview and Descriptive Statistics Introduction Statistical concepts and methods are not only useful but indeed often indispensable in understanding the world around us. They provide ways of gaining new insights into the behavior of many phenomena that you will encounter in your chosen field of specialization. The discipline of statistics teaches us how to make intelligent judgments and informed decisions in the presence of uncertainty and variation. Without uncertainty or variation, there would be little need for statistical methods or statisticians. If the yield of a crop were the same in every field, if all individuals reacted the same way to a drug, if everyone gave the same response to an opinion survey, and so on, then a single observation would reveal all desired information. An interesting example of variation arises in the course of performing emissions testing on motor vehicles. The expense and time requirements of the Federal Test Procedure (FTP) preclude its widespread use in vehicle inspection programs. As a result, many agencies have developed less costly and quicker tests, which it is hoped replicate FTP results. According to the journal article “Motor Vehicle Emissions Variability” (J. Air Waste Manage. Assoc., 1996: 667–675), the acceptance of the FTP as a gold standard has led to the widespread belief that repeated measurements on the same vehicle would yield identical (or nearly identical) results. The authors of the article applied the FTP to seven vehicles characterized as “high emitters.” Here are the results of four hydrocarbon and carbon dioxide tests on one such vehicle: HC (g/mile) CO (g/mile) 13.8 118 18.3 149 32.2 232 32.5 236 J.L. Devore and K.N. Berk, Modern Mathematical Statistics with Applications, Springer Texts in Statistics, DOI 10.1007/978-1-4614-0391-3_1, # Springer Science+Business Media, LLC 2012 1 2 CHAPTER 1 Overview and Descriptive Statistics The substantial variation in both the HC and CO measurements casts considerable doubt on conventional wisdom and makes it much more difficult to make precise assessments about emissions levels. How can statistical techniques be used to gather information and draw conclusions? Suppose, for example, that a biochemist has developed a medication for relieving headaches. If this medication is given to different individuals, variation in conditions and in the people themselves will result in more substantial relief for some individuals than for others. Methods of statistical analysis could be used on data from such an experiment to determine on the average how much relief to expect. Alternatively, suppose the biochemist has developed a headache medication in the belief that it will be superior to the currently best medication. A comparative experiment could be carried out to investigate this issue by giving the current medication to some headache sufferers and the new medication to others. This must be done with care lest the wrong conclusion emerge. For example, perhaps really the two medications are equally effective. However, the new medication may be applied to people who have less severe headaches and have less stressful lives. The investigator would then likely observe a difference between the two medications attributable not to the medications themselves, but to a poor choice of test groups. Statistics offers not only methods for analyzing the results of experiments once they have been carried out but also suggestions for how experiments can be performed in an efficient manner to lessen the effects of variation and have a better chance of producing correct conclusions. 1.1 Populations and Samples We are constantly exposed to collections of facts, or data, both in our professional capacities and in everyday activities. The discipline of statistics provides methods for organizing and summarizing data and for drawing conclusions based on information contained in the data. An investigation will typically focus on a well-defined collection of objects constituting a population of interest. In one study, the population might consist of all gelatin capsules of a particular type produced during a specified period. Another investigation might involve the population consisting of all individuals who received a B.S. in mathematics during the most recent academic year. When desired information is available for all objects in the population, we have what is called a census. Constraints on time, money, and other scarce resources usually make a census impractical or infeasible. Instead, a subset of the population...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture