**Unformatted text preview: **Springer Texts
in Statistics
Series Editors:
G. Casella
S. Fienberg
I. Olkin For further volumes:
Modern
Mathematical
Statistics with
Applications
Second Edition Jay L. Devore
California Polytechnic State University Kenneth N. Berk
Illinois State University Jay L. Devore
California Polytechnic State University
Statistics Department
San Luis Obispo California
USA
[email protected] Kenneth N. Berk
Illinois State University
Department of Mathematics
Normal Illinois
USA
[email protected] Additional material to this book can be downloaded from ISBN 978-1-4614-0390-6
e-ISBN 978-1-4614-0391-3
DOI 10.1007/978-1-4614-0391-3
Springer New York Dordrecht Heidelberg London
Library of Congress Control Number: 2011936004
# Springer Science+Business Media, LLC 2012
All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher
(Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in
connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval,
electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is
forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such,
is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
Printed on acid-free paper
Springer is part of Springer Science+Business Media ( ) To my wife Carol
whose continuing support of my writing efforts
over the years has made all the difference. To my wife Laura
who, as a successful author, is my mentor and role model. About the Authors Jay L. Devore
Jay Devore received a B.S. in Engineering Science from the University of
California, Berkeley, and a Ph.D. in Statistics from Stanford University. He previously taught at the University of Florida and Oberlin College, and has had visiting
positions at Stanford, Harvard, the University of Washington, New York University, and Columbia. He has been at California Polytechnic State University,
San Luis Obispo, since 1977, where he was chair of the Department of Statistics
for 7 years and recently achieved the exalted status of Professor Emeritus.
Jay has previously authored or coauthored five other books, including Probability and Statistics for Engineering and the Sciences, which won a McGuffey
Longevity Award from the Text and Academic Authors Association for demonstrated excellence over time. He is a Fellow of the American Statistical Association, has been an associate editor for both the Journal of the American Statistical
Association and The American Statistician, and received the Distinguished Teaching Award from Cal Poly in 1991. His recreational interests include reading,
playing tennis, traveling, and cooking and eating good food. Kenneth N. Berk
Ken Berk has a B.S. in Physics from Carnegie Tech (now Carnegie Mellon) and a
Ph.D. in Mathematics from the University of Minnesota. He is Professor Emeritus
of Mathematics at Illinois State University and a Fellow of the American Statistical
Association. He founded the Software Reviews section of The American Statistician and edited it for 6 years. He served as secretary/treasurer, program chair, and
chair of the Statistical Computing Section of the American Statistical Association,
and he twice co-chaired the Interface Symposium, the main annual meeting in
statistical computing. His published work includes papers on time series, statistical
computing, regression analysis, and statistical graphics, as well as the book Data
Analysis with Microsoft Excel (with Patrick Carey). vi Contents
Preface x
1 Overview and Descriptive Statistics 1
1.1
1.2
1.3
1.4 2 56 Introduction 96
Random Variables 97
Probability Distributions for Discrete Random Variables 101
Expected Values of Discrete Random Variables 112
Moments and Moment Generating Functions 121
The Binomial Probability Distribution 128
Hypergeometric and Negative Binomial Distributions 138
The Poisson Probability Distribution 146 Continuous Random Variables and Probability Distributions 158
4.1
4.2
4.3
4.4
4.5
4.6
4.7 5 Introduction 50
Sample Spaces and Events 51
Axioms, Interpretations, and Properties of Probability
Counting Techniques 66
Conditional Probability 74
Independence 84 Discrete Random Variables and Probability Distributions 96
3.1
3.2
3.3
3.4
3.5
3.6
3.7 4 9 Probability 50
2.1
2.2
2.3
2.4
2.5 3 Introduction 1
Populations and Samples 2
Pictorial and Tabular Methods in Descriptive Statistics
Measures of Location 24
Measures of Variability 32 Introduction 158
Probability Density Functions and Cumulative Distribution Functions
Expected Values and Moment Generating Functions 171
The Normal Distribution 179
The Gamma Distribution and Its Relatives 194
Other Continuous Distributions 202
Probability Plots 210
Transformations of a Random Variable 220 159 Joint Probability Distributions 232
5.1
5.2
5.3
5.4
5.5 Introduction 232
Jointly Distributed Random Variables 233
Expected Values, Covariance, and Correlation
Conditional Distributions 253
Transformations of Random Variables 265
Order Statistics 271 245 vii viii Contents 6 Statistics and Sampling Distributions 284
6.1
6.2
6.3
6.4 7 Point Estimation 331
7.1
7.2
7.3
7.4 8 8.5 10.2
10.3
10.4
10.5
10.6 Introduction 484
z Tests and Confidence Intervals for a Difference Between Two
Population Means 485
The Two-Sample t Test and Confidence Interval 499
Analysis of Paired Data 509
Inferences About Two Population Proportions 519
Inferences About Two Population Variances 527
Comparisons Using the Bootstrap and Permutation Methods 532 The Analysis of Variance 552
11.1
11.2
11.3
11.4
11.5 12 Introduction 425
Hypotheses and Test Procedures 426
Tests About a Population Mean 436
Tests Concerning a Population Proportion 450
P-Values 456
Some Comments on Selecting a Test Procedure 467 Inferences Based on Two Samples 484
10.1 11 Introduction 382
Basic Properties of Confidence Intervals 383
Large-Sample Confidence Intervals for a Population Mean and Proportion
Intervals Based on a Normal Population Distribution 401
Confidence Intervals for the Variance and Standard Deviation of a Normal
Population 409
Bootstrap Confidence Intervals 411 Tests of Hypotheses Based on a Single Sample 425
9.1
9.2
9.3
9.4
9.5 10 Introduction 331
General Concepts and Criteria 332
Methods of Point Estimation 350
Sufficiency 361
Information and Efficiency 371 Statistical Intervals Based on a Single Sample 382
8.1
8.2
8.3
8.4 9 Introduction 284
Statistics and Their Distributions 285
The Distribution of the Sample Mean 296
The Mean, Variance, and MGF for Several Variables 306
Distributions Based on a Normal Random Sample 315
Appendix: Proof of the Central Limit Theorem 329 Introduction 552
Single-Factor ANOVA 553
Multiple Comparisons in ANOVA 564
More on Single-Factor ANOVA 572
Two-Factor ANOVA with Kij ¼ 1 582
Two-Factor ANOVA with Kij > 1 597 Regression and Correlation 613
12.1
12.2
12.3 Introduction 613
The Simple Linear and Logistic Regression Models 614
Estimating Model Parameters 624
Inferences About the Regression Coefficient b1 640 391 Contents 12.4
12.5
12.6
12.7
12.8 13 654 Goodness-of-Fit Tests and Categorical Data Analysis 723
13.1
13.2
13.3 14 Inferences Concerning mY x and the Prediction of Future Y Values
Correlation 662
Assessing Model Adequacy 674
Multiple Regression Analysis 682
Regression with Matrices 705 Introduction 723
Goodness-of-Fit Tests When Category Probabilities
Are Completely Specified 724
Goodness-of-Fit Tests for Composite Hypotheses 732
Two-Way Contingency Tables 744 Alternative Approaches to Inference 758
14.1
14.2
14.3
14.4 Introduction 758
The Wilcoxon Signed-Rank Test 759
The Wilcoxon Rank-Sum Test 766
Distribution-Free Confidence Intervals 771
Bayesian Methods 776 Appendix Tables 787
A.1
A.2
A.3
A.4
A.5
A.6
A.7
A.8
A.9
A.10
A.11
A.12
A.13
A.14
A.15
A.16 Cumulative Binomial Probabilities 788
Cumulative Poisson Probabilities 790
Standard Normal Curve Areas 792
The Incomplete Gamma Function 794
Critical Values for t Distributions 795
Critical Values for Chi-Squared Distributions 796
t Curve Tail Areas 797
Critical Values for F Distributions 799
Critical Values for Studentized Range Distributions 805
Chi-Squared Curve Tail Areas 806
Critical Values for the Ryan–Joiner Test of Normality 808
Critical Values for the Wilcoxon Signed-Rank Test 809
Critical Values for the Wilcoxon Rank-Sum Test 810
Critical Values for the Wilcoxon Signed-Rank Interval 811
Critical Values for the Wilcoxon Rank-Sum Interval 812
b Curves for t Tests 813 Answers to Odd-Numbered Exercises 814
Index 835 ix Preface
Purpose
Our objective is to provide a postcalculus introduction to the discipline of statistics
that
•
•
•
•
• Has mathematical integrity and contains some underlying theory.
Shows students a broad range of applications involving real data.
Is very current in its selection of topics.
Illustrates the importance of statistical software.
Is accessible to a wide audience, including mathematics and statistics majors
(yes, there are a few of the latter), prospective engineers and scientists, and those
business and social science majors interested in the quantitative aspects of their
disciplines. A number of currently available mathematical statistics texts are heavily
oriented toward a rigorous mathematical development of probability and statistics,
with much emphasis on theorems, proofs, and derivations. The focus is more on
mathematics than on statistical practice. Even when applied material is included,
the scenarios are often contrived (many examples and exercises involving dice,
coins, cards, widgets, or a comparison of treatment A to treatment B).
So in our exposition we have tried to achieve a balance between mathematical foundations and statistical practice. Some may feel discomfort on grounds that
because a mathematical statistics course has traditionally been a feeder into graduate programs in statistics, students coming out of such a course must be well
prepared for that path. But that view presumes that the mathematics will provide
the hook to get students interested in our discipline. This may happen for a few
mathematics majors. However, our experience is that the application of statistics to
real-world problems is far more persuasive in getting quantitatively oriented
students to pursue a career or take further coursework in statistics. Let’s first
draw them in with intriguing problem scenarios and applications. Opportunities
for exposing them to mathematical foundations will follow in due course. We
believe it is more important for students coming out of this course to be able to
carry out and interpret the results of a two-sample t test or simple regression
analysis than to manipulate joint moment generating functions or discourse on
various modes of convergence. Content
The book certainly does include core material in probability (Chapter 2), random
variables and their distributions (Chapters 3–5), and sampling theory (Chapter 6).
But our desire to balance theory with application/data analysis is reflected in the
way the book starts out, with a chapter on descriptive and exploratory statistical
x Preface xi techniques rather than an immediate foray into the axioms of probability and their
consequences. After the distributional infrastructure is in place, the remaining
statistical chapters cover the basics of inference. In addition to introducing core
ideas from estimation and hypothesis testing (Chapters 7–10), there is emphasis on
checking assumptions and examining the data prior to formal analysis. Modern
topics such as bootstrapping, permutation tests, residual analysis, and logistic
regression are included. Our treatment of regression, analysis of variance, and
categorical data analysis (Chapters 11–13) is definitely more oriented to dealing
with real data than with theoretical properties of models. We also show many
examples of output from commonly used statistical software packages, something
noticeably absent in most other books pitched at this audience and level. Mathematical Level
The challenge for students at this level should lie with mastery of statistical
concepts as well as with mathematical wizardry. Consequently, the mathematical
prerequisites and demands are reasonably modest. Mathematical sophistication and
quantitative reasoning ability are, of course, crucial to the enterprise. Students with
a solid grounding in univariate calculus and some exposure to multivariate calculus
should feel comfortable with what we are asking of them. The several sections
where matrix algebra appears (transformations in Chapter 5 and the matrix approach
to regression in the last section of Chapter 12) can easily be deemphasized or
skipped entirely.
Our goal is to redress the balance between mathematics and statistics by
putting more emphasis on the latter. The concepts, arguments, and notation
contained herein will certainly stretch the intellects of many students. And a solid
mastery of the material will be required in order for them to solve many of the
roughly 1,300 exercises included in the book. Proofs and derivations are included
where appropriate, but we think it likely that obtaining a conceptual understanding
of the statistical enterprise will be the major challenge for readers. Recommended Coverage
There should be more than enough material in our book for a year-long course.
Those wanting to emphasize some of the more theoretical aspects of the subject
(e.g., moment generating functions, conditional expectation, transformations, order
statistics, sufficiency) should plan to spend correspondingly less time on inferential
methodology in the latter part of the book. We have opted not to mark certain
sections as optional, preferring instead to rely on the experience and tastes of
individual instructors in deciding what should be presented. We would also like
to think that students could be asked to read an occasional subsection or even
section on their own and then work exercises to demonstrate understanding, so that
not everything would need to be presented in class. Remember that there is never
enough time in a course of any duration to teach students all that we’d like them to
know! Acknowledgments
We gratefully acknowledge the plentiful feedback provided by reviewers and
colleagues. A special salute goes to Bruce Trumbo for going way beyond his
mandate in providing us an incredibly thoughtful review of 40+ pages containing xii Preface many wonderful ideas and pertinent criticisms. Our emphasis on real data would
not have come to fruition without help from the many individuals who provided us
with data in published sources or in personal communications. We very much
appreciate the editorial and production services provided by the folks at Springer, in
particular Marc Strauss, Kathryn Schell, and Felix Portnoy. A Final Thought
It is our hope that students completing a course taught from this book will feel as
passionately about the subject of statistics as we still do after so many years in the
profession. Only teachers can really appreciate how gratifying it is to hear from a
student after he or she has completed a course that the experience had a positive
impact and maybe even affected a career choice.
Jay L. Devore
Kenneth N. Berk CHAPTER ONE Overview
and Descriptive
Statistics
Introduction
Statistical concepts and methods are not only useful but indeed often indispensable in understanding the world around us. They provide ways of gaining
new insights into the behavior of many phenomena that you will encounter in your
chosen field of specialization.
The discipline of statistics teaches us how to make intelligent judgments
and informed decisions in the presence of uncertainty and variation. Without
uncertainty or variation, there would be little need for statistical methods or statisticians. If the yield of a crop were the same in every field, if all individuals reacted
the same way to a drug, if everyone gave the same response to an opinion survey,
and so on, then a single observation would reveal all desired information.
An interesting example of variation arises in the course of performing
emissions testing on motor vehicles. The expense and time requirements of the
Federal Test Procedure (FTP) preclude its widespread use in vehicle inspection
programs. As a result, many agencies have developed less costly and quicker tests,
which it is hoped replicate FTP results. According to the journal article “Motor
Vehicle Emissions Variability” (J. Air Waste Manage. Assoc., 1996: 667–675), the
acceptance of the FTP as a gold standard has led to the widespread belief that
repeated measurements on the same vehicle would yield identical (or nearly
identical) results. The authors of the article applied the FTP to seven vehicles
characterized as “high emitters.” Here are the results of four hydrocarbon and
carbon dioxide tests on one such vehicle:
HC (g/mile)
CO (g/mile) 13.8
118 18.3
149 32.2
232 32.5
236 J.L. Devore and K.N. Berk, Modern Mathematical Statistics with Applications, Springer Texts in Statistics,
DOI 10.1007/978-1-4614-0391-3_1, # Springer Science+Business Media, LLC 2012 1 2 CHAPTER 1 Overview and Descriptive Statistics The substantial variation in both the HC and CO measurements casts considerable
doubt on conventional wisdom and makes it much more difficult to make precise
assessments about emissions levels.
How can statistical techniques be used to gather information and draw
conclusions? Suppose, for example, that a biochemist has developed a medication
for relieving headaches. If this medication is given to different individuals, variation in conditions and in the people themselves will result in more substantial
relief for some individuals than for others. Methods of statistical analysis could
be used on data from such an experiment to determine on the average how much
relief to expect.
Alternatively, suppose the biochemist has developed a headache medication
in the belief that it will be superior to the currently best medication. A comparative
experiment could be carried out to investigate this issue by giving the current
medication to some headache sufferers and the new medication to others. This
must be done with care lest the wrong conclusion emerge. For example, perhaps
really the two medications are equally effective. However, the new medication may
be applied to people who have less severe headaches and have less stressful lives.
The investigator would then likely observe a difference between the two medications attributable not to the medications themselves, but to a poor choice of test
groups. Statistics offers not only methods for analyzing the results of experiments
once they have been carried out but also suggestions for how experiments can
be performed in an efficient manner to lessen the effects of variation and have a
better chance of producing correct conclusions. 1.1 Populations and Samples
We are constantly exposed to collections of facts, or data, both in our professional
capacities and in everyday activities. The discipline of statistics provides methods
for organizing and summarizing data and for drawing conclusions based on information contained in the data.
An investigation will typically focus on a well-defined collection of
objects constituting a population of interest. In one study, the population might
consist of all gelatin capsules of a particular type produced during a specified
period. Another investigation might involve the population consisting of all individuals who received a B.S. in mathematics during the most recent academic year.
When desired information is available for all objects in the population, we have
what is called a census. Constraints on time, money, and other scarce resources
usually make a census impractical or infeasible. Instead, a subset of the population...

View
Full Document