
Unformatted text preview: Introduction to the Practice of
Statistics
NINTH EDITION David S. Moore
George P. McCabe
Bruce A. Craig
Purdue University 2 Vice President, STEM: Ben Roberts
Publisher: Terri Ward
Senior Acquisitions Editor: Karen Carson
Marketing Manager: Tom DeMarco
Marketing Assistant: Cate McCaffery
Development Editor: Jorge Amaral
Senior Media Editor: Catriona Kaplan
Assistant Media Editor: Emily Tenenbaum
Director of Digital Production: Keri deManigold
Senior Media Producer: Alison Lorber
Associate Editor: Victoria Garvey
Editorial Assistant: Katharine Munz
Photo Editor: Cecilia Varas
Photo Researcher: Candice Cheesman
Director of Design, Content Management: Diana Blume
Text and Cover Designer: Blake Logan
Project Editor: Edward Dionne, MPS North America LLC
Illustrations: MPS North America LLC
Production Manager: Susan Wein
Composition: MPS North America LLC
Printing and Binding: LSC Communications
Cover Illustration: Drawing Water: Spring 2011 detail
(Midwest) by David Wicks
“Look Back” Arrow: NewCorner/Shutterstock Library of Congress Control Number: 2016946039
Student Edition Hardcover:
ISBN-13: 978-1-319-01338-7
ISBN-10: 1-319-01338-4
Student Edition Loose-leaf:
ISBN-13: 978-1-319-01362-2
ISBN-10: 1-319-01362-7
Instructor Complimentary Copy:
ISBN-13: 978-1-319-01428-5
ISBN-10: 1-319-01428-3
© 2017, 2014, 2012, 2009 by W. H. Freeman and Company
All rights reserved
Printed in the United States of America
First printing
W. H. Freeman and Company
One New York Plaza
Suite 4500
New York, NY 10004-1562
3 Brief Contents
To Teachers: About This Book
To Students: What Is Statistics?
About the Authors
Data Table Index
Beyond the Basics Index PART I Looking at Data CHAPTER 1 Looking at Data—Distributions CHAPTER 2 Looking at Data—Relationships CHAPTER 3 Producing Data PART II Probability and Inference CHAPTER 4 Probability: The Study of Randomness CHAPTER 5 Sampling Distributions CHAPTER 6 Introduction to Inference CHAPTER 7 Inference for Means CHAPTER 8 Inference for Proportions PART III Topics in Inference CHAPTER 9 Inference for Categorical Data CHAPTER 10 Inference for Regression CHAPTER 11 Multiple Regression CHAPTER 12 One-Way Analysis of Variance 4 CHAPTER 13 Two-Way Analysis of Variance Tables
Answers to Odd-Numbered Exercises
Notes and Data Sources
Index 5 Contents
To Teachers: About This Book
To Students: What Is Statistics?
About the Authors
Data Table Index
Beyond the Basics Index PART I Looking at Data
CHAPTER 1 Looking at Data—Distributions
Introduction 1.1 Data
Key characteristics of a data set
Section 1.1 Summary
Section 1.1 Exercises 1.2 Displaying Distributions with Graphs
Categorical variables: Bar graphs and pie charts
Quantitative variables: Stemplots and histograms
Histograms
Data analysis in action: Don’t hang up on me
Examining distributions
Dealing with outliers
Time plots
Section 1.2 Summary
Section 1.2 Exercises 1.3 Describing Distributions with Numbers
Measuring center: The mean
Measuring center: The median
Mean versus median
6 Measuring spread: The quartiles
The five-number summary and boxplots
The 1.5 × IQR rule for suspected outliers
Measuring spread: The standard deviation
Properties of the standard deviation
Choosing measures of center and spread
Changing the unit of measurement
Section 1.3 Summary
Section 1.3 Exercises 1.4 Density Curves and Normal Distributions
Density curves
Measuring center and spread for density curves
Normal distributions
The 68–95–99.7 rule
Standardizing observations
Normal distribution calculations
Using the standard Normal table
Inverse Normal calculations
Normal quantile plots
Beyond the Basics: Density estimation
Section 1.4 Summary
Section 1.4 Exercises
Chapter 1 Exercises CHAPTER 2 Looking at Data—Relationships
Introduction 2.1 Relationships
Examining relationships
Section 2.1 Summary
Section 2.1 Exercises 2.2 Scatterplots
Interpreting scatterplots
The log transformation
Adding categorical variables to scatterplots
7 Scatterplot smoothers
Categorical explanatory variables
Section 2.2 Summary
Section 2.2 Exercises 2.3 Correlation
The correlation r
Properties of correlation
Section 2.3 Summary
Section 2.3 Exercises 2.4 Least-Squares Regression
Fitting a line to data
Prediction
Least-squares regression
Interpreting the regression line
Facts about least-squares regression
Correlation and regression
Another view of r2
Section 2.4 Summary
Section 2.4 Exercises 2.5 Cautions about Correlation and Regression
Residuals
Outliers and influential observations
Beware of the lurking variable
Beware of correlations based on averaged data
Beware of restricted ranges
Beyond the Basics: Data mining
Section 2.5 Summary
Section 2.5 Exercises 2.6 Data Analysis for Two-Way Tables
The two-way table
Joint distribution
Marginal distributions
Describing relations in two-way tables
Conditional distributions
8 Simpson’s paradox
Section 2.6 Summary
Section 2.6 Exercises 2.7 The Question of Causation
Explaining association
Establishing causation
Section 2.7 Summary
Section 2.7 Exercises
Chapter 2 Exercises CHAPTER 3 Producing Data
Introduction 3.1 Sources of Data
Anecdotal data
Available data
Sample surveys and experiments
Section 3.1 Summary
Section 3.1 Exercises 3.2 Design of Experiments
Comparative experiments
Randomization
Randomized comparative experiments
How to randomize
Randomization using software
Randomization using random digits
Cautions about experimentation
Matched pairs designs
Block designs
Section 3.2 Summary
Section 3.2 Exercises 3.3 Sampling Design
Simple random samples
How to select a simple random sample
Stratified random samples
9 Multistage random samples
Cautions about sample surveys
Beyond the Basics: Capture-recapture sampling
Section 3.3 Summary
Section 3.3 Exercises 3.4 Ethics
Institutional review boards
Informed consent
Confidentiality
Clinical trials
Behavioral and social science experiments
Section 3.4 Summary
Section 3.4 Exercises
Chapter 3 Exercises PART II Probability and Inference
CHAPTER 4 Probability: The Study of Randomness
Introduction 4.1 Randomness
The language of probability
Thinking about randomness
The uses of probability
Section 4.1 Summary
Section 4.1 Exercises 4.2 Probability Models
Sample spaces
Probability rules
Assigning probabilities: Finite number of outcomes
Assigning probabilities: Equally likely outcomes
Independence and the multiplication rule
Applying the probability rules
Section 4.2 Summary
10 Section 4.2 Exercises 4.3 Random Variables
Discrete random variables
Continuous random variables
Normal distributions as probability distributions
Section 4.3 Summary
Section 4.3 Exercises 4.4 Means and Variances of Random Variables
The mean of a random variable
Statistical estimation and the law of large numbers
Thinking about the law of large numbers
Beyond the Basics: More laws of large numbers
Rules for means
The variance of a random variable
Rules for variances and standard deviations
Section 4.4 Summary
Section 4.4 Exercises 4.5 General Probability Rules
General addition rules
Conditional probability
General multiplication rules
Tree diagrams
Bayes’s rule
Independence again
Section 4.5 Summary
Section 4.5 Exercises
Chapter 4 Exercises CHAPTER 5 Sampling Distributions
Introduction 5.1 Toward Statistical Inference
Sampling variability
Sampling distributions
Bias and variability
11 Sampling from large populations
Why randomize?
Section 5.1 Summary
Section 5.1 Exercises 5.2 The Sampling Distribution of a Sample Mean
The mean and standard deviation of x̅
The central limit theorem
A few more facts
Beyond the Basics: Weibull distributions
Section 5.2 Summary
Section 5.2 Exercises 5.3 Sampling Distributions for Counts and Proportions
The binomial distributions for sample counts
Binomial distributions in statistical sampling
Finding binomial probabilities
Binomial mean and standard deviation
Sample proportions
Normal approximation for counts and proportions
The continuity correction
Binomial formula
The Poisson distributions
Section 5.3 Summary
Section 5.3 Exercises
Chapter 5 Exercises CHAPTER 6 Introduction to Inference
Introduction
Overview of inference 6.1 Estimating with Confidence
Statistical confidence
Confidence intervals
Confidence interval for a population mean
How confidence intervals behave
Choosing the sample size
12 Some cautions
Section 6.1 Summary
Section 6.1 Exercises 6.2 Tests of Significance
The reasoning of significance tests
Stating hypotheses
Test statistics
P-values
Statistical significance
Tests for a population mean
Two-sided significance tests and confidence intervals
The P-value versus a statement of significance
Section 6.2 Summary
Section 6.2 Exercises 6.3 Use and Abuse of Tests
Choosing a level of significance
What statistical significance does not mean
Don’t ignore lack of significance
Statistical inference is not valid for all sets of data
Beware of searching for significance
Section 6.3 Summary
Section 6.3 Exercises 6.4 Power and Inference as a Decision
Power
Increasing the power
Inference as decision
Two types of error
Error probabilities
The common practice of testing hypotheses
Section 6.4 Summary
Section 6.4 Exercises
Chapter 6 Exercises CHAPTER 7 Inference for Means
13 Introduction 7.1 Inference for the Mean of a Population
The t distributions
The one-sample t confidence interval
The one-sample t test
Matched pairs t procedures
Robustness of the t procedures
Beyond the Basics: The bootstrap
Section 7.1 Summary
Section 7.1 Exercises 7.2 Comparing Two Means
The two-sample z statistic
The two-sample t procedures
The two-sample t confidence interval
The two-sample t significance test
Robustness of the two-sample procedures
Inference for small samples
Software approximation for the degrees of freedom
The pooled two-sample t procedures
Section 7.2 Summary
Section 7.2 Exercises 7.3 Additional Topics on Inference
Choosing the sample size
Inference for non-Normal populations
Section 7.3 Summary
Section 7.3 Exercises
Chapter 7 Exercises CHAPTER 8 Inference for Proportions
Introduction 8.1 Inference for a Single Proportion
Large-sample confidence interval for a single proportion
Beyond the Basics: The plus four confidence interval for a single
proportion
14 Significance test for a single proportion
Choosing a sample size for a confidence interval
Choosing a sample size for a significance test
Section 8.1 Summary
Section 8.1 Exercises 8.2 Comparing Two Proportions
Large-sample confidence interval for a difference in proportions
Beyond the Basics: The plus four confidence interval for a difference
in proportions
Significance test for a difference in proportions
Choosing a sample size for two sample proportions
Beyond the Basics: Relative risk
Section 8.2 Summary
Section 8.2 Exercises
Chapter 8 Exercises PART III Topics in Inference
CHAPTER 9 Inference for Categorical Data
Introduction 9.1 Inference for Two-Way Tables
The hypothesis: No association
Expected cell counts
The chi-square test
Computations
Computing conditional distributions
The chi-square test and the z test
Beyond the Basics: Meta-analysis
Section 9.1 Summary
Section 9.1 Exercises 9.2 Goodness of Fit
Section 9.2 Summary
Section 9.2 Exercises
15 Chapter 9 Exercises CHAPTER 10 Inference for Regression
Introduction 10.1 Simple Linear Regression
Statistical model for linear regression
Preliminary data analysis and inference considerations
Estimating the regression parameters
Checking model assumptions
Confidence intervals and significance tests
Confidence intervals for mean response
Prediction intervals
Transforming variables
Beyond the Basics: Nonlinear regression
Section 10.1 Summary
Section 10.1 Exercises 10.2 More Detail about Simple Linear Regression
Analysis of variance for regression
The ANOVA F test
Calculations for regression inference
Inference for correlation
Section 10.2 Summary
Section 10.2 Exercises
Chapter 10 Exercises CHAPTER 11 Multiple Regression
Introduction 11.1 Inference for Multiple Regression
Population multiple regression equation
Data for multiple regression
Multiple linear regression model
Estimation of the multiple regression parameters
Confidence intervals and significance tests for regression
coefficients
16 ANOVA table for multiple regression
Squared multiple correlation R2
Section 11.1 Summary
Section 11.1 Exercises 11.2 A Case Study
Preliminary analysis
Relationships between pairs of variables
Regression on high school grades
Interpretation of results
Examining the residuals
Refining the model
Regression on SAT scores
Regression using all variables
Test for a collection of regression coefficients
Beyond the Basics: Multiple logistic regression
Section 11.2 Summary
Section 11.2 Exercises
Chapter 11 Exercises CHAPTER 12 One-Way Analysis of Variance
Introduction 12.1 Inference for One-Way Analysis of Variance
Data for one-way ANOVA
Comparing means
The two-sample t statistic
An overview of ANOVA
The ANOVA model
Estimates of population parameters
Testing hypotheses in one-way ANOVA
The ANOVA table
The F test
Software
Beyond the Basics: Testing the equality of spread
Section 12.1 Summary
Section 12.1 Exercises
17 12.2 Comparing the Means
Contrasts
Multiple comparisons
Power
Section 12.2 Summary
Section 12.2 Exercises
Chapter 12 Exercises CHAPTER 13 Two-Way Analysis of Variance
Introduction 13.1 The Two-Way ANOVA Model
Advantages of two-way ANOVA
The two-way ANOVA model
Main effects and interactions 13.2 Inference for Two-Way ANOVA
The ANOVA table for two-way ANOVA
Chapter 13 Summary
Chapter 13 Exercises
Tables
Answers to Odd-Numbered Exercises
Notes and Data Sources
Index 18 To Teachers: About This Book
Statistics is the science of data. Introduction to the Practice of Statistics (IPS) is
an introductory text based on this principle. We present methods of basic statistics
in a way that emphasizes working with data and mastering statistical reasoning. IPS
is elementary in mathematical level but conceptually rich in statistical ideas. After
completing a course based on our text, we would like students to be able to think
objectively about conclusions drawn from data and use statistical methods in their
own work.
In IPS, we combine attention to basic statistical concepts with a comprehensive
presentation of the elementary statistical methods that students will find useful in
their work. IPS has been successful for several reasons:
1. IPS examines the nature of modern statistical practice at a level suitable for
beginners. We focus on the production and analysis of data as well as the
traditional topics of probability and inference.
2. IPS has a logical overall progression, so data production and data analysis are
a major focus, while inference is treated as a tool that helps us draw
conclusions from data in an appropriate way.
3. IPS presents data analysis as more than a collection of techniques for
exploring data. We emphasize systematic ways of thinking about data. Simple
principles guide the analysis: always plot your data; look for overall patterns
and deviations from them; when looking at the overall pattern of a distribution
for one variable, consider shape, center, and spread; for relations between
two variables, consider form, direction, and strength; always ask whether a
relationship between variables is influenced by other variables lurking in the
background. We warn students about pitfalls in clear cautionary discussions.
4. IPS uses real examples to drive the exposition. Students learn the technique of
least-squares regression and how to interpret the regression slope. But they
also learn the conceptual ties between regression and correlation and the
importance of looking for influential observations.
5. IPS is aware of current developments both in statistical science and in
teaching statistics. Brief, optional Beyond the Basics sections give quick
overviews of topics such as density estimation, scatterplot smoothers, data
mining, nonlinear regression, and meta-analysis. Chapter 16 gives an
elementary introduction to the bootstrap and other computer-intensive 19 statistical methods.
The title of the book expresses our intent to introduce readers to statistics as it
is used in practice. Statistics in practice is concerned with drawing conclusions
from data. We focus on problem solving rather than on methods that may be useful
in specific settings.
GAISE The College Report of the Guidelines for Assessment and Instruction in
Statistics Education (GAISE) Project ( ) was
funded by the American Statistical Association to make recommendations for how
introductory statistics courses should be taught. This report and its update contain
many interesting teaching suggestions, and we strongly recommend that you read it.
The philosophy and approach of IPS closely reflect the GAISE recommendations.
Let’s examine each of the latest recommendations in the context of IPS.
1. Teach statistical thinking. Through our experiences as applied statisticians,
we are very familiar with the components that are needed for the appropriate
use of statistical methods. We focus on formulating questions, collecting and
finding data, evaluating the quality of data, exploring the relationships among
variables, performing statistical analyses, and drawing conclusions. In
examples and exercises throughout the text, we emphasize putting the analysis
in the proper context and translating numerical and graphical summaries into
conclusions.
2. Focus on conceptual understanding. With the software available today, it is
very easy for almost anyone to apply a wide variety of statistical procedures,
both simple and complex, to a set of data. Without a firm grasp of the
concepts, such applications are frequently meaningless. By using the methods
that we present on real sets of data, we believe that students will gain an
excellent understanding of these concepts. Our emphasis is on the input
(questions of interest, collecting or finding data, examining data) and the
output (conclusions) for a statistical analysis. Formulas are given only where
they will provide some insight into concepts.
3. Integrate real data with a context and a purpose. Many of the examples and
exercises in IPS include data that we have obtained from collaborators or
consulting clients. Other data sets have come from research related to these
activities. We have also used the Internet as a data source, particularly for data
related to social media and other topics of interest to undergraduates. Our
emphasis on real data, rather than artificial data chosen to illustrate a
calculation, serves to motivate students and help them see the usefulness of
statistics in everyday life. We also frequently encounter interesting statistical
issues that we explore. These include outliers and nonlinear relationships. All
data sets are available from the text website. 20 4. Foster active learning in the classroom. As we mentioned earlier, we
believe that statistics is exciting as something to do rather than something to
talk about. Throughout the text, we provide exercises in Use Your Knowledge
sections that ask the students to perform some relatively simple tasks that
reinforce the material just presented. Other exercises are particularly suited to
being worked on and discussed within a classroom setting.
5. Use technology for developing concepts and analyzing data. Technology has
altered statistical practice in a fundamental way. In the past, some of the
calculations that we performed were particularly difficult and tedious. In other
words, they were not fun. Today, freed from the burden of computation by
software, we can concentrate our efforts on the big picture: what questions are
we trying to address with a study and what can we conclude from our
analysis?
6. Use assessments to improve and evaluate student learning. Our goal for
students who complete a course based on IPS is that they are able to design
and carry out a statistical study for a project in their capstone course or other
setting. Our exercises are oriented toward this goal. Many ask about the design
of a statistical study and the collection of data. Others ask...
View
Full Document