Introductory Statistics Explained.pdf - Introductory...

This preview shows page 1 out of 459 pages.

Unformatted text preview: Introductory Statistics Explained Edition 1.10 ©2015 Jeremy Balka All rights reserved Balka ISE 1.10 2 Contents 1 Introduction 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Inferential Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 2 Gathering Data 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 2.2 Populations and Samples, Parameters and Statistics 2.3 Types of Sampling . . . . . . . . . . . . . . . . . . . 2.3.1 Simple Random Sampling . . . . . . . . . . . 2.3.2 Other Types of Random Sampling . . . . . . 2.4 Experiments and Observational Studies . . . . . . . 2.5 Chapter Summary . . . . . . . . . . . . . . . . . . . 1 2 2 3 . . . . . . . 7 9 9 10 11 13 14 19 . . . . . . . . . . . . 21 23 23 23 25 32 33 34 37 45 50 53 56 4 Probability 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Basics of Probability . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Interpreting Probability . . . . . . . . . . . . . . . . . . . 59 61 62 62 3 Descriptive Statistics 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . 3.2 Plots for Categorical and Quantitative Variables 3.2.1 Plots for Categorical Variables . . . . . . 3.2.2 Graphs for Quantitative Variables . . . . 3.3 Numerical Measures . . . . . . . . . . . . . . . . 3.3.1 Summation Notation . . . . . . . . . . . . 3.3.2 Measures of Central Tendency . . . . . . . 3.3.3 Measures of Variability . . . . . . . . . . . 3.3.4 Measures of Relative Standing . . . . . . 3.4 Boxplots . . . . . . . . . . . . . . . . . . . . . . . 3.5 Linear Transformations . . . . . . . . . . . . . . . 3.6 Chapter Summary . . . . . . . . . . . . . . . . . i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CONTENTS Balka ISE 1.10 4.3 4.4 4.5 4.6 4.7 4.8 4.2.2 Sample Spaces and Sample Points . . . . . . . . . 4.2.3 Events . . . . . . . . . . . . . . . . . . . . . . . . . Rules of Probability . . . . . . . . . . . . . . . . . . . . . 4.3.1 The Intersection of Events . . . . . . . . . . . . . . 4.3.2 Mutually Exclusive Events . . . . . . . . . . . . . . 4.3.3 The Union of Events and the Addition Rule . . . . 4.3.4 Complementary Events . . . . . . . . . . . . . . . 4.3.5 An Example . . . . . . . . . . . . . . . . . . . . . . 4.3.6 Conditional Probability . . . . . . . . . . . . . . . 4.3.7 Independent Events . . . . . . . . . . . . . . . . . 4.3.8 The Multiplication Rule . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 4.5.2 The Law of Total Probability and Bayes’ Theorem Counting rules: Permutations and Combinations . . . . . 4.6.1 Permutations . . . . . . . . . . . . . . . . . . . . . 4.6.2 Combinations . . . . . . . . . . . . . . . . . . . . . Probability and the Long Run . . . . . . . . . . . . . . . . Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 64 65 65 65 66 67 67 69 71 73 75 82 82 85 88 89 90 91 94 5 Discrete Random Variables and Discrete Probability Distributions 97 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.2 Discrete and Continuous Random Variables . . . . . . . . . . . . 99 5.3 Discrete Probability Distributions . . . . . . . . . . . . . . . . . . 101 5.3.1 The Expectation and Variance of Discrete Random Variables102 5.4 The Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . 110 5.5 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . 111 5.5.1 Binomial or Not? . . . . . . . . . . . . . . . . . . . . . . . 114 5.5.2 A Binomial Example with Probability Calculations . . . . 115 5.6 The Hypergeometric Distribution . . . . . . . . . . . . . . . . . . 117 5.7 The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . 121 5.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.7.2 The Relationship Between the Poisson and Binomial Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.7.3 Poisson or Not? More Discussion on When a Random Variable has a Poisson distribution . . . . . . . . . . . . . . . 125 5.8 The Geometric Distribution . . . . . . . . . . . . . . . . . . . . . 127 5.9 The Negative Binomial Distribution . . . . . . . . . . . . . . . . 131 5.10 The Multinomial Distribution . . . . . . . . . . . . . . . . . . . . 133 5.11 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 135 ii CONTENTS Balka ISE 1.10 6 Continuous Random Variables and Continuous Probability Distributions 139 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.2 Properties of Continuous Probability Distributions . . . . . . . . 143 6.2.1 An Example Using Integration . . . . . . . . . . . . . . . 144 6.3 The Continuous Uniform Distribution . . . . . . . . . . . . . . . 146 6.4 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . 150 6.4.1 Finding Areas Under the Standard Normal Curve . . . . . 153 6.4.2 Standardizing Normally Distributed Random Variables . . 158 6.5 Normal Quantile-Quantile Plots . . . . . . . . . . . . . . . . . . . 162 6.5.1 Examples of Normal QQ Plots . . . . . . . . . . . . . . . 163 6.6 Other Important Continuous Probability Distributions . . . . . . 166 6.6.1 The 2 Distribution . . . . . . . . . . . . . . . . . . . . . 166 6.6.2 The t Distribution . . . . . . . . . . . . . . . . . . . . . . 168 6.6.3 The F Distribution . . . . . . . . . . . . . . . . . . . . . . 169 6.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 172 7 Sampling Distributions 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 7.2 The Sampling Distribution of the Sample Mean . . . 7.3 The Central Limit Theorem . . . . . . . . . . . . . . 7.3.1 Illustration of the Central Limit Theorem . . 7.4 Some Terminology Regarding Sampling Distributions 7.4.1 Standard Errors . . . . . . . . . . . . . . . . 7.4.2 Unbiased Estimators . . . . . . . . . . . . . . 7.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 177 180 183 185 187 187 187 188 8 Confidence Intervals 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 8.2 Interval Estimation of µ when is Known . . . . . . 8.2.1 Interpretation of the Interval . . . . . . . . . 8.2.2 What Factors Affect the Margin of Error? . . 8.2.3 Examples . . . . . . . . . . . . . . . . . . . . 8.3 Confidence Intervals for µ When is Unknown . . . 8.3.1 Introduction . . . . . . . . . . . . . . . . . . . 8.3.2 Examples . . . . . . . . . . . . . . . . . . . . 8.3.3 Assumptions of the One-Sample t Procedures 8.4 Determining the Minimum Sample Size n . . . . . . 8.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 193 194 198 200 202 205 205 208 212 218 220 9 Hypothesis Tests (Tests of Significance) 223 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 9.2 The Logic of Hypothesis Testing . . . . . . . . . . . . . . . . . . 227 iii CONTENTS Balka ISE 1.10 9.3 Hypothesis Tests for µ When is Known . . . . . . . . . . . . . 9.3.1 Constructing Appropriate Hypotheses . . . . . . . . . . . 9.3.2 The Test Statistic . . . . . . . . . . . . . . . . . . . . . . 9.3.3 The Rejection Region Approach to Hypothesis Testing . . 9.3.4 P -values . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interpreting the p-value . . . . . . . . . . . . . . . . . . . . . . . 9.5.1 The Distribution of the p-value When H0 is True . . . . . 9.5.2 The Distribution of the p-value When H0 is False . . . . . Type I Errors, Type II Errors, and the Power of a Test . . . . . . 9.6.1 Calculating Power and the Probability of a Type II Error 9.6.2 What Factors Affect the Power of the Test? . . . . . . . . One-sided Test or Two-sided Test? . . . . . . . . . . . . . . . . . 9.7.1 Choosing Between a One-sided Alternative and a Twosided Alternative . . . . . . . . . . . . . . . . . . . . . . . 9.7.2 Reaching a Directional Conclusion from a Two-sided Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . Statistical Significance and Practical Significance . . . . . . . . . The Relationship Between Hypothesis Tests and Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . Hypothesis Tests for µ When is Unknown . . . . . . . . . . . . 9.10.1 Examples of Hypothesis Tests Using the t Statistic . . . . More on Assumptions . . . . . . . . . . . . . . . . . . . . . . . . Criticisms of Hypothesis Testing . . . . . . . . . . . . . . . . . . Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 229 229 230 233 235 239 243 243 245 245 248 253 254 10 Inference for Two Means 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 The Sampling Distribution of the Difference in Sample Means . . 10.3 Inference for µ1 µ2 When 1 and 2 are Known . . . . . . . . . 10.4 Inference for µ1 µ2 when 1 and 2 are unknown . . . . . . . . 10.4.1 Pooled Variance Two-Sample t Procedures . . . . . . . . . 10.4.2 The Welch Approximate t Procedure . . . . . . . . . . . . 10.4.3 Guidelines for Choosing the Appropriate Two-Sample t Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.4 More Examples of Inferences for the Difference in Means . 10.5 Paired-Difference Procedures . . . . . . . . . . . . . . . . . . . . 10.5.1 The Paired-Difference t Procedure . . . . . . . . . . . . . 10.6 Pooled-Variance t Procedures: Investigating the Normality Assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 273 275 276 277 279 280 286 11 Inference for Proportions 309 9.4 9.5 9.6 9.7 9.8 9.9 9.10 9.11 9.12 9.13 254 256 257 258 260 261 266 268 271 290 292 297 300 303 306 iv CONTENTS Balka ISE 1.10 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 The Sampling Distribution of the Sample Proportion . . . . . . . 11.2.1 The Mean and Variance of the Sampling Distribution of pˆ 11.2.2 The Normal Approximation . . . . . . . . . . . . . . . . . 11.3 Confidence Intervals and Hypothesis Tests for the Population Proportion p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Determining the Minimum Sample Size n . . . . . . . . . . . . . 11.5 Inference Procedures for Two Population Proportions . . . . . . . 11.5.1 The Sampling Distribution of pˆ1 pˆ2 . . . . . . . . . . . . 11.5.2 Confidence Intervals and Hypothesis Tests for p1 p2 . . 11.6 More on Assumptions . . . . . . . . . . . . . . . . . . . . . . . . 11.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 311 312 312 313 314 316 320 321 322 323 327 329 12 Inference for Variances 331 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 12.2 The Sampling Distribution of the Sample Variance . . . . . . . . 334 12.2.1 The Sampling Distribution of the Sample Variance When Sampling from a Normal Population . . . . . . . . . . . . 334 12.2.2 The Sampling Distribution of the Sample Variance When Sampling from Non-Normal Populations . . . . . . . . . . 336 12.3 Inference Procedures for a Single Variance . . . . . . . . . . . . . 337 12.4 Comparing Two Variances . . . . . . . . . . . . . . . . . . . . . . 343 12.4.1 The Sampling Distribution of the Ratio of Sample Variances343 12.4.2 Inference Procedures for the Ratio of Population Variances 345 12.5 Investigating the Effect of Violations of the Normality Assumption 352 12.5.1 Inference Procedures for One Variance: How Robust are these Procedures? . . . . . . . . . . . . . . . . . . . . . . 352 12.5.2 Inference Procedures for the Ratio of Variances: How Robust are these Procedures? . . . . . . . . . . . . . . . . . 355 12.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 357 13 Tests for Count Data 359 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 13.2 2 Tests for One-Way Tables . . . . . . . . . . . . . . . . . . . . 361 13.2.1 The 2 Test Statistic . . . . . . . . . . . . . . . . . . . . . 362 13.2.2 Testing Goodness-of-Fit for Specific Parametric Distributions366 13.3 2 Tests for Two-Way Tables . . . . . . . . . . . . . . . . . . . . 368 13.3.1 The 2 Test Statistic for Two-Way Tables . . . . . . . . . 370 13.3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 13.4 A Few More Points . . . . . . . . . . . . . . . . . . . . . . . . . . 375 13.4.1 Relationship Between the Z Test and 2 Test for 2⇥2 Tables375 13.4.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 376 2 v CONTENTS Balka ISE 1.10 13.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 379 14 One-Way Analysis of Variance (ANOVA) 14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 One-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . 14.3 Carrying Out the One-Way Analysis of Variance . . . . . 14.3.1 The Formulas . . . . . . . . . . . . . . . . . . . . . 14.3.2 An Example with Full Calculations . . . . . . . . . 14.4 What Should be Done After One-Way ANOVA? . . . . . 14.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 14.4.2 Fisher’s LSD Method . . . . . . . . . . . . . . . . . 14.4.3 The Bonferroni Correction . . . . . . . . . . . . . . 14.4.4 Tukey’s Honest Significant Difference Method . . . 14.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6 A Few More Points . . . . . . . . . . . . . . . . . . . . . . 14.6.1 Different Types of Experimental Design . . . . . . 14.6.2 One-Way ANOVA and the Pooled-Variance t Test 14.6.3 ANOVA Assumptions . . . . . . . . . . . . . . . . 14.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . 15 Introduction to Simple Linear Regression 15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 15.2 The Linear Regression Model . . . . . . . . . . . . . 15.3 The Least Squares Regression Line . . . . . . . . . . 15.4 Statistical Inference in Simple Linear Regression . . 15.4.1 Model Assumptions . . . . . . . . . . . . . . 15.4.2 Statistical Inference for the Parameter 1 . . 15.5 Checking Model Assumptions with Residual Plots . . 15.6 Measures of the Strength of the Linear Relationship 15.6.1 The Pearson Correlation Coefficient . . . . . 15.6.2 The Coefficient of Determination . . . . . . . 15.7 Estimation and Prediction Using the Fitted Line . . 15.8 Transformations . . . . . . . . . . . . . . . . . . . . . 15.9 A Complete Example . . . . . . . . . . . . . . . . . . 15.10Outliers, Leverage, and Influential Points . . . . . . . 15.11Some Cautions about Regression and Correlation . . 15.11.1 Always Plot Your Data . . . . . . . . . . . . 15.11.2 Avoid Extrapolating . . . . . . . . . . . . . . 15.11.3 Correlation Does Not Imply Causation . . . . 15.12A Brief Multiple Regression Example . . . . . . . . . 15.13Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 383 384 386 386 388 391 391 393 395 398 401 406 406 407 407 408 . . . . . . . . . . . . . . . . . . . . 411 413 413 417 420 421 422 424 426 426 429 431 433 435 437 439 439 440 441 442 446 vi Chapter 1 Introduction “I have been impressed with the urgency of doing. Knowing is not enough; we must apply. Being willing is not enough; we must do.” -Leonardo da Vinci 1 1.1. INTRODUCTION Balka ISE 1.10 1.1 Introduction When first encountering the study of statistics, students often have a preconceived— and incorrect—notion of what the field of statistics is all about. Some people think that statisticians are able to quote all sorts of unusual statistics, such as 32% of undergraduate students report patterns of harmful drinking behaviour, or that 55% of undergraduates do not understand what the field of statistics is all about. But the field of statistics has little to do with quoting obscure percentages or other numerical summaries. In statistics, we often use data to answer questions like: • Is a newly developed drug more effective than one currently in use? • Is there a still a sex effect in salaries? After accounting for other relevant variables, is there a difference in salaries between men and women? Can we estimate the size of this effect for different occupations? • Can post-menopausal women lower their heart attack risk by undergoing hormone replacement therapy? To answer these types of questions, we will first need to find or collect appropriate data. We must be careful in the planning and data collection process, as unfortunately sometimes the data a researcher collects is not appropriate for answering the questions of interest. Once appropriate data has been collected, we summarize and illustrate it with plots and numerical summaries. Then—ideally—we use the data in the most effective way possible to answer our question or questions of interest. Before we move on to answering these types of questions using statistical inference techniques, we will first explore the basics of descriptive statistics. 1.2 Descriptive Statistics In descriptive statistics, plots and numerical summaries are used to describe a data set. Example 1.1 Consider a data set representing final grades in a large introductory statistics course. We may wish to illustrate the grades using a histogram or a boxplot,1 as in Figure 1.1. 1 You may not have encountered boxplots before. Boxplots will be discussed in detail in Section 3.4. In their simplest form, they are plots of the five-number summary: minimum, 25th percentile, median, 75th percentile, and the maximum. Extreme values are plotted in individually. They are most useful for comparing two or more distributions. 2 1.3. INFERENTIAL STATISTICS Balka ISE 1.10 100 Final Grade Frequency 120 80 40 75 50 25 0 20 40 60 80 100 Final Grade (a) Histogram of final grades. (b) Boxplot of final grades. Figure 1.1: Boxplot and histogram of final grades in an introductory statistics course. Plots like these can give an effective visual summary of the data. But we are also interested in numerical summary statistics, such as the mean, median, and variance. We will investigate descriptive statistics in greater detail in Chapter 3. But the main purpose of this text is to introduce statistical inference concepts and techniques. 1.3 Inferential Statistics The most interesting statistical techniques involve investigating the relationship between variables. Let’s look at a few examples of the types of problems we will be investigating. Example 1.2 Do traffic police officers in Cairo have higher levels of lead in their blood than that of other police officers? A study2 investigated this question by drawing random samples of 126 Cairo traffic officers and 50 officers from the suburbs. Lead levels in the blood (µg/dL) were measured. The boxplots in Figure 1.2 illustrate the data. Bo...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture