Unformatted text preview: M316 Chapter 15 Dr. Berg Tests of Significance: The Basics A confidence interval is used to estimate a population parameter. The second type of common statistical inference is called a test of significance and is used to assess the evidence provided by data about some claim concerning a population. Example (15.1) Free Throw Shooting Someone claims to make 80% of basketball free throws attempted. As a test you ask them to make 20 free throw attempts, of which 8 are good and 12 miss. Is this good evidence that the person is not an 80% free throw shooter? The number of successes is low, but is it low enough to be significant? The reasoning is based on asking what would happen if the person really were an 80% free throw shooter. How unlikely would it be for them to make as few as 8 out of 20? It happens that the probability of making so few free throws is 0.0001. Thus, if the person really were an 80% free throw shooter, this outcome would occur once in 10,000 tests. The small probability convinces you that the claim is false. The basic idea here is simple: an outcome that would rarely happen if a claim were true is good evidence that the claim is not true. The Reasoning of Tests of Significance The reasoning of statistical tests, like that of confidence intervals, is based on asking what would happen if we repeated the sample or experiment many times. We will act as if the "simple conditions" are true: we have a perfect SRS from an exactly Normal population with known standard deviation . Example (15.2) Sweetening Colas Diet colas use artificial sweeteners to avoid sugar. These sweeteners gradually lose their sweetness over time. Manufacturers therefore test new colas for loss of sweetness before marketing them. Trained tasters sip the cola along with drinks of standard sweetness and score the cola on a "sweetness score" of 1 to 10. The cola is then stored for a month at high temperature to imitate the effect of four month's storage at room temperature. Each taster scores the cola again after storage. This is a matched pairs experiment. Our data are the differences (score before storage minus score after storage) in the taster's scores. The bigger these differences, the bigger the loss in sweetness. Suppose we know that for any cola, the sweetness loss scores vary from taster to taster according to a Normal distribution with standard deviation = 1. The mean for all tasters measures loss of sweetness and is different for different colas. 1 M316 Chapter 15 Dr. Berg Here are the sweetness losses for a new cola, as measured by 10 trained tasters: 2.0 0.4 0.7 2.0 0.4 2.2 1.3 1.2 1.1 2.3 Most are positive. That is, most tasters found a loss of sweetness. But the losses are small, and two tasters (the negative scores) thought the cola gained sweetness. The average sweetness loss is given by the sample mean, 2.0 + 0.4 + ... + 2.3 x= = 1.02 10 Are these data good evidence that the cola lost sweetness in storage? The reasoning is the same as in Example 15.1. We make a claim and ask if the data give evidence against it. We seek evidence that there is a sweetness loss, so the claim we test is that there is not a loss. In that case, the mean loss for the population of all trained testers would be = 0 . 1 If the claim that = 0 is true, the sampling distribution of x from 10 tasters is Normal with mean = 0 and standard deviation 1 = = 0.316 . n 10 Here is a graphic showing the distribution for the case where there is no loss of sweetness. 2 Were the mean loss of the ten tasters x = 0.3 , the hypothesis that = 0 would be plausible since a mean that large could easily occur just by chance. 2 M316 Chapter 15 Dr. Berg 3 In fact, the mean loss for these 10 tasters was x = 1.02 which is so far out on the Normal curve that an observed value so large would rarely occur just by chance if the true mean were = 0 . This observed value is good evidence that the true value of is greater than zero. This means that the cola lost sweetness and the manufacturer must reformulate the cola. Exercise (15.1) Anemia Hemoglobin is a protein in red blood cells that carries oxygen from the lungs to body tissues. People with less than 12 grams of hemoglobin per deciliter of blood (g/dl) are anemic. A public health official in Jordan suspects that the mean for all children in Jordan is less than 12. He measures a sample of 50 children. Suppose that the "simple conditions" hold: the 50 children are an SRS form all Jordanian children and the hemoglobin level in this population follows a Normal distribution with standard deviation = 1.6 g/dl. a) We seek evidence against the claim that = 12 . What is the sampling distribution of x in many samples of size 50 if in fact = 12 ? Sketch the Normal curve for this distribution. b) The sample mean was x = 11.3 . Mark this outcome on the sampling distribution. Also mark the outcome x = 11.8 g/dl of a different study of 50 children. Explain why one of these outcomes is good evidence that < 12 and the other is not. Stating Hypotheses A statistical test starts with a careful statement of the claims we want to compare. Because the reasoning of tests looks for evidence against a claim, we start with the claim we seek evidence against, such as "no loss of sweetness." Null and Alternative Hypotheses The statement being tested in a statistical test is called the null hypothesis. The test is designed to assess the strength of evidence against the null hypothesis. Usually the null hypothesis is a statement of "no effect" or "no difference." The claim about the population that we are trying to find evidence for is the alternative hypothesis. The alternative hypothesis is onesided if it states that a parameter is larger than or smaller than the null hypothesis value. It is twosided if it states that the parameter is different from the null value. We abbreviate the null hypothesis as H0 and the alternative hypothesis as Ha. Hypotheses always refer to a population, not to a particular outcome. Be sure to state H0 and Ha in terms of population parameters. In Example 15.2, we are seeking evidence for loss in sweetness. The null hypothesis says "no loss" on the average in a large population of tasters. The alternative hypothesis says "there is a loss." So the hypotheses are H 0 : = 0 and H a : > 0 . 3 M316 Chapter 15 Dr. Berg The alternative hypothesis is onesided because we are interested only in whether the cola lost sweetness. Example (15.3) Studying Job Satisfaction Does the job satisfaction of assembly workers differ when their work is machinepaced rather than selfpaced? Assign workers either to an assembly line moving at a fixed pace or to a selfpaced setting. All subjects work in both settings, in random order. This is a matched pairs design. After two weeks in each work setting, the workers take a test of job satisfaction. The response variable is the difference in satisfaction scores, selfpaced minus machinepaced. The parameter of interest is the mean of the differences in scores in the population of all assembly workers. The null hypothesis says that there is no difference, that is H 0 : = 0 . The authors of the study wanted to know if there was any difference in levels of job satisfaction. The alternative hypothesis is therefore twosided: H a : 0 . The hypotheses should express the hopes or suspicions we have before we see the data. It is cheating to look at the data and then frame hypotheses to fit what the data show. If you do not have a specific direction firmly in mind in advance, use a twosided alternative. Exercise (15.3) Anemia State the null and alternative hypotheses for the anemia study in Exercise 15.1. Test Statistics A significance test uses data in the form of a test statistic. Here are some principles that apply to most tests: 1 The test is based on a statistic that compares the value of the parameter stated by the null hypothesis with an estimate of the parameter from sample data. The estimate is usually the same one used in a confidence interval for the parameter. 2 Large values of the test statistic indicate that the estimate is far from the parameter value specified by H0. These values give evidence against H0. The alternative hypothesis determines which directions count against H0. Example (15.4) Sweetening Colas In Example 15.2, the null hypothesis is H 0 : = 0 and the estimate of is x = 1.02 . The test statistic for hypotheses about the mean of a Normal distribution is the standardized version of x : x - z = / n 4 M316 Chapter 15 Dr. Berg The statistic z says how far x is from the value of given by the null hypothesis. For Example 15.2, 1.02 - 0 z= = 3.23. 1/ 10 Because the sample result is more than 3 standard deviations above the hypothesized mean 0, it gives good evidence that the mean sweetness loss is not 0, but positive. Exercise (15.9) Anemia What are the values of the test statistic z for the two outcomes in the anemia study of Exercise 15.1? Pvalues The null hypothesis H0 states the claim we are seeking evidence against. The test statistic measures how far the sample data diverge from the null hypothesis. If the test statistic is large and is in the direction suggested by the alternative hypothesis Ha, we have data that would be unlikely if H0 were true. We make "unlikely" precise by calculating a probability. PValue The probability, computed assuming that H0 is true, that the test statistic would take a value as extreme or more extreme than that actually observed is called the Pvalue of the test. The smaller the Pvalue, the stronger the evidence against H0 provided by the data. Example (15.5) Sweetening Colas The study of sweetness loss in Example 15.2 tests the hypotheses H 0 : = 0 versus H a : > 0 . Because the alternative hypothesis says that > 0 , values of x greater than 0 favor Ha over H0. The 10 tasters found sweetness loss x = 1.02 . The Pvalue is the probability of getting an x at least as large as 1.02 when the null hypothesis is true. 1.02 - 0 The test statistic is z = = 3.23. Because x has a Normal distribution, z 1/ 10 has the standard Normal distribution when H0 is true. Thus, the Pvalue is also the probability of z being at least as large as 3.23. Using table A we get P - value = P(Z > 3.23) = 1- 0.9994 = 0.0006 . We would rarely observe a mean sweetness loss of 1.02 or larger if H0 were true. The small Pvalue provides strong evidence against H0 and in favor of the alternative hypothesis H a : > 0 . 5 M316 Chapter 15 Dr. Berg The alternative hypothesis sets the direction that counts as evidence against H0. In Example 15.5, only large values count because the alternative is onesided on the high side. If the alternative is twosided, both directions count. Example (15.6) Job Satisfaction Suppose we know that differences in job satisfaction scores in Example 15.3 follow a Normal distribution with standard deviation = 60 . If there is no difference in job satisfaction between the two work environments, the mean is = 0 . This is H0. The alternative hypothesis says simply "there is a difference," H a : 0 . Data from 18 workers gave x = 17 , meaning that these workers preferred the selfpaced environment on the average. The test statistic is x -0 17 z= = = 1.20 . / n 60 / 18 Because the alternative is twosided, the Pvalue is the probability of getting a z at least as far from 0 in either direction as the observed z = 1.20. As always, calculate the Pvalue taking H0 to be true. It is P - value = P(Z < -1.20 or Z > 1.20) = 2P(Z < -1.20) = (2)(0.1151) = 0.2302 . Values as far from 0 as x = 17 would happen 23% of the time when the true population mean is = 0 . This is not good evidence against H0. The conclusion of Example 15.6 is not that H0 is true. The study looked for evidence against H 0 : = 0 and failed to find strong evidence. That is all we can say. Exercise (15.13) Anemia What are the Pvalues for the two outcomes in the anemia studies in Exercise 15.1? Explain why this shows that one is strong evidence against the null hypothesis and the other is not. Statistical Significance We sometimes take one final step to assess the evidence against H0. We can compare the Pvalue with a fixed value that we regard as decisive. This amounts to announcing in advance how much evidence against H0 we will insist on. The decisive value of P is called the significance level. We write it as . If we choose = 0.05 , we are requiring that the data give evidence against H0 so strong that it would happen no more that 5% of the time when H0 is true. Statistical Significance If the Pvalue is as small or smaller than , we say that the data are statistically significant at level . Significant in this sense does not mean important. It means simply not likely to happen by chance. 6 M316 Chapter 15 Dr. Berg Exercise (15.15) Anemia In Exercises 15.9 and 15.13, we found the z test statistic and the Pvalue for the outcome x = 11.8 in the anemia study of Exercise 15.1. Is this outcome statistically significant at the = 0.05 level? At the = 0.01 level? Tests for a Population Mean The steps in carrying out a significance test mirror the overall fourstep process for organizing realistic statistical problems. Tests of Significance: The FourStep Process State: What is the practical question that requires a statistical test? Formulate: Identify the parameter and state null and alternative hypotheses. Solve: Carry out the test in three phases: a) Check the conditions of the test you plan to use. b) Calculate the test statistic. c) Find the Pvalue. Conclude: Return to the practical question to describe your results in this setting. Here is the rule we have been using for finding the test statistic and Pvalue in our examples. Z Test for a Population Mean Draw an SRS of size n from a Normal population that has unknown mean and known standard deviation . To test the null hypothesis that has a specified value, H 0 : = 0 calculate the onesample z statistic x - 0 z= . / n In terms of a variable Z having the standard Normal distribution, the Pvalue for a test of H0 against H a : > 0 is P(Z z) H a : < 0 is P(Z z) H a : 0 is P(Z z ) 7 M316 Chapter 15 Dr. Berg Example (15.7) Executive's Blood Pressure STATE: The National Center for Health Statistics reports that the systolic blood pressure for males 35 to 44 years of age has mean 128 and standard deviation 15. The medical director of a large company looks at the medical records of 72 executives in this age group and finds that the mean systolic blood pressure in this sample is x = 126.07 . Is this evidence that the company's executives have a different mean blood pressure form the general population? FORMULATE: The null hypothesis is "no difference" from the national mean 0 = 128 . The alternative hypothesis is twosided because the medical director did not have a particular direction in mind before examining the data. So the hypotheses about the unknown mean of the executive population are H 0 : = 128 H a : 128 . SOLVE: As part of the "simple conditions," suppose we know that executive's blood pressures follow a Normal distribution with standard deviation = 15 . The one sample z statistic is x - 126.07 -128 0 z= = = -1.09 . / n 15 / 72 The Pvalue is P = 2P(Z 1.09) = 2(1- 0.8621) = 0.2758 . The appropriate illustration is CONCLUDE: More than 27% of the time, an SRS of size 72 from the general male population would have a mean blood pressure at least as far from 128 as that of the executive sample. This is not strong enough evidence to reject the null hypothesis. Using Tables of Critical Values Table C can be used as a quick way to find Pvalues, but we will use Table A. 8 M316 Tests From Confidence Intervals Chapter 15 Dr. Berg There is a strong tie between confidence intervals and twosided tests of significance. Confidence Intervals and TwoSided Tests A level twosided significance test rejects a hypothesis H 0 : = 0 exactly when the value 0 falls outside a level 1 confidence interval for . Example (15.11) Tests From Confidence Intervals In Example 15.7, a medical director found mean blood pressure x = 126.07 for an SRS of 72 executives. Is this value significantly different from the nation mean 0 = 128 at the 10% significance level? The confidence interval for a 90% level is 15 x z* = 126.07 1.645 = 126.07 2.91 n 72 which is the interval from 123.16 to 128.98. The hypothesized value 0 = 128 falls inside this confidence interval, so we cannot reject the null hypthesis. 9 ...
View Full Document
This note was uploaded on 09/14/2009 for the course CH 310 N taught by Professor Blocknack during the Fall '08 term at University of Texas.
- Fall '08