EXST7005 Fall2010 08a Hypothesis Testing

EXST7005 Fall2010 08a Hypothesis Testing - Statistical...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Statistical Methods I (EXST 7005) Page 46 Tests of hypothesis Hypothesis – a contention based on preliminary evidence of what appears to be fact (an educated guess), which may or may not be true. • Formulating a hypothesis is the second step in the scientific method. • A statement of the hypothesis is the first step in experimentation. Test of hypothesis – a comparison of the contention with a set of newly gathered data. Hypothesis testing procedure – we will consider 7 steps I. Set up a meaningful hypothesis such as “The population mean is equal to some value” (call it μ0) H0: μ = μ0 or μ – μ0 = 0 This is called the null hypothesis. It is a hypothesis of equality or of no difference (even if you believe there is a difference). Note that hypotheses are always stated in terms of the population parameters, not the sample statistics we actually measure, because we are drawing inference about the population. II. Set up an alternative hypothesis Alternative hypotheses are denoted H1 or Ha. This hypothesis states what is correct if the null hypothesis in not correct. This is usually the case of actual interest. Examples: a) H 1 : μ ≠ μ 0 or H 1 : μ − μ 0 ≠ 0 (also called the non-directional alternative) b) H1: μ < μ 0 or H1: μ − μ 0 < 0 c) H1: μ > μ 0 or H1: μ − μ 0 > 0 III. Consider the assumptions. 1) We will be using the Z distribution, so the distribution we are testing must be normal. 2) The observations should be independent. The best guarantee of independent observations is random sampling. 3) Strictly speaking, the variance should be known in order to use the Z distribution; however it is often used for very large samples. Later we will discuss the tdistribution that is used when the variance is not known and must be estimated from the sample. There will be a few other assumptions for other test statistics. However, the tests of hypothesis we will be using are also “robust”. Statistically speaking, robustness indicates that the test performs quite well even if the assumptions are not perfectly met. IV. Select a probability of rejecting the null hypothesis (H0) when it is true. This is called the alpha (α) value and the value chosen is somewhat arbitrary. By convention the values usually chosen is α = 0.05 or sometimes α = 0.01. for α = 0.05 then if H0 is true we will reject it 5% of the time, or in one of 20 samples for α = 0.01 then if H0 is true we will reject it 1% of the time, or in one of 100 samples James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 47 This value is sometimes called the significance level. From this value, and the alternate hypothesis, we can determine the critical limits, those values of the test statistic that would cause us to reject the null hypothesis. Determine a critical region, what is too large or too small by using the chosen probability or significance level. Critical region – the area in the distribution which would lead to rejection of the null hypothesis (H0:). When we reject we know that it is possible that the null hypothesis is true, but if it is we would only reject α∗100% of the time. So this type of error can be controlled. Region of “acceptance” – the area under the distribution which would lead to “acceptance” of the null hypothesis (H0:). 1−α α/2 −∞ α/2 0 Lower Critical Region Region of "Acceptance" Z values +∞ Upper Critical Region Notice that I have placed the word “acceptance” in quotes. We cannot really state that we “accept” the null hypothesis because it is also possible that we would be wrong in doing so. Unfortunately, in practice the probability of this type of error is unknown and therefore one cannot “accept” with a known probability of error (more later under Type II error and Power). V. Draw a sample from the population of interest (as defined by the investigator), and a) Compute an estimate of the parameter in the hypothesis; in our example the hypothesis was about μ so the statistic will be Y , recall E( Y ) = μ. b) The value of Y from the sample now becomes one of many possible observations from the derived population of all sample means. c) Recall that the derived population has, μY = μ 2 σY = σ n 2 2 σY = σ n = σ n d) Recall that the distribution of sample means ( Yk ) approaches a normal distribution as the value of n increases (according to the Central Limit Theorem). This helps meet our assumption of normality. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) e) Recall the Z transformation Zi = Page 48 (Y − μ ) i Y σY . Our null hypothesis contends that the true value of μ Y is our hypothesized value, μ 0 , so we will calculate a Z score using μ 0 . This will follow a Z distribution if the null hypothesis is correct. If the null hypothesis is not correct we don’t care what the distribution is, we just hope to reject the null hypothesis. Zi = (Y − μ ) 0 i σY As a result, where μ 0 is the hypothesized value of the mean, if the null hypothesis is true and μ = μ0 we would expect Z to be approximately zero (within reasonable limits, defined later). On the other hand, if μ ≠ μ0 we would expect Z to be different from zero by some amount. If Z is too much greater than zero (i.e. Z > 0), that suggests that μ Y is too large while if Z is much less than zero, then μ Y appears to be too small. VI. Compare the test statistic from step V to the critical region determined in step IV. VII. Draw conclusions and interpretations from the results of the test. The test statistic is not an end in itself. Logic behind the test A key aspect of a test of hypothesis is that we must have a test statistic with a known distribution. We could sample from any one of numerous populations with many different distributions. The characteristics of these distributions are unknown, but if we can transform the sampled distribution to a known distribution, we can then make some probability statements. Beyond this, we simply want to determine what is likely under the null hypothesis. If we hypothesize a mean of μ0 and take a sample of mean that is actually close to μ0, then the null hypothesis is probably true. If, on the other hand, the calculated sample mean is not close to μ0, and if the difference big enough that it is not likely to have occurred due to sampling variation, then the alternate hypothesis is the more likely choice. Reasonable limits – recall that we needed to define this a set of limits of the critical region determined by the significance level (α) and by the alternative hypothesis (e.g. was it two tailed, or one tailed, and if one tailed, to which side). The value of α is what specifies what we feel would be unlikely under the null hypothesis. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 49 SUMMARY OF THE 7 STEPS OF HYPOTHESIS TESTING I. Establish a null hypothesis, H0: II. Determine an appropriate alternative hypothesis to the null, H1: III. Consider the assumptions IV. Determine a value for α and find the critical limits and a critical region for the chosen statistic. V. Obtain a sample of new data to test the hypothesis, and compute the appropriate test statistic from the sample. VI. Using the critical region and the test statistic (e.g. Z), compare the values and make a decision to reject the H0 or to fail to reject the H0. VII. Draw your conclusions from the test of hypothesis. Example of a Test of Hypothesis Extensive measurements done in eastern Tennessee have shown that the average 20-year-old White Oak produces an average of 12 Kg of acorns with a variance of 4 Kg2. Five White Oaks in Georgia produced a mean of 14 Kg. Assuming that the variance is the same, test the hypothesis that the production is the same in Tennessee and Georgia. 1) H 0 : μ = μ 0 or H 0 : μ − μ 0 = 0 (where μ0 = 12 Kg, the known value for Tennessee. 2) H 1 : μ ≠ μ 0 or H 1 : μ − μ 0 ≠ 0 . We might be tempted to test the hypothesis, H1: μ > μ 0 , since the Georgia oaks had a mean of 14 Kg. However, remember that this is supposed to be a new data set to test the hypothesis and we would not have known this in advance. 3) Assume the sample of Oaks is random (independent) and normally distributed. We also have a known variance from Tennessee of 4 Kg2. 4) Determine a value of α and obtain the critical limits for a critical region for the test statistic using our knowledge of H1 and α. We will somewhat arbitrarily choose a value of α = 0.05. This is a commonly used and accepted value. The H1 indicates that we are doing a 2 tailed test. To keep α at 0.05, place half the value of a in each tail (0.0250 per tail). This corresponds to critical Z values of ±1.96 1−α α/2 −∞ α/2 +∞ 0 Lower Critical Region Region of "Acceptance" -1.96 Upper Critical Region +1.96 The Critical Region: red areas in the tails are areas of rejection. 5) Obtain a new data set to test the hypothesis, and compute the appropriate test statistic from the sample ( Y for testing differences in the means). The results for our sample were Y = 14, and n = 5. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) 6) Calculate Z = Page 50 (Y − μ ) = (14 − 12 ) = ( 2 ) 0 i σY 45 2 5 = 5 ( 2) = 5 = 2.236. 2 This value of the test statistic is greater than the limit for the upper critical region (±1.96), so it falls in the region of rejection. This would be interpreted to indicate that it is unlikely that a value this large would arise by random chance alone if the null hypothesis were true. 7) Conclusion: Reject the null hypothesis and conclude that the two areas differ in terms of acorn production. We can also go one step further. Since the production levels are different we can also conclude that production is greater in Georgia since it had a greater value for the mean production. One-tailed tests Suppose our problem had been a little different, and that we had believed from the beginning that Georgia had a higher rate of production. Something we believed BEFORE we started the study (a priori). We might then want to test for only this alternative, i.e. H1: μ > μ 0 or H1: μ − μ 0 > 0 , where μ0 = 12 Kg (the Tennessee value). Now, the test is altered because we will have a different critical value. We still want an α = 0.05, but we would put all 5% chance of error in the upper tail! Note that this makes it “easier” to show significance, because we only need meet the 1.645 criteria instead of the 1.96 criteria. However, it implies that we have some additional knowledge and have no interest in the lower tail. What if the calculated value was well into in the lower tail? Presumably this would be a spurious occurrence and not of interest, because we “know” it can't happen. 1−α −∞ α +∞ 0 Region of "Acceptance" Upper Critical Region +1.645 In fact, if our critical value was 1.645 in the upper tail, and we found the Georgia value to be less than Tennessee, no additional calculations would be needed because the calculated Z value would be negative and could not be in the upper tail. In other words, if our hypothesized value (μ0) is greater than out observed value ( Y ), then the calculation (Yi − μ0 ) would be negative and could not be in the upper tail that was hypothesized. Z= σY James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 51 Additional notes and terminology on hypothesis testing Recall that a key aspect of a test of hypothesis is that we must have a test statistic with a known distribution. For our present discussion we are using the Z distribution. Given H 0 : μ = μ 0 and H1: μ ≠ μ 0 and α = 0.05 1) If H0 is true then (1–α)100% of the samples will yield a Z test statistic that will fall in the region of “acceptance”. That is, for α = 0.05, then (1–α)100% = 95%. This is sometimes referred to as the confidence level. 2) For a two-tailed test, half of the possible samples will have a Z test statistic score in the upper critical region [(α/2)100%], and half of the samples will have a score in the lower critical region [(α/2)100%]. For α = 0.05, then (α/2)100% = 2.5%. 3) Since H1: μ ≠ μ 0 (implying we do not know “a priori” if the hypothesized value might be too large or too small), the probability statement then becomes. P(|Z| ≥ Z0) = α = 0.05 (the absolute value sign indicates Z may be positive or negative) 2P(Z ≥ Z0) = α = 0.05 P(Z ≥ Z0) = α /2 = 0.025 so Z0 = 1.96 from the Z tables 4) If the calculated Z test statistic is between –1.96 and +1.96, we cannot reject the null hypothesis ( H 0 : μ = μ 0 ). This means that the observed statistic is consistent with the hypothesized value, BUT we can never actually PROVE that H0 is true. It is relatively easy to prove that things are different, but almost impossible to prove that two things are identical. So we resort to jargon; we say that ... • there is no “statistically significant difference” • there is no “significant difference” • that “the data is consistent with the null hypothesis” • that we “fail to reject the null hypothesis”. These statements are better (more correct) than stating that we ACCEPT the null hypothesis or that the null hypothesis is TRUE. 5) For a two tailed test, if the calculated |Z| is greater than, or equal to, the critical value of the test statistic (e.g. Z = 1.96), then reject the H0, and conclude that the null hypothesis is correct. For one tailed tests, if Z > 1.96 reject for the hypothesis H1: μ > μ 0 and if or Z < – 1.96 conclude that H1: μ < μ 0 . 6) The size of the critical region is determined by α, the level of significance. Note that when we reject the null hypothesis, there is a chance that we are in error, but that we know the probability of making that error. It is α. This is because we can set the level of fallibility in our conclusions for this type of error. 7) When we have a one tailed alternative, say H1: μ > μ 0 , versus the null hypothesis H 0 : μ = μ 0 , what happens to the cases that may be much less than the hypothesized values? Since we James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 52 have a one tailed test we must know that such cases are impossible, or are simply not of interest no matter how small they must be. In this case some investigators prefer notation where the other extreme is included in the null hypothesis. H 0 : μ ≤ μ 0 versus H1: μ > μ 0 H 0 : μ ≥ μ 0 versus H1: μ < μ 0 This is acceptable, but the statistical development of a test of hypothesis actually considers only the equality in the null hypothesis and doesn’t really consider these cases. Final notes on the one and two tailed alternatives 1) The two tailed test is called the “non directional alternative”. H 0 : μ = μ 0 or H 0 : μ − μ 0 = 0 H 1 : μ ≠ μ 0 or H 1 : μ − μ 0 ≠ 0 α/2 1−α α/2 0 −∞ +∞ This means that we will accept either H 1: μ < μ 0 or H 1: μ > μ 0 as fulfillment of the alternate hypothesis. Since either case is to be accepted we state our probability with an absolute value, P(|Z| ≥ Z0) = α = 0.05 and for a 5% chance of error, we divide the 5% into equal parts (usually) and put half in each tail. 2) The one tailed test is called the “directional alternative”. H 0 : μ = μ 0 or H 0 : μ − μ 0 = 0 and H1: μ < μ 0 or H1: μ − μ 0 < 0 or H1: μ > μ 0 or H1: μ − μ 0 > 0 1−α α this indicates that we will accept only one 0 −∞ of the two options, H1: μ < μ 0 or H1: μ > μ 0 as fulfillment of the alternate hypothesis. Since only one case is to be accepted, we state our probability as either P(Z ≥ Z0) = α = 0.05 or P(Z ≤ Z0) = α = 0.05, and for a 5% chance of error, we put all 5% into the tail of interest. Why α = 0.05, and not 0.04 or 0.09? No particular reason. The value is not special, but has become something of a convention or traditional standard. This value represents a one chance in 20 of error. It is generally accepted as a reasonable chance of error, and is usually acceptable to referees, editors and journals. However, if you want to use another value, and have some good reason for doing so, this should be possible. The value of 0.05 has traditionally been termed the level at which we have “statistically significant” results. A value of 0.01 is then considered a “highly significant” result. P values in tests of hypothesis Probability value or P values, like those we have discussed previously, just represent some area under a curve. However, in the context of hypothesis testing they indicate that area under the curve that represents a value equal or larger than some observed value of a test statistic. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 53 Recent literature had tended to giving just the actual “P value”, and letting the reader decide if it is “significant”. The P-value is just the area in the tail above the calculated Z value. For example, with our Oak tree example, the calculated Z value was 2.236. This was larger than our critical value of 1.96. so the “tail” would be smaller than 0.025. So, how unusual is a value of 2.236? Actually, the probability of a randomly chosen value exceeding this value is 0.0127 in one tail. For a two tailed tests we would express this probability as 2(0.0127) = 0.0254 since we would -4 reject for either – 2.236 or +2.236. Area above observed value -3 -2 -1 0 1 2 3 4 The P-value is then: P = 0.0254. For most tests that we do, SAS will give this value. If smaller than the desired α, calculated test statistic value would be in the tail and would be rejected. If larger than the desired α, test statistic value would not be in the tail and would be not be rejected. Most tests in SAS are two–tailed, though a few are one-tailed. Another Example The mean for high school seniors on a nationally standardized reading test is 170 points with a variance of 400. The principal of a small rural high school hypothesizes that the 9 seniors in his school will score better than the national average. Test his hypothesis (data given later). I. H 0 : μ = μ 0 or H 0 : μ − μ 0 = 0 II. H1: μ > μ 0 or H1: μ − μ 0 > 0 III. Assume that the scores are (1) normally and (2) independently distributed with a (3) known variance of σ2 = 400. (i.e. the distribution is NID(170, 400)). IV. Let the probability of Type I error equal 5%. (i.e. α = 0.05) V. Find the critical limits given that we want a one tailed test against the upper tail with α = 0.05. The Z value which will leave 5% in the upper tail is 1.645. VI. Gather new data to test the hypothesis. The test results for the 9 students were: 164, 175, 186, 173, 191, 187, 189, 176 and 179. The summary statistics for this group are Y = 180 -4 -3 -2 -1 and S2 = 634. However, we know the true 2 national variance (σ = 400) for the test and can use this in a Z test. 0 1 2 3 4 The condition of “known variance” is really important to using a Z test, and should be added as a third assumption. The test calculations are Z = Y − μ0 σ2 n = 180 − 170 10 = = 1.5 6.6667 400 9 VII. This value does not reach the critical value of 1.645, so we cannot conclude that these 9 seniors scored significantly higher than the national average. Apparently, it is not that unusual, at the 5% level, for any subgroup of 9 individuals to score 10 points above the James P. Geaghan Copyright 2010 ...
View Full Document

Ask a homework question - tutors are online