EXST7005 Fall2010 09a Power & the t distribution

EXST7005 Fall2010 09a Power & the t distribution -...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Statistical Methods I (EXST 7005) Page 53 Recent literature had tended to giving just the actual “P value”, and letting the reader decide if it is “significant”. The P-value is just the area in the tail above the calculated Z value. For example, with our Oak tree example, the calculated Z value was 2.236. This was larger than our critical value of 1.96. so the “tail” would be smaller than 0.025. So, how unusual is a value of 2.236? Actually, the probability of a randomly chosen value exceeding this value is 0.0127 in one tail. For a two tailed tests we would express this probability as 2(0.0127) = 0.0254 since we would -4 reject for either – 2.236 or +2.236. Area above observed value -3 -2 -1 0 1 2 3 4 The P-value is then: P = 0.0254. For most tests that we do, SAS will give this value. If smaller than the desired α, calculated test statistic value would be in the tail and would be rejected. If larger than the desired α, test statistic value would not be in the tail and would be not be rejected. Most tests in SAS are two–tailed, though a few are one-tailed. Another Example The mean for high school seniors on a nationally standardized reading test is 170 points with a variance of 400. The principal of a small rural high school hypothesizes that the 9 seniors in his school will score better than the national average. Test his hypothesis (data given later). I. H 0 : μ = μ 0 or H 0 : μ − μ 0 = 0 II. H1: μ > μ 0 or H1: μ − μ 0 > 0 III. Assume that the scores are (1) normally and (2) independently distributed with a (3) known variance of σ2 = 400. (i.e. the distribution is NID(170, 400)). IV. Let the probability of Type I error equal 5%. (i.e. α = 0.05) V. Find the critical limits given that we want a one tailed test against the upper tail with α = 0.05. The Z value which will leave 5% in the upper tail is 1.645. VI. Gather new data to test the hypothesis. The test results for the 9 students were: 164, 175, 186, 173, 191, 187, 189, 176 and 179. The summary statistics for this group are Y = 180 -4 -3 -2 -1 and S2 = 634. However, we know the true 2 national variance (σ = 400) for the test and can use this in a Z test. 0 1 2 3 4 The condition of “known variance” is really important to using a Z test, and should be added as a third assumption. The test calculations are Z = Y − μ0 σ2 n = 180 − 170 10 = = 1.5 6.6667 400 9 VII. This value does not reach the critical value of 1.645, so we cannot conclude that these 9 seniors scored significantly higher than the national average. Apparently, it is not that unusual, at the 5% level, for any subgroup of 9 individuals to score 10 points above the James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 54 national mean. However, the P value for the observed Z score is P = 0.0668, so it is not very common either. Are we convinced that these 9 students are not above average? This would be our conclusion if the P value had reached 0.05, but it reached only 0.0668. Close! The principal may well claim that this was significant. As scientists we may decide it is just too close to call, and “reserve judgment” pending more data. Summary Logic: We need a known probability distribution and we need to determine what is likely for our known distribution under the null hypothesis. Any conditions needed for this to work out are specified in the assumptions. Both one and two-tailed alternative hypotheses are possible. Review the 7 Steps of Hypothesis testing I. Determine the H0 II. Determine the H1 III. Consider the assumptions IV. Determine a value for α and obtain a critical region for a test statistic (e.g. Z), from your knowledge of alpha (α) and the H1. V. Obtain a sample of new data to test the Hypothesis. Compute the appropriate statistic from a sample (e.g. Y ) and calculate the value of the TEST STATISTIC (Z) VI. Compare the calculated value of the test statistic to the CRITICAL VALUES. Make your decision to either reject the H0 or to FAIL to Reject the H0. VII. Draw you conclusions from the test of Hypothesis and interpret your results. The 5 steps of Hypothesis Testing according to Freund & Wilson. 1) Establish H0 , H1 and a value for α. 2) Determine the test statistic and a region for rejection 3) Draw a sample, calculate the test statistic 4) Compare the test statistic to the critical limits and make a decision to reject or fail to reject. 5) Interpret the results Hypothesis testing Concepts The logic of test of hypothesis is based on the chosen probability of error, α (or significance level) for the test statistic (Z) which determines the range of what would be expected due to chance alone assuming H0 is true. Significance level notation, commonly used levels and terminology “Statistically significant” α = 0.05 “Highly significant” α = 0.01 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 55 Errors! When we do a test of hypothesis is it possible that we are wrong? Yes, unfortunately, it is always possible that we are wrong. Furthermore, there are two types of error that we could make! Types of error Data indicates: H0 is true Data indicates: H0 is false True result: H0 is true NO ERROR Type I Error: Reject TRUE H0 True result: H0 is false Type II Error: Fail to Reject FALSE H0 NO ERROR Type I Error: Type I error is the rejection of a true null hypothesis. This type of error is also called α (alpha) error. This is the value that we choose as the “level of significance”, so we actually set the probability of making this type of error. The probability of a type I error = α Type II Error: Type II error is the failure to reject of a null hypothesis that is false. This type of error is also called β (beta) error. We do not set this value, but we call the probability of a type II error = β. Furthermore, in practice we will never know this value. This is another reason we cannot “accept” the null hypothesis, because it is possible that we are wrong and we cannot state the probability of this type of error. The good news, it is only possible to make one error at a time. If you reject H0, then you may have made a type I error, but you cannot have made a type II error. If you fail to reject H0, then you may have made a type II error, but you cannot have made a type I error. The probability of Type II Error This is a probability that we will not know. This probability is called β. However, we can do several things to make the error smaller, so this will be our objective. First, let's look at how these errors occur. Examine the hypothesized distribution (below) that we believe to have a mean of 18. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 56 10 -4 12 -3 14 -2 16 -1 18 0 20 1 22 2 24 3 26 4 10 -4 12 -3 14 -2 16 -1 18 0 20 1 22 2 24 3 26 4 10 12 We are going to do a 2 tailed test with an α value of 0.05. Our critical limits will be ±1.96. So we will reject any test statistic over 1.96 (or under –1.96). But let’s suppose the null hypothesis is false!!! Let’s suppose that the alternate hypothesis is true. Then the hypothesized distribution is not real, there is another “real” distribution that we are sampling from. What might it look like? Here is the real distribution. It actually has a mean of 22, but we don't know that. If we did, we would not have hypothesized a mean of 18! 14 16 18 20 22 24 26 27 28 Critical value 10 12 14 16 18 20 22 24 26 -4 -3 -2 -1 0 1 2 3 4 27 28 So where on the real distribution is our critical limit. This is the key question. 10 -4 12 -3 14 -2 16 -1 18 0 20 1 22 2 24 3 26 4 27 28 Note that with the Z transformations each change of 1 unit of Z corresponds to a change of 2 on the original Y scale. This means that on the original scale σ = 2. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 57 Critical value 10 12 14 16 18 20 22 24 26 -4 -3 -2 -1 0 1 2 3 4 27 28 Now we draw our sample from the real distribution. If our result says reject the H0, we make no error. Critical value 10 12 14 16 18 20 22 24 26 -4 -3 -2 -1 0 1 2 3 4 27 28 But if our result causes us to fail to reject, we err. Critical value 10 12 14 16 18 20 22 24 26 27 28 And in this case it appears we have pretty close to a 50-50 chance of going either way. 10 So we take our sample and do our test. Will we err? Maybe we will, and maybe we won’t. Our sample could come from anywhere in this “real” distribution. If our sample happens to fall in the lower red area (below about 22), we would not reject H0, and we would err. But if our sample happens to fall in the upper yellow area (above about 22), we will reject H0. In this case there is no error, we draw the correct conclusion. 12 14 16 18 20 22 24 26 27 28 The Probability of Type II error or β error For α = 0.05, our critical limit, in terms of Z, would be 1.96. The critical limit translates to a value on the original scale of Yi = μ + Z iσ = 18 ± 1.96(2) = 18 ± 3.92 . The lower bound is 14.08 and the upper bound is 21.92. The lower bound is so far down on the real distribution that the probability of getting a sample that falls there is near zero. The upper bound is the one that falls in the middle of the “real” distribution. In this fictitious case we know that the true mean is 22. Normally we wouldn't know the true mean. Since we know the true mean in this case, we can calculate the probability of drawing a sample above and below the critical limit (21.92 on the Y scale, –0.04 on the Z scale of the real distribution). The probability of falling below this value, and of making a type II error, is 0.484, or about 48.4%. This is the probability we call beta (β). James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 58 The probability of falling above this value, and of NOT making a type II error, is 0.516, or 51.6%. So in this case we can calculate β, the probability of a Type II error. In practice we cannot usually know these probabilities because we never know the real value of the mean. We define a new term POWER, this is the probability of NOT making a type II error (1 – β). This was 0.516 in our example. Power and Type II Error Since we don't actually know the value of the true mean (or we wouldn't be hypothesizing something else), we cannot know in practice the type II error rate (β). However, it is affected by a number of things, and we can know about these. 1) Power is affected by the distance between the hypothesized mean (μ0) and true mean (μ). The Power Curve 1 Power α 0 Difference between true and hypothesized mean 2) Power is affected by the value chosen for Type I error (α). 10 12 14 16 18 20 22 24 26 -4 -3 -2 -1 0 1 2 3 4 10 12 14 16 18 20 22 24 26 -4 -3 -2 -1 0 1 2 3 27 27 28 4 28 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 59 3) Power is affected by the variability or spread of the distribution. 10 12 14 16 18 20 22 24 26 27 28 10 12 14 16 18 20 22 24 26 27 28 Influencing the power of a test of hypothesis The capability of the test to reject H0 when it is false is called Power = 1 – β. Anything done to enhance this value will improve your ability to test for differences among populations. Which of the 3 factors influencing power can you control? For testing means you may be able to control sample size (n). This reduces the variability and increases power. You probably cannot influence the difference between μ and μ0. You can choose any value of α. However, this cannot be too small or Type II error becomes more likely. Too large and Type I error becomes likely. Methods of increasing the power of a test How would we use our knowledge of factors affecting power to increase the power of our tests of hypothesis? Increase the significance level (e.g. from α = 0.01 to α = 0.05) If H0 is true we would increase α, the probability of a Type I error. If H0 is false then we decrease β, the probability of a Type II error, and by decreasing β, we are increasing the POWER of test. For a given α, the POWER can be increased by .... 2 decreases, and the amount of overlap between the real Increasing n, so σ Y = σ n = σ n and hypothesized distributions decreases. For example, let’s suppose we are conducting a test of the hypothesis H0: μ = μ0 against the alternative H1: μ ≠ μ 0. We believe μ0 = 50 and we set α = 0.05. We also know that σ2 = 100 and that n = 25. From this information we can calculate σY = σ n = 10 = 2 . The critical region in 5 terms of Z is then P(|Z| ≥ Z0) = 0.05 and Z0 = 1.96, and the critical value on the original scale Y variable scale is Yi = μ + Ziσ = 50 + 1.96(2) = 53.92. If the REAL population mean is 54, calculate P(Y ≥ 53.92), given that the TRUE mean is 54 we calculate the Z value as Z = (53.92 – 54)/2 = –0.08 / 2 = –0.04. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 60 The probability of a TYPE II error (β) is the probability of not drawing a sample that falls above this value and not rejecting the false null hypothesis. The value is β = P(Z ≤ –0.04) = 0.4840. So for an experiment with n = 25, the power is 1 – β = 1 – 0.4840 = 0.516. But suppose we had a larger sample, say n = 100. Now σY = σ = 10 = 1. The critical 10 region stays at Z0 = 1.96, but on the original scale this is now Yi = μ + Ziσ = 50 + n 1.96(1) = 51.96. For a true mean of 54 we now get Z = (51.96–54)/1 = –2.04/1 = – 2.04. The value of β = is P(Z ≤ –2.04) = 0.0207, and the power for this test is 1 – β = 0.9793. The bottom line, With n = 25, the power is 0.5160. With n = 100, the power is 0.9793. This is why statisticians recommend larger sample sizes so strongly. We may never really know what power is, but we know how to increase it and reduce the probability of TYPE II error. Summary Hypothesis testing is prone to two types of errors, one we control (α) and one we do not (β). Type I error is the REJECTION of a true null hypothesis. Type II error is the FAILURE TO REJECT a null hypothesis that is false. The “Power” of a test is 1 – β Not only do we not control TYPE II error, we probably do not even know its value. However, we can hopefully reduce this error, and increase power, by Controlling the distance between μ and μ0 (not really likely) Selecting a value of α that is not too small (0.05 and 0.01 are the usual values) Getting a larger sample size (n), this is the factor that is usually under the most control of the investigator. The t-test of hypotheses The t distribution is used the same as Z distribution, except it is used where sigma (σ) ,is unknown (or where Y is used instead of μ to calculate deviations). The t distribution is a bell shaped curve, like the Z distribution, but not the same. The Z distribution is normal because it has a normal distribution in the numerator (Yi) and all other terms in the transformation are constant. The t distribution has a normal distribution in the numerator but the sample variance in the denominator is another statistic with a chi square distribution. ti = (Y i −Y S ) ; the t distribution applied to individual observations James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) t= Page 61 (Y − μ ) = (Y − μ ) ; the t distribution used for hypothesis testing 0 SY 0 S n where; S = the sample standard deviation, (calculated using Y instead of μ) S Y = the sample standard error The variance of the t distribution is greater than that of the Z distribution (except where n → ∞), since S2 estimates σ2, but is never as good (reliability is less) Mean variance Z distribution 0 1 t distribution 0 ≥1 Characteristics of the t distribution E(t) = 0, the expected value of the t distribution is zero. It is symmetrically distributed about a mean of 0 with t values ranging between ±∞ (i.e. –∞ ≤ t ≤ +∞) There is a different t distribution for each degree of freedom (df), since the distribution changes as the degrees of freedom change. It has a broader spread for smaller df, and narrows (approaching the Z distribution) as df increase. As the df (γ, gamma) approaches infinity (∞), the t distribution converges the Z distribution. For example; Z (no df associated); middle 95% is between ± 1.96 t with 1 df; middle 95% is between ± 12.706 t with 10 df; middle 95% is between ± 2.228 t with 30 df; middle 95% is between ± 2.042 t with ∞ df; middle 95% is between ± 1.96 How does the test for the t distribution differ from the Z distribution? • For the Z distribution, since Yi is normally distributed, subtracting a constant (μο) and dividing by a constant (σ) does not affect the distribution and Z is normal. • For the t distribution we also have a normally distributed Yi and we subtract a constant (μο), but we divide by a statistic (S), not a constant (σ). • This alters the distribtuion so that it is not quite a normal distribution. The extra incertainty causes the t distribution to be “broader” than the Z distriution. • However, as sample size increases the value of S approaches σ and the t distribution converges on the Z distribution. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 62 Probability distribution tables in general The tables we will use will ALL be giving the area in the tail (α). However, if you examine a number of tables from other sources you will find that this is not always true. Even when it is true, some tables will give the value of α as if it were in two tails, and some as if it were in one tail. For example, we want to conduct a two-tailed Z test at the α = 0.05 level. We happen to know that Z = 1.96. If we look at this value in the Z tables we expect to see a value of 0.025, or α/2. But many tables would show the probability for 1.96 as 0.975, and some as 0.05. Why the difference? It just depends on how the tables are presented. Some of the alternatives are shown below. Some tables give cumulative distribution starting at – infinity. You want to find the probability corresponding to 1 – α/2. The value that leaves .025 in the upper tail would be 0.975. Some tables may start at zero (0.0) and give the cumulative area from this point for the upper half of the distribution. This would be less common. The value that leaves .025 in the upper tail would be 0.475. Among the tables like ours, that give the area in the tail, some are called two tailed tables and some are one tailed tables. Table value, 0.0.025 -4 -3 -2 -1 0 1 2 3 4 α=0.025 1-α=0.975 One tailed table. Table value, 0.050 -4 -3 -2 -1 0 α/2 1-α 1 2 3 4 α/2 Two tailed table. Why the extra confusion at this point? All our tables will give the area in the tail. The Z tables we used gave the area in one tail. For a two tailed test you needed to doubled the probability. For the F tables and Chi square tables covered later, this area will be a single tail as with the Z tables. This is because these distributions are not symmetric. Traditionally, many t-tables have given the area in TWO TAILS instead of on one tail. Many textbooks have this type of tables. SAS will also usually give two-tailed values for t-tests. James P. Geaghan Copyright 2010 ...
View Full Document

This note was uploaded on 12/29/2011 for the course EXST 7005 taught by Professor Geaghan,j during the Fall '08 term at LSU.

Ask a homework question - tutors are online