EXST7005 Fall2010 16a t-test (example) etc

EXST7005 Fall2010 16a t-test (example) etc - Statistical...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Statistical Methods I (EXST 7005) Page 105 Numerical example Compare the ovarian weight of 14 fish, 7 randomly assigned to receive injections of gonadotropin (treatment group) and 7 assigned to receive a saline solution injection (control group). Both groups are treated identically except for the gonadotropin treatment. Ovarian weights are to be compared for equality one week after treatment. During the experiment two fish were lost due to causes not related to the treatment, so the experiment became unbalanced. Raw data Obs 1 2 3 4 5 6 7 Treatment 134 146 104 119 124 * * Control 70 85 94 83 97 77 80 Summary statistics Statistic n Treatment 5 Control 7 ΣYi 627 586 ΣYi 2 79,625 49,588 Y 125.4 999 4 249.8 83.7 531 6 88.6 SS γ S2 Research question: Does the gonadotropin treatment affect the ovarian weight? (Note: this implies a non-directional alternative). First, which of the various situations for two-sample t-tests do we have? Obviously, n1 ≠ n2. Now check the variances. 1) 2 H0: σ12 = σ2 2) H1: σ12 ≠ σ 22 3) Assume Yi ~ NIDrv, representing the usual assumptions of normality and independence. 4) α = 0.05 and the critical value for 4, 6 d.f. is Fα/2,4,6 = 6.23. 5) We have the samples, and know that the variances are 249.8 and 88.6, and the d.f. are 4 and 6 respectively. The calculated value is (given that we have a nondirectional alternative and arbitrarily placing the largest variance in the numerator), F = 249.8/88.6 = 2.82 with 4, 6 d.f. 6) The critical value is larger than the calculated value. We therefore fail to reject the null hypothesis. 7) We can conclude that the two samples have sufficiently similar variances for pooling. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 106 Pooling the variances. 2 Recall, S p = 2 Sp = γ 1S12 + γ 2 S22 SS1 + SS2 = γ1 + γ 2 γ1 + γ 2 4 ( 249.8 ) + 6 ( 88.6 ) 999 + 531 1530 = = = 153 with 10 d.f. 4+6 4+6 10 Now calculate the standard error for the test, Sd , using the pooled variance. For this case 1⎞ ⎛1 1⎞ 2⎛ 1 Sd = SY1 −Y2 = S p ⎜ + ⎟ = 153 ⎜ + ⎟ = 153 ( 0.343) = 52.457 = 7.24 , with 10 df ⎝5 7⎠ ⎝ n1 n2 ⎠ Completing the two-sample t-test. 1) H 0 : μ1 − μ 2 = δ . In this case we could state the null as H 0 : μ1 = μ 2 since δ = 0. 2) H 0 : μ1 − μ 2 ≠ δ or H 0 : μ1 ≠ μ 2 3) Assume di ~ NIDr.v. (δ, σ δ2 ). NOTE we have pooled the variances, so obviously we have assumed that all variance is homogeneous and equal to σ δ2 . 4) α = 0.05 and the critical value is 2.228 (a nondirectional alternative for α=0.05 and 10 df) 5) We have the samples and know that the means are 125.4 and 83.7. The calculated t value is: t= (Y1 − Y2 ) − ( μ1 − μ 2 ) 1 ⎞ 2⎛ 1 Sp ⎜ + ⎟ ⎝ n1 n2 ⎠ = (Y1 − Y2 ) − 0 Y1 − Y2 125.4 – 83.7 41.7 = = = = 5.76 with 10 d.f. Sd Sd 7.24 7.24 6) The calculated value (5.76) clearly exceeds the critical value (2.228) value, so we would reject the null hypothesis. 7) Conclude that the gonadotropin treatment does affect the gonad weight of the fish. We can further state that the treatment increases the weight of gonads. How about a confidence interval? Could we use a confidence interval here? You betcha! Confidence interval for the difference between means The general formula for a two-tailed confidence interval for normally distributed parameters is: “Some parameter estimate ± tα/2 * standard error” The difference between the means ( δ = ( μ1 − μ2 ) ) is another parameter for which we may wish to calculate a confidence interval. For the estimate of the difference between μ1 and μ 2 we have already determined that for α=0.05 we have tα/2 = 2.228 with 10 d.f.. ( ) We also found the estimate of the difference d = (Y1 − Y2 ) is 41.7 and the std error of ( ) the difference, S d = SY1 −Y2 , is 7.24. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 107 The confidence interval is then d ± ta SY or 41.7± 2.228(7.24) and 41.7 ± 16.13. The 2 probability statement is P(d − ta Sd ≤ μ1 − μ2 ≤ d + ta Sd ) = 1 − α 2 2 P(25.57 ≤ μ1 − μ2 ≤ 57.83) = 0.95 Note that the interval does not contain zero. This observation is equivalent to doing a test of hypothesis against zero. Some statistical software calculates intervals instead of doing hypothesis tests. This works for hypothesis tests against zero and is advantageous if the hypothesized value of δ is something other than zero. When software automatically tests for differences it almost always test for differences from zero. Summary Testing for differences between two means can be done with the two-sample t-test or two sample Z test if variances are known. For two independently sampled populations the variance will be 2 S12 S 2 + , the variance of a linear n1 n2 combination of the means. The problem is the d.f. for this expression are not known. Degrees of freedom are known if the variances can be pooled, so we start our two-sample t-test with an F-test. Variances are pooled, if not significantly different, by calculating a weighted mean. 2 Sp = γ 1S12 + γ 2 S22 SS1 + SS2 SS1 + SS2 = = (n1 − 1) + (n2 − 1) γ1 + γ 2 γ1 + γ 2 2 The error variance is given by S p 1 1 (n + n ) 1 The standard error is 2 Sp 2 1 ( n + n1 ) 1 2 If the variances cannot be pooled, the two-sample t-test can still be done, and degrees of freedom are approximated with Satterthwaite’s approximation. Once the standard error is calculated, the test proceeds as any other t-test. Confidence intervals can also be calculated in lieu of doing the t-test. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 108 SAS example 4 – PROC TTEST We would normally do two-sample t-tests with the SAS procedure called PROC TTEST. This procedure has the structure proc ttest data = dataset name; class group variable; var variable of interest; The PROC statement functions like any other proc statement. The VARIABLE or VAR statement works the same as in other procedures we have seen. The CLASS statement is new. It specifies the variable that will allow SAS to distinguish between observations from the two groups to be tested. PROC TTEST Example 4a Example from Steele & Torrie (1980) Table 5.2. Corn silage was fed to sheep and steers. The objective was to determine if the percent digestibility differed for the two types of animals. Example 1: Raw data Obs 1 2 3 4 5 6 7 Sheep 57.8 56.2 61.9 54.4 53.6 56.4 53.2 Steers 64.2 58.7 63.1 62.5 59.8 59.2 Unfortunately this data is not structured properly for PROC TTEST. It has two variables (sheep and steers) giving the percent digestibility for sheep and steers separately. We need one variable with percent digestibility for both and a second variable specifying the type of animal. This can be fixed in the data step. In program note the following; Change of the data structure from multivarite to univariate style. The proc ttest statement Note intermediate statistics, especially the confidence intervals for both means and standard deviations The test the hypothesis for both means and variances are discussed below. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 109 Interpreting the SAS Output First examine the last lines Equality of Variances Variable Method percent Folded F Num DF 6 Den DF 5 F Value 1.70 Pr > F 0.5764 SAS is testing the Equality of Variances ( H0: σ1 = σ2 ). Notice that SAS provides a “folded F”. Most SAS F tests are one-tailed, but this is one of the few places that SAS does a two-tailed F test (a “folded F”). SAS gives the d.f. and the probability of a greater F by random chance. We would usually set α = 0.05, and would reject any P-value less than this and fail to reject any value greater than this. In this case we fail to reject. 2 2 Exactly what did SAS do with the “folded F”. Recall the two-tailed F allows you to place the larger F in the numerator, but you must use α/2 as a critical value. This is what SAS has done. When SAS gave the P value of 0.5764, it is a two tailed P value. So we conclude that the variances do not differ. If doing the test by hand we would now pool the variances to calculate the standard error. NOW, look at the PROC TTEST output, above the F test. t-tests Here SAS provides results for both types of test, one calculated using equal variances and another done with unequal variances and the user chooses which is appropriate for their case. Since we had equal variances according to the F test we just examined, we would use the first line. Variable percent percent Method Pooled Satterthwaite Variances Equal Unequal DF 11 10.9 t Value -3.34 -3.42 Pr > |t| 0.0065 0.0058 From the first line we see that the calculated t value was -3.3442 with 11 d.f. The probability of getting a greater value by random chance (i. e. the H0) is 0.0065, not very likely. We would conclude that there are statistically significant differences between the two animals in terms of silage digestibility. What about the other line, for unequal variances? Variable percent percent Method Pooled Satterthwaite Variances Equal Unequal DF 11 10.9 t Value -3.34 -3.42 Pr > |t| 0.0065 0.0058 This line would be used if we rejected the F test of equal variances. In this particular case the conclusion would be the same since we would also reject H0. Notice that the d.f. for the calculations for unequal variance are not integer. This is because Satterthwaite's approximation was used to estimate the variances. Since the variances were actually “equal”, the estimate is close to (n1–1) + (n2–1) = 11. From the SAS STATISTICS output we can conclude that the digestibility is higher for the steers, by about 5 percent. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 110 Example 4b: from Steele & Torrie (1980) Table 5.6 Determine if there is a difference in the percent fine gravel found in surface soils. The data is from a study comparing characteristics of soil categorized as “good” or “poor”. The raw data Good 5.9 3.8 6.5 18.3 18.2 16.1 7.6 Poor 7.6 0.4 1.1 3.2 6.5 4.1 4.7 Percent fine sand in good and poor soils This data is also in the form of two separate variables and must be adjusted to accommodate the data structure needed by PROC TTEST. In program note the following; Change of the data structure from multivarite to univariate style. The proc ttest statement Note intermediate statistics, especially the confidence intervals for both means and standard deviations The test the hypothesis for both means and variances are discussed below. In this case the variances are not quite different, though it is a close call and there is a pretty good chance of Type II error. Fortunately, the result is the same with either test. If we go strictly by the “α = 0.05” decision rule that we usually use, we would fail to reject the hypothesis of equal variances. We would then examine the line for equal variances and conclude that there was indeed a difference between the good and poor quality soil in terms of the fine sand present. The intermediate statistics show that the good soil had about 7 percent more fine sand. Statistics Variable soilqual percent good percent poor N 7 7 Lower CL Mean 5.0559 1.5048 Mean 10.914 3.9429 Upper CL Lower CL Mean Std Dev 16.773 4.0819 6.3809 1.6987 Std Dev 6.3344 2.6362 Example 4c: Steele & Torrie (1980) Exercise 5.5.6 The weights in grams of 10 male and 10 female juvenile ring-necked pheasants trapped in January in Wisconsin are given. Test the H0 that males were 350 grams heavier than females. In this case the data is in the form needed, one variable for weight and one for sex. Raw data James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 111 Sex Male Male Male Male Male Male Male Male Male Male Sex Weight Female 1061 Female 1065 Female 1092 Female 1017 Female 1021 Female 1138 Female 1143 Female 1094 Female 1270 Female 1028 Weight 1293 1380 1614 1497 1340 1643 1466 1627 1383 1711 There was, however, one little problem with this analysis. The hypothesis requested was not simply H0: μmale = μfemale, it was H0: μmale = μfemale + 350, or H0: μmale μfemale =350. SAS does not have provisions to specify an alternative other than zero, but if we subtract 350 from the males, we could then test for equality. We know from our discussion of transformations that the variances will be unaffected. So we create a new variable called adjwt for “adjusted weight”. See the calculations in the SAS program. ... 8 9 10 11 12 13 ... Female Female Female Male Male Male 1094 1270 1028 1293 1380 1614 1094 1270 1028 943 1030 1264 See SAS OUTPUT Appendix 4c Note intermediate statistics Note test the hypothesis for both means and variances. Note that in the PROC TTEST there is another calculation in the statistics. This is the “Diff” which also gets its calculated value and confidence interval. This difference is not a paired difference. Statistics Variable sex AdjWT Female AdjWT Male AdjWT Diff (1-2) N 10 10 Lower CL Mean 1038.1 1041 -162 Mean 1092.9 1145.4 -52.5 Upper CL Mean 1147.7 1249.8 56.989 Interpretation of the SAS output First, we fail to reject H0: σ1 = σ2 again (barely). But the weights do not differ either way (examining Pr > |t|). So we fail to reject H0: μ1 = μ2, but remember we added 350 to the males. So actually we conclude that the males are greater by an amount not different from 350 grams. 2 2 James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 112 A special case – the paired t-test One last case. In some circumstances the observations are not separate and distinct in the two samples. Sometimes they can be paired. This can be good, adding power to the design. For example: We want to test toothpaste. We may pair on the basis of twins, or siblings in assigning the toothpaste treatments. We want to compare deodorants or hand lotions. We assign one arm or hand to one brand an the other to another brand. In may drug and pharmaceutical studies done on rats or rabbits the treatments are paired on litter mates. So, how does this pairing affect our analysis? The analysis is done by subtracting one category of the pair from the other category of the pair. In this way the pair values become difference values. As a result, the “two-sample t-test” of pairs becomes a one-sample t-test. So, in many ways the paired t-test is easier. Example: We already did an example of this type of analysis. Recall the Lucerne flowers whose seeds we compared for flowers at the top and bottom of the plant. This was paired and we took differences. The difference was “1” with a standard error of “0.5055”. SAS example 2c examined previously Tests for Location: Mu0=0 Test -Statistic-----p Value-----Student's t t 1.978141 Pr > |t| 0.0793 Sign M 2 Pr >= |M| 0.3438 Signed Rank S 19.5 Pr >= |S| 0.0469 So the paired t-test is an alternative analysis for certain data structures. It is better because it eliminates the “between pair” variation and compares the treatments “within pairs”. This reduces variance. However, note that the degrees of freedom are also cut in half. If the basis for pairing is not good, the variance is not reduced, but degrees of freedom are lost. Summary The SAS PROC TTEST provides all of the tests needed for two-sample t-tests. It provides the test of variance we need to start with, and it provides two alternative calculations, one for equal variance and one for unequal variance. We choose the appropriate case. We also saw that several previous calculations, such as confidence intervals and sample size, are also feasible for the two-sample t-test case. Paired t-test, where there is a good strong basis for pairing observations, can gain power by reducing between pair variation. However, if the basis for pairing is not good, we lose degrees of freedom and power. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 113 Calculating a needed sample size The Z-test and t-test use a similar formula. Z = Y − μ0 σ2 = Y − μ0 σ n √n Let’s suppose we know everything in the formula except n. Do we really? Maybe not, but we can get some pretty good estimates. Call the numerator ( Y − 0 ) a difference, d . It is some mean difference we want to be able to detect, so d = Y − μ0 The value σ2 is a variance, the variance of the data that we will be sampling. We need this variance, or an estimate, S2. So we alter the formula to read. Z = d σ 2 = σ d n √n What other values do we know? Do we know Z? No, but we know what Z we need to obtain significance. If we are doing a 2-tailed test, and we set α = 0.05, then Z will be 1.96. Any calculated value larger will be “more significant”, any value smaller will not be significant. So, if we want to detect significance at the 5% level, we can state that ... We will get a significant difference if Z = d σ2 n = σ d ≥ Zα √n 2 We square both sides and solve for n. Then we will also SHOULD get a significant difference if 2 Zα 2σ 2 2 . Then, if we know the values of d ,σ and Z , we can solve the formula for n. If n≥ 2 d we are going to use a Z distribution we should have a known value of the variance (σ2). If the variance is calculated from the sample, use the t distribution. This would give us the sample size needed to obtain “significance”, in accordance with whatever Z value is chosen. Generic Example Try an example where d =2 σ = 5, σ2 = 25 Z = 1.96 So what value of n would detect this difference with this variance and produce a value of Z equal to 1.96 (or greater)? n≥ 2 Zα 2σ 2 d 2 = (1.962 * 52)/22 = 3.8416(25)/4 = 24.01 since n ≥ 24.01, round up to 25. James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 114 Answer, n ≥ 25 would produce significant results. Guaranteed? Wouldn't this always produce significant results? Theoretically, within the limits of statistical probability of error, yes, but only if the difference was really 2. If the null hypothesis (no difference, μ = μ0) was really true and we took larger samples, then we would get a better estimate of 0, and may never show significance. Considering Type II Error The formula we have seen contains only Zα/2 or tα/2, depending on whether we have σ2 or S2. However, a fuller version can contain consideration of the probability of Type II error (b). We can often use Z when working with very large samples. Remember that to work with TYPE II or β error we need to know the mean of the real distribution. However, in calculating sample size we have a difference, d = Y − μ0 . So we can include consideration of type II error and power in calculating the sample size. The consideration of β error would be done by adding another Z or t for the error rate. Notice that below I switch (t + t ) 2 S 2 to t distributions and use n ≥ α 2 2β . d Other examples We have done a number of tests, some yielding significant results and others not. If a test yields significant results (showing a significant difference between the observed and hypothesized values), then we don't need to examine sample size because the sample was big enough. However, some utility may be made of this information if we FAIL to reject the null hypothesis. Note: Some textbooks give only the formula I originally gave for Z, without the β error (t + t ) 2 S 2 consideration. What is the power if you use the formula omitting tβ from n ≥ α 2 2β ? d If you set tβ equal to zero the power is 0.50 and there is a 50% chance of making a Type II error. An example with t values and β error included Recall the Rhesus monkey experiment. We hypothesized no effect of a drug, and with a sample size of 10 were unable to reject the null hypothesis. However, we did observe a difference of +0.8 change in blood pressure after administering the drug. What if this change was real? What if we made a Type II error? How large a sample would we need to test for a difference of 0.8 if we also wanted 90% power? So we want to know how large a sample we would need to get significance at the α=0.05 level if power was 0.90. In this case β = 0.10. To do this calculation we need a two tailed α and a one tailed β (we know that the observed change is +0.8). We will estimate the variance from the sample so we will use the t distribution. However, since we don't know the sample size, we don't know the degrees of freedom! Since we do not know the d.f. we will start off with some “reasonable” values for tα and tβ. Then after we solve the equation we will have an estimate of the d.f. We can solve again with better values of tα and tβ, and refine our estimate. After our second calculation we have even better estimates of d.f., so James P. Geaghan Copyright 2010 Statistical Methods I (EXST 7005) Page 115 we get new values for tα and tβ and redo the calculations, etc, etc, until the estimate stabilizes. So we will approximate to start with. Given the information, α = 0.05, so the value of t will be approximately 2 β = 0.10, so the value of t will be roughly 1.3 d = Y − μ0 = 0.8 from our previous results, S2 = 9.0667 from our previous results. n≥ (tα 2 + tβ ) 2 S 2 d 2 We do the calculations. n ≥ (2 + 1.3) 2 9.0667 (3.3)2 9.0667 = = 154.27 (0.8) 2 0.64 And now we have an estimate of n and the degrees of freedom, n = 155 and d.f. =154. We can refine our values for tα/2 and tb. for d.f. = 154, tα/2 = 1.97 approx. for d.f. = 154, tb = 1.287 approx. So we redo the calculations with improved estimates. n≥ (1.97 + 1.287) 2 9.0667 (3.257) 2 9.0667 = = 150.28 (0.8) 2 0.64 A little improvement! If we saw much change in the estimate of n, we could recalculate as often as necessary. Usually 3 or 4 recalculations are enough. Summary We developed a formula for calculating sample sizen that can be adapted for either t or Z (t + t ) 2 S 2 distributions, n ≥ α 2 2β d We learned that we need input values of α, β, S2 (or σ2 for Z tests) and a value for the size of the difference to be detected ( d ). For the t-test, the first calculation was only approximate since we didn't know the degrees of freedom. However, after the initial calculation the estimate could be improved by the iterative recalculation of the estimate of n until it was stable. James P. Geaghan Copyright 2010 ...
View Full Document

This note was uploaded on 12/29/2011 for the course EXST 7005 taught by Professor Geaghan,j during the Fall '08 term at LSU.

Ask a homework question - tutors are online