EXST7015 Fall2011 Lect19

EXST7015 Fall2011 Lect19 - Statistical Techniques II Page...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Statistical Techniques II Page 83 Y1 Y2 So, we would test each pair of means using the two sample t-test as t S ANOVA, using the MSE as our variance estimate, we have t 2 p ( 1 1 n1 n2 (Y1 Y2 ) 1 1 MSE n1 n2 ) . For . If the design is balanced this simplifies this to t (Y1 Y2 ) . 2 M SE n Notice that if the calculated value of t is greater than the tabular value of t, we would reject the null hypothesis. To the contrary, if the calculated value of t is less than the tabular value we would fail to reject. Call the tabular value t*, and write the case for rejection of H0 as t (Y1 Y2 ) So we would reject H0 if t (Y1 Y2 ) (Y1 Y2 ) t 2 MSE n 2 MSE 2 MSE . n or if t 2 MSE (Y1 Y2 ) or n n . So, for any difference (Y1 Y2 ) that is greater than t 2 MSE n we find the difference between the means to be statistically significant (reject H0), and for any value less than this value we find the difference to be consistent with the null hypothesis. Right? This value of t 2 MSE n is what R. A. Fisher called the “Least Significant Difference”, commonly called the LSD (not to be confused with the Latin Square Design = LSD). LSD tcritical MSE (n1 n1 ) 1 2 or LSD tcritical SY1 Y2 This value is the exact width of an interval Y1 Y2 which would give a t-test equal to tcritical. Any larger values would be “significant” and any smaller values would not. This LSD tcritical SY Y is called the “Least Significant Difference”. 1 2 We calculate this value for each pair of differences and if the observed difference is less, the treatments are “not significantly different”. If greater they are “significantly different”. One last detail. I have used the simpler version of the variance assuming that n1 = n2. If the experiment is unbalanced (i.e. there are unequal numbers of observations in the treatment levels) then the value is MSE 1 1 . n1 n2 James P. Geaghan - Copyright 2011 Statistical Techniques II Page 84 The property of balance is nice because all of the pairwise tests have the same sample sizes and the same standard error. However, balance is not necessary. For an unbalanced design we must calculate the standard error ( MSE 1 1 ) for each n1 n2 pairwise test because they will be different. This is the first of our post ANOVA tests, it is called the “LSD”. But hey, wait a minute! Didn't Fisher invent ANOVA in the first place to avoid doing a bunch of separate t-tests? So, now we are doing a bunch of separate t-tests. What is wrong with this picture? So, this is Fisher’s solution. When we do a bunch of separate t-tests, we don't know if there are any real differences at the level. After we do the ANOVA test we know that there are some differences. So we only do the LSD if the ANOVA says that there are actually differences, otherwise, don't do the LSD. This is called “Fisher's Protected LSD”: we use the LSD ONLY if the ANOVA shows differences, otherwise we are NOT justified in using the LSD. Makes sense. But there were still a lot of nervous statisticians looking for something a little better. As a result there are MANY alternative calculations. We will discuss the “classic” solutions. This least significant difference calculation can be used to either do pairwise tests on observed differences or to place a confidence interval on observed differences. The LSD can be done in SAS in one of two ways. The MEANS statement produces a range test (LINES option) or confidence intervals (CLDIFF option), while the LSMEANS statement gives pairwise comparisons. Other Post ANOVA tests Basically, we calculate the LSD with our chosen value of . We then do our mean comparisons. Each test has a pairwise error rate of . We have already seen one alternative, the Bonferroni adjustment. If we do 5 tests, or 10 tests, our error rate is no more than 5(/2) or 10(/2). Generally, for “g” tests our error rate is no more than g(/2). To keep an experiment wide error rate of , we simply do each comparison using a t value for an a equal to /2g. For two tailed tests (which are the most common) we do each test at /2 and the Bonferroni test would use a t for an error rate of . One tailed tests are possible, but usually only done with 2g Dunnett’s test discussed below. The Bonferroni adjustment is fine if we are only doing a few tests. However, it is an upper boundary of the error, the highest that the error can be. The real probability of error is actually less, perhaps much less. So if we are doing very many tests, Bonferroni gets very conservative, giving us an actual error rate much lower than the we really want. So we seek alternatives. The major applications are Tukey's and Scheffé's. We will also consider Dunnett's and Duncan's since they are fairly commonly. Each of the tests is discussed below. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 85 The LSD has an probability of error on each and every test or for each comparison. It is called to as a comparisonwise error rate. The whole idea of ANOVA is to give a probability of error that is for the whole experiment, so, much work in statistics has been dedicated to this problem. Some of the most common and popular alternatives are discussed below. Most of these are also discussed in your textbook. The LSD is the LEAST conservative of those discussed, meaning it is the one most likely to detect a difference and it is also the one most likely to make a Type I error when it finds a difference. However, since it is unlikely to miss a difference that is real, it is also the most powerful. The probability distribution used to produce the LSD is the t distribution. Bonferroni's adjustment. Bonferroni pointed out that in doing k tests, each at a probability of Type I error equal to , the overall experimentwise probability of Type I error will be NO MORE than k*, where k is the number of tests. Therefore, if we do 7 tests, each at =0.05, the overall rate of error will be NO MORE than =.35, or 35%. So, if we want to do 7 tests and keep an error rate of 5% overall, we can do each individual test at a rate of /k = 0.055/7 = 0.007143. For the 7 tests we have an overall rate of 7*0.007143 = 0.05. The probability distribution used to produce the LSD is the t distribution. Duncan's multiple range test. This test is intended to give groupings of means that are not significantly different among themselves. The error rate is for each group, and has sometimes been called a familywise error rate. This is done in a manner similar to Bonferroni, except the calculation used to calculate the error rate is [1–(1–)r–1] instead of the sum of . For comparing two means that are r steps apart, where for adjacent means r=2. Two means separated by 3 other means would have r = 5, and the error rate would be [1–(1–)r–1] = [1–(1– 0.05)4] = 0.1855. The value of a needed to keep an error rate of is the reverse of this calculation, [1–(1–0.05)1/4] = 0.0127. The Student-Neuman-Keuls test is a similar test to Duncan’s, controlling the familywise error k /2 rate. The value of is calculated as 1 1 Tukey's adjustment. This test seems to be most appropriate in most cases since it keeps an error rate for all possible pairwise tests for the whole experiment, which is often what an investigator wants to do. This test basically allows for all pairwise tests and keeps an experimentwise error rate of for all pairwise tests. The Tukey adjustment allows for, Tukey developed his own tables (see Appendix table A.7 in your book, “Percentage points of the studentized range”). For “t” treatments and a given error degrees of freedom the table will provide 5% and 1% error rates that give an experimentwise rate of Type I error. Note SAS puts “HSD” by Tukey's. This stands for “Honest Significant Difference”. Scheffé's adjustment This test is the most conservative. It allows the investigator to do all possible tests, and still maintain an experimentwise error rate of . “All possible” tests includes not only all pairwise tests, but comparisons of all possible combinations of treatments 2 4 5 , with other combinations of treatments (e.g. H 0 : 1 2 2 3 CONTRASTS will be covered later). The calculation is based on a square root of the F distribution, and can be used for range type tests or confidence intervals. The test is more general than the others mentioned, for the special case of pairwise comparisons. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 86 The critical value for Scheffé’s test is based on the F distribution. The statistic is given by t – 1 * Ft 1, n ( t 1) for a balanced design with t treatments and n observations per treatment. This test is appropriate for “data dredging”. Place the post-hoc tests above in order from the one most likely to detect a difference (and the one most likely to be wrong) to the one least likely to detect a difference (and the one least likely to be wrong). LSD is first, followed by Duncan's test, Tukey's and finally Scheffé's. Dunnett's is a special test that is similar to Tukey's, but for a specific purpose, so it does not fit well in the ranking. The Bonferroni approach produces an upper bound on the error rate, so it is conservative for a given number of tests. It is a useful approach if you want to do a few tests, fewer than allowed by one of the others (e.g. you may want to do just a few and not all possible pairwise). In this case, the Bonferroni may be better. Note that if you want to do a couple of pairwise tests you can calculate Bonferroni and compare the critical value to Tukey's. Tukey's is for all pairwise tests and would be conservative for fewer than all pairwise tests. Bonferroni may be overly conservative because it is a bound. For other sets of tests including some that are not pairwise, compare Bonferroni to Scheffé. Post ANOVA test comparison Comparisonwise error rate: LSD Experimentwise error rate: Tukey (all pairwise), Bonferroni (selected tests), Scheffé (all possible contrasts). When doing pairwise tests, the LSD is the test most likely to find differences, and the one most likely to be wrong when it finds a difference. However, power is the ability to find differences, so although error prone in the type I error sense, the LSD is the most powerful of the tests. Scheffé is the test least likely to find a difference, and least likely to be wrong with respect to type I error. Other tests that are used in particular circumstances. We will mention only Dunnett's, which is used to compare one treatment (usually a control) to all other treatments. This is the only post hoc test in SAS that has one-tailed tests (e.g. DUNNETTL and DUNNETTU). Applying Post ANOVA test comparisons All of these tests can be expressed in one of two ways. If the analysis is BALANCED, then there is a popular expression of pairwise tests that starts with ranked means. Suppose we calculate a value of the LSD equal to 8, and we have sorted the means of treatment levels and have 5, 14, 17, 23, 29, and 38. Treatment Level 3 1 6 5 2 4 Mean 38 29 23 17 14 5 Groups If the critical value of the LSD = 8 then means below that differ by less than 8 do not differ statistically. This is represented by giving them a common letter so they share a letter. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 87 For an LSD critical value of 8 Treatment Level 3 1 6 5 2 4 Mean 38 29 23 17 14 5 Groups A B B C D C D E Same means compared with a Tukey adjusted critical value of 10 Treatment Level 3 1 6 5 2 4 Mean 38 29 23 17 14 5 Groups A A B B C C C D Same means compared with a Scheffé adjusted critical value of 15 Treatment Level 3 1 6 5 2 4 Mean 38 29 23 17 14 5 Groups A A B A B B C B C C SAS Example (Appendix 12) Note the test of homogeneity of variance (random or repeated statement) . Test the effects of TREATMENTS. Post hoc tests : They can be done from MIXED using the LSMeans statement. In GLM either the MEANS or LSMeans statement can be used. SAS statements results to compare (post ANOVA or post hoc tests ) Results with the LSD Results with Tukey's. Results with Scheffé's. Results with Dunnett's. NOTE that normally only one post-ANOVA examination would be done. We have done several here in order to compare. Note the use of a macro to get sorted and labeled means to indicate significant differences. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 88 Comparison of ranked means works very well if the analysis is balanced. If the analysis is not balanced there can be a problem. It is possible that means that are close together are significantly different, while means that have a greater difference are not significantly different. Where variance = MSE 1 1 and MSE = 25 n n tmt 1 2 3 mean 18 13 12 1 n 5 100 5 2 test 1v2 2 v3 1v3 diff 5 1 6 se t value 2.2913 2.1822 2.2913 0.4364 3.1623 1.8974 d.f. 103 103 8 P value 0.02398 0.66343 0.09435 For unbalanced tests the best way to check for difference is to calculate a confidence interval for each mean and see if the confidence intervals overlap. By default, SAS will use this approach for unbalanced means. Post-ANOVA tests Having rejected the Null hypothesis in Analysis of Variance we would usually wish to determine how the treatment levels differ from each other. This is the “post-ANOVA” part of the analysis. These tests fall into two general categories. We have already discussed the post hoc tests (LSD, Tukey, Scheffé, Duncan's, Dunnett's, etc.) . These tests are often (usually?) done with no a priori hypotheses in mind. That means we do not have any particular comparisons in mind before doing the experiment; we want to examine many, or all, levels of the treatments for differences from one another, and each test is done with a probability of error. The use of an experimentwise error rate is intended to permit these a posteriori comparisons without inflating the error rate for the analysis. We will now discuss a priori tests or pre-planned comparisons (contrasts). These a priori tests are better in many ways because the researcher plans on doing particular tests before the data is gathered. If we dedicate 1 d.f. to each one we generally feel comfortable doing each test at some specified level of alpha, usually 0.05. However, since multiple tests do entail risks of higher experiment wide error rates, it would not be unreasonable to apply some technique, like Bonferroni's adjustment, to insure an experimentwise error rate of the desired level of alpha (). When we want some lesser number of comparisons, and they are determined a priori (without looking at the data), then we can use a less stringent criteria. We generally feel comfortable with one test per degree of freedom at some specified level of alpha (), just as we did in regression (looking at each regression coefficient with an a level of error). James P. Geaghan - Copyright 2011 ...
View Full Document

This note was uploaded on 12/29/2011 for the course EXST 7015 taught by Professor Wang,j during the Fall '08 term at LSU.

Ask a homework question - tutors are online