350LectureL_Anova1_Student

350LectureL_Anova1_Student - Lecture L: One-way Analysis of...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lecture L: One-way Analysis of Variance (ANOVA) Text Sections 9.1, 9.2, and 9.3 Testing for difference in the means among several (>2) groups Example A (Cereal) Acme Food Company wished to test five different package designs for a new breakfast cereal. Fifty stores, with approximately equal sales volumes, were selected as the experimental units. Each store was randomly assigned one of the package designs, with each package design assigned to ten stores. Sales, in numbers of cases, were observed for the study period. x s s2 Design A 73 83 78 86 78 78 79 76 72 72 77.5 4.57651 20.94444 Design B 58 82 70 76 81 104 67 79 67 69 75.3 12.5614 157.7889 Design C 82 67 94 83 80 71 104 87 73 77 81.8 11.1036 123.2889 Design D 94 85 93 79 69 81 103 114 77 87 88.2 13.2648 175.9556 Design E 104 81 88 94 101 81 80 91 90 86 89.6 8.26236 68.26667 Combined 82.48 11.5339 133.0302 120 110 100 90 80 70 60 Design A Test: H0: μ1 = μ2 = . . . = μk Design B vs. Design C Design D Design E Ha: not all the means are equal. Explanatory variable (a.k.a. Independent variable, Predictor Variable): a categorical variable. In ANOVA, also called treatment or factor. Each treatment/factor has various levels or treatment levels Response variable (a.k.a. Dependent variable): a quantitative variable. Knapp Stat 350 Spring 2009 Lecture L: 1-way ANOVA Page 1 of 1 ⎛ between group variation ⎞ Test statistic = ⎜ ⎟ ⎝ within group variation ⎠ 100 140 120 80 100 60 80 60 40 40 20 20 0 A B C D 0 1 2 3 4 One-way Analysis of Variance (ANOVA) a.k.a. Single-Factor ANOVA Design: completely randomized design Notation Assume we are testing for differences among k treatment levels xi,j = the jth observation in the ith treatment level. n1, n2, . . . , nk are the sample sizes in groups 1, 2, . . ., k respectively n = n1 + n2 + . . . + nk is the total sample size x1 , x2 ,… , xk are the sample means in groups 1, 2, . . . , k respectively. 2 s12 , s2 ,… , sk2 are the sample variances in groups 1, 2, . . . , k respectively x is the overall average (the average of all n observations) Knapp Stat 350 Spring 2009 Lecture L: 1-way ANOVA Page 2 of 2 Assumptions 2 (1) The variances are the same across the k populations: σ 12 = σ 2 = (2) Each of the k populations follow a normal distribution Treatment Sum of Squares: SSTr ) ( ( 2 ) 2 SSTr = n1 x1 − x + n2 x2 − x + ( + nk xk − x ) = σ k2 2 Error Sum of Squares: SSE n1 SSE = ∑ ( x1 j − x1 ) n2 nk + ∑ ( x2 j − x2 ) + j =1 + ∑ ( xkj − xk ) 2 + ( n2 − 1) s2 2 2 + ( nk − 1) sk 2 j =1 = ( n1 − 1) s12 + 2 j =1 Total Sum of Squares: SST SST = SSTr + SSE k ni ( SST = ∑∑ xij − x i =1 j =1 ) 2 = ( n − 1) s 2 Degrees of Freedom • Total degrees of freedom = n -1 (associated with SST) • Treatment degrees of freedom = k – 1 (associated with SSTr) • Error degrees of freedom = n – k (associated with SSE) Mean square for treatments (between-groups): MSTr SSTr MSTr = k −1 Mean square error (within-groups): MSE SSE MSE = n−k Test Statistic: F MSTr SSTr ( k − 1) F= = MSE SSE ( n − k ) Under the null hypothesis, this statistic has an F-distribution with k – 1 numerator degrees of freedom n – k denominator degrees of freedom Knapp Stat 350 Spring 2009 Lecture L: 1-way ANOVA Page 3 of 3 Example Data Set B: A simple data set for illustration A 73 83 78 mean variance 78 25 B 58 82 70 76 81 73.4 96.8 C 61 105 59 103 Overall 82 77.41667 646.6667 231.1742 SSTr SSE SST Knapp Stat 350 Spring 2009 Lecture L: 1-way ANOVA Page 4 of 4 Example B. (continued) A 73 83 78 mean variance 78 25 B 58 82 70 76 81 73.4 96.8 C 61 105 59 103 Overall 82 77.41667 646.6667 231.1742 MSTr MSE Test-Statistic F ANOVA Table – summarizes results of an ANOVA Source of variation Treatments Error Total Knapp Stat 350 Spring 2009 df k–1 n–k n–1 SS MS F SSTr MSTr MSTr/MSE SSE MSE SST Lecture L: 1-way ANOVA Page 5 of 5 SAS code for Example A (Cereal) data cereal; input design $ cases; cards; A 73 A 83 A 78 A 86 A 78 A 78 A 79 A 76 A 72 A 72 B 58 B 82 B 70 B 76 B 81 B 104 B 67 B 79 B 67 B 69 C 82 C 67 C 94 C 83 C 80 C 71 C 104 C 87 C 73 C 77 D 94 D 85 D 93 D 79 D 69 D 81 D 103 D 114 D 77 D 87 E 104 E 81 E 88 E 94 E 101 E 81 E 80 E 91 E 90 E 86 ; proc anova data=cereal; class design; model cases = design; means design; run; Knapp Stat 350 Spring 2009 Lecture L: 1-way ANOVA Page 6 of 6 SAS Output For Example A (Cereal) The SAS System 1 The ANOVA Procedure Class Level Information Class Levels design 5 Values ABCDE Number of Observations Read Number of Observations Used 50 50 The SAS System 2 The ANOVA Procedure Dependent Variable: cases DF Sum of Squares Mean Square F Value Pr > F Model 4 1602.280000 400.570000 3.67 0.0114 Error 45 4916.200000 109.248889 Corrected Total 49 6518.480000 Source R-Square Coeff Var Root MSE cases Mean 0.245806 12.67243 10.45222 82.48000 Source DF Anova SS Mean Square F Value Pr > F design 4 1602.280000 400.570000 3.67 0.0114 The SAS System 3 The ANOVA Procedure Level of design A B C D E Knapp Stat 350 Spring 2009 N 10 10 10 10 10 ------------cases-----------Mean Std Dev 77.5000000 75.3000000 81.8000000 88.2000000 89.6000000 4.5765101 12.5614047 11.1035530 13.2648240 8.2623645 Lecture L: 1-way ANOVA Page 7 of 7 Main Effects Plots – plot of the mean for each treatment group Example A Data Set: Main Effects Plot Data Means 90.0 87.5 Mean 85.0 82.5 80.0 77.5 75.0 1 2 3 group 4 5 Interval Plots - plot of the mean for each treatment group with bars representing a confidence interval for the mean Example A Data Set: Interval Plot 95% CI for the Mean 100 95 Combined 90 85 80 75 70 65 1 Knapp Stat 350 Spring 2009 2 3 group 4 5 Lecture L: 1-way ANOVA Page 8 of 8 Multiple Comparisons Procedures (1) Tukey Test (2) Dunnett's Method – Multiple Comparisons to a Control ** only do these if the ANOVA test was significant Multiple Comparison Procedure: Tukey Test Family-wise error rate – for a significance level α, the Tukey procedure assures that the probability of making a Type I error is at most α for the entire collection of pairwise hypothesis tests. How the Tukey test works • The test determines how far apart a pair of treatment means would have to be to be "significantly different", based on the significance level α, the MSE from the ANOVA test, and the sample size. • Let T = the minimum difference between treatment means that we will call "significant". This is the "threshold value". If a pair of treatment means differ by more than T, we will conclude that those groups have significantly different means. MSE • T = qα ni o qα can be found in Appendix Table IX, based on k – the number of treatment groups Error df = n – k for a one-way ANOVA o ni is the number of observations in each group • For each pair of treatments see if xi − x j > T . If so, declare that Treatment i and Treatment j have significantly different means. Conveying results of Tukey Test A common approach is to line-up the sample means in increasing (or decreasing) order. Then draw bars across groups with non-significantly different means Another option is to use letters – treatments shown with the same letter are not significantly different Knapp Stat 350 Spring 2009 Lecture L: 1-way ANOVA Page 9 of 9 Example A – Tukey Test Finding qα: T= Compare all pairs of means x: Design B 75.3 Design A 77.5 Design C 81.8 Design D 88.2 Design E 89.6 Difference in Means Design B B A C D E A 2.2 C 6.5 4.3 D 12.9 10.7 6.4 E 14.3 12.1 7.8 1.4 This is how results should be displayed: Design B 75.3 Design A 77.5 Design C 81.8 Design D 88.2 Design E 89.6 SAS Code for Tukey Test on Example A proc anova data=cereal; class design; model cases = design; means design/tukey alpha=0.05; run; Note: if your design is unbalanced, to get the same output, you have to use the code: means design/lines tukey; to get equivalent output Knapp Stat 350 Spring 2009 Lecture L: 1-way ANOVA Page 10 of 10 SAS Output for Tukey Test on Example A The ANOVA Procedure Tukey's Studentized Range (HSD) Test for cases NOTE: This test controls the Type I experimentwise error rate, but it generally has a higher Type II error rate than REGWQ. Alpha 0.05 Error Degrees of Freedom 45 Error Mean Square 109.2489 Critical Value of Studentized Range 4.01842 Minimum Significant Difference 13.282 Means with the same letter are not significantly different. Tukey Grouping Mean N A A A A A A A 89.600 10 E 88.200 10 D 81.800 10 C 77.500 10 A 75.300 10 B B B B B B B B design Example A data set – bar chart displaying means of each group 100 95 90 85 80 75 70 65 60 55 50 AB 4 5 AB A AB 2 Response B 1 3 Group Letters represent results of Tukey Test error bars are the standard error of the means Knapp Stat 350 Spring 2009 Lecture L: 1-way ANOVA Page 11 of 11 Multiple Comparison Procedure: Dunnett's Method Many scientific studies include a "control" If you don't care whether the various treatment groups differ from each other, only whether they differ from the control, Dunnett's test is appropriate. Note that for k treatment groups, there are k(k – 1)/2 pairs of groups to test, but if we only care about testing each group to the control there are only k – 1 tests. How Dunnett's Method works As with the Tukey test, we will calculate a threshold (T). Groups with means more than T from the control group mean will be declared significantly different from the control group. ⎛1 1⎞ T = tα ,( k −1, n − k ) MSE ⎜ + ⎟ ⎝ ni nc ⎠ MSE is the Mean Squared Error from the ANOVA nc = the number of observations in the control group ni = the number of observations in the ith group tα ,( k −1, n − k ) is the critical value from Appendix X. Dunnett's t, based on k – 1 and n – k degrees of freedom Example A – Dunnett's Method for the sake of illustration, pretend that Design A was the control group. tα ,( k −1, n − k ) = T= x: Knapp Stat 350 Spring 2009 Design A 77.5 Design B 75.3 Design C 81.8 Design D 88.2 Design E 89.6 Lecture L: 1-way ANOVA Page 12 of 12 SAS Code for Dunnett's Method on Example A proc anova data=cereal; class design; model cases = design; means design/dunnett('A') alpha = 0.05; run; SAS Output for Dunnett's Method on Example A Dunnett's t Tests for cases NOTE: This test controls the Type I experimentwise error for comparisons of all treatments against a control. Alpha 0.05 Error Degrees of Freedom 45 Error Mean Square 109.2489 Critical Value of Dunnett's t 2.53129 Minimum Significant Difference 11.832 Comparisons significant at the 0.05 level are indicated by ***. design Comparison E D C B - A A A A Difference Between Means 12.100 10.700 4.300 -2.200 Simultaneous 95% Confidence Limits 0.268 -1.132 -7.532 -14.032 23.932 22.532 16.132 9.632 *** You may wish to conduct 1-tailed tests. For example, you may wish to test whether the other designs have significantly higher sales than the current design (not just significantly different). Knapp Stat 350 Spring 2009 Lecture L: 1-way ANOVA Page 13 of 13 SAS code to test if the other treatments are significantly greater than control proc anova data=cereal; class design; model cases = design; means design/dunnettu('A') alpha = 0.05; run; Dunnett's One-tailed t Tests for cases NOTE: This test controls the Type I experimentwise error for comparisons of all treatments against a control. Alpha 0.05 Error Degrees of Freedom 45 Error Mean Square 109.2489 Critical Value of Dunnett's t 2.22241 Minimum Significant Difference 10.388 Comparisons significant at the 0.05 level are indicated by ***. design Comparison E-A D-A C-A B-A Difference Between Means 12.100 10.700 4.300 -2.200 Simultaneous 95% Confidence Limits 1.712 Infinity 0.312 Infinity -6.088 Infinity -12.588 Infinity *** *** SAS code to test if the other treatments are significantly less than control proc anova data=cereal; class design; model cases = design; means design/dunnettl('A'); run; Dunnett's One-tailed t Tests for cases NOTE: This test controls the Type I experimentwise error for comparisons of all treatments against a control. Alpha 0.05 Error Degrees of Freedom 45 Error Mean Square 109.2489 Critical Value of Dunnett's t 2.22241 Minimum Significant Difference 10.388 Comparisons significant at the 0.05 level are indicated by ***. design Comparison E-A D-A C-A B-A Knapp Stat 350 Spring 2009 Difference Between Means 12.100 10.700 4.300 -2.200 Simultaneous 95% Confidence Limits -Infinity 22.488 -Infinity 21.088 -Infinity 14.688 -Infinity 8.188 Lecture L: 1-way ANOVA Page 14 of 14 Fixed Effects versus Random Effects Fixed factor / fixed effects models – if the k treatment levels we are using are the only ones of interest in the experiment and conclusions will not be drawn beyond those k treatments Random factor / random effects models – if the k treatment levels are only a sample from the population of possible levels of interest. We are interested not in the difference in means between the particular levels we happened to sample, but instead we want to estimate how much variability in the sample is due to the treatment levels versus due to experimental error. σ 2 = σ τ2 + σ ε2 Hypotheses: H0: σ τ2 = 0 vs. Ha: σ τ2 > 0 ˆ σ ε2 = MSE ˆ σ τ2 = MSTr − MSE ni Calculation of the F statistic is the same for random and fixed factors in a 1-way ANOVA Example A – assume for the sake of illustration that the design is a random factor. Knapp Stat 350 Spring 2009 Lecture L: 1-way ANOVA Page 15 of 15 One-Way ANOVA vs. Pooled t-test Recall that ANOVA assumes that all groups have the same standard deviation (estimated by MSE , Root MSE). A 1-way ANOVA with only 2 groups is equivalent to a Pooled t-test. Example: Cereal, comparing only Designs A and B. data cereal2; input design $ cases; cards; A 73 A 83 A 78 A 86 A 78 A 78 A 79 A 76 A 72 A 72 B 58 B 82 B 70 B 76 B 81 B 104 B 67 B 79 B 67 B 69 ; proc anova data=cereal2; class design; model cases = design; run; proc ttest data=cereal2; class design; var cases; run; Knapp Stat 350 Spring 2009 Lecture L: 1-way ANOVA Page 16 of 16 SAS output: ANOVA The ANOVA Procedure Class Level Information Class Levels design Values 2 AB Number of Observations Read Number of Observations Used 20 20 The ANOVA Procedure Dependent Variable: cases Source Model Error Corrected Total Sum of Squares 24.200000 1608.600000 1632.800000 DF 1 18 19 R-Square 0.014821 Source design Coeff Var 12.37355 DF 1 Mean Square 24.200000 89.366667 Root MSE 9.453394 Anova SS 24.20000000 F Value 0.27 Pr > F 0.6091 cases Mean 76.40000 Mean Square 24.20000000 F Value 0.27 Pr > F 0.6091 SAS output: t-test The TTEST Procedure Statistics Variable design N cases cases cases A B Diff (1-2) 10 10 Lower CL Mean Mean Upper CL Mean Lower CL Std Dev Std Dev Upper CL Std Dev Std Err 74.226 66.314 -6.682 77.5 75.3 2.2 80.774 84.286 11.082 3.1479 8.6402 7.1431 4.5765 12.561 9.4534 8.3549 22.932 13.98 1.4472 3.9723 4.2277 T-Tests Variable cases cases Method Pooled Satterthwaite Variable cases Knapp Stat 350 Spring 2009 Variances Equal Unequal DF 18 11.3 Equality of Variances Method Num DF Den DF Folded F 9 9 t Value 0.52 0.52 F Value 7.53 Pr > |t| 0.6091 0.6128 Pr > F 0.0060 Lecture L: 1-way ANOVA Page 17 of 17 ...
View Full Document

This note was uploaded on 02/16/2010 for the course MA 350 taught by Professor Sellke during the Spring '10 term at Purdue University-West Lafayette.

Ask a homework question - tutors are online