This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Two and Threeway ANOVA Models Stat 640 Twoway analysis of variance refers to a model with a continuous response and two cate gorical predictor variables. Of course, one way to approach this problem is to create a lot of dummy variables and do a regression, so that the design matrix is full of zeros and ones. If the first categorical variable x 1 has b levels, and x 2 has k levels, then we have b + k 1 columns in our design matrix (including the onevector column). The dummy variable approach is appropriate for observational data, where the design is not complete, that is, when there is not the same number of observations of the response y for each of the pairs of levels of the variables. However, when the data are from a designed experiment and complete , the approach looks more like an ANOVA than a regression. The difference is in how the model is written, and the organization of the ANOVA output, but the projection of y onto the b + k 1dimensional subspace is the same. The oneway ANOVA compares the means of several populations. The data could come from an experiment or from an observational study. When the data are from an experiment, the ANOVA is said to have a completely randomized design. The levels of the predictor variable are called treatments and sometimes the model sum of squares is labeled SSTR instead of SSM. For example, suppose a farmer wants to test three different fertilizers used in wheat pro duction. There are twelve fields, so that four fields are randomly assigned to each fertilizer. Further, the farmer grows four different varieties of wheat, and wants to test the fertil izer on all of the varieties. He randomly assigns a different variety to each field, for each fertilizer, so that there is one variety/fertilizer combination for each observation. Fertilizer 1 2 3 1 20.0 22.3 21.1 2 24.5 25.6 24.1 3 25.4 27.0 26.1 4 24.5 25.1 24.3 mean 23.4 25.0 23.9 std dev 2.37 1.97 2.07 1 We can plot the data: 20 22 24 26 28 Fertilizer Type Yield (2)    Fert 1 Fert 2 Fert 3 By looking at the plot, can we tell if we going to reject H : the means of the yields are the same for all fertilizers? It looks as though Fertilizer 2 has the highest yields, but if we move the top observation from Fertilizer 2 to underneath the other observations, the means no longer look different. Because the apparent different depends only on one point, we guess that the differences between fertilizers are not statistically significant. The results for the completely randomized design analysis are as follows. Note that the label for treatment is the variable name, Fertilizer. Source SS df MS Fstat pvalue Fertilizer 5.36 2 2.68 0.583 0.58 Error 41.4 9 4.6 Total 46.8 Because the pvalue is large, we decide not to reject H , and we conclude that there is no difference in the fertilizers. However, we know that there is variation in yield that is due to variety; some varieties have higher average yields than others. Even though every fertilizervariety; some varieties have higher average yields than others....
View
Full
Document
 Spring '08
 MaryMeyer
 Variance

Click to edit the document details