This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Statistical Techniques II Page 108 Expected Mean Squares (EMS)
Rules for creating EMS Expected Mean Squares (EMS) for the Completely Random Design
1) Initially treat all effects as if they were random. Represent each source with a 2, and an
appropriate subscript. In the simplest case (a treatment and an error term);
Source Levels Var component Error n 2 or 2 Treatment t 2 2) The lowest level source is generally represented as 2 . Working up from the lowest level, each
source contains the representation for itself and all lower sources.
3) The coefficients for each component is the product (if balanced, otherwise see formulas) of the
number of observations within the cell for each source level. This will be the product of all the
subscripts which occur at lower levels.
SOURCE Treatments Error d.f. t–1 t(n–1) EMS (random effects) 2 n 2
2 4) FIXED effects; any component which is fixed is represented as a sum of squared effects, and is
deleted from all expectations except its own.
Example ANOVA table
SOURCE d.f. EMS (fixed effects) Treatments t–1 2 n i (t 1) Error t(n–1) 2 2 NOTE that the F test is the same, in this case, for fixed and random effects.
For a design with more levels
e.g. Ponds, Sites(Ponds), Samples(Sites Ponds), Determinations(Samples Sites Ponds).
MODEL: Yijkl i ij ijk ijkl
let Ponds = t = 5; Sites = s = 7; Samples = b = 3; Det = d = 4
This is the typical situation for a hierarchical design (fully nested)
1) Initially treat all effects as if random. Represent each source with a 2 and an appropriate
subscript. In this case;
Source Levels Var comp Tmt t 2 Experimental Error s 2 Sampling Error b SubSampling Error d 2 2 2) Working up from the lowest level, set up each source to contain itself and all lower sources. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 109 3) This design is balanced, so the coefficient for each component is the product of the number
of observations within the cell (the product of all the subscripts which occur at lower levels).
Expected mean squares
SOURCE d.f. EMS coefficient Ponds t–1 2 d 2 db 2 dbs2 Sites(P) t(s–1) 2 d2 db2 Samp(P S) ts(b–1) 2 d 2 Error Total tsb(d–1) tsbd – 1 2 Expected mean squares with numerical coefficients
SOURCE d.f. EMS coefficient Ponds t–1 2 4 2 122 842 Sites(P) t(s–1) 2 4 2 12 2 Samp(P S) ts(b–1) 2 4 2 Error Total tsb(d–1) tsbd – 1 2 For FIXED effects, any effect which is fixed is deleted from all expectations except its own. In
our CRD example it is unlikely that any source except treatments would be fixed, since the
others are sources of random error. Therefore, if the treatments were fixed we would only 2 2 change dbs 2 to dbs i (t 1) , or numerically 84 2 becomes 84 i 4 or 84 i2 , often represented as Q.
Block designs proceed the same way, except that the blocks are not nested and have their own
source line just like treatments. The interaction effect has (b–1)(t–1) d.f.
We will let SAS get most of our EMS.
However, you should be able to get EMS for simple designs and you should be able to examine
SAS produced EMS and determine which sources are tested with which error terms.
There is one other rule that SAS uses that will be relevant to treatment arrangements when we get
EMS for interactions of random effects are random.
EMS for interactions of fixed effects are fixed.
Interactions with mixed effects are random. If there is any random source in an interaction, that
interaction EMS is a random effect. Statistics quote: Torture the data long enough and they will confess to anything. Anonymous James P. Geaghan - Copyright 2011 Statistical Techniques II Page 110 EMS, what good are they? Once upon a time you had to be able to determine the EMS to figure out what terms got tested
with what error terms.
Then life got easy when SAS began providing the EMS in PROC GLM, but you still had to
specify which terms got tested with which other terms.
Then along comes PROC MIXED.
PROC MIXED not longer estimates the EMS, it estimates the individual variance
components of the EMS, but not the EMS themselves.
It will also determine which terms should be used to test other terms.
And life gets even easier.
But, you will notice in some more complicated Experimental Designs that the different sources
of variation do not all have the same degrees of freedom for the error.
We therefore look at the EMS in order to understand why.
PROC MIXED can also produce the EMS, but in order to get them you must force PROC
MIXED to do things the “old fashion way”.
So you will get the EMS, but will not get most of the other features of the more modern
analysis. This is done by specifying the option “METHOD=TYPE3” on the model statement.
Hierarchical design example (Appendix 17a): from Bliss. (1967). Statistics in Biology. New York:
McGraw-Hill, 351. Examine the dried egg experiment. A can of dried eggs was well mixed and samples taken from
this single source. Two samples labeled as two different types were sent to six commercial
laboratories and analyzed for fat content. Each laboratory was asked to assign two different
technicians so that both technicians analyzed two replicates of both "types". The null hypothesis,
that the mean fat content of each sample is equal, should be true since all samples came from the
same can of dried eggs.
The sources of variation are then lab, tech(lab), type(lab, tech) and rep(lab, tech, type).
This is a CRD with a nested error term (3 levels). Yijkl i ij ijk ijkl
A fully nested design is one place where the TYPE I SS can be used in Designed experiments.
TYPE III are used for everything else.
To do this example in PROC MIXED you only need to specify the treatment (either fixed in the
MODEL statement or random in the RANDOM statement). The additional random hierarchical
levels would go in the RANDOM statement (omitting the last as a residual error term).
There are two ways to do this in PROC GLM.
The TEST statement - where you specify the H=effect to be tested and the E= effect to use as
error term. You can also specify a Type SS (HTYPE and ETYPE). A test statement was used
for this problem.
The RANDOM statement, with the test option, will determine the error term to be used and
correct for unbalanced data. The test option was not requested for this example.
Note that SAS GLM always tests by default with the residual error term. This is not correct in
this example. The correct test was requested with the test statement, and Type I SS used.
James P. Geaghan - Copyright 2011 Statistical Techniques II Page 111 tech(laboratory), type(laboratory, tech) and replicate(laboratory, tech, type)
TEST H=LAB E=TECH(LAB) / HTYPE=1 ETYPE=1;
TEST H=TECH E=TYPE(TECH*LAB) / HTYPE=1 ETYPE=1;
Note that SAS GLM will provide you with the EMS with numeric coefficients for the
RANDOM statement (with or without the test option.
PROC MIXED is a newer alternative to solving this type of problem. This procedure works
differently from the usual least squares procedures.
It is not a least squares solution, it is an iterative solution (maximum likelihood).
It estimates the random variance components (2) instead of the EMS.
Then it tests any fixed effect components to the model.
PROC MIXED works a little differently from PROC GLM. In PROC MIXED the fixed
components ONLY go in the model and the random effects go in the RANDOM statement.
There is not a “test” option in PROC MIXED.
Are labs fixed or random? They could be either, depending on how or why they were chosen.
Confidence limits for the random variance components were requested.
Hierarchical design example (Appendix 17b): Snedecor & Cochran, 1980 (pg 293) This is an example of a CRD with different numbers of observations at different levels. Wheat
yields were available for 6 districts in the midwest. There were UP TO 10 farms per district and
UP TO 3 fields per farm.
The model for this design is Yijk i ij ijk
In this example the TEST statement was requested for GLM, but it actually gives the wrong
answers because the unbalanced design has no clear error term estimated (see EMS coefficients).
The TEST option on the RANDOM statement will make a simple algebraic adjust for
unbalanced designs. Occasionally negative F tests can result.
The TEST statement produced a test of DISTRICTS with a P-value of 0.3056.
The TEST option on the RANDOM statement causes the tests to be calculated (with the
appropriate error terms).
This test also adjusts the tests to account for the unequal coefficients on the EMS. The P-value
The MIXED model again estimates the variance components with confidence intervals. Note that
the confidence intervals to not include zero (a negative value should not be possible with variance
components), but it is very wide and overlaps with the other variance components.
Note that algebraic calculations of the Variance components from GLM give different results.
RBD without “replicates” example (Appendix 17c): Snedecor & Cochran, 1980 (pg 256) The experiment tests the failure of soybean seeds to germinate after treatment with one of 4
fungicides and a control.
Randomized block design without replication within blocks. Five blocks and four treatment levels
in each blocks.
James P. Geaghan - Copyright 2011 ...
View Full Document
This note was uploaded on 12/29/2011 for the course EXST 7015 taught by Professor Wang,j during the Fall '08 term at LSU.
- Fall '08