This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Statistical Techniques II Page 76 Analysis of Variance and Experimental Design
The simplest model or analysis for Analysis of Variance (ANOVA) is the CRD, the Completely
Randomized Design. This model is also called “One-way” Analysis of Variance.
Unlike regression, which fits slopes for regression lines and calculates a measure of random variation
about those lines, ANOVA fits means and variation about those means.
The hypotheses tested are hypotheses about the equality of means H0 : 1 2 3 4 ... t
the i represent means of the levels of some categorical variable
“t” is the number of levels in the categorical variable.
H1: some i is different
We will generically refer to the categorical variable as the “treatment” even though it may not actually
be an experimenter manipulated effect.
The number of treatments will be designated “t”.
The number of observations within treatments will be designated n for a balanced design (the
same number of observations in each treatment), or ni for an unbalanced design (for i = 1 to t).
The assumptions for basic ANOVA are very similar to those of regression.
The residuals, or deviations of observations within groups, should be normally distributed.
The treatments are independently sampled.
The variance of each treatment is the same (homogeneous variance).
ANOVA review I am borrowing some material from my EXST7005 notes on t-test and ANOVA. See those notes
for a more complete review of the introduction to Analysis of Variance (ANOVA).
Start with the logic behind ANOVA.
Prior to R. A. Fisher's development of ANOVA, investigators were likely to have used a series of t
tests to test among t treatment levels.
What is wrong with that? Recall the Bonferroni adjustment. Each time we do a test we increase
the chance of error. To test among 3 treatments we need to do 3 tests, among 4 treatments, 6 tests,
5 treatments are 10 tests, etc.
What is needed is ONE test for a difference among all tests with one overall value of a specified
by the investigator (usually 0.05).
Fisher's solution was simple, but elegant.
Suppose we have a treatment with 5 categories or levels. We can calculate a mean and
variance for each treatment level. In order to get one really good estimate of variance we can
pool the individual variances of the 5 categories (assuming homogeneity of variance).
This pooled variance can be calculated as a weighted mean of the variance (weighted by the
degrees of freedom). James P. Geaghan - Copyright 2011 Statistical Techniques II Page 77 Y
Y A B C D E Group And since SS1 12 n1 1 then n1 1 12 SS1 , the weighted mean is simply the sum of the
SS divided by the sum of the d.f.
(n1 1)S1 (n2 1)S2 (n3 1)S3 (n4 1)S2 (n5 1)S5
S (n1 1) (n2 1) (n3 1) (n4 1) (n5 1)
p S2 p SS1 SS2 SS3 SS4 SS5
(n1 1) (n 2 1) (n 3 1) (n 4 1) (n 5 1) So we have one very good estimate of the random variation, or sampling error, S2.
Then what? Now consider the treatments. Why don't they all fall on the overall mean? Actually, under the
null hypothesis, they should, except for some random variation. So if we estimate that random
variation, it should be equal to the same error we already estimated within groups?
Recall the variance of means is estimated as S2/n, the variance of the sample divided by the
sample size. The standard error is the square root of this.
If we actually use means to estimate a
variance, we are also estimating the variance Y
of means, S2/n. If we multiply this by “n” it
should actually be equal to S2, which we
estimated with S2 , the pooled variance
So if the null hypothesis is true, the mean
square of the deviations within groups
should be equal to the mean square of the
deviations of the means multiplied by
groups Means A B C D E Group James P. Geaghan - Copyright 2011 Statistical Techniques II Page 78 Now, if the null hypothesis is not true, and some i
is different, then what?
Then, when we calculate a mean square of
deviations of the means from the overall mean, it
should be larger than the previously estimated S2 .
Y So we have two estimates of variance, S2 and the
variance from the treatment means. If the null
hypothesis is true, they should not be significantly
different. Y A B C D E Group If the null hypothesis is FALSE, the treatment mean square should be larger. It will therefore
be a ONE TAILED TEST!
We usually present this in an “Analysis of Variance” table.
Source d.f. Sum of Squares Treatment t–1 SSTreatment Error t(n–1) SSError Total tn–1 SSTotal Mean Square MSTreatment MSError Degrees of freedom There are tn observations total (ni if unbalanced).
After the correction factor, there are tn–1 d.f. for the corrected total.
There are t–1 degrees of freedom for the t treatment levels.
Each group contributes n–1 d.f. to the pooled error term. There are t groups, so the
pooled error (MSE) has t(n–1) d.f.
The SSTreatments is the SS deviations of the treatment means from the overall mean.
Each deviation is denoted ti, and is called a treatment “effect”.
SSTreatments = (Y -Y) t 2 t i i=1 2
i i=1 The model for regression is Yi b0 b1 X i ei
The effects model for a CRD is Yi i ij
where the treatments are i=1, 2, ... t and
the observations are j=1, 2, ... n, or ni for unbalanced data
An alternative expression for the CRD, called the means model, is Y i i ij
Statistics quote: Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital. -Aaron Levenstein
James P. Geaghan - Copyright 2011 Statistical Techniques II Page 79 The calculations. The SSTotal is exactly the same as regression, the sum of all t n Y
i 1 j 1 2
ij observations (squared first). The correction factor is exactly the same too, all observations are summed, the sum is t n squared and divided by the number of observations, Yij i 1 j 1 Obs
mean Group 1
1 Group 2
t n i=1 Group 3
Y3 2 (tn) . Group 4
Y4 2 j=1 UncorrectedSSTreatments = ( Yij ) n Calculations are the same as regression for the corrected sum of squares total.
The corrected SS treatments is the uncorrected treatments (calculated from the
marginals) less the same correction factor used for the total.
Error is usually calculated as the SSTotal minus the SSTreatments.
We use an F test to test the equality of two means. An Analysis of Variance usually
proceeds with an F test of the MSTreatment using the MSError. The test has t–1 and
t(n–1) degrees of freedom.
This F test will be ONE TAILED since we expect the treatment variance to be too large
if the null hypothesis is not true.
The MSError estimate a variance we designate 2 or 2 .
If the null hypothesis is true, the MSTreatments estimates the SAME VARIANCE, 2.
However, if the null hypothesis is false the MSTreatment variance is the same 2 plus
some amount due to the differences between treatments. This is designated 2 n 2 .
Since the treatment variance can be designated 2 n 2 , we can see that the null hypothesis
can be stated as either the usual H 0 : 1 2 ... t (e.g i2 0 ), or as H 0 : 2 0 .
Which is best depends on the nature of the treatment. If the treatment levels are randomly
chosen from a large number of treatment levels, then they estimate the variance of that
treatment population and would be random. This would be 2 .
However, if the treatments are not chosen from a large number of treatments; if they are either
all of the levels of interest or all of the levels that exist, then they are said to be FIXED. Fixed
treatment levels represent a group of means that are of interest to the investigator, so
H 0 1 2 ... t is a better representation of the null hypothesis than H 0 : 0 . James P. Geaghan - Copyright 2011 Statistical Techniques II Page 80 For fixed treatments we still calculate a sum of squared treatment effects and divide by d.f. This
is designated ni t 1 and the F test is the same. These simply do not represent a
n t 1 variance. t 1 and The two values estimated by the MSTreatment ( n or n
2 2 2 n t 1 2
i MSError ( ) are called expected mean squares.
2 t 1 is often represented as simply Q .
n The unwieldy term n t 1 2
i One final note on the F test. Given that MSTreatments and MSError estimate these EMS (expected mean squares) we can 2 n2
rewrite the F test as F 2 . From this value we can see that it must be a one tailed test because n 2 cannot be negative, so the ratio is always >1.
We can also see that increasing n increases power. SAS Example (Appendix 12).
Summary? Y Means Y
Deviations A B C D E Group Overview of ANOVA
Recall that we are testing for differences among indicator variables.
The treatments may be fixed or random.
H 0 : 1 2 ... t for fixed effects. H 0 : 2 0 for random effects. Assume i ~ NIDrv(0,2). Remember that this covers 3 separate assumptions. Statistics quote: If your result needs a statistician then you should design a better experiment. -- Baron Ernest
James P. Geaghan - Copyright 2011 Statistical Techniques II Page 81 Every analysis can be expressed as a “linear” model with appropriate notation and subscripting. Regression: Yi 0 1 X i i
CRD : Yi i ij Factorial : Yi 1i 2 j 1 2 ij ijk RBD : Yi i j ij or Yi i j ij ijk LSD : Yi i j k ijk Split-plot : Yi 1i ij 2 k 1 2 ik ijk Treatments levels may be fixed or random. Determining the correct and appropriate tests depends
on recognizing correctly.
With random effects we are probably not interested in individual treatment levels. We are
likely to be interested in the variability among the treatment levels and the distribution of the
With fixed effects we will probably want to compare individual levels. Usual Analysis of variance procedure
1) H0 : 1 2 3 4 ... t 2) H1: some i is different
3) a) Assume that the observations are normally distributed about each mean,
or that the residuals (i.e. deviations) are normally distributed.
b) Assume that the observations are independent
c) Assume that the variances are homogeneous
4) Set the level of type I error. Usually = 0.05
5) Determine the critical value. The test is in ANOVA is a one tailed F test.
6) Obtain data and evaluate the results.
7) Draw your conclusions based on the results Analysis of Variance source table
PROC glm DATA=Cuckoo; CLASSES HostSpecies;
TITLE2 'Analysis of Variance with PROC GLM';
MODEL EggLt = HostSpecies; run;
Source Model Error Corrected Total DF 5 114 119 Sum of Squares
137.188 Mean Square
0.8267399 F Value 10.39 P > F
< 0.0001 A more modern version of analysis of variance (mixed model analysis) will be discussed below. It
has many options not available in the old least squares approach above. Unfortunaately, many
researchers trained in the last century are still unenlightened and use this version of ANOVA.
This analysis indicates that the cuckoo egg size used in at least one other species’ nest are
different from the size in other cuckoos’ nests. What did we assume?
James P. Geaghan - Copyright 2011
Statistical Techniques II Page 82 Descriptions of post-hoc tests
Post-hoc or Post-ANOVA tests! Once you have found out some treatment(s) are “different”,
how do you determine which one(s) are different? For the moment we will be concerned only with examining for differences among the treatment
levels. We will assume that we have already detected a significant difference among
treatments levels with ANOVA. So, having rejected the Null hypothesis we wish to determine
how the treatment levels interrelate. This is the “post-ANOVA” part of the analysis.
These tests fall into two general categories. Post hoc tests (LSD, Tukey, Scheffé, Duncan's, Dunnett's, etc.)
A priori tests or pre-planned comparisons (contrasts) A priori tests are better. These are tests that the researcher plans on doing before they gather
data, and if we dedicate 1 d.f. to each one we generally feel comfortable doing each at some
specified level of alpha. However, since multiple tests do entail risks of higher experiment wide error rates, it would not
be unreasonable to apply some technique, like Bonferroni's adjustment, to insure an
experimentwise error rate of the desired level of alpha ().
So how might we do these “post hoc” tests?
The simplest approach would be to do pairwise test of the treatments using something like the
two-sample t-test. If you are interested in testing between treatment level means, then you
probably have “fixed” effects. If the levels were randomly selected from a large number of
possible choices we would probably not be interested in the individual levels chosen.
This tests examines the null hypothesis H 0 : 1 2 or H0 : 1 2 0 , against the alternative
H0 : 1 2 0 or H0 : 1 2 0 or H0 : 1 2 0 .
Recall two things about the two-sample t-test.
First, in a t-test we had to determine if the variance was equal for the two populations tested.
We tested H 0 : 12 2 with an F test to determine if this was the case.
Second, the variance of the test (variance of the difference between 1 and 2) was equal to
12 2 n1 n2 . This is a variance for the linear combination from our null hypothesis, that is, the 2 2
variance of 1 2 is 1 1 (1) 2 , if the variables are independent. If the variance are equal (as they are often assume to be for ANOVA) then the variance is
1 1 2 . We estimate 2 with the mean square error (MSE). n1 n2 James P. Geaghan - Copyright 2011 Statistical Techniques II Page 83 Y1 Y2 So, we would test each pair of means using the two sample t-test as t S ANOVA, using the MSE as our variance estimate, we have t 2
p ( 1 1 n1 n2 (Y1 Y2 )
1 MSE n1 n2 ) . For . If the design is balanced this simplifies this to t (Y1 Y2 ) .
2 M SE n Notice that if the calculated value of t is greater than the tabular value of t, we would reject the
null hypothesis. To the contrary, if the calculated value of t is less than the tabular value we
would fail to reject.
Call the tabular value t*, and write the case for rejection of H0 as t (Y1 Y2 ) So we would reject H0 if t (Y1 Y2 )
(Y1 Y2 ) t 2 MSE n 2 MSE 2 MSE .
n or if t 2 MSE (Y1 Y2 ) or
n . So, for any difference (Y1 Y2 ) that is greater than t 2 MSE n we find the difference between
the means to be statistically significant (reject H0), and for any value less than this value we
find the difference to be consistent with the null hypothesis. Right?
This value of t 2 MSE n is what R. A. Fisher called the “Least Significant Difference”,
commonly called the LSD (not to be confused with the Latin Square Design = LSD). LSD tcritical MSE (n1 n1 )
1 2 or LSD tcritical SY1 Y2 This value is the exact width of an interval Y1 Y2 which would give a t-test equal to
tcritical. Any larger values would be “significant” and any smaller values would not. This
LSD tcritical SY Y
is called the “Least Significant Difference”.
1 2 We calculate this value for each pair of differences and if the observed difference is less, the
treatments are “not significantly different”. If greater they are “significantly different”.
One last detail. I have used the simpler version of the variance assuming that n1 = n2. If the
experiment is unbalanced (i.e. there are unequal numbers of observations in the treatment
levels) then the value is MSE 1 1 . n1 n2 James P. Geaghan - Copyright 2011 ...
View Full Document
This note was uploaded on 12/29/2011 for the course EXST 7015 taught by Professor Wang,j during the Fall '08 term at LSU.
- Fall '08