Unformatted text preview: Statistical Techniques II Page 98 Experimental Design
The design aspect of Analysis of Variance refers to fate of the error term.
We will discuss three designs, CRD (Completely Randomized Design): Yij i ij RBD (Randomized Block Design): Yij i j ij or Yijk i j ij ijk LSD (Latin Square Design): Yij i j k ij We will also discuss hierarchical design, or nested error terms. Any of the three designs above
can have nested error terms. Completely Randomized Design (CRD)
The basic, most simple design is the CRD (Completely Randomized Design) with a single error
term. We will talk about the nature and arrangement of treatments later. First we talk about the
error term. Differences among designs come primarily from the error terms.
Schematic of a CRD A B A C B C B B A C B C C A A The source table for this CRD, with sources, degrees of freedom and EMS is given below. The
treatments can be fixed or random, but the error term must be a random effect.
Source d.f. SS MS EMS Tmt t–1 SSTmt MSTmt 2 n 2 Error t(n–1) SSE MSE 2 Total tn–1 SSTotal Source d.f. SS MS EMS Tmt t–1 SSTmt MSTmt 2 n k 1 i t 1 Error t(n–1) SSE MSE 2 Total tn–1 SSTotal t 2 James P. Geaghan - Copyright 2011 Statistical Techniques II Page 99 CRD with fixed effect treatments. Fixed effects include all levels of interest for the treatment, or all possible levels of the
The EMS translate directly into the F tests of treatments. 2 n 2
Random effects F =
2 2 n i t-1
Fixed effects F =
2 In more advanced designs which treatments are random and which are fixed becomes
extremely important. However, for the moment we will only be putting one treatment in our
models, and it makes little difference if that treatment is fixed or random. We will continue to
determine if each treatment is fixed or random because of its eventual importance.
Sampling Units Depending on how the experiment was done, we may find that the error term is not a simple
randomized, replicated observation with a treatment.
There may be several other sources of variation within the error term, which if ignored will alter
the error term and reduce the effectiveness of the experiment.
In a CRD the treatments are assigned, completely at random, to some unit. That unit may be a
plot in a field, a test tube, a plant in a pot, a car, a laboratory rat, a person, just about anything.
The unit that we assign the treatment to is called the experimental unit. The error term derived
from the variance of that unit is called the experimental error.
The actual measurement of the dependent variable Yij, is done on the experimental unit, or some
smaller unit than the experimental unit.
This smaller unit is called the sampling unit or measurement unit.
For example, the treatment is applied to a plot in a field. This is the experimental unit. We may
measure the height of individual plants out of many plants in that plot; these would be the
In practice, if we only have one measurement per experimental unit we consider this to
represent the experimental unit, even if we measure some smaller unit. In this case the
experimental unit and sampling unit are the same unit.
If we measure only one plant in the plot, that plant represents the whole plot. If we take only
one blood sample from a rabbit, it represents the whole rabbit.
The importance of sampling units only comes into play when we have several measurements of
sampling units per experimental unit. These are nested error terms.
For example, suppose we have 4 fertilizer treatments (t=4) in a field. Each treatment level
occurs in 5 randomly selected plots (p=5), so the field has 20 plots. Plots are the experimental
Statistics quote: I asked a statistician for her phone number... and she gave me an estimate. (unknown)
James P. Geaghan - Copyright 2011 Statistical Techniques II Page 100 We want to measure Plant available phosphorus. The design is a simple CRD if there is only one
measurement per experimental unit. But suppose we proceed to take 3 soil samples (s=3) in each
plot, and back in the laboratory we make 2 measurements on each soil sample (n=2). Treatment : fertilizer (t=4)
Experimental unit : a plot (p=5)
Sampling unit : a soil sample (s=3)
Sub-sampling unit : a soil test (n=2) This is hierarchical because we took the soil samples as sampling units of the plots, and then made
several measurements that are sub-sampling units of each soil sample. Notice that the subscripts
on the terms will show the hierarchical nature of the sampling. Yijlk i ij ijk ijkl
Expected mean square structure.
Source d.f. EMS Tmt t–1 2 n 2 ns2 nsp2 Plot(Tmt) t(p–1) 2 n 2 ns 2 Sample (Plot*Tmt) tp(s–1) 2 n 2 Measure (sample*plot*tmt) tps(n–1) 2 Total tpsn–1 2
t For fixed treatments, replace 2 with nk 1 i t 1 . This model has a hierarchical or nested expected mean square structure. The Plot(Tmt) is the experimental error.
The Sample(plot tmt) is the sampling error.
The measure(sample plot tmt) is the sub-sampling error.
The last error is also called the residual error. In SAS it will contain any terms left off the
model. Note that the appropriate error term for testing treatments is the Plot(Tmt) error term.
We expect, under the null hypothesis, that the numerator and denominator of an F test are the
If we want to test H 0 : 2 0 in a term consisting of n ns nsp , we need an
error term containing everything except the term to be tested (e.g. n ns ) so if the
null hypothesis is true the expected F value is 1. F= 2 n 2 ns 2 nsp 2 2 n 2 ns 2 Also note that power is gained because the coefficient of 2 is “nsp” instead of just the “n” that it
would be if we had no sampling or subsampling units. So, even though we do not use the
sampling error, with its greater number of degrees of freedom, we still gain power because of the
larger coefficient on the term we want to test.
James P. Geaghan - Copyright 2011 Statistical Techniques II Page 101 Randomized Block Design
Blocks: In some experiments the whole experiment cannot be conducted in a single place or at a
We may find that we have to use 3 incubators for our cultures because they don't fit in one
We may have to use several different fields to conduct the experiment if a single large field is
We may have to repeat our experiment several times if we cannot do it all at once.
These incubators or fields or times are not the same. They differ in some way. If we ignore this
variation, the extra variation will inflate our error term.
How do we get this extra variation out of the error? We put it in the model. We will call this a
A block is NOT a source of variation that we are interested in interpreting. We simply recognize
that it exists and include it in the model to remove it from the error.
It will be put in the model and appear just like a treatment.
The model would be Yi i j ij
Schematic of an RBD
Block 1 Block 2 Block 3 Block 4 A B C B C A B A B C A C The source table for this CRD, with sources, degrees of freedom and EMS is given below.
Source d.f. SS MS EMS Tmt t–1 SSTmt MSTmt 2 b 2 Block b–1 SSBlk MSBlk 2 2 t Error (t–1)(b–1) SSE MSE 2 Total tn–1 SSTotal The block design depicted has only one cell for each treatment per block. This is a valid and
legitimate design, since the “interaction” of blocks and treatments serves as an error term.
If the behavior of the treatments between blocks is consistent, this error term is a measure of
the variability of experimental units between blocks.
Why does it serve as an error term?
The interaction serves as an error term because it is a random effect. It is a random effect
because the block is random (almost invariably), and interactions of random effects are
James P. Geaghan - Copyright 2011 Statistical Techniques II Page 102 Treatments can be fixed or random, but as long as one term in an interaction (the block) is
random, the interaction is random.
Note that the degrees of freedom for an interaction between treatments (t) and blocks (b) is (t–
1)(b–1). This is typical of interactions in general.
How would the design differ if we had replicated experimental in the blocks?
These would be nested.
Schematic of an RBD with replication within the cells
A B B C C A B A B C A C For this design we still have treatments, blocks and an interaction. However, we now also have
a measure of variability of treatments within blocks as well as between blocks.
The model would be Yijk i j ij ijk
There are actually two ways this model can arise, and they are different in terms of what can be tested.
In the case just discussed, the plots were the experimental units. The “interaction” between
treatments and blocks represent variability among experimental units.
The replicated plots in each block also represent variability among experimental units.
Replicated cells in blocks
A B B C C A B A B C A C If we have two measurements of experimental units, they should be equal and we can test to see
if they are equal.
Replicated samples in cells
Block 1 A A Block 2 C B C
B B C C A B
A v Another possibility is that the replicated measurements come from sampling units within the plots.
In this case we do not expect the sampling units from within plots to estimate the same variability
as the between experimental unit variation.
James P. Geaghan - Copyright 2011 Statistical Techniques II Page 103 The plots estimate experimental error.
The replicate measurements estimated sampling error.
Though the model may look the same.
Yijk i j ij ijk Source d.f. SS EMS Tmt t–1 SSTmt 2 2 n nb 2 Block b–1 SSBlk 2
2 2 n nt Tmt*Blk (t–1)(b–1) SSB*T 2 2 n Error tb(n–1) SSE 2 Total tbn–1 SSTotal Note that in the previous case the two error terms were both supposed to represent experimental
error, where the second error was replicated plots. Now the first represented experimental error
and the second sampling error, where the replicated measurements were taken from within plots.
In both cases there is a test of hypothesis here, but with different interpretations. If the two terms
both represent experimental error, then we consider the possibility that for some reason the
treatments do not behave the same way in the different blocks. In this case the “” interaction
represents a true interaction. Hopefully this does not exist, but since we have two estimates of
experimental error we can test this ( H 0 : 0 ).
In the other case the two terms represent an experimental error and a sampling error. We expect
that the error from within the more homogeneous plots would be smaller than between the plots.
In this case the sampling error is 2 , and the experimental error is n term. We can test to
see if these are the same, but we do not really expect them to be the same.
In this latter case we cannot test to see if there is a true interaction between the blocks and the
treatments. We can only ASSUME that there is not block by treatment interaction, and in fact
this is a new assumption for block designs.
Pooling error terms In the first case, where the two error terms both represented experimental error, we may consider
the possibility that the two error terms be combined into a single error term.
Is this wise? More d.f., more power. But are the error terms really the same? And which one is
better if they are not the same?
If the two are not the same, then the difference is caused by an “interaction” between the block
and treatment. This means that for some reason the treatments did not give a consistent
performance in the various blocks.
Let’s suppose we had an experiment to find the better of 5 rice varieties (A, B, C, D and E). We
did the experiment each year for 4 years, and will block on years.
If there is an interaction between blocks and treatments, it implies that some rice varieties did
relatively better in some years and other varieties did better in other years!
James P. Geaghan - Copyright 2011 Statistical Techniques II Page 104 Suppose variety A did better in 1994 because it was a dry year, and variety B did better in 1995
(a wet year). Which one are you going to recommend to a farmer in 2001?
You don't know, because you don't know if 2001 will be wet or dry, or in between.
So if you are going to conclude that one variety is “better”, it should be a difference that is
consistent across years, or a difference that considers the annual variation and interaction. This
would be the n term.
This will be a larger term, and harder to show a difference with, but the difference will be more
On the other hand, if the errors are the same, why not pool the errors into a single error?
We need a mechanism to determine if we should pool or not.
Obviously if the treatments are significantly different, we use the interaction term and do not
But if the two terms are not significantly different, are they the same?
We cannot show statistically that two things are the same because we do not know the
probability of Type II error.
So how similar do they have to be before we would pool?
See pooling criteria by Bancroft and Chien-Pai (JASA,1983, 78(384):981-983).
Values are P(>F). Pool if equal to or larger than the values in the table. From Bancroft and
Chien-Pai (JASA,1983, 78(384):981-983). n1 = 4 n2 = 4 8 12 16 20 0.35 0.43 0.45 0.48 0.48 8 0.29 0.37 0.40 0.43 0.43 12 0.26 0.34 0.37 0.40 0.41 16 0.25 0.32 0.36 0.38 0.39 20 0.24 0.31 0.34 0.38 0.38 So if we have two estimates of experimental error (between plots), we may wish to pool.
If we have one estimate of experimental error and a sampling error (within plots), we are less
likely to want to pool.
Later we will see that when we have several treatment terms, each with a block interaction, we
will usually pool all block interactions into a single error term.
Could we have more than one block?
Sure, if we have several fields, each sampled in several years, we could block on both years and
Note that for t treatments in y years and f fields, we should have t*y*f experimental units. For
example, with 5 treatments in 4 fields over 3 years we should have 5*4*3 = 60 experimental
units. This is a minimum. Replicated experimental units would add more degrees of freedom. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 105 Latin Square Designs (LSD)
Suppose we had 5 diet treatments that we wanted to examine for their influence on milk
production. Our department provided us with 5 cows to do our experiment.
Five treatments, 5 cows, no reps?
We could block on time, and do the experiment over several weeks with weekly estimates of
total milk production. That might work.
However, there is a little problem. The cows are different. They have different milk
production rates. They always have had and always will have. We could try to look at pre-post
results, the change in milk production from before the diet to after the diet. That might work.
But there is another way; the Latin Square Design (LSD).
In the Latin Square Design each cow will get each diet, so cow differences average out, and
won't affect the results.
Obviously, with 5 diets and 5 cows we have to do the experiment for 5 weeks to give each cow
A Latin Square has a special setup so that each diet occurs in each week (weeks may differ) and
with each cow (to average out cow differences).
The Latin Square below has not been randomized. To randomize the diet rows would be
placed in random order and then the diet columns would be placed in random order. Week 1 Week 2 Week 3 Week 4 Week 5 Cow1 A B C D E Cow2 B C D E A Cow3 C D E A B Cow4 D E A B C Cow5 E A B C D Note that we have 5 diets, 5 cows and 5 weeks. The usual block design would require
5*5*5=125 experimental units.
However, we only have 25. This will only work (well) if we have a Latin Square arrangement
with each diet occurring once with each cow and once in each week.
The Latin Square source table. Note that r = c = t for any Latin Square.
Source d.f. SS EMS Row r–1 SSRow 2 2 r Col c–1 SSCol 2 r 2 Tmt t–1 SSTmt 2 r 2 Error Total (r–1)(r–2) SSE r2–1 SSTotal 2 The Latin square is a bit messy. We cannot examine any interactions at all because there are not
enough degrees of freedom. Essentially the remains of any interactions are pooled into an error
term. This is consistent with other design practices since ROW and COL are blocks and we
usually pool block interactions into a single error term. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 106 The error term for the Latin Square is usually best calculated as the remaining d.f. after
subtracting the main effects from the total.
The model is: Yi i j k ijk
Note the odd subscripting. Each observation can be identified by just 2 subscripts.
Terminology Main effects are the lone treatments and blocks. The remaining terms are interaction terms or
Interactions are between two main effects that are cross classified.
Cross-classified effects are distinct, meaningful sources of variation and must kept in the
appropriate categories. Block 1 Block 2 Block 3 Block 4 Block 5 Tmt 1 Y11 Y21 Y31 Y41 Y51 Tmt 2 Y12 Y22 Y32 Y42 Y52 Tmt 3 Y13 Y23 Y33 Y43 Y53 Tmt 4 Y14 Y24 Y34 Y44 Y54 Nested effects are randomly applied (such as observations) and could be reordered without
affecting the experiment. Obs 1 Obs 2 Obs 3 Obs 4 Obs 5 Tmt 1 Y11 Y21 Y31 Y41 Y51 Tmt 2 Y12 Y22 Y32 Y42 Y52 Tmt 3 Y13 Y23 Y33 Y43 Y53 Tmt 4 Y14 Y24 Y34 Y44 Y54 Series of Latin Squares Latin squares are rather limited as to how they are done. Additional experimental units are not
However, it is possible to add a second or third square, and reproduce the whole experiment
elsewhere or at a different time.
Suppose we are examining the effect of various treatments in removing oil from marsh area
that have been fouled. We have 3 treatments we are comparing. The experimental area will be
sprayed with oil and one of the following treatments applied.
1) Control (no treatment), 2) detergent spray and 3) biological agent
The objective is to compare the effects of the treatments after two months. The variable to be
measured is live Spartina biomass in the treatment plots.
The marsh area to be used in the experiment has several gradients. There are saline gradients
and elevation gradients that will affect the experiment. The investigators decided to “block” in
both a North-South and East-West direction. This is a Latin Square.
James P. Geaghan - Copyright 2011 Statistical Techniques II Page 107 Layout of the experiment. C D B
D B C
B C D Elevation
gradient There is nothing wrong with this experiment. Blocking on “rows” and “columns” should account
for the salinity and elevation gradients. However, if the investigators decide they need additional
replication, they could do another square elsewhere, perhaps across the stream.
Layout of the expanded experiment. C D B
D B C
B C D D B C
C D B
B C D Now we need a source table. We still have the basic Latin square, but there are two squares. Call
this variable “Square” with two levels, east and west.
We are not interested in Square as a source of variation. It is simply a mechanism to increase
replication, as blocking often is.
However, it is a source of variation and must be included in the model.
Square definitely has meaning, east and west are two distinct squares. The treatments still have
the same meaning in each square, so these are cross classified.
Do rows and columns mean the same in the two squares? Does row 1 in the east have the same
salinity as row 1 in the west. Does col 2 have the same salinity and elevation in both squares?
Probably not. If row 1, 2 and 3 in the east has a different meaning from row 1, 2 and 3 in the
west, these should be nested.
The same for columns.
The model is Yijkl i ij ik il ijkl
The source table (EMS later)
Source Square Row(Square) Col(Square) Tmt Tmt*Square Error Total d.f. s–1 s(r–1) s(c–1) t–1 (t–1)(s–1) s(r–1)(r–2) sr2–1 d.f. num 1 4 4 2 2 4 17 SS SSSquares SSRow SSCol SSTmt SSSquare*Tmt SSE SSTotal Statistics quote: You got to be careful how you interpret statistics. If you aren't, you might make the mistake of the man
who read in the paper that "most auto accidents happen within eight miles of your home." So he moved.
James P. Geaghan - Copyright 2011 ...
View Full Document
This note was uploaded on 12/29/2011 for the course EXST 7015 taught by Professor Wang,j during the Fall '08 term at LSU.
- Fall '08