This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Statistical Techniques II Page 128 What would happen if the effects were random.
Test results
Source EMS Lysine 2
2
2
2 2 n LSP np LS ns LP nps L Protein 2
2
2
2 2 n LSP ns LP nl SP nls P Sex 2
2
2
2 2 n LSP np LS nl SP nlps S L*S 2
2 2 n LSP np LS L*P 2
2 2 n LSP ns LP P*S 2
2 2 n LSP nl SP 2 2 n LSP 2 L*P*S Error The residual error is used to test the third order interaction.
The third order interaction is used to test the second order interactions.
Using SAS PROC GLM there is no proper error term for testing the main effects, though one
can be calculated with the “Random / test” statement output. PROC MIXED gives a correct
result. Splitplot and Repeated Measure Designs
The Splitplot and Repeated Measures “Designs” combine elements of design (error structure) and
treatment arrangement concepts. These are designs with two levels, a “Main Plot”, with its own
treatment and error, and a “Subplot”, with its own treatment and error.
It is possible to have more than just an a single factor treatment arrangement in both levels.
The (minimum of) two treatments (from the main and sub plots) are usually cross classified .
Either Main or Subplot may have nested error structure.
The simplest split plot would have the following model (CRD).
Yijk 1i ij 2k 1 2 ik ijk
Example with CRD main plot.
A B A C B C B B A C B C A C A James P. Geaghan  Copyright 2011 Statistical Techniques II Page 129 Each plot SPLIT for a new treatment.
A
F B
G F C
F A B
G F B B B
G F A G G F G F C G F F C G G F F A
G F C
F C G G F G
G A
G F Splitplot design source table. The d.f. for error(b) is the usual t1*t2*(n–1) less the d.f. for
error(a), t1*(n–1), giving t1*(t2–1)(n–1).
Source Treatment1 Error(a) Treatment 2 Tmt1*Tmt2 Error(b) Total d.f. t1–1 = 2 t1 (n–1) = 12 t2–1 = 1 (t1–1)( t2–1) = 2 t1*( t2–1)(n–1) = 12 t1* t2*n–1 = 29 Splitplot design  examples of splits We may split a plot to do a new treatment, e.g. an agricultural experiment with fertilizer
treatments in plots may have a herbicide applied to half of each plot an not to the other half.
A soil study of contaminants may measure levels of the chemical of interest at various levels in
a soil core (05 cm, 610 cm, 1115 cm, etc), so the core is split.
A study of the growth of plants, e.g. Spartina in a marsh, may split the plant into above ground,
root and rhizome biomass.
Anytime a treatment occurs within an experimental unit, we have a splitplot. If we are
studying diets of fish, and put a male and female fish in each aquaria, weight gain of hogs with
large and small hogs in each pen, etc.
More complex designs are possible. The main plot may be an RBD, or the main plot and/or sub
plot treatments may be factorial or nested.
It is possible to have plots that are split twice, or split and measured repeatedly.
These designs are complicated, difficult to analyze and difficult to interpret.
So why do you do them?
Split plot design with an RBD main plot.
A B B C f g
d e e d
f g d e
f g d f
e g C
d e
f g A d e
g f B C d e
f g e d
g f B A f e
d g d e
f g A g d
e f C f e
g d This design has two blocks, three levels in the main plot treatment and four levels in the subplot
treatment.
James P. Geaghan  Copyright 2011 Statistical Techniques II Page 130 For the main plot the analysis is the same as any RBD. This one will have treatments, blocks,
treatment*block interaction and replicated experimental units in blocks.
Yijkl i 1j ij ijk 2l 1 2 il 1 2 ijl ijkl Source table RBD main plot in splitplot.
Source Block Treatment 1 Blk* Tmt1 Error(a) Treatment 2 Tmt1*Tmt2 Blk*Tmt2 + Blk*Tmt1*Tmt2 (pooled) Error(b) Total d.f. calculation b–1 t1–1 (b–1)( t1–1) b t1 (n–1) t2–1 (t1–1)( t2–1) (b–1) ( t2–1) + (b–1)( t1–1)( t2–1) b* t1*( t2–1)(n–1) b* t1* t2*n–1 numeric d.f. 1 2 2 6 3 6 3 + 6 = 9 18 47 Are there advantages to a split plot design? Obviously, if there are covariances, they should be taken into account.
Also, the subplot error is expected to be smaller and have more degrees of freedom. As a
result, subplot tests should be more powerful. This is an advantage if the tests of interest
(treatment and interactions) can be placed in the subplot.
Repeated measures The repeated measures design is similar to a splitplot. We have a “main plot”, which can be any
of the designs we have discussed previously (CRD, RBD, LSD).
We then take repeated measurements over time within the plots. If these “repeated measures”
are independent, then this “time” factor is just crossclassified with the treatment.
If, however, the measurements are NOT independent, we have a repeated measures design.
Independence? Again? Yep.
What do I mean by independent? For example, if you are sampling sugar content of an ear of
corn from a plot, or the height of Spartina in a plot, you ask, “are they independent or not?”
If you measure a different ear of corn from a different plant each time, or measure a different
Spartina plant, they are probably independent.
However, if you measure a kernel from the same ear of corn, or the same Spartina plant each
time (repeatedly), they are likely NOT independent.
Some examples of split plot and repeated measures variables. Prepost tests on people, in fact most any experiment where several levels of a treatment(s) are
measured on the same subject (= a person).
Soil samples or water samples at different depths (in the same site).
Epiphytes on Spartina counted below, at and above the tide line (on the same plant). James P. Geaghan  Copyright 2011 Statistical Techniques II Page 131 Studies on plants like sugar cane where we measure production in year1, year2 and year3 on
the same biological material.
Ditto for asparagus, artichokes, most tree species, etc.
In general, any time your experimental unit has another treatment applied within each
experimental unit, this is a split plot. If the experimental unit (or sampling unit) is measure
over time it is repeated measures.
Why is this independence important? What can we do about it?
Lets BRIEFLY revisit the X and X'X matrices.
The X matrix for designs consists of columns of 0 values and 1 values, arranged to distinguish
between categories.
X = 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 For a simple CRD with 4 treatment levels the X'X matrix may look like the following.
X'X = n1 0 0 0 0 n2 0 0 0 0 n3 0 0 0 0 n4 For a simple CRD with 4 treatment levels the (X'X)1 matrix would look like the following.
(X'X)‐1 = 1 n1 0 0 0 0 1 n2 0 0 0 0 1 n3 0 0 0 0 1 n4 To get the variances and covariances we multiply by the MSE, as you know. This gives MSE/ni
2
on the main diagonal (= Y ), and zeros on the off diagonal.
All those zeros on the off diagonal mean that THERE IS NO COVARIANCE BETWEEN THE
TREATMENTS. This is well and good, we do not expect covariances between the independently
sampled treatments.
But for the split plot and repeated measures, we do actually expect some covariances!!
Maybe the covariance is simple, perhaps it is a constant. This would be the assumption for
splitplot designs, and we can use GLM for our tests (but not for subplot standard errors).
But much recent scientific investigation has found that often the structure is not simple.
Splitplot SAS example The data come from an a classic experiment to measure the effect of manure on the yield of
barley. Six blocks of three whole plots were used, together with three varieties of barley. Each
James P. Geaghan  Copyright 2011 Statistical Techniques II Page 132 whole plot was divided into four subplots to cater for the four levels of manure: 0, 0.01, 0.02
and 0.04 tons per acre. The data form a completely randomized design.
There is no significant manure level by variety treatment interaction, so the lines below do not
significantly depart from parallel lines. Also, the varieties alone are not significant so the data
could be represented by a single line.
140
Joined means with standard error bars to examine interaction
130 Yield 120
110
100
90
80
70
0 1 2
Manure treatment 3 4 The manure level is quantitative and can be tested for linear, quadratic and cubic effects, but it
is not equally spaced. Othogonal polynomial test show a linear and a quadratic effect.
130 120 Yield 110 100 90 80 70 0 1 2
Manure treatment 3 4 Statistics quote: The Ten Commandments of Statistical Inference (1/14/05,
http://www.bsos.umd.edu/socy/alan/10command.html)
1. Thou shalt not hunt statistical inference with a shotgun.
2. Thou shalt not enter the valley of the methods of inference without an experimental design.
3. Thou shalt not make statistical inference in the absence of a model.
4. Thou shalt honor the assumptions of thy model.
5. Thy shalt not adulterate thy model to obtain significant results.
6. Thy shalt not covet thy colleagues' data.
7. Thy shalt not bear false witness against thy control group.
8. Thou shalt not worship the 0.05 significance level.
9. Thy shalt not apply large sample approximation in vain.
10. Thou shalt not infer causal relationships from statistical significance.
James P. Geaghan  Copyright 2011 Statistical Techniques II Page 133 See the covariance structure table below. A couple structures of particular interest are the variance
component structure (split plot) and a favorite repeated measure structure AR(1).
Some of the covariance structures available in SAS proc mixed. From SAS Institute Inc., SAS/STAT
software changes and enhancements through release 6.11. Cary, NC, 1996.
Type Option ijth element Structure LM 0 0 0 OP
0 0 0P [I] = M
MM 0 0 0 PP
N0 0 0 Q
LM 0 0 0 OP
MM 0 0 0 PP
MN 0 0 0 0 PQ
0 0
LM MM MN LM OP
MM PP MN PQ
LM 1 OP 1 P M
MM 1 PP
N 1 Q
LM OP
MM PP MN PQ LM 0 0 OP
MM 0 PP MN 0 0 PQ
0 2 Simple structure
or
No structure no repeated
statement
or split plot 2 for i=j, 2 2 0 otherwise 2 2
1 Variance
components VC 2
2 i for i=j, 2
3 0 otherwise 2
4 2 Compound
Symmetry + 1 for i=j,
CS 1 otherwise two levels of i for i=j,
Unstructured UN symmetric
covariance
ij = ji 2
1 2
1 2
1
2
1
2
1 2
11 2 2
1 2
1
2
1 21 21
2
22 31 32 42 2
1 32
2
33 41 2 31 43 AR(1) for i=j,
ij otherwise 2
1 2
1
2
1
2
1
2 2 1 OP
PP
PQ 41 42 43
2
44 2 FirstOrder
Autoregressive 2
1
2
1 3 2 2 2 3 2 2 Toeplitz TOEP for i=j,
ij otherwise 1
2 Toeplitz with two
bands (may
specify other
number of bands) TOEP(2) ij otherwise for a
given number of
bands,
0 elsewhere 2 2 1 3 for i=j, 2 3 1
2 1 2 1
2 1 2 1
2 1 1
2 1 1
2 1 James P. Geaghan  Copyright 2011 Statistical Techniques II Page 134 There are other structures, including some where the structure follows some type of regression
line. This is frequent in “spatial statistics” where covariance is modeled as a function of distance
between plots.
Notes on covariance structure
For a simple CRD there is only a single homogeneous variance (“no structure” in the table
above). This is the SAS default in the proc mixed repeated statement.
SAS has a “VC” option for “variance components” which allows for heterogeneous variance.
This option is available in proc mixed and proc glimmix but not in proc glm.
The usual and simplest assumption for variance components for a subplot treatment is
“Compound Symmetry”, SAS option CS. This is the only covariance structure available in
PROC GLM other than the “no sturcture”.
One of the most popular structures for repeated measures designs is the “first order
autoregressive”, SAS option AR(1).
Other subplot treatment structures are possible. Below is a “Toeplitz” structure (SAS option
TOEP). The number of bands can be varied.
Another subplot treatment structure is “unstructured”, SAS option UN. This takes up the most
degrees of freedom to estimate.
Several different structures can be fitted and evaluated using the fitting statistics.
Barley and manure split plot design Tests for differences in full & reduced models Fit statistics d.f. for the Covariance ‐LL2 AIC AICC BIC Unspecified (default=CS) 3 529 535 535.5 = CS (Compound symmetry) 3 529 535 = CSH (Compound symmetry & heterogeneous variance) 6 527.1 = UN (Unstructured) 11 522.8 Covariance Structure Test Chi Sq d.f. test P > Chi Sq 534.4 CS vrs CSH 1.9 3 0.5934 535.5 534.4 CS vrs UN 6.2 8 0.6248 539.1 540.7 537.9 CSH vrs UN 4.3 5 0.5071 544.8 550.3 542.5 PROC MIXED versus PROC GLM Certain analyses like the split plot and repeated measures require addressing covariance
structure. The old PROC GLM has limited ability to accomplish this, and will not correctly
calculate all subplot errors and tests. PROC MIXED handles these issues well.
Covariance structure An additional note on different structures. There is an area of statistics called “spatial statistics”
where the covariance structure is a function of distance.
These functions can be linear, exponential, or various other options.
For unequal timing of repeated measures, where AR(1) may not be appropriate, these functions
can also be used.
James P. Geaghan  Copyright 2011 Statistical Techniques II Page 135 In order to fit these covariance structures and get correct subplot standard error estimates, use
PROC MIXED. These options are not available in PROC GLM.
When fit with PROC GLM we must assume compound symmetry. This is the default in PROC
MIXED.
There are some “adjustments” that can be made by some GLM options, but we will not cover
these since all problems are resolved in PROC MIXED.
Repeated measures SAS example The data come from an a classic experiment to measure the effect of manure on the yield of
barley. Six blocks of three whole plots were used, together with three varieties of barley. Each
whole plot was divided into four subplots to cater for the four levels of manure: 0, 0.01, 0.02
and 0.04 tons per acre. The data form a completely randomized design.
There is no significant manure level by variety treatment interaction, so the lines below do not
significantly depart from parallel lines. Also, the varieties alone are not significant so the data
could be represented by a single line.
4.00 Study of drug effect on asthma patients
Plot of means with standard errors Mean fiv1 3.75 3.50 3.25 3.00 2.75 2.50
1 2 3 4
5
Time in hours 6 7 8 Fit statistics for various covariance structures
Study of drug effect on asthma patients Analysis of residuals from PROC MIXED Obs Description ‐2 Res Log Likelihood 1 2 AIC (smaller is better) 3 AICC (smaller is better) 4 BIC (smaller is better) AR1
276.1
280.1
280.1
284.7 Toep
229
245
245.3
263.2 CS
348.2
352.2
352.2
356.7 UN
150.4
222.4
227.6
304.4 2k 2ln( Likelihood )
AIC 2k k 1 n k 1
k ln(n) 2ln( Likelihood ) Likelihood ratio test of the covariance structures
Description ‐2 Res Log Likelihood covariance parms diff with UN d.f. difference P > ChiSq
AR1 276.1 2
125.7
34 1.91189E‐12
Toep 229 8
78.6
28 1.07687E‐06
CS 348.2 2
197.8
34 5.32668E‐25
UN 150.4 36 James P. Geaghan  Copyright 2011 Statistical Techniques II Page 136 LSMeans
There is something else about the SAS LSMeans statement you should know.
There are actually several “unusual” or unexpected behaviors of this statement. One we will
discuss in connection with Analysis of Covariance.
However, there is another general behavior that we should see first.
What is the overall mean? Tmt 1 2 3 4 5 6 7 1 2 2 3 4 3 5 3 2 4 6 3 4 6 5 Reps 3 6 7 6 7 4 8 7 Sum n 5 9 100 20 Tmt Mean 4 4 6 4 5 6 4 For the table above the arithmetic mean is 100/20 = 5 and the LSMean is 33/7 = 4.71.
LSMeans calculates means as the mean of means, not the raw mean of all observations. This is
particularly important in unbalanced factorial designs.
For one unbalanced 4 by 5 factorial the means and lsmeans are given below.
Raw data Tmt2 Tmt1 1 2 3 4 1 2 3 4 5 9 4 5 7 8 7 8 8 9 2 3 4 5 6 6 6 8 6 9 9 3 1 2 3 4 5 6 4 6 5 7 4 2 4 8 3 7 4 7 7 5 2 3 3 4 3 3 8 5 6 7 8 James P. Geaghan  Copyright 2011 Statistical Techniques II Page 137 Comparison of Means & LSMeans. Tmt1 1 2 3 4 LSMean Mean 1 3 7 6 8 6.00 6.08 2
4
6
7
8
6.25
6.20 Tmt2 3
2
5
5
6
4.50
4.30 4
3
8
5
6
5.50
5.25 5
3
3
8
6.5
5.13
4.73 LSMean
3.00
5.80
6.20
6.90
5.48 Raw Mean 3.00 5.50 6.00 7.00 5.35 Which is better, arithmetic means or LSMeans?
This depends on the situation. Suppose we caught fish in the summer and in the winter, and
wanted to express the average temperature at which fish were caught.
The winter mean is 15c and the summer mean is 25c. What is the mean.
We do the calculations on the individual catches and find the mean is equal to 24.
How can that be?
Well we did 180 samples in the summer and only 20 samples in the winter. So the summer
temperatures dominate our samples.
Perhaps the average temperature would be better expressed as 20, the mean of the means.
That is LSMeans
I generally use LSMeans.
When testing hypotheses such as H 0 : 1 2 3 it is best that the overall mean not be
dominated by some cell that has an unusually high number of observations.
On the other hand, cells with more observations are better estimates of the mean than cells with
fewer estimates.
If the null hypothesis is true, why lose power by treating the cells equally?
Traditional ANOVA will use RAW means in it's calculation.
The choice is yours, except that PROC MIXED has only the LSMeans.
Testing for differences between models PROC MIXED provides several tools for comparing models. The intent is to compare between
full and reduced models. The statistics used differ from those used in regression.
Reduced models may be models with some terms omitted, or
Reduced models may be models with a simpler variance or covariance structure
The test is called a likelihood ratio test
It produces a Chi square statistic.
The degrees of freedom are the d.f. difference between the two models. James P. Geaghan  Copyright 2011 Statistical Techniques II Page 138 Homogeneous variance is tested automatically with some simple models
Recall our Typhoid strain example, we requested separate variances for each group with the
statement: REPEATED / GROUP=STRAIN;
The resulting output was
Null Model Likelihood Ratio Test
DF
ChiSquare
Pr > ChiSq
2
14.56
0.0007 Note that fitting 3 variances requires 3 d.f., while fitting a homogeneous variance model
requires only 1 d.f.
The 2 d.f. difference are the reason the test on the preceding page is a 2 d.f. model.
This test is very similar to Bartlett's test of homogeneity of variance.
Suppose that for the baseball example you were told that the salaries of the some positions were
highly variable, while others were more stable.
Perhaps we should have tested for nonhomogeneous for this example.
So we add the statement; REPEATED / GROUP=POSITION;
SAS fits the different variances for the positions, but does not always provide a test. When this
test is not provided automatically we can calculate our own test.
For the original fit we got the results,
Covariance Parameter Estimates
Standard
Z
Cov Parm Estimate
Error
Value
team
3466.41
30458
0.11
Residual
1924296
145057
13.27 Pr Z
0.4547
<.0001 Alpha
Lower
Upper
0.05 513.45 3.81E125
0.05 1668871 2243534 When separate variances are requested we get the following results,
Covariance Parameter Estimates
Cov Parm
team
Residual
Residual
Residual
Residual
Residual
Residual
Residual
Residual Group
Position
Position
Position
Position
Position
Position
Position
Position 1b
2b
3b
c
if
of
p
ss Estimate
25008
3126672
2276275
1512066
759251
626561
2558744
1875902
1384956 Standard
Error
35506
0
902599
600277
201637
240028
407215
208345
364052 Z
Value
0.70
.
2.52
2.52
3.77
2.61
6.28
9.00
3.80 Pr Z
0.2406
.
0.0058
0.0059
<.0001
0.0045
<.0001
<.0001
<.0001 Alpha
0.05
.
0.05
0.05
0.05
0.05
0.05
0.05
0.05 Lower
4960.25
.
1189304
789517
479387
333467
1916409
1526216
878092 Upper
26828515
.
5985011
3981295
1382686
1582294
3590143
2361923
2504484 SAS reports the number of parameters fitted in the “Dimensions” section. The first model
estimated 2 parameters, while this model fits 9, a difference of 7.
In order to do this 7 d.f. test we take the difference in the “–2 Res Log Likelihood” reported in
the “Fit Statistics”.
This value was 6346.8 for the reduced model and 6323.1 for the full model.
The difference is 23.7, a chi square value with 7 d.f.
The probability of a greater chi square value is 0.001286226, a significant result. James P. Geaghan  Copyright 2011 Statistical Techniques II Page 139 As with regression, when there is a difference in two models the larger model is better, since it
presumably provides some information that the smaller model does not.
If there is no significant difference we decide in favor of the simpler model.
We just tested homogeneity of variance.
Other between model comparisons SAS also provides some other statistics to compare between models. Also under the “Fit
statistics” you will find
AIC (smaller is better)
AICC (smaller is better)
BIC (smaller is better) 6341.1
6341.6
6346.8 And for the smaller model
AIC (smaller is better)
AICC (smaller is better)
BIC (smaller is better) 6350.8
6350.8
6352.1 These are all penalized index values called “Information Criteria”. As the note says, smaller is
better for all 3. AIC is the Akaike Information Criteria
AICC is the “Corrected AIC “
BIC is the Bayesian Information Criterion
and there are others. These all work in a similar fashion. They provide an adjusted measure of goodness of fit.
These are similar in concept to the “adjusted R2”, so they do not necessarily get smaller when the
model gets larger.
These results also indicate that the full model is better, but they do not provide a test with a
probability value. Statistics quotes: USA Today has come out with a new survey  apparently, three out of every four people make
up 75% of the population. David Letterman (1947  )
Statistics quotes: "Statistics are no substitute for judgment."  Henry Clay
Statistics quotes: "Like dreams, statistics are a form of wish fulfillment."  Jean Baudrillard James P. Geaghan  Copyright 2011 Statistical Techniques II Page 140 Analysis of Covariance
Our previous encounter with Analysis of Covariance was from a “Multisource Regression” point
of view.
In multisource regression we were particularly interested in the regression aspects, particularly
the slopes that would estimate some rates of change in Y relative to X.
The indicator variable estimates intercept differences.
The key concept here is that with multisource regression we are interested in the regression.
We want the slopes, we want to interpret the slopes, and we want to know if slopes from two or
more indicator variables are the same or not.
However, the name “Analysis of Covariance” actually comes from a design perspective.
In this case we are doing some designed experiment, with treatments, error, etc.
And for whatever reason we feel the need to include a “regression type” X variable, this is the
“covariable”.
Why would we include a covariable? It is probably not by choice. It is often not a source of
variation that we are interested in interpreting.
If after starting a designed experiment we recognize that there is some source of variation that
will inflate our error term, and if we find that we can account for that variation with a
“covariable”, we may choose to do “Analysis of Covariance”.
For example, we may be doing an agricultural experiment on fertilizer rates and realize that the
plots in our experiment differ in terms of moisture level, and this is influencing our results.
So we could measure soil moisture and include it as a covariable.
Or we may be doing an experiment involving the influence of diet on blood sugar levels in
diabetes patients when we realize that the patients initial weight is influencing our results. We
could include the patients weight as a covariable.
Studies of “weight gain” often include initial weight as a covariable.
One researcher in crawfish aquaculture realized that water leakage from his pond was
obscuring the results of the rice forage density that he intended to study. The effect of leakage
was mitigated by including a covariable that measured leakage (the amount of water he added
to keep some ponds from drying up completely).
So what are we doing here? We have a source of variation that, if unaccounted for, would
inflate our error term. We remove that variation from the error term by including a variable in
the model.
Sound familiar?
Conceptually we are including the covariable for the same reasons that we include blocks. It is not
a source of variation of interest; it is simply a way of removing variation from the error and
increasing power by reducing the size of the error term.
So, while in multisource regression we are fitting slopes that are of interest, and we have an
interest in testing to see if the slope interactions are significant
In Analysis of Covariance we are removing a source of nuisance variation from the error term.
In this case we not only are not particularly interested in interpreting the slopes, we absolutely
do not want the slope interactions to be significant.
James P. Geaghan  Copyright 2011 Statistical Techniques II Page 141 Why?
Because in design we are interpreting differences
in means, Treatment 1
Treatment 2
Treatment 3 Y X With a covariable added (no interaction) we are
interpreting differences in regression “levels”. Treatment 1
Treatment 2
Treatment 3 Y X Treatment 3 If there are slope interactions then the level
differences are not constant. Y Treatment 2
Treatment 1
X
Range of interest
Treatment 3 This can be a complete disaster, Y Treatment 2
Treatment 1
X Range
of
interest or a relatively minor problem. Treatment 3
Treatment 2 Y Treatment 1
X Our philosophy towards the slope interaction will be one of two approaches. Ignore the problem; don't even test for an interaction. After all, we are talking about a
“block” interaction.
Address the issue by testing the interaction, just as we would with most design interactions,
and recognize that significant treatment interactions cannot be ignored. Ignoring the problem is tempting. It is easier.
But in other cases where we ignore the block interaction, we feel that all block interactions
represent the same experimental error.
Is this true for slope interactions? Do they represent “error”.
Maybe, a new analysis involving “random regressions” actually uses the slope
interaction as an error term.
But addressing the issue by testing the interactions is probably a better approach.
First, we could put on our regression hats and actually try to interpret the different slopes as
meaningful values.
Or we could go ahead and test for levels even if we have significant slope effects.
Statistics quotes: Statistics are like ladies of the night. Once you get them down, you can do anything with
them. Mark Twain (Samuel Clemens)(18351910)
James P. Geaghan  Copyright 2011 Statistical Techniques II Page 142 Will our results be meaningful? That depends.
If the overlap in the lines is not too bad, we only need to determine where to compare the lines.
Or Here?
Y Treatment 3 Here? Treatment 2
Treatment 1
X Enter LSMEANS. The LSMEANS estimates has one other behavior that we have not seen.
This behavior occurs when a covariable is present.
Treatment 3
Y ? Treatment 2
Treatment 1
X With a covariable present, LSMEANS compares levels at a value of Xi equal to the mean of Xi.
Here!!!
Y Treatment 3
Treatment 2 X Treatment 1
X This has several advantages.
Where is the most “meaningful” place to compare levels? In the middle of the range of
observed data.
Where is the confidence interval of a regression line narrowest? At the mean of the Xi
values (note that the various treatment groups may not have exactly the same mean, so an
overall mean is used).
So this default behavior by LSMeans is both reasonable and relatively powerful.
The SAS LSMeans output will look the same, a table of pairwise comparison probabilities
(with adjustments if requested).
So we may include a covariable in a design for the same reasons that we include blocks,
increased power.
If there are no slope interactions we have a constant difference between the parallel lines,
and there is little problem with comparisons.
LSMeans is probably still best because the confidence interval is narrowest at the mean of
Xi.
In many cases, if the overlap is not too bad, we can still get pretty good interpretations of levels
by using LSMeans.
In the worst cases, consider the possibility of interpreting the slope differences (by placing
confidence intervals on them and seeing if they overlap). This may provide meaningful results,
and may be good in other cases as well (not the worst).
The End James P. Geaghan  Copyright 2011 ...
View
Full
Document
This note was uploaded on 12/29/2011 for the course EXST 7015 taught by Professor Wang,j during the Fall '08 term at LSU.
 Fall '08
 Wang,J

Click to edit the document details