EXST7015 Fall2011 Lect26

EXST7015 Fall2011 Lect26 - Statistical Techniques II Page...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Statistical Techniques II Page 128 What would happen if the effects were random. Test results Source EMS Lysine 2 2 2 2 2 n LSP np LS ns LP nps L Protein 2 2 2 2 2 n LSP ns LP nl SP nls P Sex 2 2 2 2 2 n LSP np LS nl SP nlps S L*S 2 2 2 n LSP np LS L*P 2 2 2 n LSP ns LP P*S 2 2 2 n LSP nl SP 2 2 n LSP 2 L*P*S Error The residual error is used to test the third order interaction. The third order interaction is used to test the second order interactions. Using SAS PROC GLM there is no proper error term for testing the main effects, though one can be calculated with the “Random / test” statement output. PROC MIXED gives a correct result. Split-plot and Repeated Measure Designs The Split-plot and Repeated Measures “Designs” combine elements of design (error structure) and treatment arrangement concepts. These are designs with two levels, a “Main Plot”, with its own treatment and error, and a “Sub-plot”, with its own treatment and error. It is possible to have more than just an a single factor treatment arrangement in both levels. The (minimum of) two treatments (from the main and sub plots) are usually cross classified . Either Main or Subplot may have nested error structure. The simplest split plot would have the following model (CRD). Yijk 1i ij 2k 1 2 ik ijk Example with CRD main plot. A B A C B C B B A C B C A C A James P. Geaghan - Copyright 2011 Statistical Techniques II Page 129 Each plot SPLIT for a new treatment. A F B G F C F A B G F B B B G F A G G F G F C G F F C G G F F A G F C F C G G F G G A G F Split-plot design source table. The d.f. for error(b) is the usual t1*t2*(n–1) less the d.f. for error(a), t1*(n–1), giving t1*(t2–1)(n–1). Source Treatment1 Error(a) Treatment 2 Tmt1*Tmt2 Error(b) Total d.f. t1–1 = 2 t1 (n–1) = 12 t2–1 = 1 (t1–1)( t2–1) = 2 t1*( t2–1)(n–1) = 12 t1* t2*n–1 = 29 Split-plot design - examples of splits We may split a plot to do a new treatment, e.g. an agricultural experiment with fertilizer treatments in plots may have a herbicide applied to half of each plot an not to the other half. A soil study of contaminants may measure levels of the chemical of interest at various levels in a soil core (0-5 cm, 6-10 cm, 11-15 cm, etc), so the core is split. A study of the growth of plants, e.g. Spartina in a marsh, may split the plant into above ground, root and rhizome biomass. Anytime a treatment occurs within an experimental unit, we have a split-plot. If we are studying diets of fish, and put a male and female fish in each aquaria, weight gain of hogs with large and small hogs in each pen, etc. More complex designs are possible. The main plot may be an RBD, or the main plot and/or sub plot treatments may be factorial or nested. It is possible to have plots that are split twice, or split and measured repeatedly. These designs are complicated, difficult to analyze and difficult to interpret. So why do you do them? Split plot design with an RBD main plot. A B B C f g d e e d f g d e f g d f e g C d e f g A d e g f B C d e f g e d g f B A f e d g d e f g A g d e f C f e g d This design has two blocks, three levels in the main plot treatment and four levels in the subplot treatment. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 130 For the main plot the analysis is the same as any RBD. This one will have treatments, blocks, treatment*block interaction and replicated experimental units in blocks. Yijkl i 1j ij ijk 2l 1 2 il 1 2 ijl ijkl Source table RBD main plot in split-plot. Source Block Treatment 1 Blk* Tmt1 Error(a) Treatment 2 Tmt1*Tmt2 Blk*Tmt2 + Blk*Tmt1*Tmt2 (pooled) Error(b) Total d.f. calculation b–1 t1–1 (b–1)( t1–1) b t1 (n–1) t2–1 (t1–1)( t2–1) (b–1) ( t2–1) + (b–1)( t1–1)( t2–1) b* t1*( t2–1)(n–1) b* t1* t2*n–1 numeric d.f. 1 2 2 6 3 6 3 + 6 = 9 18 47 Are there advantages to a split plot design? Obviously, if there are covariances, they should be taken into account. Also, the subplot error is expected to be smaller and have more degrees of freedom. As a result, subplot tests should be more powerful. This is an advantage if the tests of interest (treatment and interactions) can be placed in the subplot. Repeated measures The repeated measures design is similar to a split-plot. We have a “main plot”, which can be any of the designs we have discussed previously (CRD, RBD, LSD). We then take repeated measurements over time within the plots. If these “repeated measures” are independent, then this “time” factor is just cross-classified with the treatment. If, however, the measurements are NOT independent, we have a repeated measures design. Independence? Again? Yep. What do I mean by independent? For example, if you are sampling sugar content of an ear of corn from a plot, or the height of Spartina in a plot, you ask, “are they independent or not?” If you measure a different ear of corn from a different plant each time, or measure a different Spartina plant, they are probably independent. However, if you measure a kernel from the same ear of corn, or the same Spartina plant each time (repeatedly), they are likely NOT independent. Some examples of split plot and repeated measures variables. Pre-post tests on people, in fact most any experiment where several levels of a treatment(s) are measured on the same subject (= a person). Soil samples or water samples at different depths (in the same site). Epiphytes on Spartina counted below, at and above the tide line (on the same plant). James P. Geaghan - Copyright 2011 Statistical Techniques II Page 131 Studies on plants like sugar cane where we measure production in year1, year2 and year3 on the same biological material. Ditto for asparagus, artichokes, most tree species, etc. In general, any time your experimental unit has another treatment applied within each experimental unit, this is a split plot. If the experimental unit (or sampling unit) is measure over time it is repeated measures. Why is this independence important? What can we do about it? Lets BRIEFLY revisit the X and X'X matrices. The X matrix for designs consists of columns of 0 values and 1 values, arranged to distinguish between categories. X = 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 For a simple CRD with 4 treatment levels the X'X matrix may look like the following. X'X = n1 0 0 0 0 n2 0 0 0 0 n3 0 0 0 0 n4 For a simple CRD with 4 treatment levels the (X'X)-1 matrix would look like the following. (X'X)‐1 = 1 n1 0 0 0 0 1 n2 0 0 0 0 1 n3 0 0 0 0 1 n4 To get the variances and covariances we multiply by the MSE, as you know. This gives MSE/ni 2 on the main diagonal (= Y ), and zeros on the off diagonal. All those zeros on the off diagonal mean that THERE IS NO COVARIANCE BETWEEN THE TREATMENTS. This is well and good, we do not expect covariances between the independently sampled treatments. But for the split plot and repeated measures, we do actually expect some covariances!! Maybe the covariance is simple, perhaps it is a constant. This would be the assumption for split-plot designs, and we can use GLM for our tests (but not for subplot standard errors). But much recent scientific investigation has found that often the structure is not simple. Split-plot SAS example The data come from an a classic experiment to measure the effect of manure on the yield of barley. Six blocks of three whole plots were used, together with three varieties of barley. Each James P. Geaghan - Copyright 2011 Statistical Techniques II Page 132 whole plot was divided into four subplots to cater for the four levels of manure: 0, 0.01, 0.02 and 0.04 tons per acre. The data form a completely randomized design. There is no significant manure level by variety treatment interaction, so the lines below do not significantly depart from parallel lines. Also, the varieties alone are not significant so the data could be represented by a single line. 140 Joined means with standard error bars to examine interaction 130 Yield 120 110 100 90 80 70 0 1 2 Manure treatment 3 4 The manure level is quantitative and can be tested for linear, quadratic and cubic effects, but it is not equally spaced. Othogonal polynomial test show a linear and a quadratic effect. 130 120 Yield 110 100 90 80 70 0 1 2 Manure treatment 3 4 Statistics quote: The Ten Commandments of Statistical Inference (1/14/05, http://www.bsos.umd.edu/socy/alan/10command.html) 1. Thou shalt not hunt statistical inference with a shotgun. 2. Thou shalt not enter the valley of the methods of inference without an experimental design. 3. Thou shalt not make statistical inference in the absence of a model. 4. Thou shalt honor the assumptions of thy model. 5. Thy shalt not adulterate thy model to obtain significant results. 6. Thy shalt not covet thy colleagues' data. 7. Thy shalt not bear false witness against thy control group. 8. Thou shalt not worship the 0.05 significance level. 9. Thy shalt not apply large sample approximation in vain. 10. Thou shalt not infer causal relationships from statistical significance. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 133 See the covariance structure table below. A couple structures of particular interest are the variance component structure (split plot) and a favorite repeated measure structure AR(1). Some of the covariance structures available in SAS proc mixed. From SAS Institute Inc., SAS/STAT software changes and enhancements through release 6.11. Cary, NC, 1996. Type Option ijth element Structure LM 0 0 0 OP 0 0 0P [I] = M MM 0 0 0 PP N0 0 0 Q LM 0 0 0 OP MM 0 0 0 PP MN 0 0 0 0 PQ 0 0 LM MM MN LM OP MM PP MN PQ LM 1 OP 1 P M MM 1 PP N 1 Q LM OP MM PP MN PQ LM 0 0 OP MM 0 PP MN 0 0 PQ 0 2 Simple structure or No structure no repeated statement or split plot 2 for i=j, 2 2 0 otherwise 2 2 1 Variance components VC 2 2 i for i=j, 2 3 0 otherwise 2 4 2 Compound Symmetry + 1 for i=j, CS 1 otherwise two levels of i for i=j, Unstructured UN symmetric covariance ij = ji 2 1 2 1 2 1 2 1 2 1 2 11 2 2 1 2 1 2 1 21 21 2 22 31 32 42 2 1 32 2 33 41 2 31 43 AR(1) for i=j, i-j otherwise 2 1 2 1 2 1 2 1 2 2 1 OP PP PQ 41 42 43 2 44 2 First-Order Autoregressive 2 1 2 1 3 2 2 2 3 2 2 Toeplitz TOEP for i=j, |i-j| otherwise 1 2 Toeplitz with two bands (may specify other number of bands) TOEP(2) |i-j| otherwise for a given number of bands, 0 elsewhere 2 2 1 3 for i=j, 2 3 1 2 1 2 1 2 1 2 1 2 1 1 2 1 1 2 1 James P. Geaghan - Copyright 2011 Statistical Techniques II Page 134 There are other structures, including some where the structure follows some type of regression line. This is frequent in “spatial statistics” where covariance is modeled as a function of distance between plots. Notes on covariance structure For a simple CRD there is only a single homogeneous variance (“no structure” in the table above). This is the SAS default in the proc mixed repeated statement. SAS has a “VC” option for “variance components” which allows for heterogeneous variance. This option is available in proc mixed and proc glimmix but not in proc glm. The usual and simplest assumption for variance components for a sub-plot treatment is “Compound Symmetry”, SAS option CS. This is the only covariance structure available in PROC GLM other than the “no sturcture”. One of the most popular structures for repeated measures designs is the “first order autoregressive”, SAS option AR(1). Other sub-plot treatment structures are possible. Below is a “Toeplitz” structure (SAS option TOEP). The number of bands can be varied. Another sub-plot treatment structure is “unstructured”, SAS option UN. This takes up the most degrees of freedom to estimate. Several different structures can be fitted and evaluated using the fitting statistics. Barley and manure split plot design Tests for differences in full & reduced models Fit statistics d.f. for the Covariance ‐LL2 AIC AICC BIC Unspecified (default=CS) 3 529 535 535.5 = CS (Compound symmetry) 3 529 535 = CSH (Compound symmetry & heterogeneous variance) 6 527.1 = UN (Unstructured) 11 522.8 Covariance Structure Test Chi Sq d.f. test P > Chi Sq 534.4 CS vrs CSH 1.9 3 0.5934 535.5 534.4 CS vrs UN 6.2 8 0.6248 539.1 540.7 537.9 CSH vrs UN 4.3 5 0.5071 544.8 550.3 542.5 PROC MIXED versus PROC GLM Certain analyses like the split plot and repeated measures require addressing covariance structure. The old PROC GLM has limited ability to accomplish this, and will not correctly calculate all subplot errors and tests. PROC MIXED handles these issues well. Covariance structure An additional note on different structures. There is an area of statistics called “spatial statistics” where the covariance structure is a function of distance. These functions can be linear, exponential, or various other options. For unequal timing of repeated measures, where AR(1) may not be appropriate, these functions can also be used. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 135 In order to fit these covariance structures and get correct subplot standard error estimates, use PROC MIXED. These options are not available in PROC GLM. When fit with PROC GLM we must assume compound symmetry. This is the default in PROC MIXED. There are some “adjustments” that can be made by some GLM options, but we will not cover these since all problems are resolved in PROC MIXED. Repeated measures SAS example The data come from an a classic experiment to measure the effect of manure on the yield of barley. Six blocks of three whole plots were used, together with three varieties of barley. Each whole plot was divided into four subplots to cater for the four levels of manure: 0, 0.01, 0.02 and 0.04 tons per acre. The data form a completely randomized design. There is no significant manure level by variety treatment interaction, so the lines below do not significantly depart from parallel lines. Also, the varieties alone are not significant so the data could be represented by a single line. 4.00 Study of drug effect on asthma patients Plot of means with standard errors Mean fiv1 3.75 3.50 3.25 3.00 2.75 2.50 1 2 3 4 5 Time in hours 6 7 8 Fit statistics for various covariance structures Study of drug effect on asthma patients Analysis of residuals from PROC MIXED Obs Description ‐2 Res Log Likelihood 1 2 AIC (smaller is better) 3 AICC (smaller is better) 4 BIC (smaller is better) AR1 276.1 280.1 280.1 284.7 Toep 229 245 245.3 263.2 CS 348.2 352.2 352.2 356.7 UN 150.4 222.4 227.6 304.4 2k 2ln( Likelihood ) AIC 2k k 1 n k 1 k ln(n) 2ln( Likelihood ) Likelihood ratio test of the covariance structures Description ‐2 Res Log Likelihood covariance parms diff with UN d.f. difference P > ChiSq AR1 276.1 2 125.7 34 1.91189E‐12 Toep 229 8 78.6 28 1.07687E‐06 CS 348.2 2 197.8 34 5.32668E‐25 UN 150.4 36 James P. Geaghan - Copyright 2011 Statistical Techniques II Page 136 LSMeans There is something else about the SAS LSMeans statement you should know. There are actually several “unusual” or unexpected behaviors of this statement. One we will discuss in connection with Analysis of Covariance. However, there is another general behavior that we should see first. What is the overall mean? Tmt 1 2 3 4 5 6 7 1 2 2 3 4 3 5 3 2 4 6 3 4 6 5 Reps 3 6 7 6 7 4 8 7 Sum n 5 9 100 20 Tmt Mean 4 4 6 4 5 6 4 For the table above the arithmetic mean is 100/20 = 5 and the LSMean is 33/7 = 4.71. LSMeans calculates means as the mean of means, not the raw mean of all observations. This is particularly important in unbalanced factorial designs. For one unbalanced 4 by 5 factorial the means and lsmeans are given below. Raw data Tmt2 Tmt1 1 2 3 4 1 2 3 4 5 9 4 5 7 8 7 8 8 9 2 3 4 5 6 6 6 8 6 9 9 3 1 2 3 4 5 6 4 6 5 7 4 2 4 8 3 7 4 7 7 5 2 3 3 4 3 3 8 5 6 7 8 James P. Geaghan - Copyright 2011 Statistical Techniques II Page 137 Comparison of Means & LSMeans. Tmt1 1 2 3 4 LSMean Mean 1 3 7 6 8 6.00 6.08 2 4 6 7 8 6.25 6.20 Tmt2 3 2 5 5 6 4.50 4.30 4 3 8 5 6 5.50 5.25 5 3 3 8 6.5 5.13 4.73 LSMean 3.00 5.80 6.20 6.90 5.48 Raw Mean 3.00 5.50 6.00 7.00 5.35 Which is better, arithmetic means or LSMeans? This depends on the situation. Suppose we caught fish in the summer and in the winter, and wanted to express the average temperature at which fish were caught. The winter mean is 15c and the summer mean is 25c. What is the mean. We do the calculations on the individual catches and find the mean is equal to 24. How can that be? Well we did 180 samples in the summer and only 20 samples in the winter. So the summer temperatures dominate our samples. Perhaps the average temperature would be better expressed as 20, the mean of the means. That is LSMeans I generally use LSMeans. When testing hypotheses such as H 0 : 1 2 3 it is best that the overall mean not be dominated by some cell that has an unusually high number of observations. On the other hand, cells with more observations are better estimates of the mean than cells with fewer estimates. If the null hypothesis is true, why lose power by treating the cells equally? Traditional ANOVA will use RAW means in it's calculation. The choice is yours, except that PROC MIXED has only the LSMeans. Testing for differences between models PROC MIXED provides several tools for comparing models. The intent is to compare between full and reduced models. The statistics used differ from those used in regression. Reduced models may be models with some terms omitted, or Reduced models may be models with a simpler variance or covariance structure The test is called a likelihood ratio test It produces a Chi square statistic. The degrees of freedom are the d.f. difference between the two models. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 138 Homogeneous variance is tested automatically with some simple models Recall our Typhoid strain example, we requested separate variances for each group with the statement: REPEATED / GROUP=STRAIN; The resulting output was Null Model Likelihood Ratio Test DF Chi-Square Pr > ChiSq 2 14.56 0.0007 Note that fitting 3 variances requires 3 d.f., while fitting a homogeneous variance model requires only 1 d.f. The 2 d.f. difference are the reason the test on the preceding page is a 2 d.f. model. This test is very similar to Bartlett's test of homogeneity of variance. Suppose that for the baseball example you were told that the salaries of the some positions were highly variable, while others were more stable. Perhaps we should have tested for nonhomogeneous for this example. So we add the statement; REPEATED / GROUP=POSITION; SAS fits the different variances for the positions, but does not always provide a test. When this test is not provided automatically we can calculate our own test. For the original fit we got the results, Covariance Parameter Estimates Standard Z Cov Parm Estimate Error Value team 3466.41 30458 0.11 Residual 1924296 145057 13.27 Pr Z 0.4547 <.0001 Alpha Lower Upper 0.05 513.45 3.81E125 0.05 1668871 2243534 When separate variances are requested we get the following results, Covariance Parameter Estimates Cov Parm team Residual Residual Residual Residual Residual Residual Residual Residual Group Position Position Position Position Position Position Position Position 1b 2b 3b c if of p ss Estimate 25008 3126672 2276275 1512066 759251 626561 2558744 1875902 1384956 Standard Error 35506 0 902599 600277 201637 240028 407215 208345 364052 Z Value 0.70 . 2.52 2.52 3.77 2.61 6.28 9.00 3.80 Pr Z 0.2406 . 0.0058 0.0059 <.0001 0.0045 <.0001 <.0001 <.0001 Alpha 0.05 . 0.05 0.05 0.05 0.05 0.05 0.05 0.05 Lower 4960.25 . 1189304 789517 479387 333467 1916409 1526216 878092 Upper 26828515 . 5985011 3981295 1382686 1582294 3590143 2361923 2504484 SAS reports the number of parameters fitted in the “Dimensions” section. The first model estimated 2 parameters, while this model fits 9, a difference of 7. In order to do this 7 d.f. test we take the difference in the “–2 Res Log Likelihood” reported in the “Fit Statistics”. This value was 6346.8 for the reduced model and 6323.1 for the full model. The difference is 23.7, a chi square value with 7 d.f. The probability of a greater chi square value is 0.001286226, a significant result. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 139 As with regression, when there is a difference in two models the larger model is better, since it presumably provides some information that the smaller model does not. If there is no significant difference we decide in favor of the simpler model. We just tested homogeneity of variance. Other between model comparisons SAS also provides some other statistics to compare between models. Also under the “Fit statistics” you will find AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) 6341.1 6341.6 6346.8 And for the smaller model AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) 6350.8 6350.8 6352.1 These are all penalized index values called “Information Criteria”. As the note says, smaller is better for all 3. AIC is the Akaike Information Criteria AICC is the “Corrected AIC “ BIC is the Bayesian Information Criterion and there are others. These all work in a similar fashion. They provide an adjusted measure of goodness of fit. These are similar in concept to the “adjusted R2”, so they do not necessarily get smaller when the model gets larger. These results also indicate that the full model is better, but they do not provide a test with a probability value. Statistics quotes: USA Today has come out with a new survey - apparently, three out of every four people make up 75% of the population. David Letterman (1947 - ) Statistics quotes: "Statistics are no substitute for judgment." -- Henry Clay Statistics quotes: "Like dreams, statistics are a form of wish fulfillment." -- Jean Baudrillard James P. Geaghan - Copyright 2011 Statistical Techniques II Page 140 Analysis of Covariance Our previous encounter with Analysis of Covariance was from a “Multisource Regression” point of view. In multisource regression we were particularly interested in the regression aspects, particularly the slopes that would estimate some rates of change in Y relative to X. The indicator variable estimates intercept differences. The key concept here is that with multisource regression we are interested in the regression. We want the slopes, we want to interpret the slopes, and we want to know if slopes from two or more indicator variables are the same or not. However, the name “Analysis of Covariance” actually comes from a design perspective. In this case we are doing some designed experiment, with treatments, error, etc. And for whatever reason we feel the need to include a “regression type” X variable, this is the “covariable”. Why would we include a covariable? It is probably not by choice. It is often not a source of variation that we are interested in interpreting. If after starting a designed experiment we recognize that there is some source of variation that will inflate our error term, and if we find that we can account for that variation with a “covariable”, we may choose to do “Analysis of Covariance”. For example, we may be doing an agricultural experiment on fertilizer rates and realize that the plots in our experiment differ in terms of moisture level, and this is influencing our results. So we could measure soil moisture and include it as a covariable. Or we may be doing an experiment involving the influence of diet on blood sugar levels in diabetes patients when we realize that the patients initial weight is influencing our results. We could include the patients weight as a covariable. Studies of “weight gain” often include initial weight as a covariable. One researcher in crawfish aquaculture realized that water leakage from his pond was obscuring the results of the rice forage density that he intended to study. The effect of leakage was mitigated by including a covariable that measured leakage (the amount of water he added to keep some ponds from drying up completely). So what are we doing here? We have a source of variation that, if unaccounted for, would inflate our error term. We remove that variation from the error term by including a variable in the model. Sound familiar? Conceptually we are including the covariable for the same reasons that we include blocks. It is not a source of variation of interest; it is simply a way of removing variation from the error and increasing power by reducing the size of the error term. So, while in multisource regression we are fitting slopes that are of interest, and we have an interest in testing to see if the slope interactions are significant In Analysis of Covariance we are removing a source of nuisance variation from the error term. In this case we not only are not particularly interested in interpreting the slopes, we absolutely do not want the slope interactions to be significant. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 141 Why? Because in design we are interpreting differences in means, Treatment 1 Treatment 2 Treatment 3 Y X With a covariable added (no interaction) we are interpreting differences in regression “levels”. Treatment 1 Treatment 2 Treatment 3 Y X Treatment 3 If there are slope interactions then the level differences are not constant. Y Treatment 2 Treatment 1 X Range of interest Treatment 3 This can be a complete disaster, Y Treatment 2 Treatment 1 X Range of interest or a relatively minor problem. Treatment 3 Treatment 2 Y Treatment 1 X Our philosophy towards the slope interaction will be one of two approaches. Ignore the problem; don't even test for an interaction. After all, we are talking about a “block” interaction. Address the issue by testing the interaction, just as we would with most design interactions, and recognize that significant treatment interactions cannot be ignored. Ignoring the problem is tempting. It is easier. But in other cases where we ignore the block interaction, we feel that all block interactions represent the same experimental error. Is this true for slope interactions? Do they represent “error”. Maybe, a new analysis involving “random regressions” actually uses the slope interaction as an error term. But addressing the issue by testing the interactions is probably a better approach. First, we could put on our regression hats and actually try to interpret the different slopes as meaningful values. Or we could go ahead and test for levels even if we have significant slope effects. Statistics quotes: Statistics are like ladies of the night. Once you get them down, you can do anything with them. Mark Twain (Samuel Clemens)(1835-1910) James P. Geaghan - Copyright 2011 Statistical Techniques II Page 142 Will our results be meaningful? That depends. If the overlap in the lines is not too bad, we only need to determine where to compare the lines. Or Here? Y Treatment 3 Here? Treatment 2 Treatment 1 X Enter LSMEANS. The LSMEANS estimates has one other behavior that we have not seen. This behavior occurs when a covariable is present. Treatment 3 Y ? Treatment 2 Treatment 1 X With a covariable present, LSMEANS compares levels at a value of Xi equal to the mean of Xi. Here!!! Y Treatment 3 Treatment 2 X Treatment 1 X This has several advantages. Where is the most “meaningful” place to compare levels? In the middle of the range of observed data. Where is the confidence interval of a regression line narrowest? At the mean of the Xi values (note that the various treatment groups may not have exactly the same mean, so an overall mean is used). So this default behavior by LSMeans is both reasonable and relatively powerful. The SAS LSMeans output will look the same, a table of pairwise comparison probabilities (with adjustments if requested). So we may include a covariable in a design for the same reasons that we include blocks, increased power. If there are no slope interactions we have a constant difference between the parallel lines, and there is little problem with comparisons. LSMeans is probably still best because the confidence interval is narrowest at the mean of Xi. In many cases, if the overlap is not too bad, we can still get pretty good interpretations of levels by using LSMeans. In the worst cases, consider the possibility of interpreting the slope differences (by placing confidence intervals on them and seeing if they overlap). This may provide meaningful results, and may be good in other cases as well (not the worst). The End James P. Geaghan - Copyright 2011 ...
View Full Document

This note was uploaded on 12/29/2011 for the course EXST 7015 taught by Professor Wang,j during the Fall '08 term at LSU.

Ask a homework question - tutors are online