9 Pages

brykraudenbush

Course: RSS 5030, Fall 2009
School: North Texas
Rating:
 
 
 
 
 

Word Count: 7094

Document Preview

Bulletin Psychological 1988, Vol. 104, No. 3, .196-404 Copyright 1988 by the American Psychological Association, Inc. 0033-2909/88/$00.75 Heterogeneity of Variance in Experimental Studies: A Challenge to Conventional Interpretations Anthony S. Bryk Department of Education, University of Chicago Stephen W. Raudenbush Graduate School of Education, Michigan State University The presence of heterogeneity of...

Register Now

Unformatted Document Excerpt

Coursehero >> Texas >> North Texas >> RSS 5030

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Bulletin Psychological 1988, Vol. 104, No. 3, .196-404 Copyright 1988 by the American Psychological Association, Inc. 0033-2909/88/$00.75 Heterogeneity of Variance in Experimental Studies: A Challenge to Conventional Interpretations Anthony S. Bryk Department of Education, University of Chicago Stephen W. Raudenbush Graduate School of Education, Michigan State University The presence of heterogeneity of variance across groups indicates that the standard statistical model for treatment effects no longer applies. Specifically, the assumption that treatments add a constant to each subject's development fails. An alternative model is required to represent how treatment effects are distributed across individuals. We develop in this article a simple statistical model to demonstrate the link between heterogeneity of variance and random treatment effects. Next, we illustrate with results from two previously published studies how a failure to recognize the substantive importance of heterogeneity of variance obscured significant results present in these data. The article concludes with a review and synthesis of techniques for modeling variances. Although these methods have been well established in the statistical literature, they are not widely known by social and behavioral scientists. Psychological researchers have tended historically to view heterogeneity of variance as a methodological nuisance, an unwelcome obstacle in the pursuit of inferences about the effects of treatments on means. In their discussion of variance heterogeneity, standard texts concentrate on identifying conditions under which such heterogeneity can safely be ignored so that standard analyses of means may proceed. It is usually argued that heterogeneity can be ignored when statistical tests for means are robust to violation of the homogeneity assumption (Glass & Hopkins, 1984, pp. 238-240; Hays, 1981, p. 287; Winer, 1971, pp. 37-39). When such violations cannot be ignored, analysts tend to assume heterogeneity must be eliminated. The primary strategy for eliminating heterogeneity is to find a transformation of the dependent variable that stabilizes treatment group variances, enabling retention of the homogeneity hypothesis (Kirk, 1982, pp. 79-84; Winer, 1971, pp. 397-402). There has been little discussion in the literature of the causes of heterogeneity in experimental studies. Light and Smith (1971) noted that heterogeneity is likely to occur in program evaluation studies. Although providing good exploratory data analysis advice for examining heterogeneity of variance across groups, their focus remained fastened on making appropriate inferences about mean differences between programs. In our view, to restrict considerations of variance heterogeneity to its effect on inferences about means is fundamentally misguided. We show in this article that the presence of heterogene- We wish to acknowledge support for this project from the Spencer Foundation and the Benton Center for Curriculum and Instruction at the University of Chicago. We also wish to thank the reviewers and Lincoln Moses for helpful comments on an earlier version of this manuscript. Correspondence concerning this article should be addressed to Anthony S. Bryk, Department of Education, University of Chicago, 5835 South Kimbark Avenue, Chicago, Illinois 60637. 396 ity of variance across groups in experimental studies indicates that treatments have differential effects across individuals. Rather than being a nuisance factor to be adjusted away, the presence of heterogeneity of variance is important empirical evidence of an interaction of treatments with some unspecified subject characteristics. To ignore variance heterogeneity, then, is tantamount to interpreting main effects while concealing significant interaction effects. Although it is generally understood that inferences about main effects are often misleading in the presence of interaction effects, ironically, we commit exactly the same error when we ignore heterogeneity of variance in experimental studies. Further, in many cases, the nature of these differential effects is substantively interesting and can be crucial to evaluating the efficacy of treatments (e.g., see Bloom, 1984; Bryk, 1978). A common example of this phenomenon occurs when an experimental treatment has an effect on some subjects but not on others. This can result from technical problems in applying treatments, or it can result from differential responsiveness of subjects to the treatment. In either case, the treatment both increases the variance and affects the mean. We present in this article a simple mathematical model that demonstrates how individual differences in treatment effects produce heterogeneity of variance across groups. On the basis of this model, we illustrate, with data from two previously published studies, a general strategy for examining the results of experiments when heterogeneity is present. The article concludes with a brief review of the necessary statistical theory for estimating treatment effects on variance and for testing hypotheses about these variance effects. Model of Treatment Effects The simplest experimental design consists of random assignment of individuals to either a treatment group or a control group. The traditional statistical model corresponding to this design is HETEROGENEITY OF VARIANCE 397 Y c E E = in + a + e, . E (1) Now, the expected value of the mean difference between groups is Here, 7, and Y represent outcome scores for individuals in the control and experimental groups, respectively, and ft represents the mean outcome across all individuals (in the population from which both groups were sampled). The error terms, e,c and e,E, reflect all variation across individuals that is related to the outcome other than the treatment effect. It includes both the effects of personal characteristics, which we denote byp,, and truly random features, including measurement error, which we denote by rt, so that for both control and experimental groups, As a result, the observed mean difference between the groups provides an unbiased estimate of the mean effect, na. As for the variances, "f 2 + r2, (6a) as before, but now, 2ff w + o>2 + ff ,2. (6b) where p, and r, are independent by definition and have means ofzero. The treatment effect, a, is conceived as a constant increment to each individual's outcome score and is the expected value of the mean difference between the treatment and control groups, that is, E[YiE]-E[Yic] = a. Thus, as soon as we allow treatment effects to have a distribution, heterogeneity of variance across groups will occur. Included in this heterogeneity is the linkage between person characteristics, PI, and the treatment effect, a,. Thus, in randomized experiments, heterogeneity of variance between groups can be viewed as an indicator that interaction effects of treatment with subject-specific characteristics are likely in the data.1 Simple Case In order to clarify the empirical consequences of individualized treatment effects, we consider the simplest case, in which both Pi and a, depend on just one measurable characteristic, Xi, In research on cognitive development, for example, X, might be a measure of cognitive ability. Thus, in this simple case, we model the person and treatment-specific effects as p Since the same effect, a, is added to the outcome of each individual in the experimental group, the mean difference between the groups, YE - Yc, constitutes the relevant estimate of the treatment effect. We also note that under random assignment, Var [p, ] = Var [p, ] = ap2 E c p,c]= 1a1^p,c|> (7a) where <rp is the outcome variance attributable to person-specific characteristics. Similarly, assuming no interaction of treatment with the measurement process, it also follows that Var[r,- ] = Var [r, ] = <rr E c 2 2 and i = M + fcXf, cl (7b) where o>2 is true random variation. It follows then that Var [Y,E] = Var [y( ] = o> + a?. c 2 Although ideal for purposes of statistical analysis, this model for treatment effects (Equation 1) is not very realistic in many situations. Why should every individual receive an identical boost? Surely some individuals must gain more than others from experiencing the treatment. This objection suggests that we generalize the model to represent individualized treatment effects. The model for the control group remains as before: f r, . c where without loss of generality we will assume that Xp' have means of zero. The 0\ parameter in Equation 7a captures the structural relationship between aptitude and achievement that exists in the absence of the experimental intervention. The allocation of treatment effects across individuals is represented through f>2XiE. When ft is positive, the larger treatment effects are allocated to the higher ability students. Conversely, when ft is negative, the lower ability students receive the bigger effects. Substituting Equation 7 into Equations 3 and 4, we now have YiC = M + p,XiC + r,c and Yf = n + ti, + + ft)*,* + rf. (8a) (8b) (3) In the treatment group, we now have Yf = \i + a, + pE + r,E, (4) Equation 8 demonstrates that the individualized treatment effects, modeled in Equation 7, imply a Treatment X Ability interaction effect. where at represents the effect of the treatment on individual;'. The treatment effect is now a random variable that has a mar2 ginal distribution with some mean, /*,,, and dispersion, <ra . It is reasonable to assume that a, will often depend upon the personspecific features represented by p,. As a consequence, a, and p, covary, that is, <W*0. (5) 1 We note that heterogeneity of variance can also result from differ- ential measurement error across groups. The presence of floor or ceiling effects on a test, for example, could cause this to occur. Even in this case, however, identifying heterogeneity of variance is important, because it indicates a model misspecincation. Failure to take this into account would result in a biased estimate of treatment effects. 398 ANTHONY S. BRYK AND STEPHEN W. RAUDENBUSH If the data were analyzed, however, in accordance with the conventional analysis of variance (ANOVA) model (Equation 1 ), we would find that (9a) and Var [Y, ] = (ft + ft) Var ) + <r, . E / = 2/ /-i and , J variables, (15) (16) 2 E 2 (9b) Because we are assuming experimental conditions in which individuals are randomly assigned to groups, Var (X,c) = Var (X,E) = Var (A). (10) Nevertheless, heterogeneity of variance still results, because Var [Y,E] - Var c] = ft (2/3, + ft) Var (A). (11) We assume, without loss of generality, that X is scaled so that 0i is positive. Then when ft is positive, the variance in the treatment group is larger than that in the control group. We call this a disequalizing situation, in that the bigger effects are allocated to the higher-ability students. Conversely, when --2/3, < ft < 0, the variance in the treatment group is smaller than that in the control group. We call this the equalizing case, in which the bigger treatment effects are allocated to the lower-ability students. Finally, there is an anomalous case, when ft < ~2ft, in which again the treatment effect is disequalizing. Although mathematically possible, this condition seems unlikely to arise. Thus, a larger variance in the treatment group will most often indicate a disequalizing allocation of individual treatment effects. A smaller variance in the treatment group always indicates an equalizing allocation. Suppose an analysis of covariance (ANCOVA) were performed instead of the standard ANOVA for the one-factor experimental design. If the sample sizes in the treatment and control groups are equal, then /SANCOVA = (Pc + &)/2 = (2ft + ft)/2. (12) Equation 15 models the underlying structural processes that form each individual's status in the absence of a treatment intervention. Equation 16 indicates how individual treatment effects are allocated with regard to the person-specific factors captured in the Xit variables. This model is really quite general in that any /3 coefficient or subset of coefficients may be set equal to zero. If a ft; coefficient is zero, this means that the treatment effects are being distributed without regard to that factor. If an element in ft; is zero, but ft; is nonzero, this means that the treatment has introduced a factor into the allocation process upon which natural development does not depend. For example, suppose we are comparing the effectiveness of an aural method of foreign language instruction with that of a more traditional approach. Although auditory acuity may play a negligible role in traditional instruction, it could be a very important factor for aural instruction. As a result, treatment effects are allocated as a function of what was previously an extraneous factor. Without loss of generality, we assume that all of the Xq s are scaled such that in the control group the correlation between each XtJ and Y is positive. As a result, positive elements in ftyS indicate that the treatment is disequalizing with regard to those factors, that is, that the treatment is amplifying preexisting differences among individuals. Conversely, negative elements in ft; s indicate equalizing effects (again discounting reverse distribution as unlikely). Assuming that we analyze these data in accordance with Equation 1, we would find that Var[r f E ]-Var[r ; c ] = <7a2 + 2ttf. = 2 ft;2 VarOj) + 2 [2 ft;ft; Var (A,)] j-i >-i / j + 1 (ft;ft/ + ft; ft/ + fty0y)Cov (X,,Xj) (17) 2 i / As a result, the true regression coefficients within each group deviate from /SANCOVA by the same amount: It follows that the residual variances computed on the basis of the ANCOVA model would be identical. Specifically, Var [Yi ] = Var [Yf] E = (ft/2) Var (X) + <rr . 2 2 (14) Here is a case in which the model is misspecified (heterogeneity of regression is ignored), and yet the heterogeneity of variance has disappeared. This result actually constitutes a special case that occurs when there are two groups of equal sample size. If the sample sizes vary or if there are more than two groups, heterogeneity of variance will generally accompany heterogeneity of regression (assuming that the varying slopes are not specified in the model). General Case We now extend the modeling of person- and treatment-specific effects to the multivariate case. (Hereinafter, we suppress the E and C sub- and superscripts in the interest of simplicity.) Let for ally+j'. If all of the ft; s are positive, Equation 17 must also be positive. In the presence of such disequalizing effects, the outcome variance will be greater in the treatment than in the control group. Conversely, in the presence of pure equalizing effects, that is, -2ft;<ft;<0 with for all (ISb) for all; (18a) the outcome variance will be smaller in the treatment group than in the control group. Clearly, there are also many cases between these two extremes with some ft; s positive and others negative. The overall net effect can be deduced from a comparison of group variances. When a treatment group's variance is smaller, this means that the net result of the process that allo- HETEROGENEITY OF VARIANCE Table 1 Basic Statistics and Key Results From Gagne and Cropper Study Correlation of retention Retention Group Verbal Visual Control n 42 46 45 m 33.5 96.04 37.8 75.69 32.2 187.69 d 4.570 4.348 5.156 Ability .22 .16 .33 V/S .28 -.12 .07 PrePre- achieverate ment 399 particularly promising one. The Cronbach and Snow (1977) reanalysis, employing more powerful regression techniques, detected some evidence of ATI effects on retention, although they cautioned about the suggestive character of those results. Table 1 displays key statistics reported in the Cronbach and Snow reanalysis of the Gagne and Gropper data. Although Cronbach and Snow (1977) noted the heterogeneity of variance among the experimental groups, neither pair of investigators recognized the substantive implications of this empirical result. As was demonstrated in the previous section, the presence of heterogeneity is indicative of the fact that significant interaction effects occurred in this experiment. Further, we see from Table 2 that both treatments are variance-reducing in comparison with self-paced instruction. This indicates that the effects of self-paced instruction were disequalizing in relation to the other two methods considered in this study. Pure self-paced learning amplified differences among students that the supplemental direct instruction attenuated. Cronbach and Snow's (1977) reanalysis fitted separate models for regressing retention on ability, V/S, achievement, and preachievement for each group. (The residual variances from this conditional model are displayed in Table 2). After controlling for these four variables, the variance difference between the verbal treatment and the self-paced instruction was no longer significant. The residual variance in the visual group, .02 .19 .27 .09 .36 .23 Note. V/S = verbal-spatial ability. Heterogeneity of variance test (retention): (1/2) 2JL| Vj (d, ~Sf = 9.807, x2(2), p < .01. This test is based on the log transformation of the standard deviation, d, = In (s/) + 1/u,. (See Equations 19through21.) cates treatment effects is to reduce existing differences among individuals. In contrast, when the treatment group's variance is elevated, the allocation process is amplifying these differences. Included in the latter case is the possibility that the treatment is activating new factors previously unrelated to the outcome of interest. Two Illustrations from the Literature Aptitudes Treatment Study of Verbal and Visual Methods of Instruction Our first example is an Aptitude X Treatment interaction (ATI) study (Gagne & Cropper, 1965) that was subsequently reanalyzed by Cronbach and Snow (1977). The primary purpose of the investigation was to test the hypothesis that the addition of visual illustrations to text would reduce the effects of general ability on the learning of verbal lessons. The central instruction was the same for all subjects, consisting of seven selfpaced programs/lessons on mechanical advantage. Subjects were randomly assigned to three groups: visual, verbal, and control. The two experimental groups were given special introductions to each of the seven lessons. The visual group saw film demonstrations of basic concepts. For the verbal group, the same demonstrations were described only in words. The final outcome variable was achievement retention (retention) on a test 1 month after the end of the lesson. Interim outcomes included time to work through the lesson (rate) and the achievement score immediately following the lesson (achievement). General ability (ability) and verbal-spatial ability (V/S) were assessed prior to the commencement of instruction. Each group undertook two preliminary programmed lessons that provided necessary background information on mechanical advantage. This phase was identical for all three groups. These lessons provided additional aptitude measures of prerate and preachievement. Gagne and Gropper (1965), using simple correlations and blocked ANOVA, found no significant ATI effects. Their report ended on a rather pessimistic note. They commented that there was "no reason why ATI effects could not have been revealed [in this study], if they truly existed" (p. 19) and further concluded that the ATI approach to studying learning was not a however, remains significantly smaller. This implies that there are still unspecified variables, the effects of which operate differently under visual instruction than under other methods. Whatever these specific variables are, it is clear that visual methods reduce the effect of unmeasured individual differences, differences that are amplified under both the verbal and the self-paced forms of presentation. Thus, by a careful consideration of variance differences across the experimental groups, we come to a considerably different conclusion than that reached by the original authors. Quite contrary to Gagne and Cropper's (1965) rather pessimistic ending, the evidence assembled in their study suggests that the visual form of instruction is a very promising method. The level of retention was higher and the effects distributed in a relatively equalizing fashion. Teacher Expectancy Experiment Our second illustration draws on results from an experiment conducted by Kester (1969), and later published by Kester and Table 2 Residual Variances in Retention (After Controllingfor Ability, V/S, Prerate, and Preachievement) Group Verbal Visual Control s^x 65.74 23.47 78.83 d 4.030 3.180 4.392 Note. V/S = verbal-spatial ability. One degree of freedom contrasts: tfo: In <rraw2 = In ammf; z = (4.030 - 4.392)/[2(l/37 + 1/40)]"2 = -1.12, n.s. H0:la Jvi,,,.,2 = (l/2)(ln <rralM2 + In ffconttol2); z = (3.180 - 4.211)/ [2/41 + 1/4(2/37+ 2/40)]l/2 = -2.47,p<.01. These tests are based on the procedure for examining linear contrasts given in Equation 22. 400 ANTHONY S. BRYK AND STEPHEN W. RAUDENBUSH Table 4 Treatment X Classroom ANCOVA (With Pretest IQ as a Covariate) Source Covariate Treatment Teachers Treatment X Teachers Residual Letchworth (1972). The study assessed the effects of experimentally induced teacher expectancies on pupil IQ. Within each of 24 classrooms, several students were assigned at random to either a high-expectancy or a control condition. In all, 75 students were assigned to each group, though data for one control student was lost. The authors found no effect of teacher expectancy on pupil IQ. In this study, too, the treatment had an effect on the variance that went unrecognized in the original investigation. As is indicated in Tables 3 and 4, a reanalysis of the Kester data using preexperiment IQ as a covariate reveals substantial differences in the residual variance in the experimental and control groups.2 Unlike our first example, the treatment apparently exerts a disequalizing effect here. Specifically, the residual variance in the experimental group (56.52) was significantly larger than the residual variance in the control group (32.59), F(73, 74)= 1.73,p<.02. Two different explanations are possible for this result. First, it might be that teachers differed in their response to the expectancy-inducing information. If some teachers acted on the basis of the inflated expectancies while others simply ignored them, a larger treatment group variance would result. Alternatively, the effect of the treatment might depend on student characteristics (e.g., varying individual responsiveness or needs for praise and reinforcement). This, too, could produce heterogeneity of variance. Further examination of the data sheds considerable light on the tenability of these alternative explanations. If, indeed, teachers responded differentially to the expectancy-inducing information, we would expect to find evidence that the magnitude of the treatment effect varied across classrooms. This hypothesis can be examined by introducing classrooms as an additional factor in the ANCOVA, with attention focusing on the Treatment X Classroom interaction effect. The 2 X 24 (Treatment X Classroom) ANCOVA, however, revealed no evidence of a significant interaction, F(l, 23) = ,83.3 This result indicates that the source of the heterogeneity is within classrooms, a result consistent with our second explanation. Specifically, the second explanation, that treatment effects depend on student characteristics, implies that the within-classroom variance would be larger for the treatment group than for the control group. In fact, this is exactly what occurs (see Table 3).4 The within-classroom variance was 54.69 in the experimental group and only 25.24 in the control group, yielding an experiment-to-control df 1 i 22 22 102 SS MS 1,147.91 83.28 1,863.24 721.91 4,021.97 1,147.91 83.28 84.69 32.81 39.43 2.54 .07 .83 n.s. variance ratio of 2.17, .F(51, 50) = 2.17, p < .01. Hence, the effect of the treatment did indeed depend on unmeasured student characteristics.5 Here, too, our framework for studying variance heterogeneity enabled us to take a study originally dismissed as producing null findings and to discover a potentially important result. The effects of teacher expectancies, rather than being negligible, were variable. Moreover, we were able to dismiss teacher differences as the source of this variability and to conclude instead that the treatment effects depended on unidentified student characteristics. Thus, the results of this experiment simple suggest that a more sophisticated subsequent study be undertaken to identify the precise student characteristics involved and thereby to contribute to a better understanding of the mechanism by which teachers' expectations differentially affect their students. Techniques for Modeling Variances Normal Theory Methods Clearly, a careful examination of variance should be a routine component in the analysis of data from psychological experi- Table 3 Basic Statistics and Key Results From Kester (1969) Study Statistic Pretest IQ mx Posttest IQ my Pretest s2 Posttest s,.2 Residual variance, S2lx Pooled within-classes residual variance, 2 V | ^classes Experimental 101.11 104.62 30.36 71.40 56.52 Control 100.07 102.47 28.20 47.47 32.59 Sample sizes, n 54.69 74 25.24 75 2 This first analysis was based on a simple treatment-control group design (ignoring classrooms), using pretreatmentlQ as a covariate. Specifically, the model was YtJ = </ + a; + S(XtJ - X,j) + e^ where fi is the grand mean, aj is the group effect (treatment versus control), is the regression coefficient pooled within the treatment and control groups, and Xq is the pretreatment IQ. The use of the pooled within-group coefficient was justified in that the separate regression coefficients in the treatment and control groups were not statistically different. The parameter estimates from this ANCOVA model were used to generate predicted outcomes, Ys, for each subject, and a residual variance, 2JV, (Yy -- Yy)2/(nj -- 1), computed separately for the treatment and control groups. 3 The model for this second analysis was Y& = p + / + 4 + tit + S(Xijk - Xg) + where a, denotes the treatment effect (experimental vs. control); 5* is the effect of classroom k, (k = 1, , 24); 7^ represents the Treatment X Classroom interaction effect; and /3 is the regression coefficient pooled within the 48 Treatment X Classroom cells. 4 As in the first analysis, parameter estimates based on this model were used to generate predicted outcomes Yijk, and a residual variance was computed separately for experimental and control groups using the formula 2,2* (IV - ijn)2/(2;2* * - 24), where njk is the number of children in treatment _/', classroom k. 5 This analysis treated both classrooms and treatments as fixed factors. As a reviewer pointed out, the optimal analysis of these data is HETEROGENEITY OF VARIANCE 401 ments. To facilitate future efforts of this sort, we review in this section basic statistical techniques for analyzing variances and testing hypotheses about them. Although this theory is well established in the statistical literature (see, e.g., Miller, 1986), it is not widely known by practicing researchers. The method for comparing the variances of two independent groups is well-known. Assuming the data are normally distributed, the ratio of two sample variances is distributed as an F statistic with t>, and % djk respectively, under the homogeneity of variance hypothesis. This is the standard parametric test for comparing variances in two groups. Extension to more than two groups is not direct, however, because there is no simple statistical theory for linear modeling of sample variances. The most common alternative is to transform the sample variances, Sj2, for each of the /groups such that the transformed statistics are approximately normally distributed. Estimation- and hypothesis-testing techniques from normal distribution theory are then applied. Several normalizing transformations of Sj2 have been suggested in the literature, including the log transform and the square root and cubed root afsj2 (see Kendall & Stuart, 1969; or for a more recent review, Raudenbush & Bryk, 1987). Although the cube root transform converges very quickly to normality and thus offers distinct advantages with very small sample sizes, the log transformation is generally preferable for several reasons. First, linear contrasts among the In (y/) are invariant to changes of scale in the raw data (Box & Tiao, 1973). for instance, standardizing the data around the grand mean for several groups has no effect on estimated contrasts among the log-transformed variances. This invariance property does not hold s2, s, or the cube root transform. Second, when the raw data are normal, the In (jj2) are approximately normally distributed and with stable variance. The approximate sampling variance for In (i/) is 2/ty, which does not depend on the population variance, a1. Third, a bias correction factor for In (j/) can be introduced that improves the accuracy of the asymptotic approximation. Raudenbush and Bryk (1987) demonstrate that this bias-adjusted log transform is an excellent approximation with sample sizes as small as 10 per group, which covers most of the cases likely to be encountered in social and behavioral research. Specifically, we define the transformation nibus test for heterogeneity of variance, as was previously used in Table 1 . For an ANOVA design, if we define (20) where the summation is taken across all of the cells of the design, j = ! , , / , then the test statistic for the omnibus homogeneity of variance hypothesis is (21) which has an approximate chi-squared distribution with J - 1 degrees of freedom. Because the dj are approximately normally distributed, we can also perform both simple and complex contrasts among specific cell values. In general, for any linear contrast, j %Cjdj with 2 0 = 0 among the /-transformed variances, the test statistic (22) follows a z distribution under the null hypothesis. This test was also used in Table 1, in which we separately compared the verbal and visual methods of instruction with the control condition. In general, any standard multiple comparison procedure can be applied to the dj statistics. For a further discussion and illustration of these techniques, see Games (1978a, 1978b). His development is identical to ours except that he does not take into account the bias correction factor in Equation 19. In fact, modeling of variances can be approached as a general linear model problem. Specifically, Raudenbush and Bryk (1987) have formulated a mixed model for sample variances utilizing the log-transformed dj introduced above. These dj are represented as a linear function of a set of fixed effects including possible design variables and of random effects that might result from sampling units such as schools, classrooms, or persons. Sensitivity to Distributional Assumptions Parametric test statistics for variances are not robust to violation of the normality assumption for the raw data. In particular, the distribution of the test statistics are sensitive to kurtosis in the parent distribution. If the raw data have "fat tails" (a platykurtic distribution), the true a levels will be underestimated and the probability of a Type I error increased. With leptokurtic data, the reverse is true (for a review, see O'Brien, 1979). Table 5 displays the kurtosis values, 72, of several common distributions. For each distribution, we present a correction factor that can be used in two different ways to guard against invalid inferences. First, if the underlying distribution is known or can be approximated, standard errors can be adjusted for kurtosis. Suppose, for example, that the outcome variable is a behavioral count, such as the frequency of student-initiated interaction during a class, and that students initiate an average of one interaction each. This outcome should be well represented as a Pois- where Vj is the degrees of freedom in group./ and \/Vj is the bias correction factor. These transformed variances can then be used in any standard linear model technique. For example, as first suggested by Bartlett and Kendall (1946), we can estimate the residual variance separately for each cell in an ANOVA design and perform an ANOVA on the transformed variance statistics. A simple application is the om- based on a mixed model with treatments fixed and classrooms random. However, the F test for the Treatment X Classroom interaction is the same using both models if the design is balanced (Kirk, 1982, p. 391). Given that the Kester data were nearly balanced and that the F test was trivially small, no further analysis was needed. 402 Table 5 ANTHONY S. BRYK AND STEPHEN W. RAUDENBUSH than unity). Thus, the sensitivity analysis supports the normalbased inference. When nonnormal data occur, nonparametric tests for variCorrection factor C=(l+72/2)"2 2.00 1.58 1.22 1.09 1.06 1.03 1.22 1.12 1.08 1.00 .87 .76 .59 Correction Factors for Several Distributions Distribution t, df= 5 t, df" 6 f, df= 10 72 = kurtosis 6.00 3.00 1.00 ances represent another option (for a review, see Miller, 1986, chap. 7). One fairly flexible technique, discussed by O'Brien (1979), is a generalization of the Scheife test (see Glass & Hopkins, 1984, p. 356) that involves use of jackknife-type estimators. This approach is more complex computationally, and as O'Brien notes, the variance of these estimators depends upon their means, and this dependence can be problematic. t, df= 20 (, df= 30 t, df= 60 .37 .23 .11 1.00 Poisson, n = 1 Poisson, /i = 2 Poisson, ju = 3 Normal Unimodal, symmetric likert Beta (p = 2,q = 2) Uniform (or uniform likert) .50 .33 0.0 .50 -.86 -- 1.30 Discussion Both of the examples presented in this article were chosen to illustrate an important point. Substantively significant empirical results have been ignored because research methodology has tended to focus exclusively on mean differences. Standard methodological training has left researchers largely unaware of the theoretical significance of variance heterogeneity, partly because basic texts tend to view such heterogeneity as a methodological nuisance rather than a source of important information. The practice of routinely searching for data transformations that will eliminate heterogeneity is misguided. Although such transformations may be warranted, their legitimacy derives from purely substantive considerations; that is, a transformation is justified only if the transformed metric is more meaningful than the original metric. Variance, stabilizing transformations are necessarily nonlinear transformations. Hence, the Note. For the t distribution, y 2 = 6/(df-- 4). For the Poisson distribution, 72 = V- ~ 1. The Poisson is a sensible model for the probability of n events in a unit interval of time, where () = 41. Examples might include student-initiated interactions or days absent. The unimodal symmetric Likert has five categories with probabilities of .1, .2, .4, .2, and .1, respectively. The beta distribution with p = 1, q = 2, is unimodal, symmetric, and truncated so that the variable takes on values between 0 and 1. The kurtosis depends on the fourth moment of the data distribution and indicates the density of observations in the tails of the distribution. See Johnson and Kotz (1970). son variate with mean, ji = 1. For this case, the correction factor is C = 1.225. The value of the z statistic is simply divided by C to obtain a test statistic that is corrected for the kurtosis in the parent distribution. Without the correction, the z test would be too liberal, resulting in elevated Type I errors. Note that if there were an average of three interactions per student (i.e., / = 3), then the correction factor, C, would be 1.080. Alternatively, suppose the outcome is a Likert scale with an approximately uniform distribution of responses. Now the appropriate correction factor would be .59. Without this correction, the z test would be too conservative. The second way of using these correction factors is as a form of sensitivity analysis. One simply calculates the value of the correction factor needed to reverse an inference. The question becomes, Is it possible that the data were actually generated from such a distribution? For example, consider the analysis presented in Table 3 for the Kester study. Assuming normality, the z test of the difference between experimental- and control-group (pooled within classrooms), is log (54.69) + 1/50 - log (25.24) - 1/51 = 2 ?J V2/50 + 2/51 p < .005. Because the critical value of z at a = .05 is 1.96, the correction factor needed to overturn the inference is 2.75/ 1.96 = 1.40. As Table 5 shows, this correction factor is associated with a rather fat-tailed distribution, a t distribution with approximately 8 degrees of freedom. However, examination of a normal probability plot actually showed a distribution with "thin tails" (so that the correction factor is likely to be smaller variances original and transformed metrics cannot both be interval measures of the same construct. Because linear model analyses require the outcome to be measured on an interval scale (at least approximately so), a variance-stabilizing transformation is justified only if the transformed metric approximates an interval scale measure of the construct better than does the original metric. Most analyses of experimental data assume that the treatment is a fixed entity that can be formally defined and uniformly administered to each individual. In such a case, the only source of heterogeneity of variance is individual differences in responsiveness to the fixed treatment. Such differences constitute interaction effects between person characteristics and treatment group membership. In many research contexts, however, the treatment that is actually implemented may vary across individuals. We hypothesized such effects, for example, in considering the heterogeneity in Kester's (1969) teacher expectancy study. More generally, research on instruction often utilizes several classrooms, therapy groups, or other groupings in order to obtain a sufficient subject sample. It is reasonable to assume, however, that the different teachers in these classrooms will vary in their use of the instructional intervention. The resultant variations in the treatment implementation as well as differences in individual subjects characteristics can produce variance in the treatment effects. One obvious response to our objections to the constant treatment effect model (Equation 1) is that although this model is not literally correct, it is still useful in that it provides an adequate summary measure of a treatment's overall effect. Knowledge that a treatment works well on the average is considered to HETEROGENEITY OF VARIANCE 403 Test homogeneity of variance retain Ho / .reject Ho Estimate constant treatment effect, a Search for possible interaction effects of variance retain Ho/ reject Hc Estimate p.a and report treatment effects across a range of values on the interacting factors Conduct post-hoc investigation of sources of heterogeneity of variance Figure I. Decision tree in analyzing ex...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

North Texas - RSS - 6810
Examining Multivariate NormalityTo test for multivariate normality:Help file at: http:/rss.acs.unt.edu/Rdoc/library/mvnormtest/html/mshapiro.test.htmlFor example.10 columns of 1000 rows of random normal deviates: library(mvnormtest) mshapiro.tes
North Texas - RSS - 6810
Multivariate Behavioral Research, 1987,22,267-305A Brief History of the Philosophical Foundations of Exploratory Factor AnalysisStanley A. MulaikGeorgia Institute of TechnologyExploratory factor analysis derives its key ideas from Inany sources
North Texas - RSS - 6810
Methods in PsychiatryFinding Our Way: An Introduction to Path AnalysisDavid L Streiner, PhD'Path analysis is an extension of multiple regression. It goes beyond regression in that it allows for the analysis of more complicated models. In particul
North Texas - RSS - 5700
North Texas - RSS - 6810
Resampling methods: concepts, applications, and justification. Yu, .http:/pareonline.net/getvn.asp?v=8&amp;n=19Yu, Chong Ho (2003). Resampling methods: concepts, applications, and justification. Practical Assessment, Research &amp; Evaluation, 8(19). Ret
North Texas - RSS - 5700
THESCIENCEOF PSYCHOLOGYWhat are the aims of science and what place has psychology and statistics within it?Outline Howdowecometoknowanything? DefiningScience Characteristics ObjectivesandTechniques PhilosophicalIssues:Howdoesscienceproceed?
North Texas - RSS - 5700
MORE ON THE SCIENCE OF PSYCHOLOGYMore thoughts from MikeThe public's perspective Early on (and I mean way back) science made bold claims and spoke in absolute terms However, the only thing that was consistently discovered was that there w
North Texas - RSS - 5700
Additional ThoughtsCompeting ideas != competing truths The theories regard the truth but are not the truth itself Truths, by definition cannot compete with one another. You cannot have A and not A both be true. Taking context into consideration
North Texas - RSS - 5700
MORE ON THE SCIENCE OF PSYCHOLOGYDevelopment and Testing of Research IdeasOutlineDoing research The Role of Ideas Development Theories and Hypotheses Theories vs. Facts Clear Ideas Previews: Hypothesis Testing, Power, Effect Size, Repli
North Texas - RSS - 5700
Measurement ErrorWhatever measurement we might make with regard to somepsychological construct, we do so with some amount of error Any observed score for an individual is their true score with error addedinThere are different types of &quot; error
North Texas - RSS - 5700
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt; &lt;Error&gt;&lt;Code&gt;InternalError&lt;/Code&gt;&lt;Message&gt;We encountered an internal error. Please try again.&lt;/Message&gt;&lt;RequestId&gt;A450DC317D33764E&lt;/RequestId&gt;&lt;HostId&gt;kfvr8Tqt4mDPrcGSnKZz yRrIAJZUYpt9RWA9ucwQK8G/hlm3YVYuvJJcfv3+
North Texas - RSS - 5700
Initial Data AnalysisDISTINCTIONSSome DistinctionsPopulation vs. Sample Descriptive vs. Inferential stats Variables Types of data Quantitative versus Categorical Measurement scalesPopulationThe entire collection of events that you are in
North Texas - RSS - 5700
Initial Data AnalysisBeginning the Visualization of DataPlotting DataOften, the first thing one does with data is to plot frequency distributions. Usually this is done by first creating a table of the frequencies broken down by values of the re
North Texas - RSS - 5700
Initial Data AnalysisCentral TendencyOutline What is `central tendency'? Classic measuresMean, Median, ModeWhat's an `average'? Properties of statistics Sufficiency Efficiency Bias ResistanceResistant measuresMeasures of Centr
North Texas - RSS - 5700
Central TendencyMechanicsNotation When we describe a set of data corresponding to the values of some variable, we will refer to that set using an uppercase letter such as X or Y. When we want to talk about specific data points within that set, w
North Texas - RSS - 5700
Measuresofvariability:understandingthe complexityofnaturalphenomena Inadditiontoknowingwherethecenterofthedistributionis,itis oftenhelpfultoknowthedegreetowhichindividualvaluescluster aroundthecenter(orperhapsdont) Thisisknownasvariability,
North Texas - RSS - 5700
Variability MechanicsThe Average Deviation Another approach to estimating variance is to directly measure the degree to which individual data points differ from the mean and then average those deviations. That is:(X - X ) NThe Average Deviati
North Texas - RSS - 5700
THENORMAL DISTRIBUTIONOBJECTIVESReviewtheNormalDistribution PropertiesoftheStandardNormalDistribution ReviewtheCentralLimitTheorem UseNormalDistributioninaninferential fashionTHEORETICALDISTRIBUTIONEmpiricaldistributionsbasedondata Exam
North Texas - RSS - 5700
Normal DistributionPractice with z-scoresProbabilities are depicted by areas under the curve Total area under the curve is 1 Only have a probability from width For an infinite number of z scores each point has a probability of 0 (for the single
North Texas - RSS - 5700
Thinking About ProbabilityOutline Basic Idea Different types of probability Definitions and Rules Conditional and Joint probabilities Essentials of understanding stats Discrete and Continuous probability distributions Density Permutations
North Texas - RSS - 5700
Probability MechanicsLaws of probability: Addition The question of Or p(A or B) = p(A) + p(B) Probability of getting a grape or lemon skittle in a bag of 60 pieces where there are 15 strawberry, 13 grape, 12 orange, 8 lemon, 12 lime? p(G) = 13/
North Texas - RSS - 5700
The Sampling DistributionIntroduction to Hypothesis Testing and Interval Estimation OutlineDistinctions Sampling Distribution The Central Limit Theorem Confidence Intervals Random Sampling Key things to keep in mindPopul
North Texas - RSS - 5700
Sampling distributionDo not `read' this. It is meant to be watched only.POPULATION Any and usually undefinable N , Sample Size = NX, sStart with just a single random sample from the population.POPULATION Any and usually undefinable N , S
North Texas - RSS - 5700
Null Hypothesis Signficance TestingConsider the general approach and associated problemsSome thoughts &quot;Statistical significance testing retards the growth of scientific knowledge; it never makes a positive contribution&quot; (Schmidt &amp; Hunter, 1997
North Texas - RSS - 5700
Getting Started with Hypothesis TestingThe Single SampleOutline Remembering the binomial situation and z-score basics Hypothesis testing with the normal distribution When is unknown the t distribution One vs. Two-tails ProblemsRecall the
North Texas - RSS - 5700
CorrelationOh yeah!Outline Basics Visualization Covariance Significance testing and interval estimation Effect size Bias Factors affecting correlation Issues with correlational studiesCorrelation Research question: What is the relations
North Texas - RSS - 5700
Correlation MechanicsCovariancencov( x, y ) =i =1( xi - x)( yi - y ) n -1 The variance shared by two variables When X and Y move in the same direction (i.e. their deviations from the mean are similarly pos or neg) cov (x,y) = pos. When
North Texas - RSS - 5700
RegressionUnderstanding relationships and predicting outcomesKey concepts in understanding regressionThe General Linear Model Prediction and errors in prediction Coefficients/weight Variance explained, variance not accounted for Effect of out
North Texas - RSS - 5700
The intelligent and valid application of analytic methods requires knowledge of the rationale, hence the assumptions, behind them. ~Elazar PedhazurIssues to deal with before analysis Measurement reliability Model specificationhAfter analysis
North Texas - RSS - 5700
The General Linear Model with Categorical Predictors Regressioncan actually handle different types of predictors, and in the social sciences we are often interested in differences between groups For now we will concern ourselves with the two ind
North Texas - RSS - 5700
Full exampleSimple RegressionData Howell dataset Hassles Number of minor daily hassles reported during a month (measure of stress) Psychological symptomologyFirst descriptivesOuch!Pearson r = 0.61Big difference!Hassles Symptomsn mean