101A Statistics Effect Size Professor Esfandiari What does effect size mean conceptually? The concept of effect size appears in everyday language. For example, a weight loss program may boast that it leads to an average weight loss of 30 pounds. In this case, 30 pounds is an indicator of the claimed effect size. Another example is that a tutoring program may claim that it raises school performance by one letter grade. This grade increase is the claimed effect size of the program. These are both examples of "absolute effect sizes," meaning that they convey the average difference between two groups without any discussion of the variability within the groups. For example, if the weight loss program results in an average loss of 30 pounds, we do not know if every participant loses exactly 30 pounds, or if half the participants lose 60 pounds and half the participants lose no weight at all. In inferential statistics, an effect size helps to determine whether a statistically significant difference is a difference of practical concern. Given a sufficiently large sample size, a statistical comparison will always show a significant difference unless the difference in the population from which the data are sampled is exactly zero. The effect size conveys whether an observed difference is substantively important. This is in contrast to a statistical significance test, which assesses whether a relationship could be due to chance, regardless of the strength of the apparent relationship in the data. In metaanalysis, effect sizes are used as a common measure that can be calculated for different studies and then combined into an overall summary. Reporting effect sizes is considered good practice when presenting empirical research findings in many fields [1][2]. Effect sizes are particularly prominent in social and medical research. Relative and absolute measures of effect size convey different information, and can be used complementarily. Reported measures of effect size Pearson r correlation Pearson's correlation, often denoted r and introduced by Karl Pearson, is widely used as an effect size when paired quantitative data are available; for instance if one were studying the relationship between birth weight and longevity. The correlation coefficient can also be used when the data are binary. Pearson's r can vary in magnitude from -1 to 1, with -1 indicating a perfect negative linear relation, 1 indicating a perfect positive linear relation, and 0 indicating no linear relation between two variables. Cohen gives the following guidelines for the social sciences: small effect size, r = 0.1-.23; medium, r = 0.24-.36; large, r = 0.37 or larger. A related effect size is the coefficient of determination (the square of r, referred to as "rsquared"). In the case of paired data, this is a measure of the proportion of variance shared by the two variables, and varies from 0 to 1. An r2 of 0.21 means that 21% of the variance of either variable is shared with the other variable. The r2 is positive, so does not convey the polarity of the relationship between the two variables. Difference of the means as a way of calculating effect size Cohen's d is defined as the difference between two means divided by a standard deviation for the data d = ( X1 - X2 ) /S What precisely the standard deviation s is was not originally made explicit by Jacob Cohen because he defined it (using the symbol "") as "the standard deviation of either population (since they are assumed equal)". Other authors make the computation of the standard deviation more explicit with the following definition for a pooled standard deviation. If you assume that the standard deviations in the two populations are equal, then you can computer a pooled standard deviation. s= ( - S2(2 ) S22n 2 n 1 1+ - 2 / + - 1 ) ^ n 1 ^n 2 * * Glass's In 1976 Gene V. Glass proposed an estimator of the effect size that uses only the standard deviation of the second group[6]:78 = ( X1 - X2 ) /S2 S2 = S of the control group The second group may be regarded as a control group, and Glass argued that if several treatments were compared to the control group it would be better to use just the standard deviation computed from the control group, so that effect sizes would not differ under equal means and different variances. By convention, f2 effect sizes of 0.02, 0.15, and 0.35 are termed small, medium, and large, respectively Cohen's f2 Cohen's f2 is an appropriate effect size measure to use in the context of an F-test for ANOVA or multiple regression. The f2 effect size measure for multiple regression is defined as: f^2 = R^2 / (1 R^2_ where R2 is the squared multiple correlation. The point of effect size is that sometimes a factor can have an effect which is statistically significant, but small so small that you have to wonder about its overall importance in determining behavior. This is a theoretical issue, however. A small effect may have important consequences if it distinguishes two models, while a large effect may not matter if it was already expected under any theory. Reporting effect size in ANOVA The glm approach gives partial you Eta squared which is a measure of effect size. See the printout below: Tests of Between-Subjects Effects Dependent Variable:gain in the knowledge of the US constitution Source Corrected Model Intercept group gender group * gender Error Total Corrected Total Type III Sum of Squares 9312.698a 172868.086 9122.302 176.549 109.184 348172.990 562164.000 357485.688 df 3 1 1 1 1264 1268 1267 Mean Square 3104.233 9122.302 176.549 109.184 275.453 F 11.270 33.117 .641 .396 Sig. Partial Eta Squared .000 .026 .000 .332 .000 .026 .424 .001 .529 .000 1 172868.086 627.577 a. R Squared = .026 (Adjusted R Squared = .024) Parital eta square for group is 0.026 meaning that 2.6% of the gain in the knowledge of the Constitution is explained by the group (law related education plus social studies book vs. social studies book only). This has no practical significance. The reason behind the statistical significance o P = 0.000 for the group is the large sample size. Neither gender and nor the interaction effects are significant (P levels are 0,424 and 0.529 respectively) and thus their respective partial eta squares are close to zero or they do not explain any significant portion of the R square. Notice the R square is close equal to partial eta square. Eta-squared, the "correlation ratio", is one such measure, which for small effects is about equal to Cohen's effect size measure f2. However, it estimates for the sample and therefore has a positive bias; omega-squared is more complex but estimates for the population and is unbiased. It seems to be the preferred measure. However, SPSS gives instead the "partial Eta squared" measure, which is calculated a little differently from Eta squared so that it is not dependent on how many factors there are it gives the contribution of each factor or interaction, taken as if it were the only variable, so that it is not masked by any more powerful variable -- but as a result it comes out the highest of these three measures and the values for the various factors sum to more than 100%. Effect size measures Several standardized measures of effect are used within the context of ANOVA to describe the degree of relationship between a predictor or set of predictors and the dependent variable. Effect size estimates are reported to allow researchers to compare findings in studies and across disciplines. Common effect size estimates reported in bivariate (e.g. ANOVA) and multivariate (MONOVA, CANOVA, Multiple Discriminant Analysis) statistical analysis includes: Eta-squared, partial eta-squared, omega and intercorrelation (Strang, 2009). 2 ( eta-squared ): Eta-squared describes the ratio of variance explained in the dependent variable by a predictor while controlling for other predictors. Eta squared is a biased estimator of the variance explained by the model in the population (it only estimates effect size in the sample). On average it overestimates the variance explained in the population. As the sample size gets larger the amount of bias gets smaller. It is, however, an easily calculated estimator of the proportion of the variance in a population explained by the treatment. Note that earlier versions of statistical software (such as SPSS) incorrectly reports Partial eta squared under the misleading title "Eta squared". Eta square = SS treatment / SS total Partial 2 ( Partial eta-squared ): Partial eta-squared describes the "proportion of total variation attributable to the factor, partialling out (excluding) other factors from the total nonerror variation". Partial eta squared is normally higher than eta squared (except in simple one-factor models). Partial 2 = SS treatment / SS treatment + SS error The generally-accepted regression benchmark for effect size comes from (Cohen, 1992; 1988): 0.20 is a minimal solution (but significant in social science research); 0.50 is a medium effect; anything equal to or greater than 0.80 is a large effect size (Keppel & Wickens, 2004; Cohen, 1992). Since this common interpretation of effect size has been repeated from Cohen (1988) over the years with no change or comment to validity for contemporary experimental research, it is questionable outside of psychological/behavioural studies, and more so questionable even then without a full understanding of the limitations ascribed by Cohen. Note: The use of specific partial eta-square values for large medium or small as a "rule of thumb" should be avoided. Nevertheless, alternative rules of thumb have emerged in certain disciplines: Small = 0.01; medium = 0.06; large = 0.14 (Kittler, Menard & Phillips, 2007). Omega Squared Omega squared provides a relatively unbiased estimate of the variance explained in the population by a predictor variable. It takes random error into account more so than eta squared, which is incredibly biased to be too large. The calculations for omega squared differ depending on the experimental design. For a fixed experimental design (in which the categories are explicitly set), omega squared is calculated as follows: ^ 2 = (SS treatment df Error * MS error) / (SS total + MS error)
