Few people would be so intimately familiar with the

This preview shows page 331 - 333 out of 549 pages.

this test a big effect or a small one? Few people would be so intimately familiar with the items and scoring of this particular math achievement test that they could interpret statistical effects directly into practical terms. Interpretation of statistical effects on outcome measures with values that are not inherently meaningful requires comparison with some external frame of reference that provides a practical context for those effects. With achievement tests, for instance, the average scores for students in different grades in the school might be available. Suppose that the mean score for sixth graders in the school was 47 and the mean for seventh graders was 55. This 8-point increase (an effect size of .53, assuming a standard deviation of 15) thus represents the average increase in mathematics achievement scores associated with a full year of schooling. The evaluator and key stakeholders might agree that an effect of the math tutoring program that represents a 20% improvement over average grade level performance would be about the least they would expect from the program given the effort and cost it requires. The corresponding MDES for the impact evaluation thus would be .106 (20% of .53). Some outcome measures may have a preestablished threshold value that can be used as a referent for interpreting the practical significance of statistical effects, or it may be possible to define a reasonable success threshold if one is not already defined. With such a threshold, statistical effects can be assessed in terms of the proportion of individuals above and 331
below that threshold. For example, an impact evaluation of a mental health program that treats depression might plan to use the Beck Depression Inventory as an outcome measure. On this instrument, scores above 20 are generally recognized as indicating moderate to severe depression. One way to identify a minimal program effect that would have practical significance, therefore, is to ask the most relevant stakeholders to specify the smallest proportion of depressed patients moved below this threshold they would consider a worthwhile program effect. Suppose in this example that intake data could be used to establish that 60% of the patients scored above the threshold for moderate to severe depression at baseline, and key stakeholders agreed that the least they would find acceptable is sufficient improvement in one fourth of those patients to move them below the threshold (.25 × .60 = .15). This implies that the minimum acceptable change would increase the percentage below the threshold from 40% to 55%. These are referred to as a 40–60 and a 55– 45 split in the under-over ratio of patients, respectively. Assuming a normal distribution of scores, a table of areas under the normal curve shows that a 40–60 split in the distribution occurs at a z score of –.25, and a 55–45 split occurs at a z score of .13. Z scores are in standard deviation units, so their difference of .38 provides the corresponding MDES value.

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture