Also provide some protection against the possibility

This preview shows page 202 - 204 out of 549 pages.

also provide some protection against the possibility that any one of those measures does not tap into the actual outcome of interest. Empirical demonstrations of the validity of a measure depend on some comparison that shows that the measure yields the results expected if it were, indeed, valid. For instance, when the measure is applied along with alternative measures of the same outcome, such as those used by other evaluators, the results should agree to a reasonable order of approximation. Similarly, when the measure is applied to situations recognized to differ on the outcome at issue, the results should differ. Thus, a measure of environmental attitudes should sharply differentiate members of the local Sierra Club from members of an off-road dirt bike association. Validity is also demonstrated by showing that results on the measure relate to or predict other characteristics expected to be related to the outcome. For example, an examination of concurrent predictive validity could assess the extent to which an assessment of the planning skills exhibited in the portfolios of work submitted by teacher candidates correlates with their supervisor’s ratings of their planning skills. Another type of predictive validity is especially salient when measuring a program’s short-term outcomes. Predictive validity of the short-term outcome measures occurs when these measures predict or are highly correlated with longer term outcomes. Sensitivity A primary function of outcome measures is to detect changes or differences in outcomes that represent program effects. To accomplish this well, outcome measures must be sensitive to such effects. The sensitivity of a measure is the extent to which the values on the measure change when there is a change or difference in the thing being measured. Suppose, for instance, that we are measuring body weight as an outcome for a weight- loss program. A finely calibrated scale of the sort used in physicians’ offices might measure weight to within a few ounces and, correspondingly, be able to detect weight loss in that range. In contrast, the weigh-in-motion scales for trucks on interstate highways are also valid and reliable measures of weight, but they are not sensitive to differences smaller than a few hundred pounds. A scale that was not sensitive to meaningful 202
fluctuations in the weight of the dieters in the weight-loss program would be a poor choice to measure that outcome. There are two main ways in which the kinds of outcome measures frequently used in program evaluation can be insensitive to changes or differences of the magnitude the program might produce. First, the measure may include elements that relate to something other than what the program could reasonably be expected to change. These dilute the concentration of elements that are responsive and mute the overall response of the measure. Consider, for example, a math tutoring program for elementary school children that has focused on fractions and long division problems for most of the school year. The evaluator might choose

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture