100%(1)1 out of 1 people found this document helpful
This preview shows page 202 - 204 out of 549 pages.
also provide some protection against the possibility that any one of thosemeasures does not tap into the actual outcome of interest.Empirical demonstrations of the validity of a measure depend on somecomparison that shows that the measure yields the results expected if itwere, indeed, valid. For instance, when the measure is applied along withalternative measures of the same outcome, such as those used by otherevaluators, the results should agree to a reasonable order of approximation.Similarly, when the measure is applied to situations recognized to differ onthe outcome at issue, the results should differ. Thus, a measure ofenvironmental attitudes should sharply differentiate members of the localSierra Club from members of an off-road dirt bike association. Validity isalso demonstrated by showing that results on the measure relate to orpredict other characteristics expected to be related to the outcome. Forexample, an examination of concurrent predictive validity could assess theextent to which an assessment of the planning skills exhibited in theportfolios of work submitted by teacher candidates correlates with theirsupervisor’s ratings of their planning skills. Another type of predictivevalidity is especially salient when measuring a program’s short-termoutcomes. Predictive validity of the short-term outcome measures occurswhen these measures predict or are highly correlated with longer termoutcomes.SensitivityA primary function of outcome measures is to detect changes ordifferences in outcomes that represent program effects. To accomplish thiswell, outcome measures must be sensitive to such effects. The sensitivityof a measure is the extent to which the values on the measure change whenthere is a change or difference in the thing being measured. Suppose, forinstance, that we are measuring body weight as an outcome for a weight-loss program. A finely calibrated scale of the sort used in physicians’offices might measure weight to within a few ounces and, correspondingly,be able to detect weight loss in that range. In contrast, the weigh-in-motionscales for trucks on interstate highways are also valid and reliablemeasures of weight, but they are not sensitive to differences smaller than afew hundred pounds. A scale that was not sensitive to meaningful202
fluctuations in the weight of the dieters in the weight-loss program wouldbe a poor choice to measure that outcome.There are two main ways in which the kinds of outcome measuresfrequently used in program evaluation can be insensitive to changes ordifferences of the magnitude the program might produce. First, themeasure may include elements that relate to something other than what theprogram could reasonably be expected to change. These dilute theconcentration of elements that are responsive and mute the overallresponse of the measure. Consider, for example, a math tutoring programfor elementary school children that has focused on fractions and longdivision problems for most of the school year. The evaluator might choose