New Page 1[5/17/2012 10:44:31 AM]Lab 8 -- Reliability on SPSSJohn D. Morris - Florida Atlantic UniversityA Bit of Reliability TheoryThe ExampleSPSS StepsThe OutputInterpretationTwo of the most prominent criteria of quality of measurement instruments used in the behavioral sciences are validityand reliability. Validity refers to whether the test is "on-target"; is the test measuring what you intend it to measure? The reliability of an instrument refers to its accuracy. If we envision a True score on a construct of interest, and if weenvision the sum of all possible items measuring that construct (usually considered a population of items of infinitenumber) as yielding the True score, then our obtained score, using a sample of those items, is an approximation of theTrue score, with its accuracy depending on the number of items sampled and the average intercorrelation among thoseitems.It is the internal consistency of the items that leads to our positing that a reliable test would be one that correlates overrepeated testings (Test-Retest Reliability) or across forms (Alternate Forms Reliability). However, both of thesemeasures of reliability require two testings, moreover, they each have some theoretical problems, but do have theirplace.Cronbach showed that an internal consistency measure of reliability (it is usually referred to as α:either as Cronbach'sα, or Coefficient α)could be developed using just the number of items (k) in an instrument, and the averageintercorrelation among items (rij) as:α= k rij/(1 + (k- 1)rij). This reliability represents our estimate of the squared correlation between our observedscore and the true score, thus is our estimate of the percentage of the true score variance that is accounted for by ourobserved score.It is this reliability estimate that we will have SPSS calculate for a fictitious scale in this exercise. As well, SPSS willcalculate some other useful indices that are diagnostic of individual item performance.Note that this notion of reliability assumes that we have multiple items on a scale and that the observed score for anyindividual on that scale will be obtained by summing the individual item scores. In an achievement test, this usually(except in the case of partial credit) implies scores of "0," or "1" for an incorrect or correct answer respectively foreach item. For attitudinal measures using a scaling technique such as a Likert-type scale, this implies that the relativepositiveness of the attitude expressed on the item (e.g., 1, 2, 3, 4, or 5 for a 5-point scale) will be summed across allitems on the scale or subscale. An additional assumption is that the items are all measuring the same construct; thatdecision and analyses that may lead to it are assumed to have occurred previous to the reliability analysis.