Unformatted text preview: Reliability Today
q q True Score theory Kinds of Reliability What Is Reliability?
q The "dependability" of a measure The "consistency" of a measure q Reliability
q q q q The consistency or repeatability of a measure The degree to which a measure would give you the same result over and over, assuming the phenomenon being measured is not changing Cannot be calculated, only estimated Based on true score theory of measurement What's the big deal about reliability?
q q Post-positivism/critical realism Measurement!!! Key to everything else--without it you have nothing! Basic assumption in understanding reliability is that no one measure is absolutely or totally free of error or bias. Measurement of reliability helps us to understand the bias inherent in our measures. True Score Theory
et = True ability + Random error 1 ely . ctiv ely effe ctiv d ime ffe an et se ag n tion rce Ma ma ou for res 1 f in nt. ge 5 eo s. na ic. ud porta Ma ask 4 ecif ultit im le t t sp 2 a m hat is 3 ltip no an mu 5 2 Sc ide w are ge s c na 4 on 3 de ma ecti to 3 dir ow 5 en e 2 eh wh cid 4 1 de ecid ork D w nd 3 he na et 54 2 atio niz ely iv rm ga 4 1 ect Or info eff 3 of ime de 55 2 et ltitu nt. ag 4 1 an mu orta M a p 3 im an 51 Sc at is 2 wh 4 3 1 3 5 2 4 1 3 2 1 ti Ra e Sh ng X T + e ...and the whole model is represented in equation form as... True Score Theory
Observed score True ability + Random error = It follows from this that the variability of the observed score, X, is the additive combination of the variability of the true score and the error. var(X) = var(T) + var(e) Random and Systematic Errors in Measurement X=T+e X = T + er + es
X = observed score T = true ability e = error (both random and systematic) er= random error es= systematic error Random Error q Random error is caused by any factors that randomly affect measurement of a variable across the sample. Adds variability to data but does not affect group average Random Error Systematic Error Reducing Measurement Error q q q q q Pilot test measures Train interviewers or observers Double check data thoroughly Statistically adjust for error Use multiple measures of the same construct Examples:
q q How do you know if you are reliably measuring the functioning of the cardiovascular system? How do you know if you are reliably measuring attention deficit hyperactivity disorder in children? Assessing ADHD
To what degree does a child: 1. Fidgit 2. Become easily distracted 3. Have difficulty remaining seated 4. Shift from one incompleted task to another
q Rate each question on a 5 point scale In practice...
Take a group of 100 children:
q q q q Ask parents to fill out the ADHD survey Children are different on their scores (probably normally distributed) Why are the kids different (VAR X)? Because they are really different in ADHD (VAR T) AND because of error (VAR E) Reliability is...
q q The ratio of observed variance that is due to "true score" Since you can't ever directly measure "true score", you have to infer it by looking at the relationships among different measurements of the "true score" Types of Reliability Reliability is the Consistency of what?
1. Observers or raters Tests over time Different versions of the same test A test at one point in time 1. 1. 1. Inter-Rater or Inter-Observer Reliability ADHD ? =
Observer 1 Observer 2 Inter-Rater or Inter-Observer Reliability
q Have two raters rate the same kids Parent and teacher Tester1 and Tester 2 q q q Are different observers consistent? Can look at percent of agreement (especially with category ratings). Can use correlation (with continuous ratings). In our example...
q 5 categories What percentage of the time do both testers give the same kid the same rating? q Continuous All kids get get a 1 5 for each item for each tester Example
KidID q 1 q 2 q 3 q 4 T1Fidget 3 2 1 5 T2Fidget 4 2 1 4 Test-Retest Reliability Stability over time ADHD = ADHD Time 1 Time 2 Test-Retest Reliability
q q q Measure ADHD at two times for the same group of people Compute correlation between the two measures across time. Correlation (ranges from 1 to 1) 0 = no relationship 1 = perfect positive relationship -1 perfect negative relationship Parallel-Forms Reliability Form A Form B Time 1 = Time 2 Parallel-Forms Reliability Form A Stability across forms Form B Time 1 = Time 2 Parallel-Forms Reliability
q q q q Administer both forms to the same people. Get correlation between the two forms. Example: If you take the SAT twice, you won't get the same test In our ADHD example: Come up with 5 new questions and see if those relate to the old ones Types of Reliability
q Single test administered to a sample on one occasion Assesses the consistency of the results for different items for the same construct within the measure Average inter-item correlation Cronbach's Alpha q Internal Consistency Reliability
Fidget Distract ADHD Seat Shift Average inter-item correlation
F F 1.00 D .89 Seat .91 Shift .88 D 1.00 .92 .93 Seat Shift 1.00 .95 1.00 Internal Consistency Reliability
Fidget Distract ADHD Seat Shift Average inter-item correlation
F F 1.00 D .89 Seat .91 Shift .88 D 1.00 .92 .93 Seat Shift 1.00 .95 1.00 .91 KidID q 1 q 2 q 3 q 4 Fidget Distract 3 4 2 2 1 1 5 4 Seat 4 3 2 5 Shift 3 2 3 5 Internal Consistency Reliability
Fidget Distract ADHD Seat Shift
F D Seat Shift Average item-total correlation
Total .84 (Total = Distract+Seat+Shift) .88 (Total = Fidget+Seat+Shift) .86 (Total = Fidget+Distract+Shift) .87 (Total = Fidget+Distract+Seat) .86 KidID q 1 q 2 q 3 q 4 Fidget Distract 3 4 2 2 1 1 5 4 Seat 4 3 2 5 Shift 3 2 3 5 Internal Consistency Reliability
Fidget Distract ADHD Seat Shift Split-Half = .87 Split-half correlations Item 1 Item 3 Item 2 Item 4 But other possible split halves
Fidget Distract ADHD Seat Shift Split-Half = .85 Split-half correlations Item 1 Item 2 Item 3 Item 4 But other possible split halves
Fidget Distract ADHD Seat Shift Split-Half = .91 Split-half correlations Item 1 Item 4 Item 2 Item 3 KidID Fidget q 1 3 q 2 2 q 3 1 q 4 5 Distract 4 2 1 4 Seat 4 3 2 5 Shift 3 2 3 5 Cronbach's Alpha Internal Consistency Reliability
Cronbach's alpha ( ) SH1 SH2 SH3 .87 .85 .91 = .88 Alpha = the average of all possible split halves More Important when you have a lot of items Internal Consistency Reliability - Summary q q q q Average inter-item correlation Average item-total correlation Split-half reliability Cronbach's alpha ()--used most often!! When is something reliable?
q Usually use Cronbach's alpha Above .85 excellent Between .75 and .85 ok, but not great Between .70 and .75 marginal Below .70 not acceptable ...
View Full Document
This note was uploaded on 08/26/2009 for the course BB H 310W taught by Professor Saltsman,brian during the Spring '07 term at Penn State.
- Spring '07