37007624 - All About Assessment W James Popham Unraveling...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 2
Background image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: All About Assessment W. James Popham Unraveling Reliability f you were to ask an educator to identify the two most important attributes of an educa- tion test. the response most certainly would be "validity and reliability." These two tightly wedded concepts have become icons in the field of education assessment. As far as validity is concerned. the term doesn’t refer to the accuracy of a test (see “A Misunderstood Grail" in the September 2008 Educational Lratlershipl. Rather. it refers to the accuracy of score-based injcrcnces about test takers. Once educators grasp the idea that these inferences are made by people who can. of course, make mistakes. they‘re apt to be more cautious about how to use test results. in the case of reliability, however, it's the test itself that is or isn't reliable. That's a whopping difference. You‘d think, therefore, that most educators would have a better handle on the meaning of reliability. Unfortunately, that's not the ease. Defining Reliability The term i'clialvilitv connotes positive things. Who would want anything, or anyone. to be unreliable? Moreover, with respect to education assessment. reliability equals consistency, And who among us would prefer inconsistency to consistency? Clearly, reliable tests are good, whereas unreliable tests are bad. it's that simple. But here confusion can careen onto the scene—because measurement experts have identified three decisively different kinds of assessment consistency. Stability reliability refers to the consistency of students' scores when a test is administered to the same students on two different occasions. Altt'mutc-jorm reliability describes the consistency of students‘ perform— ances on two different [hopefully equivalent) versions of the same test. And lllIL‘lTillf consis- tency reliability describes the consistency with which all the separate items on a test measure whatever they're measuring. such as students” reading comprehension or mathematical ability. Because these three incarnations of reliability constitute meaningfully different ways of thinking about 3 tests consistency, teachers need to recognize that the three approaches to relia- bility are not interchangeable. For example, suppose a teacher wants to know how consis- tent a particular test would be if it were admin- istered to certain students at different times of With respect to education assessment, reliability equals consistency. the school year to track their varying rates of progress, What the teacher needs to look at in this instance would be the tests stability relia- bility. Similarly, if a teacher has developed two different forms of the same test. possibly for purposes of test security, then the sort of reliaw bility evidence needed to determine the two forms consistency with each other would be Llllfl‘li.tllt‘v_ltli‘l'll reliability. Finally, teachers might be interested in a tcst‘s internal consistency when— ever they want to know how Similarly a test's items function-that is. whether the tests items are homogeneous. Educators frequently run into reports of 3 tests reliability when they're using standardized tests. either national or state-developed exams. Typically, these reliability csuniates are reported as correlation coefficients. and these “reliability coefficients“ usually range from zero to 1.0— with higher coefficients, such as .80 or .90. being sought. [f standardized tests are distrib- uted by commercial assessment companies, such tests are invariably accompanied by tech- ASSULLATIUN tori 5UPER\I‘~IU\' no tit iiiiit'tii \l Dtti torsiirxi Tl Professional Development Attend a conference or seminar or skills workshop (not college classes) hosted by an approved professional development provider and get up to for registration, travel, and lodging. {growl PROFESSIONHL DEVELOPMENT HWHRDS FOR EDUCRTGRS DEADLINES May 1, 2009 September 1, 2009 Apply now! - wwweducatoredgeorg www.deltakagpagamma.net i Sponsored by The Delta Kappa Gamma Educational Foundatlori. AKF i illicit—ti |t',)l“-l,1‘t|_ i l'. .i U N DA i | If.) N nical manuals containing some sort oi reliability evidence. Usually. this evidence will he presented as an internal consistency coefficient because this kind of relia- bility can be computed on the hosts of a Single test aclriiitiistreticiii. as opposed to the alternate-form reliability and stability reliability coeiitc1cnis, which both require multiple test administra- tions Because collecting internal consis— tency evidence involves the least hassle it's the most often reported kind oi K’ll‘d‘ biltty evidence. The One Thing to Know Almost all classroom teachers are iar too busy to collect any kind oi reliability etidetice int their own teacher—made tests. So why should teachers know anything at all about reliability? Well. there‘s one situation in which teachers actually do need to know what's going on regarding reliability. 'l'his arises when teachers are trying to determine the consistency represented by a student's periorinance oti a nationally standardized test or: perhaps. on a state built standardized test. To get a fix on how consistent an inditidual student's test score is. the teacher can look at the tests standard error oi iiicusairmt'nt (SUM). Standard errors oi iiieasurenietit, which tlil‘lcr li'om test to test‘ are similar to the plusor-minus margins of error accompanying most opinion polls. They tell a teacher how likely it is that 3 students score would iall Within a specific score range ii a student were (theoretically) to take the same test 100 times. A standard error oi measurement of l or 2 means the test is quite reliable. Because all major published tests are accompanied by iniormation regarding, this measure teachers need to check out a given tests SEM so they'll know how iiitich confidence to place in their students scores on that test. This is briefly how it works. Suppose, ior example a student earned a score oi 33 points on a 70—point test that had a standard error oi tiieasuremeni til i. \M-z can make two assumptions: First. about 68 percent oi the tune. that student 73 l3|itt~i1|U-\\t. Li.-\tri.i:sitir/l‘i-iirti.iiir zoos would score within plus or minus one point oi the original score (between 52 anti Hi. Second, about ‘33 percent oi the time. the student‘s score would iall Will'tiil plus or tiittiits two points of ilic original score tbctwt‘cn ii and 55). II this satire text had a standard error oi measurement oi 2. about (18 percent oi the time. that strident would score within plus or minus two points ol'tlie original score thctwceii 3] and '33); about 95 percent oi the time. the student would score within plus or minusfoui points oi the original score (between +9 and 57'). Teachers should place more confidence in tests sporting smaller standard errors of measurement. As the standard error oi measurement increases. so does the range in possible scores. So ii the test had a standard error oi measurement oi it}. 68 percent oi the time the students score would be within pltis or minus 10 points oi the original score tbetween 43 and 63); and 93 percent oi the time it would be within plus or minus 2t? points oi the original score (between ‘73 and Til. which is not very reliable at all. Clearly, teachers should place more confidence in tests sporting smaller standard errors ol measurement Most educators think that the validityereiiabtlity pairing is a iii-arriage tirade in measurement heaven. with each partner pulling, equal weight. In truth reliability is the minor member oi that merger. And We just told you all you basically need to know about the concept. You can rely on it. W. James Popham is Emeritus Professor in the UCLA Graduate School of Education and Information Studies: [email protected] Copyright of Educational Leadership is the property of Association for Supervision éCurriculum Development and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. ...
View Full Document

This note was uploaded on 10/07/2009 for the course EDP 351 taught by Professor East during the Spring '09 term at West Chester.

Page1 / 3

37007624 - All About Assessment W James Popham Unraveling...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online