EPPP Testing Flashcards

Terms Definitions
Def: reliability
Repeatable and consistentFree from errorReflects 'true score'
Def: validity
Measures what it says it does
Def: power test
Assesses the attainable level of difficultyNo time limitGraduated difficultyQs that everyone can doQs that no one can doEg: WAIS information subtest
Def: ipsative measures
Scores reported in terms of relative strength within the individualPreference is expressed for one item over another
Def: mastery test
Cutoff for predetermined level of performance
Def: normative measures
Absolute strength measuredAll items answeredComparison among people possible
Range and interpretation of a reliability coefficient
0 (unreliable) to 1 (perfectly reliable).9 means 90% of the variance accounted forYou do NOT square a reliability coefficient
Factors affecting reliability coefficient
Anything reducing the range of obtained scores (eg a homogeneous population)Anything increasing measurement errorShort (vs long) testsPresence of floor or ceiling effectsHigh probability of guessing a correct answer
Factors affecting test-retest reliability
MaturationDifference in conditionsPractice effects
Measures of internal consistency
Split-half: divide test in 2 and correlate scores on the subtests; sensitive to selection strategyCoefficient alpha: used with multiple choice questionsKuder-Richardson Formula 20 (KR-20) used for questions with dichotomous answersReliability increases with item homogeniety
Utility of internal consistency measures
Measurement of unstable traitsNot good for speed testsSensitive to item content / sampling
Appropriate measure of speed test reliability
Test-retestAlternate forms
Measure of inter-rater reliability
Kappa coefficient
Factors improving inter-rater reliability
Well trained ratersExplicit observation of the ratersMutually exclusive and exhaustive scoring categories
Def: interval recording
All behavior within a specified period of time
Def: standard error of measurement
How much error is expected from an individual test score
Formula: standard error of measurement *
SE = SD * square root of (1-r)where r = the reliability coefficient which ranges from 0 to 1
Use: standard error of measurement
Construction of a confidence interval
Probability of scores falling within a specified confidence interval
68% +/- 1 SE 95% +/- 1.96 SE99% +/- 2.58 SE
Use: eta *
Correlation of continuous non-linear variables
Def: types of criterion related validity
ConcurrentScores collected at the same timeUseful for diagnostic testsPredictive validityScores tested before and laterUseful for eg job selection tests
Factors affecting criterion related validity
Restricted range of scoresUnreliability of predictor or criterionRegressionCriterion contamination
Def: criterion contamination
Occurs when person assessing criterion knows predictor for an individual
Def: convergent/divergent analysis
Convergent validity is high correlation between different measures of same constructDivergent validty is low correlation between measures measuring different constructs
Relationship between reliability and validity
The criterion-related validity coefficient cannot exceed the square root of the predictor's reliability coefficientReliability coefficient sets a ceiling on the validity coefficient
Def: face validity
Appearance of validity to test takers, administrators and other untrained people
Def: criterion related validity coefficient
Pearson r correlation between predictor and criterionacceptable range is +/- .3 to .6
Differences betweenstandard error of measurementandstandard error of estimate
Standard error of measurementrelated to reliability coefficientused to estimate true score on a given testStandard error of estimateDetermines where a criterion will fall given a predictor
Def: shrinkage
Reduction in validity coefficient on cross-validation (revalidation with a second sample)A result of noise in original sample
Factors affecting shrinkage
Small original validation sampleLarge original item poolRelative number of items retained is smallItems not sensibly chosen
Def: construct validity
Extent to which a test successfully measures an unobservable, abstract concept such as IQ
Techniques for assessing construct validity
Convergent validity techniquesHigh correlation on a trait even with different methodsDivergent / discriminant validity techniquesLow correlation on different traits even with the same methodFactor analysis
Def: factor loading
Correlation between a given test and a factor derived from a factor analysisCan be squared to give % of variance that the test accounts for in the factor
Def: communality (factor analysis)
The proportion of variance of a test accounted for by the factorsSum of the squared factor loadingsInterpreted directly, ie .4 = 40%Only valid when factors are orthogonal
Def: unique variance (factor analysis)
Variance not accounted for by the factorsu2 = 1 - h2, where h2 is the communality
Def: eigenvalue
explained variance= Sum of the squares of the loadingssum of the eigenvalues <= number of testsApplied to unrotated factors only
Formula to convert eigenvalue to %
= eigenvalue * 100 / number of tests
Types of rotation (factor analysis) *
Orthogonal - uncorrelatedOblique - correlatedChoice depends on what you believe the relationship is among the factors
Differences between principle components analysis and factor analysis
In principle components analysis:Factors are always uncorrelatedVariance = explained + errorIn factor analysis:variance = common + specific + error
Use: cluster analysis
Categorize or taxonimize a set of objects
Differences between cluster analysis and factor analysis
Cluster analysisall types of dataclusters interpreted as categoriesFactor analysisinterval or ratio data onlyfactors interpreted as underlying constructs
Def: correction for attenuation
Estimate of how much more valid a predictor would be if it and the criterion were perfectly reliable
Def: content validity
Adquate sampling of relevant content domain
To reduce the number of false positives...
Raise the predictor cutoffand / orLower the criterion cutoff
Def: false negative
Predicted not to meet a criterion but in reality does
Def: item difficulty or difficulty index *
% of examinees answering correctlyan ordinal value, because an item with an index of .2 is not necessarily half the difficulty of an item with an index of .4
Def: item discriminability
Degree to which an item differentiates between low and high scorersD = difference between high and low % correctly answeredrange from 100 to -100moderate difficulty optimal
Target values for item difficulty by objective
.5 for most tests.25 for high cutoff (matching selection %).8 or .9 for masteryhalf way between chance and 1, eg t/f exams would be .75
Relationship between item difficulty and discriminability
Difficulty creates a ceiling for discriminabilityDifficulty of .5 creates maximum discriminabilityThe greater the mean discriminability the greater the reliability
What can you determine from an item response (aka item characteristic) curve?
Difficultypoint where p(correct response) = .5Discriminabilityslope of the curve; lower more discriminableProbability of a correct guessintersection with y axis
Def: computer adaptive assessment
Computerized selection of test items based on periodic estimates of ability
What are the advantages of a test item of moderate difficulty (p = .5)
Increases variability which increases reliability and validityMaximally differentiates between low and high scorers
Techniques for assessing an item's discriminability
Correlation with total scorean external criterion
What are the mean and std deviation for the following standard scores: z, t, stanine and deviation IQ?
mean SDz 0 1t 50 10stanine 9 ~2deviation IQ 100 15
The difference between norm-referenced and criterion referenced scores
Norm referenced is a comparison to others in a sampleCriterion referenced measure against an external criterion
Characteristics of alternate forms reliability coefficient
Best, because to be high must be consistent across time and contentLikely to have a lower magnitude than other coefficients
Def: moderator variable
Variables affecting validity of a testA moderator variable confers differential validity on the test
Def: 'testing the limits' in dynamic assessment
Following a standardized test, using hints to elicit correct performance. The more hints necessary, the more severe the learning disability
Contents of the Mental Measurements Yearbook
AuthorPublisherTarget populationAdministrative timeCritical reviews
Effect on the floor of adding easy questions to a test *
Will raise the floor
Def: dynamic assessment
Variety of procedures following on standardized testing to get further information, usually used with learning disablity or retardation
test theory
ttest theory
/ 62

Leave a Comment ({[ getComments().length ]})

Comments ({[ getComments().length ]})


{[ comment.comment ]}

View All {[ getComments().length ]} Comments
Ask a homework question - tutors are online