| Terms |
Definitions |
|
Effect on the floor of adding easy questions to a test *
|
Will raise the floor
|
|
Def: criterion related validity coefficient
|
Pearson r correlation between predictor and criterionacceptable range is +/- .3 to .6
|
|
test theory
|
ttest theory
|
|
Def: computer adaptive assessment
|
Computerized selection of test items based on periodic estimates of ability
|
|
Def: dynamic assessment
|
Variety of procedures following on standardized testing to get further information, usually used with learning disablity or retardation
|
|
Target values for item difficulty by objective
|
.5 for most tests.25 for high cutoff (matching selection %).8 or .9 for masteryhalf way between chance and 1, eg t/f exams would be .75
|
|
Def: communality (factor analysis)
|
The proportion of variance of a test accounted for by the factorsSum of the squared factor loadingsInterpreted directly, ie .4 = 40%Only valid when factors are orthogonal
|
|
Relationship between item difficulty and discriminability
|
Difficulty creates a ceiling for discriminabilityDifficulty of .5 creates maximum discriminabilityThe greater the mean discriminability the greater the reliability
|
|
Differences between principle components analysis and factor analysis
|
In principle components analysis:Factors are always uncorrelatedVariance = explained + errorIn factor analysis:variance = common + specific + error
|
|
Def: correction for attenuation
|
Estimate of how much more valid a predictor would be if it and the criterion were perfectly reliable
|
|
Def: interval recording
|
All behavior within a specified period of time
|
|
Factors affecting reliability coefficient
|
Anything reducing the range of obtained scores (eg a homogeneous population)Anything increasing measurement errorShort (vs long) testsPresence of floor or ceiling effectsHigh probability of guessing a correct answer
|
|
Def: eigenvalue
|
explained variance= Sum of the squares of the loadingssum of the eigenvalues <= number of testsApplied to unrotated factors only
|
|
Measure of inter-rater reliability
|
Kappa coefficient
|
|
Def: power test
|
Assesses the attainable level of difficultyNo time limitGraduated difficultyQs that everyone can doQs that no one can doEg: WAIS information subtest
|
|
Measures of internal consistency
|
Split-half: divide test in 2 and correlate scores on the subtests; sensitive to selection strategyCoefficient alpha: used with multiple choice questionsKuder-Richardson Formula 20 (KR-20) used for questions with dichotomous answersReliability increases with item homogeniety
|
|
Def: face validity
|
Appearance of validity to test takers, administrators and other untrained people
|
|
Use: standard error of measurement
|
Construction of a confidence interval
|
|
What can you determine from an item response (aka item characteristic) curve?
|
Difficultypoint where p(correct response) = .5Discriminabilityslope of the curve; lower more discriminableProbability of a correct guessintersection with y axis
|
|
Def: ipsative measures
|
Scores reported in terms of relative strength within the individualPreference is expressed for one item over another
|
|
Relationship between reliability and validity
|
The criterion-related validity coefficient cannot exceed the square root of the predictor's reliability coefficientReliability coefficient sets a ceiling on the validity coefficient
|
|
Factors improving inter-rater reliability
|
Well trained ratersExplicit observation of the ratersMutually exclusive and exhaustive scoring categories
|
|
Def: mastery test
|
Cutoff for predetermined level of performance
|
|
Def: content validity
|
Adquate sampling of relevant content domain
|
|
Def: validity
|
Measures what it says it does
|
|
Factors affecting test-retest reliability
|
MaturationDifference in conditionsPractice effects
|
|
Techniques for assessing an item's discriminability
|
Correlation with total scorean external criterion
|
|
Def: construct validity
|
Extent to which a test successfully measures an unobservable, abstract concept such as IQ
|
|
Def: item difficulty or difficulty index *
|
% of examinees answering correctlyan ordinal value, because an item with an index of .2 is not necessarily half the difficulty of an item with an index of .4
|
|
Def: criterion contamination
|
Occurs when person assessing criterion knows predictor for an individual
|
|
Def: factor loading
|
Correlation between a given test and a factor derived from a factor analysisCan be squared to give % of variance that the test accounts for in the factor
|
|
Def: normative measures
|
Absolute strength measuredAll items answeredComparison among people possible
|
|
What are the mean and std deviation for the following standard scores: z, t, stanine and deviation IQ?
|
mean SDz 0 1t 50 10stanine 9 ~2deviation IQ 100 15
|
|
Range and interpretation of a reliability coefficient
|
0 (unreliable) to 1 (perfectly reliable).9 means 90% of the variance accounted forYou do NOT square a reliability coefficient
|
|
Factors affecting shrinkage
|
Small original validation sampleLarge original item poolRelative number of items retained is smallItems not sensibly chosen
|
|
Def: false negative
|
Predicted not to meet a criterion but in reality does
|
|
Types of rotation (factor analysis) *
|
Orthogonal - uncorrelatedOblique - correlatedChoice depends on what you believe the relationship is among the factors
|
|
What are the advantages of a test item of moderate difficulty (p = .5)
|
Increases variability which increases reliability and validityMaximally differentiates between low and high scorers
|
|
Factors affecting criterion related validity
|
Restricted range of scoresUnreliability of predictor or criterionRegressionCriterion contamination
|
|
Use: cluster analysis
|
Categorize or taxonimize a set of objects
|
|
Techniques for assessing construct validity
|
Convergent validity techniquesHigh correlation on a trait even with different methodsDivergent / discriminant validity techniquesLow correlation on different traits even with the same methodFactor analysis
|
|
Probability of scores falling within a specified confidence interval
|
68% +/- 1 SE 95% +/- 1.96 SE99% +/- 2.58 SE
|
|
Formula to convert eigenvalue to %
|
= eigenvalue * 100 / number of tests
|
|
Contents of the Mental Measurements Yearbook
|
AuthorPublisherTarget populationAdministrative timeCritical reviews
|
|
Def: unique variance (factor analysis)
|
Variance not accounted for by the factorsu2 = 1 - h2, where h2 is the communality
|
|
Def: moderator variable
|
Variables affecting validity of a testA moderator variable confers differential validity on the test
|
|
To reduce the number of false positives...
|
Raise the predictor cutoffand / orLower the criterion cutoff
|
|
Def: shrinkage
|
Reduction in validity coefficient on cross-validation (revalidation with a second sample)A result of noise in original sample
|
|
Def: reliability
|
Repeatable and consistentFree from errorReflects 'true score'
|
|
Def: types of criterion related validity
|
ConcurrentScores collected at the same timeUseful for diagnostic testsPredictive validityScores tested before and laterUseful for eg job selection tests
|
|
Def: 'testing the limits' in dynamic assessment
|
Following a standardized test, using hints to elicit correct performance. The more hints necessary, the more severe the learning disability
|
|
Differences between cluster analysis and factor analysis
|
Cluster analysisall types of dataclusters interpreted as categoriesFactor analysisinterval or ratio data onlyfactors interpreted as underlying constructs
|
|
Characteristics of alternate forms reliability coefficient
|
Best, because to be high must be consistent across time and contentLikely to have a lower magnitude than other coefficients
|
|
Use: eta *
|
Correlation of continuous non-linear variables
|
|
Def: convergent/divergent analysis
|
Convergent validity is high correlation between different measures of same constructDivergent validty is low correlation between measures measuring different constructs
|
|
The difference between norm-referenced and criterion referenced scores
|
Norm referenced is a comparison to others in a sampleCriterion referenced measure against an external criterion
|
|
Utility of internal consistency measures
|
Measurement of unstable traitsNot good for speed testsSensitive to item content / sampling
|
|
Def: standard error of measurement
|
How much error is expected from an individual test score
|
|
Appropriate measure of speed test reliability
|
Test-retestAlternate forms
|
|
Def: item discriminability
|
Degree to which an item differentiates between low and high scorersD = difference between high and low % correctly answeredrange from 100 to -100moderate difficulty optimal
|
|
Differences betweenstandard error of measurementandstandard error of estimate
|
Standard error of measurementrelated to reliability coefficientused to estimate true score on a given testStandard error of estimateDetermines where a criterion will fall given a predictor
|
|
Formula: standard error of measurement *
|
SE = SD * square root of (1-r)where r = the reliability coefficient which ranges from 0 to 1
|