Terms  Definitions 

Def: reliability 
Repeatable and consistentFree from errorReflects 'true score'

Def: validity 
Measures what it says it does

Def: power test 
Assesses the attainable level of difficultyNo time limitGraduated difficultyQs that everyone can doQs that no one can doEg: WAIS information subtest

Def: ipsative measures 
Scores reported in terms of relative strength within the individualPreference is expressed for one item over another

Def: mastery test 
Cutoff for predetermined level of performance

Def: normative measures 
Absolute strength measuredAll items answeredComparison among people possible

Range and interpretation of a reliability coefficient 
0 (unreliable) to 1 (perfectly reliable).9 means 90% of the variance accounted forYou do NOT square a reliability coefficient

Factors affecting reliability coefficient 
Anything reducing the range of obtained scores (eg a homogeneous population)Anything increasing measurement errorShort (vs long) testsPresence of floor or ceiling effectsHigh probability of guessing a correct answer

Factors affecting testretest reliability 
MaturationDifference in conditionsPractice effects

Measures of internal consistency 
Splithalf: divide test in 2 and correlate scores on the subtests; sensitive to selection strategyCoefficient alpha: used with multiple choice questionsKuderRichardson Formula 20 (KR20) used for questions with dichotomous answersReliability increases with item homogeniety

Utility of internal consistency measures 
Measurement of unstable traitsNot good for speed testsSensitive to item content / sampling

Appropriate measure of speed test reliability 
TestretestAlternate forms

Measure of interrater reliability 
Kappa coefficient

Factors improving interrater reliability 
Well trained ratersExplicit observation of the ratersMutually exclusive and exhaustive scoring categories

Def: interval recording 
All behavior within a specified period of time

Def: standard error of measurement 
How much error is expected from an individual test score

Formula: standard error of measurement * 
SE = SD * square root of (1r)where r = the reliability coefficient which ranges from 0 to 1

Use: standard error of measurement 
Construction of a confidence interval

Probability of scores falling within a specified confidence interval 
68% +/ 1 SE 95% +/ 1.96 SE99% +/ 2.58 SE

Use: eta * 
Correlation of continuous nonlinear variables

Def: types of criterion related validity 
ConcurrentScores collected at the same timeUseful for diagnostic testsPredictive validityScores tested before and laterUseful for eg job selection tests

Factors affecting criterion related validity 
Restricted range of scoresUnreliability of predictor or criterionRegressionCriterion contamination

Def: criterion contamination 
Occurs when person assessing criterion knows predictor for an individual

Def: convergent/divergent analysis 
Convergent validity is high correlation between different measures of same constructDivergent validty is low correlation between measures measuring different constructs

Relationship between reliability and validity 
The criterionrelated validity coefficient cannot exceed the square root of the predictor's reliability coefficientReliability coefficient sets a ceiling on the validity coefficient

Def: face validity 
Appearance of validity to test takers, administrators and other untrained people

Def: criterion related validity coefficient 
Pearson r correlation between predictor and criterionacceptable range is +/ .3 to .6

Differences betweenstandard error of measurementandstandard error of estimate 
Standard error of measurementrelated to reliability coefficientused to estimate true score on a given testStandard error of estimateDetermines where a criterion will fall given a predictor

Def: shrinkage 
Reduction in validity coefficient on crossvalidation (revalidation with a second sample)A result of noise in original sample

Factors affecting shrinkage 
Small original validation sampleLarge original item poolRelative number of items retained is smallItems not sensibly chosen

Def: construct validity 
Extent to which a test successfully measures an unobservable, abstract concept such as IQ

Techniques for assessing construct validity 
Convergent validity techniquesHigh correlation on a trait even with different methodsDivergent / discriminant validity techniquesLow correlation on different traits even with the same methodFactor analysis

Def: factor loading 
Correlation between a given test and a factor derived from a factor analysisCan be squared to give % of variance that the test accounts for in the factor

Def: communality (factor analysis) 
The proportion of variance of a test accounted for by the factorsSum of the squared factor loadingsInterpreted directly, ie .4 = 40%Only valid when factors are orthogonal

Def: unique variance (factor analysis) 
Variance not accounted for by the factorsu2 = 1  h2, where h2 is the communality

Def: eigenvalue 
explained variance= Sum of the squares of the loadingssum of the eigenvalues <= number of testsApplied to unrotated factors only

Formula to convert eigenvalue to % 
= eigenvalue * 100 / number of tests

Types of rotation (factor analysis) * 
Orthogonal  uncorrelatedOblique  correlatedChoice depends on what you believe the relationship is among the factors

Differences between principle components analysis and factor analysis 
In principle components analysis:Factors are always uncorrelatedVariance = explained + errorIn factor analysis:variance = common + specific + error

Use: cluster analysis 
Categorize or taxonimize a set of objects

Differences between cluster analysis and factor analysis 
Cluster analysisall types of dataclusters interpreted as categoriesFactor analysisinterval or ratio data onlyfactors interpreted as underlying constructs

Def: correction for attenuation 
Estimate of how much more valid a predictor would be if it and the criterion were perfectly reliable

Def: content validity 
Adquate sampling of relevant content domain

To reduce the number of false positives... 
Raise the predictor cutoffand / orLower the criterion cutoff

Def: false negative 
Predicted not to meet a criterion but in reality does

Def: item difficulty or difficulty index * 
% of examinees answering correctlyan ordinal value, because an item with an index of .2 is not necessarily half the difficulty of an item with an index of .4

Def: item discriminability 
Degree to which an item differentiates between low and high scorersD = difference between high and low % correctly answeredrange from 100 to 100moderate difficulty optimal

Target values for item difficulty by objective 
.5 for most tests.25 for high cutoff (matching selection %).8 or .9 for masteryhalf way between chance and 1, eg t/f exams would be .75

Relationship between item difficulty and discriminability 
Difficulty creates a ceiling for discriminabilityDifficulty of .5 creates maximum discriminabilityThe greater the mean discriminability the greater the reliability

What can you determine from an item response (aka item characteristic) curve? 
Difficultypoint where p(correct response) = .5Discriminabilityslope of the curve; lower more discriminableProbability of a correct guessintersection with y axis

Def: computer adaptive assessment 
Computerized selection of test items based on periodic estimates of ability

What are the advantages of a test item of moderate difficulty (p = .5) 
Increases variability which increases reliability and validityMaximally differentiates between low and high scorers

Techniques for assessing an item's discriminability 
Correlation with total scorean external criterion

What are the mean and std deviation for the following standard scores: z, t, stanine and deviation IQ? 
mean SDz 0 1t 50 10stanine 9 ~2deviation IQ 100 15

The difference between normreferenced and criterion referenced scores 
Norm referenced is a comparison to others in a sampleCriterion referenced measure against an external criterion

Characteristics of alternate forms reliability coefficient 
Best, because to be high must be consistent across time and contentLikely to have a lower magnitude than other coefficients

Def: moderator variable 
Variables affecting validity of a testA moderator variable confers differential validity on the test

Def: 'testing the limits' in dynamic assessment 
Following a standardized test, using hints to elicit correct performance. The more hints necessary, the more severe the learning disability

Contents of the Mental Measurements Yearbook 
AuthorPublisherTarget populationAdministrative timeCritical reviews

Effect on the floor of adding easy questions to a test * 
Will raise the floor

Def: dynamic assessment 
Variety of procedures following on standardized testing to get further information, usually used with learning disablity or retardation

test theory 
ttest theory

Leave a Comment ({[ getComments().length ]})
Comments ({[ getComments().length ]})
{[ comment.comment ]}