Early Approaches to Intelligence Testing
In the early 20th century, the French government hired French psychologist Alfred Binet to identify students who might need extra assistance in school. Binet worked with French psychologist Théodore Simon to explore ways to assess a student's learning ability. Instead of writing an achievement test, which measures the skills and knowledge acquired during school, they wanted to test innate attention, memory, and problem-solving skills.
Binet and Simon created the first intelligence test, called the Binet-Simon Scale. A heavily revised version remains a popular assessment tool today. An intelligence test is intended to measure the ability to think and reason rather than measuring accumulated knowledge. Their test provided information about a person's mental age, an expression of cognitive ability in terms of the age at which a typical person reaches that level of mental capacity. For example, a six-year-old who performs as well as the average nine-year-old would have a mental age of nine. A 30-year-old with developmental disabilities could also have a mental age of nine.In 1916 American psychologist Lewis Terman of Stanford University standardized Binet's original test. A standardized test ensures that testing and scoring conditions are consistent across test takers. Standardization of a test makes it possible to compare the performance of test takers without other variables affecting the results. The resulting standardized test was named the Stanford-Binet Intelligence Scale. It produced an intelligent quotient (IQ) score representing a person's reasoning ability. In the early days of intelligence testing, IQ was calculated by dividing a person's mental age by their chronological age and multiplying by 100.
Items from the Original Binet-Simon Intelligence Test
The Wechsler Intelligence Scales
American psychologist David Wechsler felt the Stanford-Binet Intelligence Scale had limitations. Even before Gardner and Sternberg introduced their theories of multiple intelligences, he believed intelligence was made up of many different mental abilities instead of just one general intelligence factor (g). The Stanford-Binet Intelligence Scale also had been designed specifically for schoolchildren, making it invalid when used for adults. Building off of the Stanford-Binet Intelligence Scale, Wechsler designed the Wechsler Adult Intelligence Scale (WAIS) in 1955. He then developed tests for younger age groups as well: the Wechsler Intelligence Scale for Children (WISC) and the Wechsler Preschool and Primary Scale of Intelligence (WPPSI).
The Wechsler Adult Intelligence Scale (WAIS) was designed to measure intelligence across various mental abilities. It has gone through four major revisions over the years. The most recent version, the WAIS-IV, was published in 2008. The WAIS-IV has 10 core subtests used to create broad index scores in four major areas of intelligence: verbal comprehension, perceptual reasoning, working memory, and processing speed. It also gives two overall intelligence scores: Full-Scale IQ combines all four index scores and General Ability combines scores from verbal comprehension and perceptual reasoning only.
WAIS-IV Index Scores and Core Subtests
|Perceptual Reasoning||Block Design
|Working Memory||Digit Span
|Processing Speed||Symbol Search
Modern Intelligence Testing
Many students are familiar with school admission tests such as the SAT, ACT, GRE, MCAT, and LSAT. These tests are aptitude tests, designed to measure ability in a particular skill or field of knowledge. They are used by colleges and universities when selecting students for admission who will perform well at their institutions. Scores on these tests correlate with intelligence test scores, but they do not assess the full scope of a person's intellectual ability.
For aptitude and intelligence tests to have value, they must be reliable and valid. Psychometrics is the science behind measurements of mental capacities, abilities, and processing. In order to be fair and useful, intelligence tests must be standardized so scores can be compared across all test takers. For example, intelligence tests have strict rules about how to deliver instructions and rules against offering hints. Otherwise, a test taker with an especially helpful test giver could get an unfairly high IQ score.
Useful intelligence tests must have content validity, meaning the test measures the behavior or skill it is intended to measure. They also must have predictive validity, meaning that a score on one measure can predict the score on a related measure. For example, if a significant percentage of students who scored very high on the LSAT failed out of law school, that test would have poor predictive validity.
Modern intelligence and aptitude tests are also normed, meaning that they have been given to a large, representative sample. This allows test developers to know what level of performance reflects average intellectual ability.
Intelligence test scores produce relatively small differences across genders, race, or ethnic groups. Men tend to score slightly higher on spatial reasoning and women on verbal skills. In the United States, white and Asian groups tend to score slightly higher than African American and Latino groups. However, across all groups, intellectual abilities are more alike than different. Variability within a group far exceeds variability between groups. Differences between groups are also strongly linked to environmental differences rather than innate biological differences.
Aspects of some intelligence tests depend on cultural knowledge, educational experiences, and knowledge of specific vocabulary. Test bias, which occurs when a test is comparatively more difficult for one group of people than it is for others, can influence an individual's IQ scores. For example, a test asking questions related to snow may disadvantage test takers from states or nations that rarely have cold weather. Questions using common sayings from one culture, like “comparing apples to oranges,” may disadvantage people who do not have American English as a first language.A culture-fair intelligence test is designed to ensure it does not favor any certain cultural background over another. Tests such as Raven's Progressive Matrices, which focuses on nonverbal abstract reasoning, may be less influenced by culture and life experiences. Similarly, tasks focusing on processing speed and mental rotation may lead to fewer biases. However, as culture influences attitudes toward testing and test experience, no test is truly equivalent across cultures.
Mental Rotation Task
Stereotypes about a group’s intellectual abilities can also bias a test administrator’s test administration or scoring. If they have to make a judgment call about whether an answer is “good enough” to count as correct, administrators may unconsciously give the benefit of doubt to people they expect to perform well. They may also unconsciously err on the side of giving too few points to people they do not expect to perform well. Test administration manuals help to target this potential issue by having standardized rules for how to give directions, when to repeat or elaborate on directions, and how to score answers.