Steele Article

Steele Article - ATTITUDES AND SOCIAL COGNITION Stereotype...

Info iconThis preview shows pages 1–15. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 6
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 8
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 10
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 12
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 14
Background image of page 15
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ATTITUDES AND SOCIAL COGNITION Stereotype Threat and the Intellectual Test Performance of African Americans Claude M. Steele Stanford University Joshua Aronson University of Texas, Austin Stereotype threat is being at risk of confirming, as self—characteristic, a negative stereotype about one’s group. Studies I and 2 varied the stereotype vulnerability of Black participants taking a diffi- cult verbal test by varying whether or not their performance was ostensibly diagnostic of ability, and thus, whether or not they were at risk of fulfilling the racial stereotype about their intellectual ability. Reflecting the pressure of this vulnerability, Blacks underperformed in relation to Whites in the ability-diagnostic condition but not in the nondiagnostic condition (with Scholastic Aptitude Tests controlled). Study 3 validated that ability-diagnosticity cognitively activated the racial stereotype in these participants and motivated them not to conform to it, or to be judged by it. Study 4 showed that mere salience of the stereotype could impair Blacks’ performance even when the test was not ability diagnostic. The role of stereotype vulnerability in the standardized test performance of abil- ity-stigmatized groups is discussed. Not long ago, in explaining his career-long preoccupation with the American Jewish experience, the novelist Philip Roth said that it was not Jewish culture or religion per se that fasci- nated him, it was what he called the Jewish “predicament.” This is an apt term for the perspective taken in the present research. It focuses on a social-psychological predicament that can arise from widely-known negative stereotypes about one’s group. It is this: the existence of such a stereotype means that anything one does or any of one’s features that conform to it make the stereotype more plausible as a self-characterization in the eyes of others, and perhaps even in one’s own eyes. We call this pre- dicament stereotype threat and argue that it is experienced, es- sentially, as a self-evaluative threat. In form, it is a predicament that can beset the members of any group about whom negative stereotypes exist. Consider the stereotypes elicited by the terms yuppie, feminist, liberal, or White male. Their prevalence in society raises the possibility for potential targets that the stereo- type is true of them and, also, that other people will see them that way. When the allegations of the stereotype are importantly Claude M. Steele. Department of Psychology, Stanford University; Joshua Aronson, School of Education. University of Texas, Austin. This research was supported by National Institutes of Health Grant MH51977, Russell Sage Foundation Grant 879.304, and by Spencer Foundation and James S. McDonnell Foundation postdoctoral fellow- ships, and its completion was aided by the Center for Advanced Study in the Behavioral Sciences. We thank John Butner, Emmeline Chen, and Matthew McGlone for assistance and helpful comments on this research. Correspondence concerning this article should be addressed to Claude M. Steele, Department of Psychology, Stanford University, Stan- ford, California 94305, or Joshua Aronson, School of Education, Uni- versity ofTexas, Austin, Texas 78712. negative, this predicament may be self-threatening enough to have disruptive effects of its own. The present research examined the role these processes play in the intellectual test performance of African Americans. Our reasoning is this: whenever African American students per- form an explicitly scholastic or intellectual task, they face the threat of confirming or being judged by a negative societal ste— reotype—a suspicion—about their group’s intellectual ability and competence. This threat is not borne by people not stereo- typed in this way. And the self-threat it causes—through a va- riety of mechanisms—may interfere with the intellectual functioning of these students, particularly during standardized tests. This is the principal hypothesis examined in the present research. But as this threat persists over time, it may have the further effect of pressuring these students to protectively dis- identify with achievement in school and related intellectual domains. That is, it may pressure the person to define or rede- fine the self-concept such that school achievement is neither a basis of self-evaluation nor a personal identity. This protects the person against the self-evaluative threat posed by the ste— reotypes but may have the byproduct of diminishing interest, motivation, and, ultimately, achievement in the domain (Steele, 1992). The anxiety of knowing that one is a potential target of prej- udice and stereotypes has been much discussed: in classic social science (e.g., Allport, 1954; Goffman, 1963), popular books (e.g., Carter, 1991 ) and essays, as, for example, S. Steele’s ( 1990) treatment of what he called racial vulnerability In this last analysis, S. Steele made a connection between this experi- ence and the school life of African Americans that has similari- ties to our own. He argued that after a lifetime of exposure to society’s negative images of their ability, these students are likely to internalize an “inferiority anxiety”—a state that can be Journal ofPersonality and Social Psychology. 1995. Vol. 69, No. 5. 797—8] 1 Copynght I995 by the American Psychological Association, Inc. 0022-35 14/955100 797 798 CLAUDE M. STEELE AND JOSHUA ARONSON aroused by a variety of race—related cues in the environment. This anxiety, in turn, can lead them to blame others for their troubles (for example, White racism ), to underutilize available opportunities, and to generally form a victim’s identity. These adaptations, in turn, the argument goes, translate into poor life success. The present theory and research do not focus on the internal- ization of inferiority images or their consequences. Instead they focus on the immediate situational threat that derives from the broad dissemination of negative stereotypes about one’s group—the threat of possibly being judged and treated stereo- typically, or of possibly self-fulfilling such a stereotype. This threat can befall anyone with a group identity about which some negative stereotype exists, and for the person to be threat- ened in this way, he need not even believe the stereotype. He need only know that it stands as a hypothesis about him in situ- ations where the stereotype is relevant. We focused on the ste- reotype threat of African Americans in intellectual and scho- lastic domains to provide a compelling test of the theory and because the theory, should it be supported in this context for this group, would have relevance to an important set of outcomes. Gaps in school achievement and retention rates between White and Black Americans at all levels of schooling have been strikingly persistent in American society (e.g., Steele, 1992). Well publicized at the kindergarten through 12th grade level, recent statistics show that they persist even at the college level where. for example, the national drop-out rate for Black college students (the percentage who do not complete college within a 6-year window of time) is 70% compared to 42% for White Americans (American Council on Education, 1990). Even among those who graduate, their grades average two thirds of letter grade lower than those of graduating Whites (e.g., Nettles, 1988 ). It has been most common to understand such problems as stemming largely from the socioeconomic disadvantage, seg- regation, and discrimination that African Americans have en- dured and continue to endure in this society, a set of conditions that, among other things, could produce racial gaps in achieve- ment by undermining preparation for school. Some evidence, however, questions the sufficiency of these ex- planations. It comes from the sizable literature examining racial bias in standardized testing. This work, involving hundreds of studies over several decades, generally shows that standardized tests predict subsequent school achievement as well for Black students as for White students (e.g., Cleary, Humphreys, Ken- drick, & Wesman, 1975; Linn, 1973; Stanley, 1971). The slope of the lines regressing subsequent school achievement on entry- level standardized test scores is essentially the same for both groups. But embedded in this literature is another fact: At every level of preparation as measured by a standardized test—for example, the Scholastic Aptitude Test (SAT)—Black students with that score have poorer subsequent achievement—GPA, re- tention rates, time to graduation, and so on—than White stu- dents with that score (Jensen, 1980). This is variously known as the overprediction or underachievement phenomenon, be- cause it indicates that, relative to Whites with the same score, standardized tests actually overpredict the achievement that Blacks will realize. Most important for our purposes, this evi- dence suggests that Black—White achievement gaps are not due solely to group differences in preparation. Blacks achieve less well than Whites even when they have the same preparation, and even when that preparation is at a very high level. Could this underachievement, in some part, reflect the stereotype threat that is a chronic feature of these students’ schooling environments? Research from the early 19605—largely that of Irwin Katz and his colleagues (e.g., Katz, 1964) on how desegregation affected the intellectual performance of Black students—shows the sizable influence on Black intellectual performance of fac- tors that can be interpreted as manipulations of stereotype threat. Katz, Roberts, and Robinson ( 1965), for example, found that Black participants performed better on an IQ subtest when it was presented as a test of eye-hand coordination—a nonevaluative and thus threat-negating test representation— than when it was said to be a test of intelligence. Katz, Epps, and Axelson ( 1964) found that Black students performed better on an IQ test when they believed their performance would be compared to other Blacks as opposed to Whites. But as evidence that bears on our hypothesis, this literature has several limita- tions. Much of the research was conducted in an era when American race relations were different in important ways than they are now. Thus, without their being replicated, the extent to which these findings reflect enduring processes of stereotype threat as opposed to the racial dynamics of a specific historical era is not clear. Also, this research seldomly used White control groups. Thus it is difficult to know the extent to which some of the critical effects were mediated by the stereotype threat of Black students as opposed to processes experienced by any students. Other research supports the present hypothesis by showing that factors akin to stereotype threat——that is, other factors that add self-evaluative threat to test taking or intellectual per- formance—are capable of disrupting that performance. The presence of observers or coactors, for example, can interfere with performance on mental tasks (e.g., Geen, 1985', Seta, 1982). Being a “token” member of a group—the sole repre- sentative of a social category—can inhibit one’s memory for what is said during a group discussion (Lord & Saenz, 1985; Lord, Saenz, & Godfrey, 1987). Conditions that increase the importance of performing well—prizes, competition, and au- dience approval—have all been shown to impair performance of even motor skills (e.g., Baumeister, 1984). The stereotype threat hypothesis shares with these approaches the assumption that performance suffers when the situation redirects attention needed to perform a task onto some other concern—in the case of stereotype threat, a concern with the significance of one’s performance in light of a devaluing stereotype. For African American students, the act of taking a test pur- ported to measure intellectual ability may be enough to induce this threat. But we assume that this is most likely to happen when the test is also frustrating. It is frustration that makes the stereotype—as an allegation of inability—relevant to their performance and thus raises the possibility that they have an inability linked to their race. This is not to argue that the ste- reotype is necessarily believed; only that, in the face of frustra- tion with the test, it becomes more plausible as a self-charac- terization and thereby more threatening to the self. Thus for Black students who care about the skills being tested—that is, those who are identified with these skills in the sense of their self-regard being somewhat tied to having them—the stereo- RACIAL STEREOTYPES AND TEST PERFORMANCE 799 type loads the testing situation with an extra degree of self- threat, a degree not borne by people not stereotyped in this way. This additional threat, in turn. may interfere with their performance in a variety of ways: by causing an arousal that reduces the range of cues participants are able to use (e.g.. Easterbrook. 1959 ). or by diverting attention onto task-irrele- vant worries (e.g., Sarason. 1972; Wine, 1971 ). by causing an interfering self-consciousness (e.g., Baumeister. 1984). or overcautiousness ( Geen. 1985 ). Or. through the ability-indict- ing interpretation it poses for test frustration, it could foster low performance expectations that would cause participants to withdraw effort (e.g., Bandura. 1977. 1986). Depending on the situation. several of these processes may be involved simul- taneously or in alternation. Through these mechanisms, then, stereotype threat might be expected to undermine the stan- dardized test performance of Black participants relative to White participants who. in this situation. do not suffer this added threat. Study 1 Accordingly. Black and White college students in this experi- ment were given a 30-min test composed of items from the ver— bal Graduate Record Examination (GRE) that were difficult enough to be at the limits of most participants’ skills. In the stereotype-threat condition. the test was described as diagnostic of intellectual ability. thus making the racial stereotype about intellectual ability relevant to Black participants’ performance and establishing for them the threat of fulfilling it. In the non- stereotype-threat condition. the same test was described simply as a laboratory problem-solving task that was nondiagnostic of ability. Presumably. this would make the racial stereotype about ability irrelevant to Black participants’ performance and thus preempt any threat of fulfilling it. Finally. a second nondiagnos- tic condition was included which exhorted participants to view the difficult test as a challenge. For practical reasons we were interested in whether stressing the challenge inherent in a difficult test might further increase participants’ motivation and performance over what would occur in the nondiagnostic con- dition. The primary dependent measure in this experiment was participants’ performance on the test adjusted for the influence of individual differences in skill level (operationalized as partic- ipants’ verbal SAT scores). We predicted that Black participants would underperform relative to Whites in the diagnostic condition where there was stereotype threat. but not in the two nondiagnostic condi- tions—the non-diagnostic-only condition and the non-diagnos- tic-plus—challenge condition—where this threat was presum- ably reduced. In the non—diagnostic—challenge condition, we also expected the additional motivation to boost the perfor- mance of both Black and White participants above that ob- served in the non—diagnostic—only condition. Several additional measures were included to assess the effectiveness of the manip- ulation and possible mediating states. Method Design and Participants This experiment took the form of a 2 X 3 factorial design. The factors were race of the participant. Black or White. and a test description factor in which the test was presented as either diagnostic of intellectual ability (the diagnostic condition). as a laboratory tool for studying problem solv- ing (the non-diagnostic-only condition). or as both a problem—solving tool and a challenge (the non—diagnostic—challenge condition). Test perfor- mance was the primary dependent measure. We recruited 1 l7 male and female. Black and White Stanford undergraduates through campus adver- tisements which offered $ 10.00 for 1 hr of participation. The data from 3 participants were excluded from the analysis because they failed to provide their verbal SAT scores. This left a total of 1 14 participants randomly as- signed to the three experimental conditions with the exception that we ensured an equal number of participants per condition. Procedure Participants who signed up for the experiment were contacted by telephone prior to their experimental participation and asked to pro- vide their verbal and quantitative SAT scores. to rate their enjoyment of verbally oriented classes. and to provide background information (e.g.. year in school. major. etc.). When participants arrived at the laboratory. the experimenter (a White man) explained that for the next 30 min they would work on a set of verbal problems in a format identical to the SAT exam. and end by answering some questions about their experience. The participant was then given a page that stated the purpose ofthe study. described the procedure for answering questions. stressed the importance of indicating guessed answers (by a check). described the test as very difficult and that they should expect not to get many ofthe questions correct. and told them that they would be given feedback on their performance at the end of the session. We included the informa- tion about test difficulty to. as much as possible. equate participants‘ performance expectations across the conditions. And. by acknowledg- ing the difficulty of the test. we wanted to reduce the possibility that participants would see the test as a miscalculation of their skills and perhaps reduce their effort. This description was the same for all con- ditions with the exception of several key phrases that comprised the experimental manipulation. Participants in the diagnostic condition were told that the study was concerned with “various personal factors involved in performance on problems requiring reading and verbal reasoning abilities.“ They were further informed that after the test. feedback would be provided which “may be helpful to you by familiarizing you with some ofyour strengths and weaknesses” in verbal problem solving. As noted. participants in all conditions were told that they should not expect to get many items correct. and in the diagnostic condition. this test difficulty wasjustified as a means of providing a “genuine test of your verbal abilities and lim- itations so that we might better understand the factors involved in both.” Participants were asked to give a strong effort in order to “help us in our analysis of your verbal ability.” In the non-diagnostic-only and non-diagnostic-challenge conditions. the description of the study made no reference to verbal ability. Instead. participants were told that the purpose of the research was to better understand the “psychological factors involved in solving verbal prob- lems. . . These participants too were told that they would receive performance feedback. but it wasjustified as a means of familiarizing them “with the kinds of problems that appear on tests [they] may en- counter in the future.” In the non-diagnostic-only condition. the diffi- culty of the test was justified in terms of a research focus on difficult verbal problems and in the non-diagnostic-challenge condition it was justified as an attempt to provide “even highly verbal people with a men- tal challenge. . . Last, participants in both conditions were asked to give a genuine effort in order to “help us in our analysis ofthe problem solving process.” As the experimenter left them to work on the test, to further differentiate the conditions. participants in the non-diagnostic- only condition were asked to try hard “even though we’re not going to evaluate your ability.” Participants in the nondiagnostic-challenge 800 CLAUDE M. STEELE AND JOSHUA ARONSON condition were asked to “please take this challenge seriously even though we will not be evaluating your ability.” Dependent Measures The primary dependent measure was participants’ performance on 30 verbal items, 27 of which were difficult items taken from GRE study guides (only 30% of earlier samples had gotten these items correct) and 3 difficult anagram problems. Both the total number correct and an accuracy index of the number correct over the number attempted were analyzed. Participants next completed an 18-item self-report measure of their current thoughts relating to academic competence and personal worth (e.g., “I feel confident about my abilities,” “I feel self-conscious,” “I feel as smart as others,“ etc). These were measured on S—point scales anchored by the phrases not at all ( 1) and extremely (5). Participants also completed a lZ-item measure of cognitive interference frequently used in test anxiety research (Sarason, 1980) on which they indicated the frequency of several distracting thoughts during the exam (e.g., “I wondered what the experimenter would think of me,” “I thought about how poorly l was doing,” “I thought about the difficulty of the prob- lems," etc.) by putting a number from 1 (never) to 5 (very often) next to each statement. Participants then rated how difficult and biased they considered the test on 15-point scales anchored by the labels not at all ( l ) and extremely ( 15). Next, participants evaluated their own perfor— mance by estimating the number of problems they correctly solved, and by comparing their own performance to that of the average Stanford student on a 15-point scale with the end points much worse (1) and much better( 15). Finally, as a check on the manipulation, participants responded to the question: The purpose of this experiment was to: (a) provide a genuine test of my abilities in order to examine personal factors involved in verbal ability; (b) provide a challenging test in order to examine factors involved in solving verbal problems; (c) present you with unfamiliar verbal problems to measure verbal learning. Participants were asked to circle the appropriate response. Results Because there were no main or interactive effects of gender on verbal test performance or the self-report measures, we col- lapsed over this factor in all analyses. Manipulation Check Chi-square analyses performed on participants’ responses to the postexperimental question about the purpose of the study revealed only an effect of condition, x2 (2) = 43.18, p < .001. Participants were more likely to believe the purpose of the ex— periment was to evaluate their abilities in the diagnostic condi- tion (65%) than in the nondiagnostic condition (3%), or the challenge condition ( 1 1%). Test Performance The ANCOVA on the number of items participants got cor- rect, using their self-reported SAT scores as the covariate (Black mean = 592, White mean = 632) revealed a significant condi- tion main effect, F (2, 107) = 4.74, p < .02, with participants in the non-diagnostic-challenge condition performing higher than participants in the non-diagnostic-only and diagnostic condi- tions, respectively, and a significant race main effect, F ( 1, 107) I BLACK SUBJECTS WHITE SUBJECTS ON‘O’W s\ DIAGNOSTIC NONDIAGNOSTIC CHALLENGE Figure I . Mean test performance Study 1. = 5.22, p < .03, with White participants performing higher than Black participants.‘ The race-by-condition interaction did not reach conventional significance (p < .19). The adjusted condi- tion means are presented in Figure 1. If making the test diagnostic of ability depresses the perfor- mance of Black students through stereotype threat, then their performance should be lower in the diagnostic condition than in either the non-diagnostic-only or non-diagnostic-challenge conditions which presumably lessened stereotype threat, and it should be lower than that of Whites in the diagnostic condition. Bonferroni contrasts2 with SATs as a covariate supported this reasoning by showing that Black participants in the diagnostic condition performed significantly worse than Black partici— pants in either the nondiagnostic condition, t( 107) = 2.88, p < .01, or the challenge condition, t(107) = 2.63, p < .01, as well as significantly worse than White participants in the diagnostic condition t( 107) = 2.64, p < .01. But, as noted, the interaction testing the differential effect of test diagnosticity on Black and White participants did not reach significance. This may have happened, however, because an inci- dental pattern of means—Whites slightly outperforming Blacks in the nondiagnostic—challenge condition—undermined the overall interaction effect. To pursue a more sensitive test, we constructed a weighted contrast that compared the size of the race effect in the diagnostic condition with that in the nondiagnostic condition and assigned weights of zero to the White and Black non-diagnostic- challenge conditions. This analysis (including the use of SATs as a covariate) reached marginal significance, F ( 1, 107) = 3.27, p < .08. In sum, then, the hypothesis was supported by the pattern of contrasts, but when tested over the whole design, reached only marginal significance. ' Because we did not warn participants to avoid guessing in these ex— periments, we do not report the performance results in terms of the index used by Educational Testing Service, which includes a correction for guessing. This correction involves subtracting from the number cor- rect, the number wrong adjusted for the number of response options for each wrong item and dividing this by the number of items on the test. Because 27 of our 30 items had the same number of response options (5), this correction amounts to adjusting the number correct almost invariably by the same number. All analyses are the same regardless of the index used. 2 All comparisons of adjusted means reported hereafter used the Bonferroni procedure. RACIAL STEREOTYPES AND TEST PERFORMANCE 801 Accuracy An ANCOVA on accuracy, the proportion correct of the number attempted, with SATs as the covariate, found that nei- ther condition main effect nor the interaction reached signifi- cance, although there was a marginally significant tendency for Black participants to evidence less accuracy, p < .10. This ten- dency was primarily due to Black participants in the diagnostic condition who had the lowest adjusted mean accuracy of any group in the experiment, .420. The adjusted means for the White diagnostic, White non-diagnostic-only, White non-diag- nostic-challenge, Black non-diagnostic-only, and Black diag— nostic-challenge conditions were, .519, .518, .561, .546, and .490, respectively. Bonferroni tests revealed that Black partici- pants in the diagnostic condition were reliably less accurate than Black participants in the non-diagnostic-only condition and White participants in the diagnostic condition, t( 107) = 2.64, p < .01, and t( 107) = 2.13, p < .05, respectively. No condition or interaction effects reached significance for the number of items completed or the number of guesses par- ticipants recorded on the test (all Fs < 1). The overall means for these two measures were 22.9 and 4.1, respectively. Self-Report Measures There were no significant condition effects on the self-report measure of academic competence and personal worth or on the self-report measure of disruptive thoughts and feelings during the test. Analysis of participants’ responses to the question about test bias yielded a main effect of race, F( 1, 107) = 10.47, p < .001. Black participants in all conditions thought the test was more biased than White participants. Perceived Performance Participants’ estimates of how many problems they solved correctly and of how they compared to other participants both showed significant condition main effects, F(2, 106) = 7.91, p < .001, and F(2, 107) = 3.17, p < .05, respectively. Performance estimates were higher in the non-diagnostic—only condition (M = 1 1.81 ) than in either the diagnostic (M = 9.20) or non-diag— nostic-challenge conditions (M = 8.15). Bonferroni tests showed that Black participants in the diagnostic condition (M = 4.89) saw their relative performance as poorer than Black participants in the non-diagnostic-only condition (M = 6.54), t( 107) = 2.81, p < .01, and than Black participants in the non- diagnostic-challenge condition (M = 6.30), t( 107) = 2.40, p < .02., while test description had no effect on the ratings of White participants. The overall mean was 5.86. Discussion With SAT differences statistically controlled, Black partici- pants performed worse than White participants when the test Was presented as a measure of their ability, but improved dra- matically, matching the performance of Whites, when the test was presented as less reflective of ability. Nonetheless, the race- by-diagnosticity interaction testing this relationship reached only marginal significance, and then, only when participants from the non-diagnostic-challenge condition were excluded from the analysis. Thus there remained some question as to the reliability of this interaction. We had also reasoned that stereotype threat might un— dermine performance by increasing interfering thoughts during the test. But the conditions affected neither self-evaluative thoughts nor thoughts about the self in the immediate situation ( Sarason, 1980). Thus to further test the reliability of the pre- dicted interaction and explore the mediation of the stereotype threat effect, we conducted a second experiment. Study 2 We argued that the effect of stereotype threat on performance is mediated by an apprehension over possibly conforming to the negative group stereotype. Could this apprehension be detected as a higher level of general anxiety among stereotype-threatened participants? To test this possibility, participants in all condi- tions completed a version of the Spielberger State Anxiety 1n- ventory (STAI) immediately after the test. This scale has been successfully used in other research to detect anxiety induced by evaluation apprehension ( e.g., Geen, 1985 ). We also measured the amount of time they spent on each test item to learn whether greater anxiety was associated with more time spent answering items. Method Participants Twenty Black and 20 White Stanford female undergraduates were randomly assigned (with the exception of attaining equal cell sizes) to either the diagnostic or the nondiagnostic conditions as described in Study 1, yielding 10 participants per condition. Female participants were used in this experiment because. due to other research going on, we had considerably easier access to Black female undergraduates than to Black male undergraduates. This decision was justified by the finding of no gender differences in the first study. or, as it turned out. in any of the subsequent studies reported in this article—all of which used both men and women. Procedure This experiment used the same test used in Study 1, with several ex- ceptions; the final three anagram problems were deleted and the test period was reduced from 30 to 25 min. Also. the test was presented on a Macintosh computer (LCll). Participants controlled with the mouse how long each item or item component was on the screen and could, at their own pace, access whatever item material they wanted to see. The computer recorded the amount of time the items, or item components were on the screen as well as the number of referrals between item com- ponents (as in the reading comprehension items)——in addition to re- cording participants’ answers. Following the exam, participants completed the STAl and the cogni- tive interference measure described for Study 1. Also. on 1 l—point scales (with end-points not a! all and extremely) participants indicated the extent to which they guessed when having difficulty, expended effort on the test, persisted on problems, limited their time on problems, read problems more than once. became frustrated and gave up. and felt that the test was biased. Results and Discussion The ANCOVA performed on the number of items correctly solved yielded a significant main effect of race, F( 1, 35) z 802 CLAUDE M. STEELE AND JOSHUA ARONSON 10.04. p < .01, qualified by a significant Race X Test Descrip- tion interaction, F( 1, 35) = 8.07, p < .01. The mean SAT score for Black participants was 603 and for White participants 655. The adjusted means are presented in Figure 2. Planned con- trasts on the adjusted scores revealed that, as predicted, Blacks in the diagnostic condition performed significantly worse than Blacks in the nondiagnostic condition [(35) = 2.38, p < .02, than Whites in the diagnostic condition t( 35) = 3.75, p < .001, and than Whites in the nondiagnostic condition [(35) = 2.34, p < .025. For accuracy—the number correct over the number at— tempted—a similar pattern emerged: Blacks in the diagnostic condition had lower accuracy (M = .392) than Blacks in the nondiagnostic condition (M = .490) or than Whites in either the diagnostic condition (M = .485) or the nondiagnostic con- dition (M = .435). The diagnosticity-by-race interaction test- ing this pattern reached significance, F( 1, 35) = 4.18, p < .05. But the planned contrasts of the Black diagnostic condition against the other conditions did not reach conventional signifi- cance. although its contrasts with the Black nondiagnostic and White diagnOStic conditions were marginally significant, with ps of .06 and .09 respectively. Blacks completed fewer items than Whites, F( 1, 35) = 9.35, p < .01, and participants in the diagnostic conditions tended to complete fewer items than those in the nondiagnostic con- ditions, F( 1, 35) = 3.69, p < .07. The overall interaction did not reach significance. But planned contrasts revealed that Black participants in the diagnostic condition finished fewer items (M = 1238) than Blacks in the nondiagnostic condition (M = 18.53). t(35) = 2.50, p < .02; than Whites in the diag- nostic condition (M = 20.93), t( 35) = 3.39, p < .01; and than Whites in the nondiagnostic condition (M = 21.45), [(35) = 3.60, p < .01. These results establish the reliability of the diagnosticity-by- race interaction for test performance that was marginally sig- nificant in Study 1. They also reveal another dimension of the effect of stereotype threat. Black participants in the diagnostic condition completed fewer test items than participants in the other conditions. Test diagnosticity impaired the rate, as well as the accuracy of their work. This is precisely the impairment caused by evaluative pressures such as evaluation apprehen- sion, test anxiety, and competitive pressure (e.g., Baumeister, 1984). But one might ask why this did not happen in the near- identical Study 1. Several factors may be relevant. First, the most involved test items—reading comprehension items that I BLACK SUBJECTS r: g _ WHITE SUBJECTS > .D N , mean items solved (adlus DIAGNOSTIC NONDIAGN STIC Figure 2. Mean test performance Study 2. took several steps to answer—came first in the test. And second, the test lasted 25 min in the present experiment whereas it lasted 30 min in the first experiment. Assuming, then, that ste- reotype threat slowed the pace of Black participants in the diag- nostic conditions of both experiments, this 5-min difference in test period may have made it harder for these participants in the present experiment to get past the early, involved items and onto the more quickly answered items at the end of the test, a possibility that may also explain the generally lower scores in this experiment. This view is reinforced by the ANCOVA (with SATs as a covariate) on the average time spent on each of the first five test items—the minimum number of items that all participants in all conditions answered. A marginal effect of test presentation emerged, F ( 1, 35) = 3.52, p < .07, but planned comparisons showed that Black participants in the diagnostic condition tended to be slower than participants in the other conditions. On average they spent 94 s answering each of these items in contrast to 71 for Black participants in the nondiagnostic con- dition, ((35) = 2.39, p < .05; 73 s for Whites in the diagnostic condition, t(35) = 2.12, p < .05, and 71 s for Whites in the nondiagnostic condition, t(35) = 2.37, p < .05. Like other forms of evaluative pressure, stereotype threat causes an im- pairment of both accuracy and speed of performance. No differences were found on any of the remaining measures, including self-reported efibrt, cognitive interference, or anxiety. These measures may have been insensitive, or too delayed. Nonetheless, we lack an important kind of evidence. We have not shown that test diagnosticity causes in Black participants a specific apprehension about fulfilling the negative group stereo- type about their ability—the apprehension that we argue dis- rupts their test performance. To examine this issue we con- ducted a third experiment. Study 3 Taking an intellectually diagnostic test and experiencing some frustration with it, we have assumed, is enough to cause stereotype threat for Black participants. In testing this reasoning, the present experiment examines several specific propositions. First, if taking or expecting to take a difficult, intellectually di- agnostic test makes Black participants feel threatened by a spe- cifically racial stereotype, then it might be expected to activate that stereotype in their thinking and information processing. That is, the racial stereotype, and perhaps also the self-doubts associated with it, should be more cognitively activated for these participants than for Black participants in the nondiagnostic condition or for White participants in either condition (e.g., Davidio, Evans, & Tyler, 1986; Devine, 1989; Higgins, 1989). Accordingly, in testing whether test diagnosticity arouses this state, the present experi- ment measured the efiect of conditions on the activation of this stereotype and of related self-doubts about ability. Second, if test diagnosticity makes Black participants appre- hensive about fulfilling and being judged by the racial stereo- type, then these participants, more than participants in the other conditions, might be motivated to disassociate themselves from the stereotype. Brent Staples, an African American edito- rialist for the New York Times, offers an example of this in his recent autobiography, Parallel Time. He describes beginning graduate school at the University of Chicago and finding that as RACIAL STEREOTYPES AND TEST PERFORMANCE 803 he walked the streets of Hyde Park he made people uncomfort- able. They grouped more closely when he walked by, and some even crossed the street to avoid him. He eventually realized that in that urban context, dressed as a student, he was being per- ceived through the lens of a race-class stereotype as a potentially menacing Black man. To deflect this perception he learned a trick; he would whistle Vivaldi. It worked. Upon hearing him do this, people around him visibly relaxed and he felt out of suspicion. If it is apprehension about being judged in light of the racial stereotype that interferes with the performance of Black participants in the diagnostic condition, then these participants, like Staples, might be motivated to deflect such a perception by showing that the broader racial stereotype is not applicable to them. To test this possibility, the present experiment measured the effect of conditions on participants’ stated preferences for such things as activities and styles of music, some of which were stereotypic of African Americans. Third, by adding to the normal evaluative risks of test perfor- mance the further risk of self—validating the racial stereotype, the diagnostic condition should also make Black participants more apprehensive about their test performance. The present experiment measured this apprehension as the degree to which participants self—handicapped their expected performance, that is, endorsed excuses for poor performance before the test. The experiment took the form ofa 2 X 3 design in which the race of participants (African American or White) was crossed with diagnostic, nondiagnostic, and control conditions. The diagnostic and nondiagnostic conditions were the same as those described for Study 2, while in the control condition par- ticipants completed the critical dependent measures without expecting to take a test ofany sort. In the experimental condi- tions. the dependent measures were administered immediately after the diagnosticity instructions and just before the test was ostensibly to be taken. These included measures of stereotype activation, stereotype avoidance, and, as a measure of general performance apprehension, participants” willingness to self- handicap. Participants in this experiment never took the test. The measures of stereotype activation and stereotype avoid- ance, we felt, could activate the racial stereotype and stereo- type threat among Black participants in both the diagnostic and nondiagnostic conditions, making performance results difficult to interpret. If test diagnosticity threatens Black participants with a spe- cifically racial stereotype, then Black participants in the diag- nostic condition, more than participants in the other condi- tions, should show greater cognitive activation of the stereotype and ability—related self-doubts, greater motivation to disassoci- ate themselves from the stereotype, and greater performance apprehension as indicated by the endorsement of self-handicap- ping excuses. Method Participants Thirty-five Black (9 male. 26 female) and 33 White (20 male, 13 female) Stanford undergraduates were randomly assigned to either a diagnostic. nondiagnostic, or control condition, yielding from 10 to 12 participants per experimental group. Procedure A White male experimenter gave a booklet to participants as they arrived that explained that the study was examining the relationship between two types of cognitive processes: lexical access processing (LAP) and higher verbal reasoning (HVR). They were told that they would be asked to complete two tasks, one of which measured LAP—— “the visual and recognition processing of words"—and the other of which measured HVR—“abstract reasoning about the meaning of words.” Test diagnosticity was manipulated as in Study 1 with the fol- lowing written instructions to further differentiate the conditions: Diagnostic Because we want an accurate measure of your ability in these domains, we want to ask you to try as hard as you can to perform well on these tasks. At the end of the study. we can give you feedback which may be helpful by pointing out your strengths and weaknesses. Nondiagnostic: Even though we are not evaluating your ability on these tasks, we want to ask you to try as hard as you can to perform well on these tasks. If you want to know more about your LAP and HVR performance, we can give you feedback at the end ofthe study. Finally, participants were shown one sample item from the LAP (an item ofthe same sort as used in the fragment completion task) and three sample items from the HVRwdiflicult verbal GRE problems. The pur- pose ofthe HVR sample items was to alert participants to the difficulty ofthe test and the possibility ofpoor performance, thus occasioning the relevance of the racial stereotype in the diagnostic condition. Participants in the control condition arrived at the laboratory to find a note on the door from the experimenter apologizing for not being present. The note instructed them to complete a set of measures lying on the desk in an envelope with the participant‘s name on it. The enve- lope contained the LAP word fragment measure and the stereotype avoidance measure (described below) with detailed instructions. No mention of verbal ability evaluation was made. Measures Stereotype activation. Participants first performed a word-fragment completion task, introduced as the “LAP task,“ versions of which have been shown to measure the cognitive activation of constructs that are either recently primed or self-generated (Gilbert & Hixon. 1991; Tulv— ing, Schacter, & Stark, 1982). The task was made up of 80 word frag— ments with missing letters specified as blank spaces (e.g.. _ _ C E). Twelve of these fragments had as one possible solution a word reflecting either a race-related construct or an image associated with African Americans. The list was generated by having a group of 40 undergradu— ates (White students from the introductory psychology pool) generate a set of words that reflected the image of African Americans. From these lists, the research team identified the 12 most common constructs (e.g., lower class, minority) and selected single words to represent those con- structs on the task. For example, the word “race” was used to represent the construct “concerned with race” on the task. Then, for each of the words placed on the task, at least two letter spaces were omitted and the word was checked again to determine whether other, non-stereotype- related associations to the word stem were possible. Leaving at least two letter spaces blank in each word fragment greatly unconstrains the number of word completions possible for each fragment when com- pared to leaving only one letter space blank. This reduces the chance of ceiling effects in which virtually all participants would think of the race-related fragment completion. The complete list was as follows: _ C E (RACE): L A __ (LAZY): __ A C K (BLACK); __ o R (POOR); C L _ s _ (CLASS); B R _____ (BROTHER); ___ T E (WHITE); M 1 804 CLAUDE M. STEELE AND JOSHUA ARONSON (MINORITY); W E L _______ (WELFARE); C O _ (COLOR); T O____(TOKEN). We included a fairly high number ( 12) of target fragments so that if ceiling or floor effects occurred on some fragments it would be less likely to damage the sensitivity of the overall measure. To reduce the chance that participants would become aware of the racial nature of the target fragments, they were spaced with at least three filler items between them, and there were only two target fragments per page in the task booklet. Participants were instructed to work quickly, spending no more than 15 s on each item. Self-doubt activation. Seven word fragments reflecting self-doubts about competence and ability were included in the 80—item LAP task: L O_____(LOSER); D U_____(DUMB); S H A__(SHAME); ____ E R I O R (INFERIOR); F L_______ (FLUNK); _A R D (HARD); W _ __ K (WEAK). These were generated by the research team, and again included at least two blank letter spaces in each frag— ment. As with the racial fragments, these were separated from one an- other (and from the racial fragments) by at least three filler items. Stereotype avoidance. This measure asked participants to rate their preferences for a variety of activities and to rate the self-descriptiveness of various personality traits, some of which were associated with images of African Americans and African American life. Participants in the diagnostic and nondiagnostic conditions were told that these ratings were taken to give us a better understanding of the underpinnings of LAP and H VR processes. Control participants were told that these mea- sures were being taken to assess the typical interests and personality traits of Stanford undergraduates. The measure contained 57 items ask- ing participants to rate the extent to which they enjoyed a number of activities (e.g., pleasure reading, socializing, shopping, traveling, etc.), types of music (e.g., jazz, rap music, classical music), sports (e.g., base— ball, basketball, boxing), and finally, how they saw themselves standing on various personality dimensions (e.g., extroverted, organized, humor- ous, etc.). All ratings were made on 7-point Likert scales with l indicat- ing the lowest preference or degree of trait descriptiveness. Some of these activities and traits were stereotypic of African Americans. For an item to be selected as stereotypic, 65% of our pretest sample of 40 White participants had to have generated the item when asked to list activities and traits they believed to be stereotypic of African Americans. In the activities category, the stereotype-relevant items were: “How much do you enjoy sports?” and “How much do you enjoy being a lazy ‘couch potato’?" The stereotype-relevant music preference item was rap music; _ the stereotype-relevant sports preference item was basketball; and the stereotype-relevant trait ratings were lazy and aggressive/belligerent. Participants also completed a brief demographic questionnaire (asking their age, gender, major, etc.) just before they expected to begin the test. As another measure of participants’ motivation to distance themselves from the stereotype, the second item of this questionnaire gave them the option of recording their race. We reasoned that partici- pants who wanted to avoid having their performance viewed through the lens of a racial stereotype would be less willing to indicate their race. Self-handicapping measure. This measure just preceded the demo- graphic questionnaire. The directions stated “as you know, student life is sometimes stressful. and we may not always get enough sleep, etc. Such things can affect cognitive functioning, so it will be necessary to ask how prepared you feel.” Participants then indicated the number of hours they slept the night before in addition to responding, on 7-point scales (with 7 being the higher rating on these dimensions) to the fol- lowing questions: “How able to focus do you feel?;” “How much stress have you been under lately7;” “How tricky/unfair do you typically find standardized tests?” Results Stereotype Activation A 2 (race) X 3 (condition: diagnostic, nondiagnostic, or control) ANCOVA (with verbal SAT as the covariate: Black mean = 581, White mean = 650) was performed on the number of target word fragments filled in with stereotypic completions. This analysis yielded significant main effects for both race, F ( l, 61 ) = 13.77, p < .001, and for experimental condition, F(2, 61) = 5.90, p < .005. These main effects, however, were qualified by a significant Race X Condition interaction, F (2, 61 ) = 3.30, p < .05. Figure 3 shows that as expected, the diagnostic condition significantly increased the number of race-related completions of Black participants but not of White participants. Black par- ticipants in the diagnostic condition produced more race-re- lated completions (M = 3.70) than Black participants in the nondiagnostic condition (M: 2.10), t(61 ) = 3.53,p < .001, or for that matter, more than participants in any of other condi- tions, all ps < .05. Self-Doubt Activation It did the same for their self doubts. The number of self- doubt-related completions of self~doubt target fragments were submitted to an ANCOVA (as described above) yielding a main effect of experimental condition, F (2, 61 ) = 4.33, p < .02, and a Race X Condition interaction, F (2, 61) = 3.34, p < .05. As Figure 3 shows, Black participants in the diagnostic condition, as predicted, generated the most self-doubt-related comple- tions, significantly more than Black participants in the nondi- agnostic condition, t(6l ) = 3.52, p < .001, and more than par- ticipants in any of the other conditions as well, all ps < .05. Stereotype Avoidance The six preference and stereotype items described above were summed to form an index of stereotype avoidance that ranged from 6 to 42 with 6 indicating high avoidance and 42 indicating low avoidance (Cronbach’s alpha = .65). When these scores were submitted to the ANCOVA they yielded a significant effect of condition, F (2, 61 ) = 4.73, p < .02, and a significant Race X Condition interaction, F (2, 61) = 4.14, p < .03. As can be seen in Figure 3, Black participants in the diagnostic condition were the most avoidant of conforming to stereotypic images of African Americans (M = 20.80), more so than Black participants in the nondiagnostic condition (M = 29.80), t(6l) = 3.61, p < .001, and/or White participants in either condition, all ps < .05. Indicating Race Did the ability diagnosticity of the test affect participants’ tendency to indicate their race on the demographic question- naire? Among Black participants in the diagnostic condition, only 25% would indicate their race on the questionnaire, whereas 100% of the participants in each of the other conditions would do so. Using a 0/ 1 conversion of the response frequencies (with 0 = refusal to indicate race and 1 = indication of race) the standard ANCOVA performed on this measure revealed a marginally significant effect of race, F( l, 61 ) = 3.86, p < .06, a significant effect of condition, F (2.61) = 3.40, p < .04, and a significant Race X Condition interaction, F ( l, 61) = 6.60, p < .01, all due, of course, to the unique unwillingness of Black participants in the diagnostic condition to indicate their race. racial word completions 0 (II A M sell-doubt word completions -t (a) stereotyplc eelf—characterlzatlon Stereotype Activation Measure RACIAL STEREOTYPES AND TEST PERFORMANCE 805 BLACK SUBJECTS WHITE SUBJECTS Stereotype Avoidance Measure I BLACK SUBJECTS 8 WHITE SUBJECTS DIAGNOSUC l‘fk’llf't' .i’. Indicators of stereotype threat. NONDIAGNOS‘HC CO Self-Handicapping Four measures assessed participants” desire to claim impedi- ments to performance. Because participants in the control con- ditions did not complete this measure, these responses were submitted to separate 2(race) >< 2(diagnosticity) ANCOVAs. Cell means are presented in Table 1. Framing the verbal tasks as diagnostic of ability had significant effects on three of the four measures. For the number of hours of sleep, the ANCOVA yielded a significant effect ofrace, F( 1, 39) = 8.22, p < .01, and a significant effect ofcondition, F( l, 39) = 6.53, p < .02. These effects were qualified by a significant Race X Condition interac- tion, F( 1, 39) = 4.1, p < .01. For participants’ ratings of their ability to focus, a similar result emerged: main effects of race, F( 1, 39) = 7.26, p < .02, and condition, F( 1, 39) = 10.67, p < .01, and a significant qualifying interaction, F( 1, 39) = 5.73, p < .03. And finally, the same pattern of effects emerged for participants’ ratings of how tricky or unfair they generally find standardized tests to be: a race main effect. F( 1, 39) = 13.24, p < .001, a condition main effect, F(1, 39) = 13.42, p < .001, and a marginally significant, qualifying interaction, F( 1. 39) = 3.58, p < .07. No significant effects emerged on participants” ratings of their current stress. Discussion We had assumed that presenting an intellectual test as diag- nostic of ability would arouse a sense of stereotype threat in Black participants. The present results dramatically support this assumption. Compared to participants in the other condi- tions—that is, Blacks in the nondiagnostic condition and Whites in either condition—Black participants expecting to take a difficult, ability-diagnostic test showed significantly greater cognitive activation of stereotypes about Blacks, greater cognitive activation of concerns about their ability, a greater ten- dency to avoid racially stereotypic preferences, a greater ten— dency to make advance excuses for their performance, and fi- nally, a greater reluctance to have their racial identity linked to their performance even in the pedestrian way of recording it on their questionnaires. Clearly the diagnostic instructions caused these participants to experience a strong apprehension, a dis- tinct sense of stereotype threat. Table 1 Sclleana'icapping Responses in 51114113 Experimental condition Diagnostic Nondiagnostic Blacks Whites Blacks Whites Measure 01:12) (n=ll) (n:11) (r1210) Hours ofsleep 5.102, 7.481., 7.05., 7.70b Ability to focus 4.03a 5.88b 5.85b 6.16b Current stress 5.51a 5.24a 5.00a 5.02a Tests unfair 5.46a 2.78,, 3.14., 2.04b Nate. Means not sharing a common subscript differ at the .01 level according to Bonferroni procedure. Means sharinga common subscript do not differ. 806 CLAUDE M. STEELE AND JOSHUA ARONSON So far, then, we have shown that representing a difficult test as diagnostic of ability can undermine the performance of Black participants, and that it can cause in them a distinct sense of being under threat of judgment by a racial stereotype. This ma- nipulation of stereotype threat—in terms of test diagnostic- ity—is important because it establishes the generality of the effect to a broad range of real-life situations. But two questions remain. The first is whether stereotype threat itself—in the absence of the test being explicitly diagnos- tic of ability—is sufficient to disrupt the performance of these participants on a difficult test. That is, we do not know whether mere activation of the stereotype in the test situation—without the test being explicitly diagnostic of ability—would be enough to cause such effects. A second question is whether the disrup- tive effect of the diagnosticity manipulation was in fact medi- ated by the stereotype threat it caused. Showing first that test diagnosticity disrupts Black participants’ performance and then, separately, that it causes in these participants to be threat- ened by the stereotype, does not prove that the effect of test diagnosticity on performance was mediated by the stereotype threat it caused. The performance effect could have been medi- ated by some other effect of the diagnosticity manipulation. We conducted a fourth experiment to address these questions, and thereby, to test the replicability of the stereotype threat effect under different conditions. Study 4 This experiment again crossed a manipulation of stereotype threat with the race of participants in a 2 X 2 design with test performance as the chief dependent measure. We addressed the first question ab0ve by representing the test in this experiment as nondiagnostic of ability. If stereotype threat then depressed Black participants’ performance, we would know that stereo- type threat is sufficient to cause this effect even when the test is not represented as diagnostic of ability. We addressed the sec- ond question by taking from Study 3 a dependent measure of stereotype threat that had been significantly affected by the di- agnosticity manipulation, and manipulating that variable as an independent variable in the present experiment. If this manip- ulation then affects Black participants’ performance, we would know that at least one aspect of the stereotype threat caused by the diagnosticity manipulation was able to impair performance. This would mean that the effect of that manipulation on perfor— mance was, or could have been, mediated by the stereotype threat it caused. The variable that we manipulated in the present study was whether or not participants were required to list their race be— fore taking the test. Recall that in Study 3, 75% of the Black participants in the diagnostic condition refused to record their race on the questionnaire when given the option, whereas all of the participants in the other conditions did. On the assumption that this was a sign of their stereotype avoidance, we reasoned that having participants record their race just prior to the test should prime the racial stereotype about ability for Black par- ticipants, and thus make them stereotype threatened. If this threat alone is sufficient to impair their performance, then, with SATs covaried, these participants should perform worse than White participants in this condition. In the non—stereotype-threat conditions, the demographic questionnaire simply omitted the item requesting participants’ race and, otherwise, followed the nondiagnostic procedures of Studies I and 2. Without raising the specters of ability or race- relevant evaluation, we expected Black participants in this con- dition to experience no stereotype threat and to perform (adjusted for SATs) on par with White participants. Method Design and Participants This experiment took the form of a 2 X 2 design in which partici- pants’ race was crossed with whether or not they recorded their ethnicity on a preliminary questionnaire. Twenty-four Black (6 male, 18 female) and 23 White ( l 1 male, 12 female) Stanford undergraduates were ran- domly assigned to either the race-prime condition or the no-race-prime condition. Data from two Black participants were discarded because they arrived with suspicions about the racial nature of the study. One White student failed to provide her SAT score and was discarded from data analyses. These participants were replaced to bring the number of participants in each of the four conditions to l 1. Procedure The procedure closely paralleled that of the nondiagnostic conditions in Studies I and 2. After explaining the purpose and format of the test, the experimenter (White man) randomly assigned the participant to the race-prime or no-race-prime condition by drawing a brief questionnaire (labeled “personal information”) from a shuffled stack. This question- naire comprised the experimental manipulation. It was identical for all participants—asking them to provide their age, year in school, major, number of siblings, and parents‘ education—except that in the race- prime condition the final item asked participants to indicate their race. Because this questionnaire was given to the participant immediately prior to the test, the experimenter remained blind to the participant’s condition throughout the pretest interaction. After ensuring that the participant had completed the questionnaire, the experimenter started the test and left the room. Twenty-five minutes later he returned, col- lected the test, and gave the participant a dependent measure questionnaire. Dependent Measures This experiment used the same 25-min test used in Study 2, but in this experiment it was administered on paper. During the test, partici- pants marked their guesses, and after the test, they indicated on 1 1- point scales (with end points not at all and extremely) the extent to which they guessed when they were having difficulty, expended effort on the test, persisted on problems, limited their time on problems, read problems more than once, became frustrated and gave up, and felt that the test was biased. Participants also completed a questionnaire aimed at measuring their stereotype threat, by expressing their agreement on 7-point scales (with endpoints strongly disagree and strongly agree) with each of eight statements (e.g., “Some people feel I have less verbal ability be- cause of my race,” “The test may have been easier for people of my race,” “The experimenter expected me to do poorly because of my race,” “In English classes people of my race often face biased evalua- tions,” “My race does not affect people’s perception of my verbal ability”). As a measure of academic identification, nine further items ex- plored the effect of conditions on participants’ perceptions of the im- portance of verbal and math skills to their education and intended RACIAL STEREOTYPES AND TEST PERFORMANCE 807 career (cg. “verbal skills will be important to my career.” “1 am a verbally oriented person.“ “I feel that math is important to me.” etc). Participants responded to these items on 11-point scales with end- points labeled no! at all and extremele Results Test Performance A 2 (race) X 2 (race prime vs. no race prime) ANCOVA on test performance with self-reported SATs as a covariate (Black mean = 591, White mean = 643) revealed a strong condition interaction in the predicted direction. As Figure 4 shows, Blacks in the race-prime condition performed worse than virtually all of the other groups. yet in the no-race—prime condition their performance equaled that ofWhites, F( l. 39) = 7.82.17 < .01. Planned contrasts on these adjusted scores revealed that, as pre- dicted. Blacks in the race-prime condition performed signifi- cantly worse than Blacks in the no-race—prime condition, t( 39) = 2.43. p < .02. and significantly worse than Whites in the race- prime condition. [(39) = 2.87. p < .01. Black participants in the race-prime condition performed worse than Whites in the no-race—prime condition. but not significantly so. Nonetheless, the comparison pitting the Black race-prime condition against the three remaining conditions was highly significant. F( l. 39) = 8.15.p< .01. Accuracy The ANCOVA for this index—the percent correct of the items attempted for each participant—with participants’ SATs as the covariate revealed a significant tendency for participants in the race-prime condition to have poorer accuracy, F( 1, 39) = 4.07, p = .05. The adjusted means for the Black and White participants in the race-prime condition were .402 and .438 respectively. while those for the Black and White participants in the no-race—prime condition were .541 and .520 respec- tively. Condition contrasts did not reach significance, although the difference between the Black participants in the race- prime and no-race-prime conditions was marginally signifi- cant. p < .08. Again. these data suggest that lessened accuracy is part ofthe process through which stereotype threat impairs performance. N O I BLACK SUBJECTS WHITE SUBJECTS OMAO); V mean items solved (adjusted by SAT) a N0 RACE PRIME RACE PRIME Figure 4. Mean test performance Study 4. Number of I tems Completed An ANCOVA (again with SATs removed as a covariate) re- vealed only a significant Race X Race Prime interaction for the number of test items participants completed, F( 1, 39) = 12.13. p < .01. In the race-prime condition Blacks completed fewer items than Whites, t( 39) = 3.83. p < .001. The adjusted means were 1 1.58 and 20.15 respectively. In the no-race-prime condition, however, Blacks and Whites answered roughly the same number of problems. The adjusted means were 15.32 and 13.03. respectively. Performance-Relevant Measures Although participants’ postexam ratings revealed no differ- ences in the degree to which they thought they guessed on the test (F < l ), the ANCOVA performed on the actual number of guesses participants indicated on their test sheet revealed a Race X Race Prime interaction, F( 1. 39) = 5.56, p < .03. Black par- ticipants made fewer guesses when race was primed (M = 1.99) than when it was not (M = 2.74). whereas White participants tended to guess more when race was primed (M = 4.23) than when it was not (M = 1.58). No significant condition effects emerged for participants’ self—reported effort where. on an 1 1- point scale with l 1 indicating extremely hard work, the overall mean was 8.84. Participants’ estimates of how well they had performed, taken after the test, showed no condition effects (the overall mean was 7.4 items). Neither were there condition effects on participants’ ratings (made during the postexperimental debriefing) of how much having to indicate their ethnicity bothered them during the test (or would have bothered them in the case of participants in the no-race-prime condition). The overall mean was 3.31 on an 1 l-point scale for which 1 1 indicated the most distraction. Participants often stated in postexperimental interviews that they found recording their race unnoteworthy because they had to do it so often in everyday life. Of the items bearing on partic- ipants’ experience taking the test. only one effect emerged: Black participants reported reading test items more than once to a greater degree than did White participants, F ( l, 39) = 8.62, p < .01. Stereotype Threat and Academic Identification Measures A MANOVA of the stereotype threat scale revealed that Black participants felt more stereotype threat than White participants, F (9, 31) = 8.80, p < .01. No other effects reached significance. Analyses of participants’ responses to questions regarding the per- sonal importance of math. verbal skills, and athletics revealed that Black participants reported valuing sports less than Whites, F ( 1, 39) = 4.1 l. p < .05. As in Study 3. this result may reflect Black participants distancing themselves from the stereotype of the aca- demically untalented Black athlete. Correlations between partici- pants’ numerical performance estimates and their ratings of the importance of sports, showed that for Blacks. the worse they be- lieved they performed, the more they devalued sports—in the no— race-prime condition (r = .56). and particularly in the race-prime condition (r = .70). 808 CLAUDE M. STEELE AND JOSHUA ARONSON Discussion Priming racial identity depressed Black participants’ perfor- mance on a difficult verbal test even when the test was not pre- sented as diagnostic of intellectual ability. It did this, we as- sume, by directly making the stereotype mentally available and thus creating the self-threatening predicament that their perfor- mance could prove the stereotype self-characteristic. In Studies 1, 2 and 3, the stereotype was evoked indirectly by describing the test as diagnostic of an ability to which it was relevant. What this experiment shows is that mere cognitive availability of the racial stereotype is enough to depress Black participants’ intel- lectual performance, and that this is so even when the test is presented as not diagnostic of intelligence. Also—because we know from Study 3 that the diagnosticity manipulation strongly affects participants’ willingness to record their race—this find- ing shows that the performance-depressing effect of the diagnos- ticity manipulation in the earlier experiments was, or could have been, mediated by the effect of that manipulation on ste- reotype threat—as opposed to some other aspect of the manipulation. Still, we had expected Black participants in the race-prime condition to show more stereotype threat (as measured by the stereotype threat and stereotype avoidance measures) than Black participants in the no-race-prime condition—reflecting the effect of the manipulation. Instead, while Blacks showed more stereotype threat than Whites, Blacks in the race-prime condition showed no more stereotype threat than Blacks in the no-race—prime condition. Nor did these groups differ on the identification measures. This may have happened for several reasons. These measures came after the test in this experiment, not before it as in Study 3. Thus, after experiencing the difficult, frustrating exam, all Black participants may have been some- what stereotype threatened and stereotype avoidant (more so than the White participants) regardless of their condition. Also, the lack of a condition difference between Black participants on the stereotype threat and identification items may have oc- curred because these items asked participants to respond in ref- erence to settings (e.g., English classes) and attitudes (e.g., about how one’s race is generally regarded) that are beyond their immediate experience in the experiment. Compared to participants in the other conditions, Black par- ticipants in the race-prime condition did not report expending less effort on the test; they were not more disturbed at having to list their race; and they did not guess more than other partici- pants. Also, Black participants in both conditions reread the test items more than White participants. Such findings do not fit the idea that these participants underperformed because they withdrew effort from the experiment. To establish the replicability of the race-prime effect and to explore the possible mediational role of anxiety, we conducted a two-condition experiment which randomly assigned only Black participants to either the race—prime or no—race—prime condi- tions described in Study 4. We also administered the test on computer to enable a measure of the time participants spent on the items, and gave participants an anxiety measure at the end of the experiment. Replicating Study 4, race-prime participants got significantly fewer items correct (M = 4.4) than no-race- prime participants (M = 7.7), [(18) = 2.34, p < .04; they were marginally less accurate (M = .334) than no-race-prime partic- ipants (M = .395), p = .10; and they answered fewer items (M = 13.2) than no-race-prime participants (M = 20.1 ), t(18) = 2.89, p < .01. Race-prime participants spent more time on the first five test items (the number which all participants completed) (M = 79 5) than no-race-prime participants (M = 61 s), t( 18) = 2.27, p < .04, and they were significantly more anxious than no—race-prime participants, t( 18) = 2.34, p < .04. The means on the STAI were 48.5 and 40.5 respectively, on a scale that ranged from 20 (indicating low anxiety) to 80 (extreme anxiety). These results show that a race prime reliably depresses Black participants’ performance on this difficult exam, and that it causes reactions that could be a response to stereotype threat—namely, an anxiety-based perseveration on especially the early test items, items that, as reading compre- hension items, required multiple steps. General Discussion The existence of a negative stereotype about a group to which one belongs, we have argued, means that in situations where the stereotype is applicable, one is at risk of confirming it as a self- characterization, both to one’s self and to others who know the stereotype. This is what is meant by stereotype threat. And when the stereotype involved demeans something as important as intellectual ability, this threat can be disruptive enough, we hypothesize, to impair intellectual performance. In support of this reasoning, the present experiments show that making African American participants vulnerable to judg- ment by negative stereotypes about their group’s intellectual ability depressed their standardized test performance relative to White participants, while conditions designed to alleviate this threat, improved their performance, equating the two groups once their differences in SATs were controlled. Studies 1 and 2 produced this pattern by varying whether or not the test was represented as diagnostic of intellectual ability—a procedure that varied stereotype threat by varying the relevance of the ste- reotype about Blacks’ ability to their performance. Study 3 pro- vided direct evidence that this manipulation aroused stereotype threat in Black participants by showing that it activated the ra- cial stereotype and stereotype-related self-doubts in their think- ing, that it led them to distance themselves from African Amer- ican stereotypes. Study 4 showed that merely recording their race—presumably by making the stereotype salient—was enough to impair Black participants’ performance even when the test was not diagnostic of ability. Taken together these ex- periments show that stereotype threat—established by quite subtle instructional differences—can impair the intellectual test performance of Black students, and that lifting it can dramati- cally improve that performance. Mediation: How Stereotype Threat I mpairs Performance Study 3 offers clear evidence of what being stereotype threat- ened is like——as well as demonstrating that the mere prospect of a difficult, ability-diagnostic test was enough to do this to our sample of African American participants. But how precisely did this state of self-threat impair performance, through what mechanism or set of mechanisms did the impairment occur? RACIAL STEREOTYPES AND TEST PERFORMANCE 809 There are a number of possibilities: distraction, narrowed at— tention, anxiety, self-consciousness, withdrawal of effort, over— effort, and so on (e.g., Baumeister, 1984). In fact, several such mechanisms may be involved simultaneously, or different mechanisms may be involved under different conditions. For example, if the test were long enough to solidly engender low performance expectations, then withdrawal of effort might play a bigger mediational role than, say, anxiety, which might be more important with a shorter test. Such complexities notwith- standing, our findings offer some insight into how the present effects were mediated. Our best assessment is that stereotype threat caused an in- efficiency of processing much like that caused by other evalua- tive pressures. Stereotype-threatened participants spent more time doing fewer items more inaccurately—probably as a result of alternating their attention between trying to answer the items and trying to assess the self-significance of their frustration. This form of debilitation—reduced speed and accuracy—has been shown as a reaction to evaluation apprehension (e.g., Geen, 1985); test anxiety (e.g., Wine, 1971; Sarason, 1972); the presence of an audience (e.g., Bond, 1982); and competi- tion (Baumeister, 1984). Several findings, by suggesting that stereotype-threatened participants were both motivated and in- efficient, point in this direction. They reported expending as much effort as other participants. In those studies that included the requisite measures—Study 2 and the replication study ‘re- ported with Study 4—they actually spent more time per item. They did not guess more than non—stereotype-threatened par— ticipants, and, as Black participants did generally, they reported rereading the items more. Also, as noted, these participants were strong students, and almost certainly identified with the material on the test. They may even have been more anxious. Stereotype threat increased Black participants’ anxiety in the replication study, although not significantly in Study 2. To- gether then, these findings suggest that stereotype threat led par- ticipants to try hard but with impaired efficiency. Still, we note that lower expectations may have also been involved, especially in real-life occurrences of stereotype threat. As performance falters under stereotype threat, and as the stereotype frames that faltering as a sign of a group-based inferiority, the individual’s expectations about his or her abil- ity and performance may drop—presumably faster than they would if the stereotype were not there to credit the inability interpretation. And lower expectations, as the literature has long emphasized (e.g., Bandura, 1977, 1986; Carver, Blaney, & Scheier, 1979: Pyszczynski & Greenberg, 1983) can further undermine performance by undermining motivation and effort. It is precisely a process of stereotype threat fostering low expectations in a domain that we suggest leads eventually to disidentification with the domain. We assume that this pro- cess did not get very far in the present research because the tests were short, and because our participants, as highly iden— tified students, were unlikely to give up on these tests—as their self-reports tell us. But we do assume that lower expectations can play a role in mediating stereotype threat effects. There is, however, strong evidence against one kind of expec- tancy mediation. This is the idea that lowered performance or self- efficacy expectations alone mediated the effects of stereotype threat. Conceivably, the stereotype threat treatments got Black participants to expect that they would perform poorly on the test—presumably by getting them to accept the image of them- selves inherent in the racial stereotype. The stereotype threat con- dition did activate participants’ self-doubts. This lower expecta- tion, then, outside of any experience these participants may have had with the test itself, and outside of any apprehension they may have had about self—confirming the stereotype, may have directly weakened their motivation and performance. Of course it would be important to show that stereotype threat effects are mediated in African American students by expectations implicit in the ste— reotype, expectations powerful enough to more or less automati- cally cause their underperformance. V But there are several reasons to doubt this view. For one thing, it isn’t clear that our stereotype threat manipulations led Black participants to accept lower expectations and then to follow them unrevisedly to lower performance. For example, they re- sisted the self-applicability of the stereotype. But most impor- tant, as noted, it is almost certain that any expectation formed prior to the test would be superseded by the participants” actual experience with the test items; rising with success and falling with frustration. In fact, another experiment in our lab offered direct evidence of this by showing that expectations manipu- lated before the test had no effect on performance. Its procedure followed, in all conditions, that of the standard diagnostic con— dition used in Studies 1 and 2—with the exception that it di- rectly manipulated efficacy and performance expectations be- fore participants took the test. After being told that the test was ability diagnostic, and just before taking the test, the experi- menter (an Asian woman) asked participants what their SAT scores were. After hearing the score, in the positive expectation condition, she commented that the participant should have little trouble with the test. In the negative expectation condition, this comment indicated that the participant would have trouble with the test, and nothing was said in a no-expectation condi- tion. Both White and Black participants were run in all three expectation conditions. While the experiment replicated the standard effect of Whites outperforming Blacks under these ste- reotype threat conditions (participants’ SATs were again used as a covariate) F( 1, 32) = 5.12, p < .03, this personalized ex- pectation manipulation had no effect on the performance of ei- ther group. For Blacks,‘the means were 4.32, 6.38, and 6.55, for the positive, negative and no-expectations conditions, respec- tively, and for Whites, for the same conditions, they were 8.24, 9.25, and 11.23, respectively. Thus in an experiment that was sensitive enough to replicate the standard stereotype threat effect, expectations explicitly manipulated before the test had no effect on performance. They are unlikely, then, to have been the medium through which stereotype threat affected perfor- mance in this research. Finally, participants in all conditions of these experiments were given low performance expectations by telling them that they should expect to get few items correct due to the difficulty of the test. Importantly, this instruction did not depress the per- formance of participants in the non-stereotype-threat condi- tions. Thus it is not likely that a low performance expectation, implied by the stereotype, would have been powerful enough, by itself, to lower performance among these participants when a direct manipulation of the expectation could not. 810 CLAUDE M. STEELE AND JOSHUA ARONSON The Emerging Picture of Stereotype Threat In the social psychological literature there are other con- structs that address the experience of potential victims of ste- reotypes. For clarity’s sake, we briefly compare the construct of stereotype threat to these. “Token ” Status and Cognitive Functioning Lord & Saenz ( 1985) have shown that token status in a group—that is, being the token minority in a group that is oth- erwise homogeneous—can cause deficits in cognitive function- ing and memory, presumably as an outgrth of the self-con- sciousness it causes. Although probably in the same family of effects as stereotype threat, token status would be expected to disrupt cognitive functioning even when the token individual is not targeted by a performance-relevant stereotype, as with, for example, a White man in a group of women solving math prob- lems. Nor do stereotype threat effects require token status, as was shown in the present experiments. In real life, of course, these two processes may often co-occur, as for the Black in an otherwise non-Black classroom. They are nonetheless, distinct processes. Attributional Ambiguity Another important theory, and now extensive program of re- search by Crocker and Major (e.g., Crocker & Major, I989; Crocker, Voelkl, Testa, & Major, 1991) examined how people contend with the self-evaluative implications of having a stig- matized identity. Both their theory and ours focus on the psy- chology of contending with social devaluation and differ most clearly in which aspect of this psychology they attend to. The work of Crocker and Major focused on the implications of this psychology for self-esteem maintenance (for example, the strat- egies available for protecting self-esteem against stigmatized status) and we have focused on its implications for intellectual performance. There is also a conceptual difference. Attribu- tional ambiguity refers to the confusion a potential target cf prejudice might have over whether or not he is being treated prejudicially. Stereotype threat, of course, refers to his appre- hension over confirming, or eliciting the judgment that the ste- reotype is self-characteristic. Again, the two processes can co~ occur—as for the woman who gets cut from the math team, for example—but are distinct. The Earlier Research of the Katz Group We also note that stereotype threat may explain the earlier findings of Katz and his colleagues. They found in the 19605 that the intellectual performance of Black participants rose and fell with conditions that seemed to vary in stereotype threat—— for example, whether the test was represented as a test of intel- ligence or as one of psychomotor skill. A stereotype threat in- terpretation of these findings was foiled, however, by the lack of White participant control groups. Thus, the finding that manip- ulations very similar to Katz’s depressed Black participants’ performance while not depressing White participants’ perfor- mance makes stereotype threat a parsimonious account of all these findings. Test Difliculty and Racial Diflerences in Standardized Test Performance The test used in these experiments is quite difficult, as the low performance scores indicate. As we argued, it may have to be at least somewhat demanding for stereotype threat to be occasioned. But acknowledging this parameter raises a question: Does stereo- type threat significantly undermine the performance of Black stu- dents on the SAT? And if it does, is it appropriate to use the SAT as the standard for equating Black and White participants on skill level within our experiments? The answer to the first question has to be that it depends on how much frustration is experienced on the SAT. If the student perceives that a significant portion of the test is within his or her competence, it may preempt or override stereotype threat by proving the stereotype inapplicable. When the student cannot gain this perception, however, the group stereotype becomes relevant as an explanation and may undermine perfor- mance. Thus we surmise that over the entire range of Black stu- dent test takers, stereotype threat causes a significant depression of scores. And, of course, this point holds more generally. An important implication of this research is that stereotype threat is an underap- preciated source of classic deficits in standardized test perfor~ mance (e.g., IQ) suffered by Blacks and other stereotype-threat- ened groups such as those of lower socioeconomic status and women in mathematics (Hermstein, 1973; Jensen, 1969, 1980; Spencer & Steele, 1994). In addition to whatever environmental or genetic endowments a person brings to the testing situation, this research shows that this situation is not group-neutraJ—not even, quite possibly, when the tester and test content have been accom- modated to the test-taker’s background. The problem is that ste- reotypes afoot in the larger society establish a predicament in the testing situation—aside from test content—that still has the power to undermine standardized test performance, and, we suspect, contribute powerfully to the pattern of group differences that have characterized these tests since their inception. But, for several reasons, we doubt that this possibility compro- mises the interpretation of the present findings. First, it is unlikely that stereotype threat had much diiferential effect on the SATs of our Black and White participants since both groups, as highly se- lected students, are not likely to have experienced very great frus- tration on these tests. Second, even if our Black participants’ SATs were more depressed in this way, using such depressed scores as a covariate in the present analyses would only adjust Black perfor- mance more in the direction of reducing the Black—White differ- ence in the stereotype threat conditions. Thus, while a self-threat- eningly difficult test is probably a necessary condition for stereo- type threat, and while stereotype threat may commonly depress the standardized test performance of Black test takers, these facts are not likely to have compromised the present results. In conclusion, our focus in this research has been on how social context and group identity come together to mediate an important behavior. This approach is Lewinian; it is also hope- ful. Compared to viewing the problem of Black underachieve— ment as rooted in something about the group or its societal conditions, this analysis uncovers a social psychological pre- dicament of race, rife in the standardized testing situation, that is amenable to change—as we hope our manipulations have illustrated. RACIAL STEREOTYPES AND TEST PERFORMANCE 811 References A11port.G. ( 1954). The nature ()tktr‘e/ztdicc', New York: Addison-Wesley. American Council on Education. ( 1990). lliinorities in higher educa— tion. Washington. DC: Office ofMinority Concerns. Bandura. A. ( 1977). Self-efficacy: Toward a unifying theory of behav- ioral change. Hs_1'chological Review, 84. 191—2 15. Bandura. A. ( 1986). Fearful expectations and avoidant actions as coeffects of perceived self-inefficacy. American Psychologist. 41, 1389—1391. Baumeister. R. F. ( 1984). Choking under pressure: Self-consciousness and paradoxical effects ofincentives on skillful performance Journal of Personality and Social Psychology: 46. 6 10-620. Bond. C. F. ( 1982 ). Social facilitation: A self-presentational view. Journal olPersona/ityand Social Psychology 42, 1042—1050. Carter. 5. L. ( 1991 ). Reflections o/an affirmative action hairy New York: Basic Books. Carver. C. S.. Blaney. P. H.. & Schcier. M. F. (1979). Reassertion and giving up: The interactive role of self-directed attention and outcome expectancy. Journal o/‘Personality and Social Psychology 3 7, 1859— 1870. Cleary. T. A.. Humphreys. L. G.. Kendrick, S. A.. & Wesman. A. ( 1975). Educational uses oftests with disadvantaged students. Amer— ican Psychologist, 30. 15—4 1. Crocker. J.. & Major. B. ( 1989). Social stigma and self-esteem: The self- protective properties of stigma. Psychological Review, 96. 608—630. Crocker. J.. Voelkl. K.. Testa. M.. & Major. B. (1991). Social stigma: The affective consequences ofattributional ambiguity. Journal o/Per- sonality and Social Psychology 60. 2 18—228. Dcvine. P. G. ( 1989). Stereotypes and prejudice: Their automatic and controlled components. Journal of Personality and Social Psychol- og1156.5—18. Dovidio. J. F.. Evans. N.. & Tyler. R. B. ( 1986). Racial stereotypes: The contents of their cognitive representations. Journal of Experimental Social Psychology 22. 22—37. Easterbrook. J. A. ( 1959). The effect ofemotion on cue utilization and the organization of behavior. Psychological Rw'iew, 66. 183—201. Geen. R. G. ( 1985). Evaluation apprehension and response withhold- ing in solution of anagrams. Personality and Individual Differences. 6. 293—298. Geen. R. G. ( 1991 ). Social motivation. Annual Review o/Psychologi; 42. 377-399. Gilbert. D. T.. & Hixon. J. G. ( 1991 ). The trouble ofthinking: Activa- tion and application of stereotypic beliefs. Journal o/Personality and Social Psychology 60. 509—517. Goffman. 1. ( 1963). Stigma. New York: Simon & Shuster. Inc. Herrnstein. R. ( 1973). IQ in the meritocracy Boston: Little Brown. Higgins. E. T. ( 1989). Knowledge accessibility and activation: Subjec- tivity and suffering from unconscious sources. In J. S. Uleman & J. A. Bargh (Eds). Unintended Thoughts (pp. 75—123). New York: Guilford. Jensen. A. R. ( 1969). How much can we boost IQ and scholastic achievement? Harvard Educational Review, 39. 1—123. Jensen. A. R. ( 1980). Bias in mental testing. New York: Free Press. Katz. 1. ( 1964). Review of evidence relating to effects of desegregation on the intellectual performance of Negroes. American Psychologist. 19. 381—399. Katz. 1.. Epps. E. G.. & Axelson. L. J. ( 1964). Effect upon Negro digit symbol performance of comparison with Whites and with other Ne- groes. Journal o/Abnorma/ and Social Prtcho/ogy: 69. 963—970. Katz. 1.. Roberts. S. 0.. & Robinson, J. M. (1965). Effects of task difficulty. race ofadministrator. and instructions on digit-symbol per- formance of Negroes. Journal t2ch’rsona/ity and Social Psychology. 2. 53—59. Linn. R. L. (1973). Fair test use in selection. Review o/‘Educational Research. 43. 139—161. Lord. C. G., & Saenz. D. S. ( 1985). Memory deficits and memory sur- feits: Differential cognitive consequences of tokenism for tokens and observers. Journal ofPersonality and Social Psychology 4 9, 91 8—926. Lord. C. G.. Saenz. D. S.. & Godfrey. D. K. ( 1987). Effects ofperceived scrutiny on participant memory for social interactions. Journal of Experimental Social Psychology 23. 498—5 17. Nettles. M. T. ( 1988). Toward undergraduate strident equality in Amer- ican higher education. New York: Greenwood. Pyszczynski. T.. & Greenberg. J. (1983). Determinants of reduction in effort as a strategy for coping with anticipated failure. Journal of Research in Personality: I 7, 412—422. Sarason. 1. G. ( 1972). Experimental approaches to test anxiety: Atten— tion and the uses ofinformation. In C. D. Spielberger ( Ed.). Anxiety: Current trends in theory and research (Vol. 2 ). New York: Academic Press. Seta. J. J. (1982). The impact of coactors’ comparison processes on task performance. Journal ol'Personality and Social Psychology 42. 281—291. Spencer. S. J .. & Steele. C. M. ( 1994). Under suspicion ot’inability.‘ Ste- reotype vulnerability and women ’s math performance. Unpublished manuscript. State University of New York at Buffalo and Stanford University. Stanley. J. C. (1971). Predicting college success of the educationally disadvantaged. Science. 171, 640—647. Steele. C. M. ( 1992. April). Race and the schooling ofblack Americans. The Atlantic Monthly. Steele. S. ( 1990). The content ol'our character. New York: St. Martin’s Press. Tulving. E.. Schacter. D. L.. & Stark. H. A. ( 1982). Priming effects in word-fragment completion are independent of recognition memory. Journal of Experimental Psychology: Learning. Memory and C ogniv tion, 8, 336—342. Wine. J . ( 1971 ). Test anxiety and direction of attention. Psychological Bulletin, 76, 92—104. Received August 9, 1994 Revision received May 9. 1995 Accepted May 18. 1995 l ...
View Full Document

This note was uploaded on 04/03/2008 for the course PSYCH 101 taught by Professor Brill during the Fall '07 term at Rutgers.

Page1 / 15

Steele Article - ATTITUDES AND SOCIAL COGNITION Stereotype...

This preview shows document pages 1 - 15. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online