Among social psychologists, stereotype threat is a popular explanation for performance differences between groups. The most widely studied phenomenon are gender differences in performance on challenging math tests.
So far, there have been few registered replication studies of classic stereotype threat studies. Nevertheless, doubts have emerged about the robustness of stereotype threat effects. For example, Michael Inzlicht, who made some influential contributions to the literature (and is cited in Gilovich, Keltner, Chen, & Nisbett’s textbook) is no longer sure that stereotype threat effects can be replicated.

In addition, statistical analyses show that published stereotype threat effects were obtained with questionable research practices that inflate type-I error rates (Schimmack, 2015), and some large-scale studies that failed to produce the effect are neglected (Schimmack, 2017); these studies are also not mentioned in the textbook, despite the authors’ promise to be “scrupulous about noting when the evidence about a given point is mixed” (p. 57) .
Presentation of Stereotype Threat Research In Gilovich et al. (2019)
Students are told in detail about Spencer et al.’s (1999) seminal article about stereotype effects on women’s math performance, but they are not told that the study used only women who were highly identified with the field and that Steve Spencer himself argues that the effect would be weaker or absent if math is not important to participants. Thus, they overgeneralize the results from a specific subgroup of women to all women.
They then proclaim that “it seems that nobody is safe from stereotype threat” (p. 398), and that group differences between White and Asian students can be attributed to stereotype threat of White students.
They then point out that stereotype threat effects depend on the activation of a particular identity based on Shih, Pittinsky, & Ambady’s (1999) study that Asian female students’ performance seemed to change as a function of thinking about themselves as women (poorer performance) vs. Asians (superior performance).
To explain stereotype threat effects, they mention (a) arousal as a mediator (Ben-Zeev, Fein, & Inzlicht, 2005), (b) distraction as a mediator (Cheryan & Bodenhausen, 2000), (c) negative thought as a mediator (Cadinu, Maass, Rosabianca, & Kiesner, 2005), and (d) focusing on avoiding failure as a mediator (Seibt & Forster, 2004).
Finally, they suggest that stereotype threat may lead to gender differences in the participation in STEM (science, technology, engineering, and mathematics) fields and that social psychologists have developed “highly effective” interventions that reduce the effect of stereotype threat.
Replication Studies
The Curate Science database contains two replication studies of Shih et al.’s (1999) study. The article by Gibson, Losee, and Vitiello (2014) contains all of the relevant information in Table 1.

Shih et al. (1999) had a very small sample with n < 20 in each cell. The results show the expected pattern for accuracy, but not for absolute number of correct responses. Gibson et al. have a much larger sample and the results show the same pattern for accuracy, but not for correct responses. Moon and Roeder failed to find the expected pattern for accuracy or correct responses.
These results are not a total replication failure, but they also do not provide empirical support for the strong claims that are made in the textbook.
Aside from actual replication studies, it is also possible to examine replicability of published results using a statistical approach. I have used this approach, z-curve (Brunner & Schimmack, 2018) to conduct a replicability audit of stereotype threat studies (Schimmack, 2018).

The graph shows that the published literature consists of studies that were selected to show evidence for stereotype threat. A much larger number of studies with non-significant results remained unpublished, although it is also possible that stereotype threat researchers used other questionable research practices to publish mostly supportive evidence while, they had only a 23% probability to find an effect.
The z-curve method also estimates how many of the published studies could be false positives. The estimate is 70%. Thus, it is unclear which of the published studies would show evidence for stereotype threat without the help of statistical tricks.
Conclusion
Social psychologists prefer situational explanations for group differences in performance. However, the evidence for these situational effects is thin. This does not mean that group differences are due to essential, genetic differences between groups. Environmental factors can also produce group differences. A whole literature, called behavioral genetics, aims to separate genetic and environmental influences. This literature is ignored in social psychology textbooks, which focus on experimental manipulations of situations. As a result, social psychology textbooks fail to inform students about the actual causes of group differences in stable dispositions.
3 thoughts on “Social Psychology Textbook audiT: Stereotype Threat”