An outdated idealistic concept of science is that scientists are trying hard to test their theories in empirical studies and revise false theories when studies do not confirm their predictions. In reality, scientists are human and act in accordance with social psychologists’ description of human information processing.
“Instead of a naïve scientist entering the environment in search of the truth, we find the rather unflattering picture of a charlatan trying to make the data come out in a manner most advantageous to his or her already-held theories” (Fiske & Taylor, 1984).
So, a logical conclusion is that IAT researchers are charlatans because they are humans and humans are charlatans. More direct evidence for their untrustworthiness can be found in their publications (Schimmack, 2021). IAT researchers continue to conflate performance on an Implicit Association Test with measurement of implicit biases, although the wider community has rejected this view (Sherman & Klein, 2020). Even Greenwald and Banaji (2017) have walked back the original claim that the IAT probes implicit attitudes.
IAT researchers also continue to ignore valid criticism of their work. I feel compelled to write this blog post to highlight this blatant disregard of scientific criticism to promote a questionable computer task as an important tool to fight racism. A key claim is that changing scores on the race IAT is an important goal because these changes reflect changes in people’s attitudes that influence their behavior. This claim legitimizes simple and quick online studies that can be run with large samples, but may have little practical consequences for the understanding of race relationships and intergroup behaviors.
The latest propaganda piece by IAT researchers is Kurdi, Sanchez, Dasgupta, and Banaji’s (2023) article “(When) Do Counter attitudinal Exemplars Shift Implicit Racial Evaluations? Replications and Extensions of Dasgupta and Greenwald (2001)”
I was actually a reviewer of a manuscript of this paper and made several critical comments that the authors’ blissfully ignored. Instead, they provide a misleading description of the history of studies that aim to change scores on the race IAT and omit relevant articles that do not fit their uplifting message that IAT research is thriving and making theoretical progress in the understanding of racism.
The Bullshit Story
The authors start with a highly cited article by Dasgupta and Greenwald (2001) that suggested showing some counter-attitudinal examples can change IAT scores and that these changes even last for a week.
They claim that “the article by Dasgupta and Greenwald (2001) set into motion what was soon to become a fundamental shift in our understanding of the nature of racial attitudes and, specifically, implicit racial evaluations.”
They claim that over the past 20 years, “a firm theoretical understanding has emerged that implicit evaluations, including implicit racial evaluations, can exhibit sizable temporary shifts toward neutrality in response to a wide range of interventions”
They cite Lai et al. (2014), who demonstrated that “implicit racial
evaluations [a.k.a., IAT scores] were found to shift in response to a broad range of
experimental manipulations.”
The authors note that it hardly seems necessary to conduct a replication study of Dasgupta and Greenwald’s study, if there is robust evidence that IAT scores can be moved around. Their argument to do so is based on the fact that the original article has been cited over 1,800 times in Google Scholars. However, a better reason is that Dasgupta and Greewald’s study was the target of a large replication attempt well before large replication studies became fashionable in social psychology.
“Second, an independent replication today is timely given that a previous replication attempt published by Joy-Gaba and Nosek (2010) over a decade ago replicated the Dasgupta and Greenwald (2001) result with large samples but with considerably smaller effect
sizes (Cohen’s ds = 0.17 and 0.14).” [The authors do not mention that this set of replication studies had over N = 3,000 participants compared to the N < 50 in the original study.]
The authors then spend a lot of time to suggest that population effect sizes may have decreased over time due to societal changes. They leave out the well-known fact that large effect sizes in small samples are often followed by small effect sizes in large samples because researchers with small samples require luck or flair to get significance with low statistical power.
Study 1 is a simple online study with N = 1,533 participants. Despite the large sample size, the study failed to replicate Dasgupta and Greenwald’s results and produced an effect size of d = 0.02. In other words, nada. Nothing to see here, a result very much consistent with Joy-Gaba and Nosek’s (2010) results in large online studies a decade earlier.
Study 2 was also a bust, d = -0.04 and provided further evidence that effect sizes in Dasgupta and Greenwalds’ underpowered study were inflated and that the true effect size is much smaller or zero (Joy-Gaba & Nosek, 2010).
Study 3 and 4 once more confirmed that Dasgupta and Greenwald’s results could not be replicated, d = 0.15.
Although these replication failures are largely consistent with Joy-Gaba and Nosek’s results, the authors ignore this consistency.
“The failure to replicate the shifts in implicit racial evaluations observed by Dasgupta and Greenwald (2001) is puzzling for several reasons, the primary one being that other procedures, far less potent, have been shown to create malleability in implicit evaluations.”
Studies 5-7 use different procedures to shift IAT scores. Although these results are interesting on their own, they do not explain why Dasgupta and Greenwald’s (2017) results could not be replicated.
In the General Discussion, the authors discuss the replication failures.
“In three high-powered (total N > 1,800) and close-to-exact replications, we failed to obtain the effect originally reported by Dasgupta and Greenwald (2001). That is, we found no reduction in
pro-White/anti-Black implicit evaluations after exposure to positive Black and negative White exemplars (Experiments 1–3). Given the substantial amount of time that has elapsed since the original results were published, we can only make informed guesses about the reasons for the lack of replication.
“At a first approximation, it is conceivable that the original result was a false positive, in which case one should expect replication attempts to yield null results. Contrary to this possibility, some of the experiments conducted as part of the only known previous independent replication attempt by Joy-Gaba and Nosek (2010) produced statistically significant results.”
The shift to statistical significance is problematic because the effect sizes in Joy-Gaba and Nosek’s studies were small and much closer to the zero effect sizes in this study than to the large effect sizes in the original study. Consistent with this finding, Joy-Gaba and Nosek’s article was titled “The surprisingly limited malleability of implicit racial evaluations”. Kurdi et al’s results only show that it is even more limited than in Joy-Gaba and Nosek’s studies, but the fact remains that Dasgupta and Greenwald’s results do not provide a solid empirical foundation for interventions that can reduce racial biases.
The authors then try to sell the hypothesis that Dasgupta and Greenwald’s study in a small sample miraculously produced a precise estimate of the population effect size and that population effect sizes have really decreased over time.
“As such, we believe that it is more likely that the effect originally obtained in the late 1990s
decreased in size over time, both between 2001 and 2010 and between 2010 and 2023.”
If you believe this, you probably also believe in Santa Clause and Immaculate Conception.
They then suggest that the difference between lab and online studies could contribute to the different findings. They may not have read Joy-Gaba and Nosek’s article or simply failed to mention the fact that Joy-Gaba and Nosek tested this hypothesis.
“Experiment 3 was a direct replication of Experiments 2a and 2b across different settings (Internet or laboratory) and samples (undergraduate participant pool or heterogeneous
volunteers). Students in the participant pool at the University of Virginia completed the study either online or in the laboratory….As shown in Figure 2, a 2 (Condition) × 3 (Sample)
ANOVA revealed no main effect of Condition, F(1, 1178) = .76, p = .74, d = .05, and no interaction between Condition and Sample, F(2, 1177) = .04, p = .96, d = .01.” (Joy-Gaba & Nosek, 2010, Study 3).
To put a cherry on top, the authors totally ignore that one of Dasgupta and Greenwald’s amazing findings was that the manipulation seemed to be lasting a full day.
“Results revealed that exposure to admired Black and disliked White exemplars significantly weakened automatic pro-White attitudes for 24 hr beyond the treatment” (Dasgupta & Greenwald, 2001, abstract).
The authors did not even attempt to replicate this important finding, they also fail to mention that Lei et al. (2016) found that none of the manipulations that had immediate effects on IAT scores produced changes several days later. They also did not examine whether their successful manipulations in Studies 5-7 produced lasting effects.
In sum, this article makes no scientific contribution to the understanding of racism and ways to reduce it. Instead, it is a glaring piece of evidence why you shouldn’t trust IAT researchers. Of course, whether you trust them or me is up to you. It is a free world and there are no ethical guidelines that regulate publications. There is the illusion that peer-review corrects mistakes, but authors can get away with bullshit if editors let them.
In conclusion, producing lasting changes on IAT scores is hard and there is no solid evidence that it is possible. It is also not clear why this would be important because scores on the race IAT are messy measures of consciously accessible racial biases that do not predict behavior (Schimmack, 2021). The notion of implicit bias is unscientific, lacks empirical support, and implicit bias training has been ineffective or even harmful. It is time to fund research that studies real behaviors of discrimination and to stop wasting time on reaction times in online studies (Baumeister, Vohs, & Funder, 2007).
Why don’t you tell us what you really think!
The IAT has always be political. I don’t know why you are taking their statistics so seriously. I am not a member of your profession but I think I know a party line when I see one: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8688641/
If that link doesn’t work, the citation is Banaji MR, Fiske ST, Massey DS. Systemic racism: individuals and interactions, institutions and society. Cogn Res Princ Implic. 2021 Dec 20;6(1):82
Is this Fiske the same one you quote above, calling psychologists “charlatans.” Presumably that Fiske was talking about someone else.
Also, apart from it being a waste of time, why do you think implicit bias training is harmful? There’s certainly a lot of it going around.
Well, at least Banaji in the Wall Street Journal admitted that the way DEI trainer are using her test is totally wrong. https://www.wsj.com/business/c-suite/dei-training-hr-business-acd23e8b