Category Archives: Project Implicit

Who is who in social psychology? Project Implicit

Project Implicit is a nonprofit company founded in 1998 by three social psychologists:
Tony Greenwald (University of Washington)
Mahzarin Banaji (Harvard University)
Brian Nosek (University of Virginia)

Project Implicit is mainly known as the company that hosts a website where people receive (false) feedback about their implicit associations based on the Implicit Association Test (IAT). The website is hosted by Harvard University, which is prominently displayed in web searchers, presumably because many Americans associate Harvard with excellent science.

However, the ethical oversight for the activities of Project Implicit rests with the Institutional Review Board with the University of Virginia’s IRB for Social and Behavioral Sciences. The Harvard branding is real but largely a legacy of Banaji’s professorship there; the organization is legally independent of Harvard.”

Project Implicit is now also hosting on an independent site as About the IAT – Project Implicit. Thus, the connection with Harvard may come to an end, but the website hosted by Harvard is still operational.

People

Based on the ProPublica 990 data, the leaders of Project Implicit in the fiscal year 2025 were:

  • Amy Jin Johnson — Executive Director (the only compensated employee, at $111,038)
  • Dr. Brian Nosek — President (University of Virginia; co-founder)
  • Dr. Kate Ratliff — Treasurer (University of Florida)
  • Keith Maddox, PhD — Director
  • Jarvis Idowu — Director
  • Bayet Ross Smith — Director

The affiliation with University of Virginia and Brian Nosek’s role as president and co-founder make it clear that Brian Nosek is the main person responsible for the ethical integrity of Project Implicit’s scientific work and the administration of IATs to the general public.

Financials

The picture that emerges is of a very small operation that is burning through reserves. As a 501(c)(3), Project Implicit files Form 990s with the IRS, which are publicly accessible. The ProPublica Nonprofit Explorer has their filings going back to 2011.

For fiscal year ending September 2025: revenue of $104,552, expenses of $296,971, a net loss of $192,419, and total net assets of $365,382. The dominant revenue source was program services (82% of revenue, at $86,100), with investment income making up most of the rest. Public donations were negligible at $675 (0.6%).

The prior year (FY2024) showed revenue of $273,966 against expenses of $489,223 — another large deficit — and the year before that (FY2023) showed revenue of $436,655 against expenses of $522,546.

So revenues have dropped sharply over three years (~$437K → ~$274K → ~$105K) while expenses remain high relative to income. They are drawing down net assets at a significant rate.

The main revenue of Project Implicit are fees for program services:

  • Corporate/organizational DEI training and consulting — companies, government agencies, universities, and HR departments pay Project Implicit to run implicit bias workshops, license the IAT for their own use, or deliver training programs. This has been a significant revenue stream for them, especially during the DEI boom years of 2020–2022.
  • Licensing or access fees — organizations that want to use the IAT infrastructure for research or applied purposes may pay for that.
  • Speaking and educational programs — paid engagements where Project Implicit personnel deliver training.

The trajectory tells an interesting story. Program service revenue went from ~$308K (FY2023) to ~$240K (FY2024) to ~$86K (FY2025) — a collapse of roughly 72% in two years. That almost certainly tracks the broader pullback in corporate DEI spending that accelerated after 2023 and especially into 2024–2025. Thus, while the website hosts hundreds of different IATs, the race IAT is the bread and butter IAT that funds the organization. The collapse in revenues can be explained by the changing political climate under the “Make Racism Great Again” policies of the MAGA government. There is no evidence that sustained criticism of the validity of IATS in general and the race IAT specifically over the past decades has contributed to this sharp drop in revenues.

Mission Statement

Old mission statement, https://app-prod-03.implicit.harvard.edu/implicit/aboutus.html (retrieved 26-06-01)

Project Implicit’s mission statement has changed considerably over time, against the backdrop of accumulating scientific criticism of the IAT and the organization’s broader institutional repositioning. The changes are visible not only in the language itself, but also in where the organization now presents itself to the public.

An older version still visible on the Harvard-hosted site describes an organization that “provides consulting, education, and training services on implicit bias, diversity and inclusion, leadership, applying science to practice, and innovation” (app-prod-03.implicit.harvard.edu, retrieved June 1, 2026). The earliest version cached by the Wayback Machine, from 2013, contains the same language. The current projectimplicit.net site describes its educational work in considerably more cautious terms, as providing “research-based educational programs that translate findings from cognitive science into clear, accessible understanding of judgment and decision-making, without prescribing behavior change or organizational intervention.”

The phrase “without prescribing behavior change or organizational intervention” marks a significant retreat. The earlier language presented Project Implicit as an organization that translated implicit-bias science into diversity, inclusion, leadership, and applied organizational practice. The current language distances the organization from prescriptive behavior change and organizational intervention. This does not mean that Project Implicit has abandoned all consulting or educational services. Rather, it means that the organization has narrowed the public rationale for those services. It no longer presents itself as directly prescribing organizational change, but as providing research-based education about judgment and decision-making.

That retreat is important, but it is incomplete. Even the current mission statement continues to claim the authority of “research-based” education and the translation of “findings from cognitive science.” Those phrases preserve the impression that Project Implicit is communicating settled scientific knowledge. But the central scientific problem remains unresolved. The issue is not whether racial disparities, prejudice, or discrimination exist. They plainly do. The issue is whether IAT scores validly measure implicit prejudice at the individual level, and whether individualized feedback about hidden racial bias is scientifically justified.

The evidence does not support that stronger interpretation. IAT scores have limited validity, weak relations with behavior, and substantial ambiguity in what they measure (Schimmack, 2021). They are influenced by task-specific processes, cultural associations, and systematic sources of measurement error. In the case of the race IAT, the color-valence confound raises the additional possibility that scores partly reflect general associations with black and white rather than racial attitudes themselves. These limitations are not minor qualifications. They go to the construct validity of the measure and to the ethical defensibility of giving people individualized feedback about hidden racial bias.

Ethics

The administration of psychological tests to assess individuals with clinical relevance is regulated by professional bodies such as the American Psychological Association. However, these strict ethical rules do not apply to test that are administered for other purposes. Anybody can host a website and give people scores on some test.

Millions of people have taken tests like astrological birth chart generators or the “What kind of pizza are you? test (Pizza Test). However, as academics, Brian Nosek and Project Implicit are required to have ethical approval for the administration of IATs, especially because they are using the data for research purposes. Currently, the IRB of the University of Virginia is responsible for the ethical oversight of Project Implicit’s activities.

The IRB protocol obtained from Brian Nosek — the only document he could find, dated 2006 — confirms that the ethical oversight of Project Implicit has not kept pace with the scientific evidence.

The 2006 protocol acknowledges that participants may be “surprised” and “concerned” by their results, and promises debriefing that contextualizes scores as having “no direct implications for individual scores.” But it makes no mention of the limited reliability of IAT scores, the color-valence confound, the absence of construct validity evidence, or the specific risks to African American participants of being told they harbor hidden pro-White bias.

A protocol written in 2006, before the major validity critiques were published, and apparently never formally updated, cannot provide adequate ethical oversight for a research enterprise that has since accumulated overwhelming evidence of the instrument’s limitations. The fact that Nosek’s response to a direct request for the current IRB protocol was to send a 20-year-old document is itself an answer.

UVA seems to treat this project like any other research project, but Project Implicit research is different because it gives people feedback about potential hidden biases. The key claim is that they measure processes that are not directly accessible to introspection. This is also used to explain why people may receive feedback that is inconsistent with their self-perceptions — the supposed reason being that the test revealed something true about them that is not accessible to conscious awareness, much like a psychoanalyst claiming to recover a forgotten or repressed memory. These claims are controversial because they are difficult to verify, and the epistemic structure is problematic: participants cannot dispute the feedback on the basis of their own experience because the whole point is that the bias is hidden from them. The danger is that discrepancies between IAT scores and self-perceptions are more likely to reflect measurement error in the IAT than truly hidden biases — a conclusion supported by published psychometric research (Schimmack, 2021). As a result, a substantial proportion of participants will receive false feedback about racial attitudes they do not hold and people are not given proper debriefing that the most likely reason for surprising results is measurement error.

Implicit Biases of Project Implicit

Given the seriousness of providing people with feedback about hidden biases on topics like prejudice, depression, and suicide, one might expect that Project Implicit has carefully evaluated the psychometric properties of IATs — that is, assessed the accuracy of IAT scores. However, this is not the case. None of the three founding members has training in psychometrics or demonstrated understanding of modern test theory, as evidenced by their failure to apply basic psychometric concepts such as discriminant validity, convergent validity with other implicit measures, or the fundamental constraint that validity cannot exceed reliability (Schimmack, 2021).

Most of the discussion of measurement error in the IAT literature has focused on random measurement error and situational influences on IAT scores. This limited focus ignores that IAT scores can also be influenced by systematic measurement error. Random error averages out across repeated administrations; systematic error does not. If IAT scores are systematically influenced by factors such as cognitive ability or task-switching rather than hidden bias, repeated testing will not produce valid feedback about hidden biases. Neglect of systematic measurement error is common in psychology, but the ethical stakes are considerably higher when such error invalidates personal feedback about sensitive topics like racial prejudice, depression, or suicidal ideation.

The finding that the average white, Asian, or non-white Hispanic American finds it easier to associate white with good and black with bad rather than the other way around does not mean that they are prejudiced against Black people. It also does not show that they are unbiased. In fact, self-reports show that a substantial number of people are aware of and willing to admit their prejudices.

Brian Nosek, the director of Project Implicit, has ignored scientific criticism of the interpretation of IAT scores made by numerous researchers using independent lines of argument. One is that IAT scores show low convergent validity with other implicit measures — meaning that a person classified as biased on the IAT may not be classified as biased on other implicit measures of the same construct. Yet visitors to the Project Implicit website are offered only the IAT, with no acknowledgment that other implicit measures exist or that they frequently disagree with IAT scores. While the name Project Implicit implies a focus on implicit constructs, the site is really just promoting the Implicit Association Test, even though it lacks validity to measure implicit biases.

Is the race IAT itself racist?

The scoring of the race IAT rests on a simple assumption. If reaction times in favor of white-good, black-bad are faster than black-good and white-bad, a person shows an implicit bias favoring whites. This scoring assumes that a value of zero corresponds to a psychological attitude that is neutral and unbiased. While this assumption is intuitively appealing, it requires scientific evidence. An alternative possibility is that scores on the race IAT are also influenced by factors that have nothing to do with prejudice.

One way to validate the assumption is to see how scores on the IAT are related to actual behaviors. If zero reflects neutrality, people with scores above zero should show prejudice in their behaviors and people with scores below zero should show the opposite pattern, a preference for Black people. However, no compelling evidence has been provided that reaction time differences map directly on amount of bias in behavior.

A critical analysis of the literature failed to provide evidence for the scoring of the race IAT that is used to provide people with feedback about their hidden biases (Blanton, Jacard, Strauts, Mitchell, & Tetlock, 2015) [ironically, Mitchell is also affiliated with UVA that oversees the ethics of Project Implicit]. There has been no response to this criticism and no research to demonstrate that the scoring of the race IAT is valid by Project Implicit since then. There is also no response by Brian Nosek or other founders of Project Implicit to more recent criticisms (Schimmack, 2021).

Moreover, there has been research that has examined why the IAT may have a bias towards white-good/black-bad associations; that is, the test itself is biased. The first problem is that American culture is filled with racial stereotypes that associated Black people with negative attributes. Mere awareness of these stereotypes may influence IAT scores, even if people hold favorable attitudes towards specific Black people or even African Americans as a group (Olson & Fazio, 2004). Even African Americans are aware of these stereotypes and their responses may be influenced by these associations. In support of this argument, responses are more neutral on other tasks that rely on specific stimuli (faces of European and African Americans) rather than abstract associations.

More challenging for the race IAT is the finding that simple color associations explain a substantial portion of the variance in scores on the race IAT (Smith-McLallen, Johnson, Dovidio, & Pearson, 2006). This means the race IAT is not a pure measure of racial biases because it is contaminated by general associations related to the colors white and black. Although this problem was reported 20 years ago, it has been largely ignored by the research community and by Project Implicit. The implication is that African Americans who like white cars and white clothing may receive feedback that they have a hidden bias against African Americans.

Durgin, Diop, Lewis-Owona, and Eaton (in press) replicated and substantially extended Smith-McLallen et al.’s findings across six experiments. They showed that the correlation between color IAT scores and race IAT scores is of similar magnitude to the test-retest correlation of the race IAT itself, suggesting that the two instruments are measuring largely the same underlying construct. Critically, the shared variance between the color and race IATs was not explained by explicit racial bias but by metaphoric alignments of black and white — the deep cultural association of darkness with evil present across racial groups. Even Black participants showed similar metaphoric color alignments to White participants, and a blue-gray color IAT showed no correlation with the race IAT, confirming the effect is specific to black-white alignments rather than a general method artifact.

These results undermine the validity of race IAT scores, especially for African Americans. This matters because the validity of test scores must be assessed within populations, not just in aggregate. However, IAT validation studies have relied exclusively on White or mixed samples, meaning the test has never been properly validated for African Americans. Durgin et al.’s findings suggest that race IAT scores are even less valid for African Americans than for European Americans, as the metaphoric color bias and in-group effects pull in opposing directions, making individual scores particularly difficult to interpret.

Good Intentions and Bad Behavior

Racists often accuse social psychologists of a left-leaning, liberal bias. However, racial equality is enshrined in the 13th, 14th, and 15th Amendments to the Constitution of the United States, passed after the Northern States won the Civil War against the Confederate States that sought to maintain slavery. Working towards Martin Luther King’s dream of actual racial equality is therefore aligned with the moral and political ideals of the United States.

Project Implicit was founded on the idea that many Americans embrace Martin Luther King’s dream but often act in violation of egalitarian principles — sometimes due to limitations in their ability to control their behavior, and sometimes because they are not even aware that their actions are influenced by race. The founding vision of Project Implicit was that a five-minute reaction time task could help people become aware of their biases, and that this awareness would be a first step toward changing their behavior.

The problem is that early on, research findings suggested that the race IAT could not deliver on this promise. However, well-known motivated biases made it impossible for Nosek, Banaji, and Greenwald to acknowledge their own biases and temper their enthusiasm about IATs as “windows into people’s unconscious” (Banaji & Greenwald, 2013). Instead, they continued to promote the test, generated substantial revenues for Project Implicit, and aggressively promoted the concept of implicit biases to a broad public audience and ignored valid criticism of IATs as measures of implicit biases.

At this point, the dream of Martin Luther King and the dream of Nosek, Banaji, and Greenwald diverged. Project Implicit promoted a research program and a task that did not increase awareness of bias and did not reduce racism. In fact, the recent surge in open, old-fashioned racism may partly reflect a backlash against DEI programs and implicit bias training. Some people did not resent feedback that they were racist — they resented the implication that racism is bad and that they need to change. These people are now fighting back against DEI programs because they wish to maintain the racial hierarchy established during slavery and perpetuated through the Jim Crow laws of former Confederate states.

Project Implicit was built on a false understanding of racism in the United States, an invalid measure of racial bias, and a failure to connect laboratory findings to actual discriminatory behavior. These problems might have been recognized sooner had Project Implicit — which derived most of its revenues from the use of the race IAT in DEI training — consulted with African American communities or scholars. There is little public evidence that their work on racial issues involved meaningful engagement with the actual targets of racial discrimination.

Giving False Feedback to African Americans

It seems that Brian Nosek trusted the validity of the race IAT even when self-reports of African Americans suggested otherwise (Jost, Banaji, & Nosek, 2004). Millions of people have taken the race IAT on the Project Implicit website and also reported their consciously accessible preferences. Many of them were African Americans and research articles show their results at the aggregate level.

A robust finding based on hundreds of thousands of scores shows a striking dissociation in African Americans’ racial attitudes: on explicit self-report measures, African Americans show strong ingroup favoritism — clearly preferring their own group — yet on the race IAT they score close to zero, showing neither consistent preference for Black nor for White (Nosek et al., 2007; Jost et al., 2004).

This dissociation has two possible interpretations. Either African Americans hold two genuinely different attitudes — one conscious and pro-Black, one unconscious and neutral or pro-White — or they hold one attitude, the explicit measure captures it accurately, and the IAT is biased for this group in ways that suppress the ingroup preference that is clearly present in self-reports. The second interpretation is strongly supported by the documented color-valence confound in the race IAT, the near-zero mean being equally consistent with cultural contamination of the measure, and the fundamental psychometric principle that validity cannot exceed reliability.

Nevertheless, Nosek, Banaji, and Greenwald — three non-African American scholars with no documented engagement with African American communities or scholars — chose the most psychologically and politically loaded interpretation available: that many African Americans harbor a hidden pro-White bias rooted in system justification, a motivated tendency to endorse the existing social order even when that order places them at the bottom of the racial hierarchy.

This is a remarkable claim. Translated out of theoretical language, it asserts that the race IAT reveals that many African Americans are unconsciously motivated to maintain a social system that affords them fewer rights, lower status, and less economic opportunity than White Americans. The claim is made on the basis of a psychometrically compromised instrument, without consulting African American communities or scholars, and in direct contradiction of the most obvious behavioral evidence available. African Americans vote overwhelmingly Democratic — approximately 80% overall and 90% among women — consistently supporting the party associated with anti-racism policies and government intervention to address racial inequality. This is not the behavior of a group that unconsciously endorses the racial status quo. More broadly, African Americans have actively resisted racial hierarchy throughout their entire history in the United States, from the abolitionist movement and Reconstruction to the civil rights movement and beyond. System justification theory, as applied to African Americans through the race IAT, mistakes the cognitive fingerprints of living under racism for psychological endorsement of it.

Although this claim was made in the most highly cited article in the journal Political Psychology (1,277 citations in Web of Science), it has received little critical attention outside the academic literature. Black activists and scholars working on racism have largely ignored this work rather than directly challenging it — not because they accept it, but because Project Implicit’s research program is so disconnected from the empirical traditions and practical concerns that dominate Black psychology and anti-racism activism. This neglect further underscores that Project Implicit operates largely in isolation from broader anti-racism efforts in the United States. African American scholars from W.E.B. Du Bois onward have had good reasons to be skeptical of psychological instruments developed by White researchers to make claims about the inner lives of Black Americans — the history of IQ testing used to pathologize Black communities is instructive. Project Implicit repeated this pattern without appearing to recognize it. The fundamental problem is that the focus of Project Implicit is the measure, not the construct of racial bias. An organization genuinely committed to understanding and reducing racism would follow the evidence wherever it leads, including away from its flagship instrument. Project Implicit has done the opposite.

It is particularly troubling that this interpretation of African Americans’ scores was made by prominent members of Project Implicit, including Nosek himself. If the system justification interpretation is wrong — and the psychometric evidence strongly suggests it is — then African Americans who receive pro-White feedback on the race IAT are being told something false and potentially harmful about their own psychology. The ethical stakes are highest precisely for this group, yet the 2006 IRB protocol makes no mention of the specific risks to African American participants, provides no tailored debriefing to address the system justification interpretation, and offers no guidance on how to contextualize a pro-White result for a Black participant who strongly identifies with their own group. This is not a minor oversight. It is the most serious ethical failure in Project Implicit’s research program.

Conclusion: So, What is Project Implicit?

In my opinion, Project Implicit is a research project built around an experimental paradigm. Participants are asked to perform two complementary reaction time tasks, and the outcome is the difference in response times between them. This task is called the Implicit Association Test. Like many experimental paradigms, the IAT gives social psychologists something to do and write articles about. This academic research is inexpensive and not directly connected to real-world problems. It is basic research by academics in the ivory tower, for researchers in other ivory towers.

However, Project Implicit took this experimental paradigm and presented it to the public as a valid measure of hidden biases and unconscious processes, and as a tool capable of assessing those processes at the level of individual people. It provided individuals with feedback about their scores on a publicly accessible website, used the research to support seminars and public speaking engagements about implicit bias, and claimed that this work could address real social problems. This marketing was extremely effective, in part due to Banaji’s affiliation with Harvard, and Project Implicit generated substantial revenues over two decades while ignoring mounting evidence that the IAT is not a valid instrument for studying racism or reducing it.

Largely unrelated to this scientific evidence, the resurgence of open racism in American politics is draining Project Implicit of revenue, and the organization appears to be running out of money. This would be a serious loss if Project Implicit had made genuine progress in the fight against racism. But it did not. Instead, it deflected attention from real problems and drained resources — financial, institutional, and intellectual — from more effective anti-racism efforts. The projected demise of Project Implicit is therefore a blessing in disguise.

Unfortunately, the real problem of racism remains. Many Americans are unwilling to abandon their racial prejudices and to treat all people as equal under the law. Martin Luther King’s dream remains elusive — not because we lacked a reaction time task to measure hidden bias, but because we lacked the collective will to confront the bias that was never hidden at all.

References

Axt, J. R., Connor, P., Hoogeveen, S., Clark, C. J., Vianello, M., Lahey, J. N., Hahn, A., To, J., Petty, R. E., Costello, T. H., Mitchell, G., Tetlock, P. E., & Uhlmann, E. L. (in press). On the relationship between indirect measures of Black vs. White racial attitudes and discriminatory outcomes: An adversarial collaboration using a sample of White Americans. Journal of Personality and Social Psychology.

Banaji, M. R., & Greenwald, A. G. (2013). Blindspot: Hidden biases of good people. New York: Delacorte Press.

Blanton, H., & Jaccard, J. (2006). Arbitrary metrics in psychology. American Psychologist, 61(1), 27–41.

Blanton, H., Jaccard, J., Strauts, E., Mitchell, G., & Tetlock, P. E. (2015). Toward a meaningful metric of implicit prejudice. Journal of Applied Psychology, 100(5), 1468–1481.

Durgin, F. H., Diop, S. M., Lewis-Owona, J., & Eaton, O. (in press). A downside of conceptual metaphor: Metaphoric alignments of black and white. Manuscript submitted for publication.

Greenwald, A. G., McGhee, D. E., & Schwartz, J. L. K. (1998). Measuring individual differences in implicit cognition: The Implicit Association Test. Journal of Personality and Social Psychology, 74(6), 1464–1480.

Greenwald, A. G., Nosek, B. A., & Banaji, M. R. (2003). Understanding and using the Implicit Association Test: I. An improved scoring algorithm. Journal of Personality and Social Psychology, 85(2), 197–216.

Hahn, A., & Gawronski, B. (2019). Facing one’s implicit biases: From awareness to acknowledgment. Journal of Personality and Social Psychology, 116(5), 769–794.

Jost, J. T., Banaji, M. R., & Nosek, B. A. (2004). A decade of system justification theory: Accumulated evidence of conscious and unconscious bolstering of the status quo. Political Psychology, 25(6), 881–919.

Karpinski, A., & Hilton, J. L. (2001). Attitudes and the Implicit Association Test. Journal of Personality and Social Psychology, 81(5), 774–788.

Kurdi, B., Seitchik, A. E., Axt, J. R., Carroll, T. J., Karapetyan, A., Kaushik, N., … & Greenwald, A. G. (2019). Relationship between the Implicit Association Test and intergroup behavior: A meta-analysis. American Psychologist, 74(5), 569–586.

McFarland, S. G., & Crouch, Z. (2002). A cognitive skill confound on the Implicit Association Test. Social Cognition, 20(6), 483–510.

Meier, B. P., Robinson, M. D., & Clore, G. L. (2004). Why good guys wear white: Automatic inferences about stimulus valence based on brightness. Psychological Science, 15(2), 82–87.

Meier, B. P., Fetterman, A. K., & Robinson, M. D. (2015). The brightness of your smile: The solar hypothesis of the affect-brightness link. In Handbook of embodied cognition and sport psychology. MIT Press.

Nosek, B. A., Banaji, M. R., & Greenwald, A. G. (2002). Harvesting implicit group attitudes and beliefs from a demonstration website. Group Dynamics: Theory, Research, and Practice, 6(1), 101–115.

Nosek, B. A., Smyth, F. L., Hansen, J. J., Devos, T., Lindner, N. M., Ranganath, K. A., … & Banaji, M. R. (2007). Pervasiveness and correlates of implicit attitudes and stereotypes. European Review of Social Psychology, 18(1), 36–88.

Olson, M. A., & Fazio, R. H. (2004). Reducing the influence of extrapersonal associations on the Implicit Association Test: Personalizing the IAT. Journal of Personality and Social Psychology, 86(5), 653–667.

Oswald, F. L., Mitchell, G., Blanton, H., Jaccard, J., & Tetlock, P. E. (2013). Predicting ethnic and racial discrimination: A meta-analysis of IAT criterion studies. Journal of Personality and Social Psychology, 105(2), 171–192.

Oswald, F. L., Mitchell, G., Blanton, H., Jaccard, J., & Tetlock, P. E. (2015). Using the IAT to predict ethnic and racial discrimination: Small effect sizes of unknown societal significance. Journal of Personality and Social Psychology, 108(4), 562–571.

Pew Research Center. (2015, August 19). Exploring racial bias among biracial and single-race adults: The IAT. https://www.pewresearch.org/social-trends/2015/08/19/exploring-racial-bias-among-biracial-and-single-race-adults-the-iat/

ProPublica Nonprofit Explorer. Project Implicit Inc (EIN: 20-3939536). https://projects.propublica.org/nonprofits/organizations/203939536

Schimmack, U. (2021). The Implicit Association Test: A method in search of a construct. Perspectives on Psychological Science, 16(2), 396–414.

Smith-McLallen, A., Johnson, B. T., Dovidio, J. F., & Pearson, A. R. (2006). Black and White: The role of color bias in implicit race bias. Social Cognition, 24(1), 46–73.

Worden, R. E., Najdowski, C. J., McLean, S. J., Worden, K. M., Corsaro, N., Cochran, H., & Engel, R. S. (2024). Implicit bias training for police: Evaluating impacts on enforcement disparities. Law and Human Behavior, 48(5–6), 338–355.

Did Social Psychologist Really Develop a Love Test?

Until 2011, social psychologists were able to believe that they were actually doing science. They conducted studies, often rigorous experiments with random assignment, analyzed the data and reported results only when they achieved statistical significance, p < .05. This is how they were trained to do science and most of them believed that this is how science works.

However, in 2011 an article by a well-respected social psychologists changed all this. Daryl Bem published an article that showed time-reversed causal processes. Seemingly, people were able to feel the future (Bem, 2011). This article shock the foundations of social psychology because most social psychologists did not believe in paranormal phenomena. Yet, Bem presented evidence for his crazy claim in 8 out of 9 studies. The only study that did not work was with supraliminal stimuli. The other studies used subliminal stimuli, suggesting that only our unconscious self can feel the future.

Over the past decade it has become apparent that Bem and other social psychologists had misused significance testing. They only paid attention to significant results, p < .05, and ignored non-significant results, p > .05. Selective publishing of significant results means that statistical results no longer distinguished between true and false findings. Everything was significant, even time-reversed implicit priming.

Some areas of social psychology have been hit particularly hard by replication failures. Most prominently, implicit priming research has been called out as a poster child of doubt about social psychological results by Nobel Laureate Kahneman. The basic idea of implicit priming is that stimuli outside of participants’ awareness can influence their behavior. Many implicit priming studies have failed to replicate.

Ten years later, we can examine how social psychologists have responded to the growing evidence that many classic findings were obtained with questionable practices (not reporting the failures) and cannot be replicated. Unfortunately, the response is consistent with psychodynamic theories of ego-defense mechanisms and social psychologists’ own theories of motivated reasoning. For the most part, social psychologists have simply ignored the replication failures in the 2010s and continue to treat old articles as if they provide scientific insights into human behavior. For example, Bargh – a leading figure in the implicit priming world – wrote a whole book about implicit priming that does not mention replication failures and presents questionable research as if they were well-established facts (Schimmack, 2017).

Given the questionable status of implicit priming research, it is not surprising that concerns are also growing about measures that were designed to reflect individual differences in implicit cognitions (Schimmack, 2019). The measures often have low reliability (when you test yourself you get different results each time) and show low convergent validity (one measure of your unconscious feelings towards your spouse doesn’t correlate with another measure of your unconscious feelings towards your spouse). It is therefore suspicious, when researchers consistently find results with these measures because measurement error should make it difficult to get significant results all the time.

Implicit Love

In an article from 2019 (i.e., when the replication crisis in social psychology has been well-established), Hicks and McNulty make the following claims about implicit love; that is feelings that are not reflected in self-reports of affection or marital satisfaction.

Their title is based on a classic article by Bargh and Chartrand.

Readers are not informed that the big claims made by Bargh twenty years ago have failed to be supported by empirical evidence. Especially the claim that stimuli often influence behavior without awareness lacks any credible evidence. It is therefore sad to say that social psychologists have moved on from self-deception (they thought they were doing science, but they did not) to other-deception (spreading false information knowing that credible doubts have been raised about this research). Just like it is time to reclaim humility and honesty in American political life, it is important to demand humility and honesty from American social psychologists, who are dominating social psychology.

The empirical question is whether research on implicit love has produced robust and credible results. One advantage for relationship researchers is that a lot of this research was published after Bem (2011). Thus, researchers could have improved their research practices. This could result in two outcomes. Either relationship researchers reported their results more honestly and did report non-significant results when they emerged, or they increased sample sizes to ensure that small effect sizes could produce statistically significant results.

Hicks and McNulty’s (2019) narrative review makes the following claims about implicit love.

1. The frequency of various sexual behaviors was prospectively associated with automatic partner evaluations assessed with an implicit measure but not with self-reported relationship satisfaction. (Hicks, McNulty, Meltzer, & Olson, 2016).

2. Participants with less responsive partners who felt less connected to their partners during conflict-of-interest situations had more negative automatic partner attitudes at a subsequent assessment but not more negative subjective evaluations (Murray, Holmes, & Pinkus, 2010).

3. Pairing the partner with positive affect from other sources (i.e., positive words and pleasant images) can increase the positivity of automatic partner attitudes relative to a control group.

4. The frequency of orgasm during sex was associated with automatic partner attitudes, whereas sexual frequency was associated only with deliberate reports of relationship satisfaction for participants who believed frequent sex was important for relationship health.

5. More positive automatic partner attitudes have been linked to perceiving fewer problems over time (McNulty, Olson, Meltzer, & Shaffer, 2013).

6. More positive automatic partner attitudes have been linked to self-reporting
fewer destructive behaviours (Murray et al., 2015).

7. More positive automatic partner attitudes have been linked to more cooperative relationship behaviors (LeBel & Campbell, 2013)

8. More positive automatic partner attitudes have been linked to displaying attitude-consistent nonverbal communication in conflict discussions (Faure et al., 2018).

9. More positive automatic partner attitudes were associated with a decreased likelihood of dissolution the following year, even after controlling for explicit relationship satisfaction (Lee, Rogge, & Reis, 2010).

10. Newlyweds’ implicit partner evaluations but not explicit satisfaction within the first few months of marriage were more predictive of their satisfaction 4 years later.

11. People with higher motivation to see their relationship in a positive light because of barriers to exiting their relationships (i.e., high levels of relationship investments and poor alternatives) demonstrated a weaker correspondence between their automatic attitudes and their relationship self-reports.

12. People with more negative automatic evaluations are less trusting of their partners when their working memory capacity is limited (Murray et al., 2011).

These claims are followed with the assurance that “these studies provide compelling evidence that automatic partner attitudes do have implications for relationship outcomes” (p. 256).

Should anybody who reads this article or similar claims in the popular media believe them? Have social psychologists improved their methods to produce more credible results over the past decade?

Fortunately, we can answer this question by examining the statistical evidence that was used to support these claims, using the z-curve method. First, all test statistics are converted into z-scores that represent the strength of evidence against the null-hypothesis (i.e., implicit love has no effect or does not exist) in each study. These z-scores are a function of the effect size and the amount of sampling error in a study (signal/noise ratio). Second, the z-scores are plotted as a histogram to show how many of the reported results provide weak or strong evidence against the null-hypothesis. The data are here for full transparency (Implicit.Love.xlsx).

The figure shows the z-curve for the 30 studies that reported usable test results. Most published z-scores are clustered just above the threshold value of 1.96 that corresponds to the .05 criterion to claim a discovery. This clustering is indicative of the use of selecting significant results from a much larger set of analyses that were run and produced non-significant results. The grey curve from z = 0 to 1.96 shows the predicted number of analyses that were not reported. The file drawer ratio implies that for every significant result there were 12 analyses with non-significant results.

Another way to look at the results is to compare the observed discovery rate with the expected discovery rate. The observed discovery rate is simply the percentage of studies that reported a significant result, which is 29 out of 30 or 97%. The estimated discovery rate is the average power of studies to produce a significant result. It is only 8%. This shows that social psychologists still continue to select only successes and do not report or interpret the failures. Moreover, in this small sample of studies, there is considerable uncertainty around the point estimates. The 95%confidence interval for the replication success probability includes 5%, which is not higher than chance. The complementary finding is that the maximum number of false positives is estimated to be 63%, but could be as high as 100%. In other words, the results make it impossible to conclude that even some of these studies produced a credible result.

In short, the entire research on implicit love is bullshit. Ten years ago, social psychologists had the excuse that they did not know better and misused statistics because they were trained the wrong way. This excuse is wearing thin in 2020. They know better, but they continue to report misleading results and write unscientific articles. In psychology, this is called other-deception, in everyday life it is called lying. Don’t trust social psychologists. Doing so is as stupid as believing Donald Trump when he claims that he won the election.

Invalid Claims about the Validity of Implicit Association Tests

Citation:
Schimmack, U. (2021). Invalid Claims About the Validity of Implicit Association Tests by Prisoners of the Implicit Social-Cognition Paradigm. Perspectives on Psychological Science, 16(2), 435–442. https://doi.org/10.1177/1745691621991860

This post has been revised on March 12, 2021 to make it consistent with the published version (https://doi.org/10.1177/1745691621991860) of my response to commentaries by Vianello and Bar-Anan and Kurdi, Ratliff, and Cunningham in response to my target article about the lack of construct validity of IATs (Schimmack, 2021).

Invalid Claims about the Validity of Implicit Association Tests by Prisoners of the Implicit Social-Cognition Paradigm

Abstract
In a prior publication, I used structural equation modeling of multimethod data to examine the construct validity of Implicit Association Tests. The results showed no evidence that IATs measure implicit constructs (e.g., implicit self-esteem, implicit racial bias). This critique of IATs elicited several responses by implicit social-cognition researchers, who tried to defend the validity and usefulness of IATs. I carefully examine these arguments and show that they lack validity. IAT proponents consistently ignore or misrepresent facts that challenge the validity of IATs as measures of individual differences in implicit cognitions. One response suggests that IATs can be useful even if they merely measure the same constructs as self-report measures, but I find no support for the claim that IATs have practically significant incremental predictive validity. In conclusions, IATs are widely used without psychometric evidence of construct or predictive validity.

Keywords
implicit attitudes, Implicit Association Test, validity, prejudice, suicide, mental health

Greenwald and colleagues (1998) introduced Implicit Association Tests (IATs) as a new method to measure individual differences in implicit cognitions. Twenty years later, IATs are widely used for this purpose, but their construct validity has not been established. Even its creator is no longer sure what IATs measure. Whereas Banaji and Greenwald (2013) confidently described
IATs as “a method that gives the clearest window now available into a region of the mind that is inaccessible to question-asking methods” (p. xiii), they now claim that IATs merely measure “the strengths of associations among concepts” (Cvencek et al., 2020, p. 187). This is akin to saying that an old-fashioned thermometer measures the expansion of mercury: It is true, but it has little to do with thermometers’ purpose of measuring temperature.

Fortunately, we do not need Greenwald or Banaji to define the constructs that IATs are supposed to measure. Twenty years of research with IATs makes it clear what researchers believe they are measuring with IATs. A self-esteem IAT is supposed to measure implicit self-esteem (Greenwald & Farnham, 2000). A race IAT is supposed to measure implicit prejudice (Cunningham et al., 2001), and a suicide IAT is supposed to measure implicit suicidal tendencies that can predict suicidal behaviors above and beyond self-reports (Kurdi et al.,
2021). The empirical question is whether IATs are any good at measuring these constructs. I concluded that most IATs are poor measures of their intended constructs (Schimmack, 2021). This conclusion elicited one implicit and two explicit responses.

Implicit Response

The implicit response is to simply ignore criticism and to make invalid claims about the construct validity of IATs (Greenwald & Lai, 2020). For example, a 2020 article coauthored by Nosek, Greenwald, and Banaji (among others) claimed that “available evidence for validity of
IAT measures of self-esteem is limited (Bosson et al., 2000; Greenwald & Farnham, 2000), with some of the strongest evidence coming from empirical tests of the balance-congruity principle” (Cvencek et al., 2020, p. 191). This statement is as valid as Donald Trump’s claim that an honest count of votes would make him the winner of the 2020 election. Over the past 2 decades, several articles have concluded that self-esteem IATs lack validity (Buhrmester et al., 2011; Falk et al., 2015; Walker & Schimmack, 2008). It is unscientific to omit these references from a literature review.

The balance-congruity principle is also not a strong test of the claim that the self-esteem IAT is a valid measure of individual differences in implicit self-esteem. In contrast, the lack of convergent validity with informant ratings and even other implicit measures of
self-esteem provides strong evidence that self-esteem IATs are invalid (Bosson et al., 2000; Falk et al., 2015). Finally, supporting evidence is surprisingly weak. For example, Greenwald and Farnham’s (2000) highly cited article tested predictive validity of the self-esteem IAT with responses to experimentally manipulated successes and failures (n = 94). They did not even report statistical results. Instead, they suggested that even nonsignificant results should be counted as evidence for the validity of the self-esteem IAT:

Although p values for these two effects straddled the p = .05 level that is often treated as a boundary between noteworthy and ignorable results, any inclination to dismiss these findings should be tempered by noting that these two effects agreed with prediction in both direction and shape. (Greenwald & Farnham, 2000, p. 1032)

Twenty years later, this finding has not been replicated, and psychologists have learned to distrust p values that are marginally significant (Benjamin et al., 2018; Schimmack, 2012, 2020). In conclusion, conflict of interest and motivated biases undermine the objectivity of Greenwald and colleagues in evaluations of IATs’ validity.

Explicit Response 1

Vianello and Bar-Anan (2021) criticized my structural equation models of their data. They also presented a new model that appeared to show incremental predictive validity for implicit racial bias and implicit political orientation. I thought it would be possible to resolve some of the disagreement in a direct and open communication with the authors because the disagreement
is about modeling of the same data. I was surprised when the authors declined this offer, given that Bar- Anan coauthored an article that praised the virtues of open scientific communication (Nosek & Bar-Anan, 2012). Readers therefore have to reconcile conflicting viewpoints for themselves. To ensure full transparency, I published syntax, outputs, and a detailed discussion
of the different modeling assumptions on OSF at https://osf.io/wsqfb/.

In brief, a comparison of the models shows that mine is more parsimonious and has better fit than their model. Because the model is more parsimonious, better fit cannot be attributed to overfitting of the data. Rather, the model is more consistent with the actual data, which in most sciences is considered a good reason to favor a model. Vianello and Bar-Anan’s model also produced unexplained, surprising results. For example, the race IAT has only a weak positive loading on the IAT method factor, and the political-orientation IAT even has a moderate negative loading. It is not clear how a method can have negative loadings on a method factor,
and Vianello and Bar-Anan provided no explanation for this surprising finding.

The two models also produce different results regarding incremental predictive validity (Table 1). My model shows no incremental predictive validity for implicit factors. It is also surprising that Vianello and Bar-Anan found incremental predictive validity for voting behaviors,
because the explicit and implicit factors correlated (r) at .9. This high correlation leaves little room for variance in implicit political orientation that is distinct from political orientation measured with self-ratings.

In conclusion, Vianello and Bar-Anan failed to challenge my conclusion that implicit and explicit measures measure mostly the same constructs and that low correlations between explicit and implicit measures reflect measurement error rather than some hidden implicit processes.

Explicit Response 2

The second response (Kurdi et al., 2021) is a confusing 7,000-word article that is short of facts, filled with false claims, and requires more fact-checking than a Trump interview.

False fact 1

The authors begin with the surprising statement that my findings are “not at all incompatible with the way that many social cognition researchers have thought about the construct of (implicit) evaluation” (p. 423). This statement is misleading. For 3 decades, social-cognition
researchers have pursued the idea that many social-cognitive processes that guide behavior occur outside of awareness. For example, Nosek et al. (2011) claim “most human cognition occurs outside conscious awareness or conscious control” (p. 152) and go on to claim that IATs “measure something different from self-report” (p. 153). And just last year, Greenwald and Lai
(2020) claimed that “in the last 20 years, research on implicit social cognition has established that social judgments and behavior are guided by attitudes and stereotypes of which the actor may lack awareness” (p. 419).

Social psychologists have also been successful in making the term implicit bias a common term in public discussions of social behavior. The second author, Kathy Ratliff, is director of Project Implicit, which “has a mission to develop and deliver methods for investigating and applying phenomena of implicit social cognition, including especially phenomena of implicit bias based on age, race, gender or other factors” (Kurdi et al., 2021, p. 431). It is not clear what this statement means if we do not make a distinction between traditional research on prejudice with self-report measures and the agenda of Project Implicit to study implicit biases with IATs.
In addition, all three authors have published recent articles that allude to IATs as measures of implicit cognitions.

In a highly cited American Psychologist article, Kurdi and coauthors (2019) claim “in addition to dozens of studies that have established construct validity . . . investigators have asked to what extent, and under what conditions, individual differences in implicit attitudes, stereotypes, and identity are associated with variation in behavior toward individuals as a function of their social group membership” (p. 570). The second author coauthored an article with the claim that “Black participants’ implicit attitudes reflected no ingroup/ outgroup preference . . . Black participants’ explicit attitudes reflected an ingroup preference” ( Jiang et al.,
2019). In 2007, Cunningham wrote that the “distinction between automatic and controlled processes now lies at the heart of several of the most influential models of evaluative processing” (Cunningham & Zelazo, 2007, p. 97). And Cunningham coauthored a review article with the claim that “a variety of tasks have been used to reflect implicit psychopathology associations, with the IAT (Greenwald et al., 1998) used most widely” (Teachman
et al., 2019). Finally, many users of IATs assume that they are measuring implicit constructs that are distinct from constructs that are measured with self-ratings. It is therefore a problem for the construct validity of IATs if they lack discriminant validity. At the least, Kurdi et al. fail to explain why anybody should use IATs if they merely measure the same constructs that can be
measured with cheaper self-ratings. In short, the question whether IATs and explicit measures reflect the same constructs or different constructs has theoretical and empirical relevance, and lack of discriminant validity is a problem for many theories of implicit cognitions (but see Cunningham & Zelazo, 2007).

False fact 2

A more serious false claim is that I found “high correlations between relatively indirect (automatic) measures of mental content, as indexed by the IAT, and relatively direct (controlled) measures of mental content, as indexed by a variety of self-report scales” (p. 423). Table 2 shows some of the correlations among implicit and explicit measures in Vianello and Bar-Anan’s data. Only one of these correlations meets the standard criterion of a high correlation (i.e., r = .5; Cohen, 1988). The other correlations are small to moderate. These correlations show at best moderate convergent validity and no evidence of discriminant validity (i.e., higher implicit-implicit than implicit-explicit correlations). Similar results have been reported since the first IATs were created (Bosson et al., 2000). For 20 years, IAT researchers have ignored these low correlations and made grand claims about the validity of IATs. Kurdi et al. are doubling
down on this misinformation by falsely describing these correlations as high.

False fact 3

The third false claim is that “plenty of evidence in favor of dissociations between direct and indirect measures exists” (p. 428). To support this claim, Kurdi et al. cite a meta-analysis of incremental predictive validity (Kurdi et al., 2019). There are several problems with this claim.
First, the meta-analysis corrects only for random measurement error and not systematic measurement error. To the extent that systematic measurement error is present, incremental validity will shrink because explicit and implicit factors are very highly correlated when both sources of error are controlled (Schimmack, 2021). Second, Kurdi et al. fail to mention effect sizes. The meta-analysis suggests that a perfectly reliable IAT would explain about 2% unique variance. However, IATs have only modest reliability. Thus, manifest IAT scores would explain even less unique variance. Finally, even this estimate has to be interpreted with caution because the meta-analysis did not correct for publication bias and included some questionable studies. For example, Phelps et al. (2003) report, among 12 participants, a correlation of .58 between scores on the race IAT and differences in amygdala activation in response to Black and White faces. Assuming 20% valid variance in the IAT scores (Schimmack, 2021), the validation- corrected correlation would be 1.30. In other words, a correlation of .58 is impossible given the low validity of race-IAT scores. It is well known that correlations in functional MRI studies with small samples are not credible (Vul et al., 2009). Moreover, brain activity is not a social behavior. It is therefore unclear why studies like this were included in Kurdi et al.’s (2019) meta-analysis.

Kurdi et al. also used suicides as an important outcome that can be predicted with suicide and death IATs. They cited two articles to support this claim. Fact checking shows that one article reported a statistically significant result (p = .013; Barnes et al., 2017), whereas the other one did not (p > .50; Glenn et al., 2019). I conducted a meta-analysis of all studies that reported incremental predictive validity of suicide or death IATs. The criterion was suicide attempts in the next 3 to 6 months (Table 3). I found eight studies, but six of them came from a single lab (Matthew K. Nock). Nock was also the first one to report a significant result in an extremely underpowered study that included only two suicide attempts (Nock & Banaji, 2007). Five of the eight studies showed a statistically significant result (63%), but the average observed power to achieve significance was only 42%. This discrepancy suggests the presence of publication bias (Schimmack, 2012). Moreover, significant results are all clustered around .05, and none
of the p values meets the stricter criterion of .005 that has been suggested by Nosek and others to claim a discovery (Benjamin et al., 2018). Thus, there is no conclusive evidence to suggest that suicide IATs have incremental predictive validity in the prediction of suicides. This is not surprising because most of the studies were underpowered and unlikely to detect small effects.
Moreover, effect sizes are bound to be small because the convergent validity between suicide and death IATs is low (r = .21; Chiurliza et al., 2018), suggesting that most of the variance in these IATs is measurement error.

In conclusion, 20 years of research with IATs has produced no credible and replicable evidence that IATs have incremental predictive validity over explicit measures. Even if there is some statistically significant incremental predictive validity, the amount of explained
variance may lack practical significance (Kurdi et al., 2019).

False fact 4

Kurdi et al. (2021) object (p. 424) to my claim that “most researchers regard the IAT as a valid measure of enduring attitudes that vary across individuals” (Schimmack, 2021, p. 397). They claim that “the overwhelming theoretical consensus in the community of attitude researchers.
. . is that attitudes emerge from an interaction of persons and situations” (p. 425). It is instructive to compare this surprising claim with Cunningham and Zelazo’s (2007) definition of attitudes as “relatively stable ideas about whether something is good or bad” (p. 97). Kurdi and Banaji (2017) wrote that “differences in implicit attitudes . . . may arise because of multiple components, including relatively stable components [emphasis added]” (p. 286). Rae and Greenwald (2017) stated that it is a “widespread assumption . . . that implicit attitudes are characteristics of people, almost certainly more so than a property of situations” (p. 297).
Greenwald and Lai (2020) stated that test–retest reliability “places an upper limit on correlational tests of construct validity” (p. 425). This statement makes sense only if we assume that the construct to be measured is stable over the retest interval. It is also not clear how it would be ethical to provide individuals with feedback about their IAT scores on the Project Implicit website, if IAT scores were merely a product of the specific situation at the moment they are taking the test. Finally, how can the suicide IAT be a useful predictor of suicide if it cannot not measure some stable dispositions related to suicidal behaviors?

In conclusion, Kurdi et al.’s definition of attitudes is inconsistent with the common definition of attitudes as relatively enduring evaluations. That being said, the more important question is
whether IATs measure stable attitudes or momentary situational effects. Ironically, some of the best evidence comes from Cunningham. Cunningham et al. (2001) repeatedly measured prejudice four times over a 3-month period with multiple measures, including the race IAT. Cunningham et al. (2001) modeled the data with a single trait factor that explained all of the covariation among different measures of racial attitudes. Thus, Cunningham et al. (2001) provided first evidence that most of the valid variance in race IAT scores is perfectly stable over a 3-month period and that person-by-situation interactions had no effect on racial attitudes. There have been few longitudinal studies with IATs since Cunningham et al.’s (2001) seminal study. However, last year, an article examined stability over a 6-year interval (Onyeador et al., 2020). Racial attitudes of more than 3,000 medical students were measured in the first year of medical school, the fourth year of medical school, and the second year of medical residency.
Table 4 shows the correlations for the explicit feeling thermometer and the IAT scores. The first observation is that the Time-1-to-Time-3 correlation for the IAT scores is not smaller than the Time-1-to-Time-2 or the Time-2-to-Time-3 correlations. This pattern shows that a single trait factor can capture the shared variance among the repeated IAT measures. The second observation is that the bold correlations between explicit ratings and IAT scores on the same occasion are only slightly higher than the correlations for different measurement
occasions. This finding shows that there is very little occasion-specific variance in racial attitudes. The third observation is that IAT correlations over time are higher than the corresponding FT-IAT correlations over time. This finding points to IAT-specific method variance that is revealed in studies with multiple implicit measures (Cunningham et al., 2001; Schimmack, 2021). These findings extend Cunningham et al.’s (2001) findings to
a 6-year period and show that most of the valid variance in race IAT scores is stable over long periods of time.

In conclusion, Kurdi et al.’s claims about person-by-situation effects are not supported by evidence.

Conclusion

Like presidential debates, the commentaries and my response present radically different views of reality. In one world, IATs are valid and useful tools that have led to countless new insights into human behavior. In the other world, IATs are noisy measures that add nothing to the information we already get from cheaper self-reports. Readers not well versed in the literature are likely to be confused rather than informed by these conflicting accounts. Although we may expect such vehement disagreement in politics, we should not expect it among scientists.
A common view of scientists is that they are able to resolve disagreement by carefully looking at data and drawing logical conclusions from empirical facts. However, this model of scientists is naive and wrong.

A major source of disagreement among psychologists is that psychology lacks an overarching paradigm; that is, a set of fundamentally shared assumptions and facts. Psychology does not have one paradigm, but many paradigms. The IAT was developed within the implicit social-cognition paradigm that gained influence in the 1990s (Bargh et al., 1996; Greenwald & Banaji, 1995; Nosek et al., 2011). Over the past decade, it has become apparent that the empirical foundations of this paradigm are shaky (Doyen et al., 2012; D. Kahneman quoted in Yong, 2012, Supplemental Material; Schimmack, 2020). It took a long time to see the problems because paradigms are like prisons that make it impossible to see the world from the outside. A key force that prevents researchers within a paradigm from noticing problems is publication bias. Publication bias ensures that studies that are consistent with a paradigm are published, cited, and highlighted in review articles to provide false evidence in support for a paradigm
(Greenwald & Lai, 2020; Kurdi et al., 2021).

Over the past decade, it has become apparent how pervasive these biases have been, especially in social psychology (Schimmack, 2020). The responses to my critique of IATs merely confirms how powerful paradigms and conflicts of interest can be. It is therefore necessary to allocate more resources to validation projects by independent researchers. In addition, validation studies should be preregistered and properly powered, and results need to be published whether they show validity or not. Conducting validation studies of widely used measures could be an important role for the emerging field of meta-psychology that is not focused on new discoveries, but rather on evaluating paradigmatic research from an outsider, meta-perspective (Carlsson et al., 2017). Viewed from this perspective, many IATs that are in use lack credible evidence of construct validity.

References
*References marked with an asterisk report studies included in
the suicide IAT meta-analysis

Banaji, M. R., & Greenwald, A. G. (2013). Blindspot: Hidden
biases of good people. Delacorte Press.

Bargh, J. A., Chen, M., & Burrows, L. (1996). Automaticity
of social behavior: Direct effects of trait construct and
stereotype activation on action. Journal of Personality
and Social Psychology, 71(2), 230–244. https://doi.org/
10.1037/0022-3514.71.2.230

*Barnes, S. M., Bahraini, N. H., Forster, J. E., Stearns-Yoder, K. A.,
Hostetter, T. A., Smith, G., Nagamoto, H. T., & Nock,
M. K. (2017). Moving beyond self-report: Implicit associations
about death/ life prospectively predict suicidal
behavior among veterans. Suicide and Life-Threatening
Behavior, 47, 67–77. https://doi.org/10.1111/sltb.12265

Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A.,
Wagenmakers, E.-J., Berk, R., Bollen, K. A., Brembs, B.,
Brown, L., Camerer, C., Cesarini, D., Chambers, C. D.,
Clyde, M., Cook, T. D., Boeck, P., De, Dienes, Z., Dreber,
A., Easwaran, K., Efferson, C., . . . Johnson, V. E. (2018).
Redefine statistical significance. Nature Human Behaviour,
2, 6–10.

Bosson, J. K., Swann, W. B. Jr., & Pennebaker, J. W. (2000).
Stalking the perfect measure of implicit self-esteem:
The blind men and the elephant revisited? Journal of
Personality and Social Psychology, 79, 631–643. https://
doi.org/10.1037/0022-3514.79.4.631

Buhrmester, M. D., Blanton, H., & Swann, W. B., Jr. (2011).
Implicit self-esteem: Nature, measurement, and a new way
forward. Journal of Personality and Social Psychology,
100(2), 365–385. https://doi.org/10.1037/a0021341

Carlsson, R., Danielsson, H., Heene, M., Ker, Å., Innes, Lakens,
D., Schimmack, U., Schönbrodt, F. D., van Assen, M., &
Weinstein, Y. Inaugural editorial of Meta-Psychology. Meta-
Psychology, 1. https://doi.org/10.15626/MP2017.1001

Chiurliza, B., Hagan, C. R., Rogers, M. L., Podlogar, M. C., Hom,
M. A., Stanley, I. H., & Joiner, T. E. (2018). Implicit measures
of suicide risk in a military sample. Assessment, 25(5),
667–676. https://doi.org/10.1177/1073191116676363
Cohen, J. (1988). Statistical power analysis for the behavioral
sciences (2nd ed.). Erlbaum.

Cunningham, W. A., Preacher, K. J., & Banaji, M. R. (2001).
Implicit attitude measures: Consistency, stability, and
No Evidence for Construct Validity of IAT 441
convergent validity. Psychological Science, 12(2), 163–170
https://doi.org/10.1111/1467-9280.00328

Cunningham, W. A., & Zelazo, P. D. (2007). Attitudes and
evaluations: A social cognitive neuroscience perspective.
Trends in Cognitive Sciences, 11, 97–104. https://
doi.org/10.1016/j.tics.2006.12.005

Cvencek, D., Meltzoff, A. N., Maddox, C. D., Nosek, B. A.,
Rudman, L. A., Devos, T., Dunham, Y., Baron, A. S.,
Steffens, M. C., Lane, K., Horcajo, J., Ashburn Nardo, L.,
Quinby, A., Srivastava, S. B., Schmidt, K., Aidman, E.,
Tang, E., Farnham, S., Mellott, D. S., . . . Greenwald, A. G.
(2020). Meta-analytic use of balanced identity theory to
validate the Implicit Association Test. Personality and
Social Psychology Bulletin, 47(2), 185–200. https://doi
.org/10.1177/0146167220916631

Doyen, S., Klein, O., Pichon, C. L., & Cleeremans, A. (2012).
Behavioral priming: It’s all in the mind, but whose mind?
PLOS ONE, 7(1), Article e29081. https://doi.org/10.1371/
journal.pone.0029081

Falk, C. F., Heine, S. J., Takemura, K., Zhang, C. X., & Hsu,
C. (2015). Are implicit self-esteem measures valid for
assessing individual and cultural differences. Journal of
Personality, 83, 56–68. https://doi.org/10.1111/jopy.12082

*Glenn, C. R., Millner, A. J., Esposito, E. C., Porter, A. C.,
& Nock, M. K. (2019). Implicit identification with death
predicts suicidal thoughts and behaviors in adolescents.
Journal of Clinical Child & Adolescent Psychology, 48,
263–272. https://doi.org/10.1080/15374416.2018.1528548

Greenwald, A. G., & Banaji, M. R. (1995). Implicit social cognition:
Attitudes, self-esteem, and stereotypes. Psychological
Review, 102(1), 4–27. https://doi.org/10.1037/0033-
295X.102.1.4

Greenwald, A. G., & Farnham, S. D. (2000). Using the Implicit
Association Test to measure self-esteem and self-concept.
Journal of Personality and Social Psychology, 79, 1022–1038
https://doi.org/10.1037/0022-3514.79.6.1022

Greenwald, A. G., & Lai, C. K. (2020). Implicit social cognition.
Annual Review of Psychology, 71, 419–445. https://
doi.org/10.1146/annurev-psych-010419-050837

Greenwald, A. G., McGhee, D. E., & Schwartz, J. L. K. (1998).
Measuring individual differences in implicit cognition:
The Implicit Association Test. Journal of Personality and
Social Psychology, 74, 1464–1480.

*Harrison, D. P., Stritzke, W. G. K., Fay, N., & Hudaib, A.-R.
(2018). Suicide risk assessment: Trust an implicit probe
or listen to the patient? Psychological Assessment, 30(10),
1317–1329. https://doi.org/10.1037/pas0000577

Jiang, C., Vitiello, C., Axt, J. R., Campbell, J. T., & Ratliff, K. A.
(2019). An examination of ingroup preferences among
people with multiple socially stigmatized identities. Self
and Identity. Advance online publication. https://doi.org/
10.1080/15298868.2019.1657937

Kurdi, B., & Banaji, M. R. (2017). Reports of the death of
the individual difference approach to implicit social cognition
may be greatly exaggerated: A commentary on Payne,
Vuletich, and Lundberg. Psychological Inquiry, 28,
281–287. https://doi.org/10.1080/1047840X.2017.1373555

Kurdi, B., Ratliff, K. A., & Cunningham, W. A. (2021). Can
the Implicit Association Test serve as a valid measure of
automatic cognition? A response to Schimmack (2021).
Perspectives on Psychological Science, 16(2), 422–434.
https://doi.org/10.1177/1745691620904080

Kurdi, B., Seitchik, A. E., Axt, J. R., Carroll, T. J., Karapetyan,
A., Kaushik, N., Tomezsko, D., Greenwald, A. G., &
Banaji, M. R. (2019). Relationship between the Implicit
Association Test and intergroup behavior: A meta-analysis.
American Psychologist, 74(5), 569–586. https://doi.org/
10.1037/amp0000364

*Millner, A. J., Augenstein, T. M., Visser, K. H., Gallagher, K.,
Vergara, G. A., D’Angelo, E. J., & Nock, M. K. (2019). Implicit
cognitions as a behavioral marker of suicide attempts in
adolescents. Archives of Suicide Research, 23(1), 47–63.
https://doi.org/10.1080/13811118.2017.1421488

*Nock, M. K., & Banaji, M. R. (2007). Prediction of suicide ideation
and attempts among adolescents using a brief performance-
based test. Journal of Consulting and Clinical
Psychology, 75(5), 707–715. https://doi.org/10.1037/0022-
006X.75.5.707

*Nock, M. K., Park, J. M., Finn, C. T., Deliberto, T. L.,
Dour, H. J., & Banaji, M. R. (2010). Measuring the suicidal
mind: Implicit cognition predicts suicidal behavior.
Psychological Science, 21(4), 511–517. https://doi
.org/10.1177/0956797610364762

Nosek, B. A., & Bar-Anan, Y. (2012). Scientific utopia: I. Opening
scientific communication. Psychological Inquiry, 23(3),
217–243. https://doi.org/10.1080/1047840X.2012.692215

Nosek, B. A., Hawkins, C. B., & Frazier, R. S. (2011). Implicit
social cognition: From measures to mechanisms. Trends
in Cognitive Sciences, 15(4), 152–159. https://doi.org/
10.1016/j.tics.2011.01.005

Onyeador, I. N., Wittlin, N. M., Burke, S. E., Dovidio, J. F.,
Perry, S. P., Hardeman, R. R., Dyrbye, L. N., Herrin, J.,
Phelan, S. M., & van Ryn, M. (2020). The value of interracial
contact for reducing anti-Black bias among non-Black
physicians: A Cognitive Habits and Growth Evaluation
(CHANGE) study report. Psychological Science, 31(1),
18–30. https://doi.org/10.1177/0956797619879139

Phelps, E. A., Cannistraci, C. J., & Cunningham, W. A. (2003).
Intact performance on an indirect measure of race bias
following amygdala damage. Neuropsychologia, 41(2),
203–208. https://doi.org/10.1016/s0028-3932(02)00150-1

Rae, J. R., & Greenwald, A. G. (2017). Persons or situations?
Individual differences explain variance in aggregated
implicit race attitudes. Psychological Inquiry, 28, 297–300.
https://doi.org/10.1080/1047840X.2017.1373548

*Randall, J. R., Rowe, B. H., Dong, K. A., Nock, M. K., &
Colman, I. (2013). Assessment of self-harm risk using
implicit thoughts. Psychological Assessment, 25(3), 714–721
https://doi.org/10.1037/a0032391

Schimmack, U. (2012). The ironic effect of significant results
on the credibility of multiple-study articles. Psychological
Methods, 17(4), 551–566. https://doi.org/10.1037/a0029487

Schimmack, U. (2020). A meta-psychological perspective on
the decade of replication failures in social psychology.
Canadian Psychology/Psychologie canadienne, 61(4),
364–376. http://doi.org/10.1037/cap0000246

Schimmack, U. (2021). The Implicit Association Test: A method
in search of a construct. Perspectives on Psychological Science, 16(2), 396–414. https://doi.org/10.1177/1745691619863798

Teachman, B. A., Clerkin, E. M., Cunningham, W. A., Dreyer-
Oren, S., & Werntz, A. (2019). Implicit cognition and
psychopathology: Looking back and looking forward.
Annual Review of Clinical Psychology, 15, 123–148.
https://doi.org/10.1146/annurev-clinpsy-050718-095718

*Tello, N., Harika-Germaneau, G., Serra, W., Jaafari, N., &
Chatard, A. (2020). Forecasting a fatal decision: Direct
replication of the predictive validity of the Suicide–
Implicit Association Test. Psychological Science, 31(1),
65–74. https://doi.org/10.1177/0956797619893062

Vianello, M., & Bar-Anan, Y. (2021). Can the Implicit Association
Test measure automatic judgment? The validation continues.
Perspectives on Psychological Science, 16(2), 415–421.
https://doi.org/10.1177/1745691619897960

Vul, E., Harris, C., Winkielman, P., & Pashler, H. (2009).
Puzzlingly high correlations in fMRI studies of emotion,
personality, and social cognition. Perspectives on
Psychological Science, 4(3), 274–290. https://doi.org/10
.1111/j.1745-6924.2009.01125.x

Walker, S. S., & Schimmack, U. (2008). Validity of a happiness
implicit association test as a measure of subjective wellbeing.
Journal of Research in Personality, 42, 490–497.
https://doi.org/10.1016/j.jrp.2007.07.005

Yong, E. (2012 October 12). Nobel laureate challenges
psychologists to clean up their act. Nature. https://doi
.org/10.1038/nature.2012.11535