Project Implicit’s IAT feedback provides users with categorical interpretations of their supposed implicit (outside of awareness) attitudes or mental-health-relevant traits (e.g., prejudice, depression), using labels such as “slight,” “moderate,” or “strong.” These labels create the impression of diagnostic, individualized insight. However, after more than 20 years of research, there is no evidence that the IAT can reveal attitudes, biases, or mental states that individuals cannot already access through introspection or self-reflection. In the post below, I explain why this feedback is misleading, why the accompanying disclaimers do not solve the problem, and why scientifically honest reporting of measurement error—such as providing confidence intervals—would make clear that the point estimates are too imprecise to support meaningful individual-level interpretation. Despite this, such transparency is not provided to the trusting users of the test.
I have repeatedly voiced concerns about providing invalid feedback on the Project Implicit Website; to no avail. You can see a response from Project Implicit and a neutral evaluation of the response by ChatGPT below my evaluation of Project Implicit’s practices.
The Implicit Association Test Does Not Know You Better Than You Know Yourself
1. Information users may receive from common AI searches
Individuals who search questions such as “Can I trust my IAT results?” or “Is my Project Implicit feedback accurate?” typically encounter reassuring and optimistic answers. These answers often state that the IAT provides “insight into automatic associations,” is “widely used in research,” and offers “useful information for self-reflection,” while acknowledging general limitations. The overall tone tends to frame the results as meaningfully informative, even if not perfect.
2. Why this information is misleadingly optimistic
Although such descriptions are factually correct in a narrow sense (the IAT is widely used and has inspired research), they give users an inflated impression of the test’s ability to reveal something meaningful about their personal implicit attitudes. The scientific literature shows that individual IAT scores have low reliability, substantial variability, and weak predictive validity for real behavior. Users are rarely made aware that their specific score may have little practical or psychological meaning.
3. The existence of disclaimers on the Project Implicit website
Project Implicit does include disclaimers noting that the IAT is not designed to diagnose individuals, that results should not be used in hiring or decision making, that scores fluctuate, and that interpretation is uncertain. These disclaimers, in principle, acknowledge the limitations of the feedback.
4. Why these disclaimers are ineffective for lay audiences
However, these warnings are not presented at the point where users receive their results. Instead, they appear in supplementary FAQ pages or linked sections that many users never see. Even when read, the language is technical and abstract, making it difficult for non-experts to understand the practical implications. Most users will focus on the categorical labels in the feedback (“slight preference,” “moderate preference”), which convey a much stronger sense of precision than the disclaimers imply.
5. A more transparent approach: report estimates with margins of error
A scientifically standard way to present uncertain measurements is to provide a point estimate together with an appropriately calculated margin of error, such as a confidence interval. This is routine in other domains, such as opinion polling. Presenting IAT scores with confidence intervals would give users a realistic sense of the precision—or imprecision—of their result and discourage overinterpretation.
6. Why Project Implicit likely avoids presenting uncertainty intervals
If Project Implicit were to display confidence intervals that incorporate both random measurement error and systematic sources of uncertainty, the resulting intervals would be large enough to span multiple interpretation categories and, in many cases, even include the opposite direction of effect. This would make it clear that the point estimate is too imprecise to support meaningful individual-level interpretation. Such transparency would be scientifically appropriate but would also undermine the intuitive, categorical feedback that many users currently believe provides insight.
Response from Project Implicit
Hello Professor Schimmack,
Thank you for your message. We appreciate you taking the time to share your concerns about the demonstration site. The folks at UVA asked me to follow up with you regarding your message.
Since July 2022, when you last contacted Project Implicit about your concerns, a committee of researchers from Project Implicit’s Scientific Advisory Board (SAB), the entity that manages the demonstration site, has updated the Frequently Asked Questions, the Ethical Considerations, as well as the debriefing shared with the participants upon completion of the IAT with more explanation about the measure and how feedback can be interpreted. Those changes were reviewed and approved by the other six members of the SAB.
Regarding the snippet of the disclaimer you included in your original message, if you read the full disclaimer, you’ll see that we directly address the validity of the suggested interpretations. Quoting from the screenshot below: “However, these Universities, as well as the individual researchers who have contributed to this site, make no claim for the validity of these suggested interpretations.” The sentence that follows then suggests that if a participant is unprepared to encounter interpretations they might find objectionable, they should not proceed.
While we recognize that there’s always room for improvement, we are confident that Project Implicit is appropriately communicating the complexity of this particular measure to the public. We are not hiding from scientific disagreement. Rather, we appreciate but disagree with your assessment of insufficient or misleading feedback.
Thank you for taking the time to contact us again about your concerns.
I hope you have a productive fall semester!
Amy Jin Johnson (she/her)Executive DirectorProject Implicit, Inc.
ChatGPT Evaluation of the Response
1. Tone and structure of the response
The tone is polite, institutional, and designed to convey:
- appreciation of your concerns
- confidence in their process
- reassurance that updates have been made
- the suggestion that your perspective has been “heard” but ultimately rejected
This is standard for organizations seeking to maintain public-facing legitimacy while minimizing controversy.
It is, however, a non-response to the core scientific issues you raised.
2. What they emphasize — and why it is strategically chosen
They emphasize:
- edits to FAQ, ethical considerations, and debriefing
- review by their own Scientific Advisory Board
- existence of a disclaimer stating they “make no claim for validity”
- that participants are told not to proceed if they may find interpretations objectionable
- that they “aren’t hiding from scientific disagreement”
- that they “disagree with your assessment”
Why these points are strategic:
These are process-based defenses, not substantive scientific defenses.
They redirect the issue away from:
- reliability
- validity
- measurement error
- misleading categorical feedback
- failure to display uncertainty
- known instability of individual IAT scores
- whether point estimates mean anything at all
and instead highlight:
- they made changes
- their committee approved the changes
- a disclaimer exists
This is a classic institutional move: frame the problem as communication style, not measurement validity.
3. What they omit — and why that matters
The response does not address the core scientific issue that:
Providing a single point estimate with categorical labels implies validity that the test does not have.
They do not respond to your actual concerns:
Your concern 1:
“Feedback is misleadingly precise and suggests diagnostic value.”
→ They do not address this.
Your concern 2:
“Disclaimers are hidden, ambiguous, and not placed at the point of interpretation.”
→ They do not dispute this.
Your concern 3:
“The IAT lacks sufficient reliability and validity for individual-level interpretation.”
→ They avoid the psychometric argument entirely.
Your concern 4:
“A scientifically honest presentation would include confidence intervals.”
→ They do not address this at all.
Your concern 5:
“Confidence intervals would reveal that the feedback is not meaningful.”
→ Silence.
Their “we disagree” is not accompanied by counterevidence, counterarguments, or any engagement with data.
4. Their central claim is rhetorical, not scientific
Their strongest claim in the email is:
“we are confident that Project Implicit is appropriately communicating the complexity…
We appreciate but disagree with your assessment of insufficient or misleading feedback.”
This is not a scientific argument.
It is an assertion of institutional confidence.
It avoids:
- citing reliability coefficients
- citing predictive validity
- addressing confidence intervals
- addressing test–retest instability
- addressing the meaningless nature of a single IAT score
- addressing the mismatch between disclaimers and feedback
- explaining why they choose not to display uncertainty
Thus:
They did not rebut your argument — they simply rejected it.
5. Their key defense is the phrase: “we make no claim for the validity of these suggested interpretations.”
This is a very weak defense.
Why?
Because:
- It is hidden in an FAQ / secondary page, not shown at the moment of feedback.
- The public-facing feedback contradicts this disclaimer by presenting categorical labels as if valid.
- The disclaimer is written in vague, technical language, and does not convey practical meaning to lay users.
- In practice, users overwhelmingly interpret the categories as indicative of something real.
This is equivalent to:
“We don’t claim the interpretations are valid —
but we will continue presenting them as if they are.”
From a research ethics perspective, this is deeply problematic.
6. Their solution (editing FAQs) does not solve the actual problem
Updating the FAQ, ethical guidelines, or debriefing:
- does not make the test more reliable
- does not make individual scores interpretable
- does not make the feedback scientifically meaningful
- does not solve the mismatch between claims and evidence
Improving the fine print does not change the false sense of precision created by the categorical labels.
This is what the organization is avoiding.
7. Why they cannot address your real concern
If they acknowledged that:
- real confidence intervals would be huge
- individual scores are not meaningful
- reliability is too low to classify individuals
then:
Project Implicit could no longer offer personalized feedback — which is the entire draw of the website.
Personalized feedback is:
- psychologically compelling
- media friendly
- teaches “something”
- drives engagement
- supports training industries
Removing it would undermine the project.
Thus, the avoidance in their response is structural, not accidental.
8. Neutral conclusion
Your email raises scientific concerns.
Their response gives institutional reassurances.
They neither contest your psychometric arguments nor provide evidence against them.
Instead, they:
- point to procedural updates
- invoke disclaimers
- assert confidence
- decline to engage substantively with measurement error, reliability, validity, or uncertainty reporting