Behavioral or social priming is the poster child of the replication crisis. Bargh declined Kahneman’s offer to rehabilitate it by showing that he could replicate his result. The same is true for other social psychologists who became famous for changing people’s behaviors without their knowledge / awareness.
While priming studies have largely disappeared from social psychology journals, the old studies still exist and have not been retracted although we know that they were obtained with unscientific methods that make it easy to present false evidence for false claims.
This allows unscientific meta-analysists to analyze these data and to continue to make claims that social priming is a real effect even in 2023 and even in a journal that is considered to be methodologically rigorous, although the name suggests otherwise (Psych Bull).

Dai, W., Yang, T., White, B. X., Palmer, R., Sanders, E. K., McDonald, J. A., Leung, M., & Albarracín, D. (2023). Priming behavior: A meta-analysis of the effects of behavioral and nonbehavioral primes on overt behavioral outcomes. Psychological Bulletin, 149(1-2), 67–98. https://doi-org.myaccess.library.utoronto.ca/10.1037/bul0000374
In the old days, the social power of censorship (peer-review) in scientific journals made it difficult to fight back against paradigm prisoners, who try to keep their little word of academia alive. Fortunately, consumers of scientific research now have options to ensure that they are not feeding their minds junk science.
I asked my ChatGPT (trained with discussions and information) to evaluate this article with a set of questions that anybody could ask an AI.
Critically evaluate this article.
Is the literature review accurate or misrepresent the broader literature
0 = totally false to 10 = totally accurate and consistent with current literature
Is this meta-analysis up to date or uses outdated methods
0 = totally outdated to 10 = totally up to date
Are the results interpreted correctly.
0 = totally misrepresents the results to 10 = totallly accurate
Integration with existing literature.
0 = not integrated at all, 10 = accurately interpreted in the context of broader literature
The scores are less important (as they would require comparison scores) than the reasons for the scores. Importantly, AI is neutral and more balanced than most humans, myself included, usually are.
Strengths
- The paper integrates its moderator findings into established theoretical disputes (perception–behavior link vs goal mediation), and it connects specific moderation patterns (e.g., goal value manipulations) to prior work and theoretical interpretations.
- It frames its contribution clearly as both (a) an expanded synthesis and (b) a mechanism-relevant test across moderators.
Limits
- Integration is strongest for the authors’ preferred theoretical framing; it is thinner on integrating the skeptical/critical meta-science side beyond noting that bias methods have “become ever more sophisticated” and providing the Lakens critique as a prompt. In other words, it engages critique more as a motivation for adding diagnostics than as a serious alternative interpretation of what the literature’s apparent “robustness” means. Priming_behavior_
I did ask a couple of follow up questions, mainly about the well-known problem that average effect sizes are meaningless when studies are heterogenous; priming sometimes works and sometimes it does not. To make priming research actually useful, we need to specify WHEN priming works. Priming MAY work or priming SOMETIMES works, is not a scientific finding.
On this point, the meta-analysis is as mute as most others are, but the paper will be cited as “priming works and has a moderate effect size.” Only psychologists believe that this is enough to defend priming against critics who say, “If priming works, why don’t you do a preregistered study with high power and show us that it works?”
ChatGPT Summary: This Meta-Analysis Tell Us When Priming Works?
The authors present an ambitious meta-analysis of priming effects on behavior and argue that replication failures can be reconciled with a “moderate” average effect size by appealing to heterogeneity and moderators. In principle, this is a legitimate defense: heterogeneous effects can exist even when many individual studies fail to replicate. In practice, however, the paper does not deliver what this defense requires.
Although substantial heterogeneity is documented, the moderator analyses do not clearly identify which studies work and which do not. Most examined moderators are weak, inconsistent, or explain only a small fraction of the variance. As a result, the analysis does not partition the literature into a subset of theoretically coherent, reliably effective studies versus a subset that fails. Without such discrimination, heterogeneity becomes descriptive rather than explanatory.
This creates a central tension in the paper’s conclusions. The authors argue for a nonzero average priming effect while simultaneously acknowledging widespread replication failures and publication bias. Yet they do not show that the moderators they test successfully predict when priming should occur. Consequently, the results do not support a practical or falsifiable claim about replicability. A new study cannot be classified, based on their findings, as likely to succeed or fail.
The strongest conclusion supported by the evidence is therefore modest: priming effects may exist under some conditions, but those conditions are not empirically established by the present meta-analysis. Claims of “moderate” effects should be interpreted as population averages over a poorly understood mixture of successful and unsuccessful studies, rather than as evidence that priming reliably works in practice.
In short, the paper shows that priming has not been decisively ruled out—but it does not show when, how, or how often it actually works.
POSTSCRIPT

PET allows for an effect size of d = 0.025. However, because this result rejects H0, d = 0, at the 5% level, the authors use the PEESE estimate that is always higher. If they had used alpha = .01 or a minim effect size of d = .1, PET would not have been rejected, and the average effect size estimate would be close to zero. Did the authors p-hack PET to be significant? It does not matter. The evidence is weak and the main problem remains. Heterogeneity without a clear moderator that explains when it works and when it does not. This is psycho-science, not real science.
PPS
Priming may work, but NOT when a study is preregistered. (hm?)
