What Gelman Gets Wrong About Psychology

This blog post was created in collaboration with ChatGPT. I asked ChatGPT to read Gelman’s blog posts about psychological research and discussed his criticism with ChatGPT. Based on the evidence retrieved by ChatGPT and our discussion, ChatGPT wrote the following summary of our discussion. You could do the same with an IA for my blog posts. The quality of the end product depends on the quality of the questions. I believe that this is an accurate description of Gelman’s blog posts: quick, strong opinions with little attention to empirical evidence.

* I think ChatGPT for its help with this blog post. All mistakes are of course made by ChatGPT. LOL

Why Overgeneralized Skepticism Misses the Evidence in Experimental Research

Andrew Gelman’s blog has become a go-to source for sharp, often scathing critiques of empirical research — especially in psychology. His core concerns are valid and well known: small sample sizes, flexibility in data analysis (“garden of forking paths”), publication bias, and the misuse of statistical significance. Many in the field welcome this type of scrutiny. But Gelman’s commentary often goes beyond fair criticism. Instead of offering a balanced view, he paints much of psychology as inherently unreliable, even garbage.

This post explains why that view is too sweeping, often unsupported by the very kind of data-driven inquiry Gelman himself advocates, and rooted — at least in part — in disciplinary biases shaped by his background in political science.

1. 📎 Cherry-Picked Failures ≠ Systemic Collapse

Gelman frequently cites high-profile failures (e.g., power posing, embodied metaphors, beauty-voting correlations) as emblematic of the entire field. But these examples, while real, are not representative of psychology as a whole. He rarely highlights:

Highly replicable subfields (e.g., perception, psychophysics, cognitive neuroscience),
Areas with strong internal consistency across decades (e.g., visual search, delay discounting),
The many methodological reforms underway (preregistration, Registered Reports, replication projects).

Psychology is a broad discipline, and focusing exclusively on its weakest claims distorts the picture.

2. 🧠 Not All Psychology Is Noisy

Gelman often portrays psychology as relying on noisy, underpowered studies. But many psychological studies — especially those using within-subject designs with repeated measures — have very high reliability and low error variance. In perception research, effects observed with N = 5 can replicate precisely across labs. Psychology spans a continuum from highly stable low-level phenomena to complex social behavior. Collapsing this range into a single judgment of unreliability misses the nuance entirely.

3. 🔁 Psychologists Rarely Rely on Single Studies for Effect Size Claims

A major oversight in Gelman’s critique is the assumption that psychologists treat small, noisy studies as definitive evidence about effect size magnitude. In fact:

Psychologists are well aware that small studies yield unstable estimates.
Claims about effect sizes are rarely made based on individual experiments.
Instead, researchers use meta-analysis to combine results across studies and reduce sampling noise.

And importantly: In well-controlled studies with p < .05, sign errors (wrong direction of the effect) are rare — as shown in projects like the Open Science Collaboration’s reproducibility study. This means that even small studies typically get the direction of an effect right, even if they are imprecise about its size. Psychologists know this. Gelman often implies they don’t.

4. 🔬 He Overlooks Tools that Handle Heterogeneity

Gelman claims that “there is no one true effect size” in many psychological literatures — a point that is actually well-acknowledged by psychologists themselves. Tools like random-effects meta-analysis, Bayesian model averaging, and z-curve explicitly model heterogeneity across studies and estimate how much variation exists in true effect sizes. In fact, analyses correcting for publication bias (e.g., Maier et al., 2022 using RoBMA) often show that some psychological interventions (like nudging) may have modest average effects but substantial variability — meaning that some contexts do work.

Rather than garbage, the data often show a mix of gold and dross.

5. 🧭 Bias Against Generalization May Reflect Disciplinary Training

Gelman’s own background is in political science and survey statistics — fields that deal with singular, unrepeatable events and complex causal inference (e.g., “Why did Obama win in 2008?”). In that world, generalizing across contexts is precarious, and every event is historically unique. Psychology, especially experimental psychology, operates differently. It isolates regularities in human behavior — how long it takes to name a color, how memory decays over time — using tightly controlled designs and repeated trials. These are not historically unique events; they are law-like patterns observable across time, people, and settings.

What seems naïve to a political scientist may simply reflect a different (and valid) epistemology.

6. 🧪 Gelman Sometimes Critiques Without Engaging the Data

Perhaps most concerning is that Gelman occasionally dismisses entire literatures — such as the nudging literature — without directly engaging with the available evidence. In the case of the meta-analysis by Mertens et al. (2022), he calls it garbage, while more nuanced reanalyses (e.g., using RoBMA) show a small average effect but substantial heterogeneity, with an estimated 25% of studies likely showing true effects greater than d = 0.4. These are data-driven findings. Ignoring them in favor of blanket skepticism contradicts the evidence-based reasoning Gelman himself promotes.

🎯 Conclusion: Skepticism ≠ Cynicism

Andrew Gelman has done a great service to science by helping uncover many statistical flaws. But his broad-brush skepticism of psychology often veers into cynicism, undermining valid work and demoralizing reformers within the field. A better path forward is critical realism — recognizing the flaws, but also recognizing the value, nuance, and complexity of a diverse empirical discipline.

Psychology is not garbage. It’s a field in progress — flawed, yes, but full of replicable insights into how minds and behavior work.

Replicability-Index

Improving the replicability of empirical research

What Gelman Gets Wrong About Psychology

Why Overgeneralized Skepticism Misses the Evidence in Experimental Research

1. 📎 Cherry-Picked Failures ≠ Systemic Collapse

2. 🧠 Not All Psychology Is Noisy

3. 🔁 Psychologists Rarely Rely on Single Studies for Effect Size Claims

4. 🔬 He Overlooks Tools that Handle Heterogeneity

5. 🧭 Bias Against Generalization May Reflect Disciplinary Training

6. 🧪 Gelman Sometimes Critiques Without Engaging the Data

🎯 Conclusion: Skepticism ≠ Cynicism

Like this:

Leave a ReplyCancel reply

Why Overgeneralized Skepticism Misses the Evidence in Experimental Research

1. 📎 Cherry-Picked Failures ≠ Systemic Collapse

2. 🧠 Not All Psychology Is Noisy

3. 🔁 Psychologists Rarely Rely on Single Studies for Effect Size Claims

4. 🔬 He Overlooks Tools that Handle Heterogeneity

5. 🧭 Bias Against Generalization May Reflect Disciplinary Training

6. 🧪 Gelman Sometimes Critiques Without Engaging the Data

🎯 Conclusion: Skepticism ≠ Cynicism

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from Replicability-Index