Estimating Reproducibility of Psychology (No. 118): An Open Post-Publication Peer-Review


In 2015, Science published the results of the first empirical attempt to estimate the reproducibility of psychology.   One key finding was that out of 97 attempts to reproduce a significant result, only 36% of attempts succeeded.

This finding fueled debates about a replication crisis in psychology.  However, there have been few detailed examinations of individual studies to examine why a particular result could be replicated or not.  The main reason is probably that it is a daunting task to conduct detailed examinations of all studies. Another reason is that replication failures can be caused by several factors.  Each study may have a different explanation.  This means it is important to take an ideographic (study-centered) perspective.

The conclusions of these ideographic reviews will be used for a nomothetic research project that aims to predict actual replication outcomes on the basis of the statistical results reported in the original article.  These predictions will only be accurate if the replication studies were close replications of the original study.  Otherwise, differences between the original study and the replication study may explain why replication studies failed.

Summary of Original Article


The article presents one study with a 2 x 2 between subject design with 120 participants (n = 30 per cell).   One experimental factor manipulated the intake of sugar.  A lemonade was either sweetened with sugar or Splenda.  The second factor manipulated attention regulation.  While watching an interview, words were displayed at the bottom of the screen. Half of the participants were instructed not to look at the words. The other half were given no instructions about their attentional focus.  The dependent variable was a hypothetical decisions task.

The authors used a focal contrast analysis that compared the Splenda and attention-regulation condition against the other three conditions.  This contrast was statistically significant, F(1,111) = 5.31.

Replication Study

The replication study followed the same procedure with a slightly larger sample (N = 169).  The same statistical procedure produced a non-significant result, F(1,158) = 0.38.  The replication authors mention that the original study was carried out in Florida and that the replication study was carried out in Virginia.


The replication study failed to replicate the original result.  This is not surprising, given other replication failures for glucose effects and statistical problems of original glucose studies (Schimmack, 2012).







