Robust Science with Low Replicability?

Stephan Lewandowsky and Klaus Oberauer (2020) published an article titled “Low replicability can support robust and efficient science.” A good example of low replicability is the cited finding that only 25% of results published in social psychology could be replicated (OSC, 2015). Thus, the title suggests that social psychology can be a robust science even if only one quarter of published findings are replicable. This seems to be a surprising conclusion that goes against initiatives to improve psychological science.

The authors note that low replication rates in social psychology have been used to call for more replication studies (Zwaan et al., 201z). The main point of the article is that this is not necessarily the best response to the replication crisis in social psychology.

“Highlighting the virtues of replications is, however, not particularly helpful without careful consideration of when, how, why, and by whom experiments should be replicated.”

Examining when replication studies are valuable or a waste of resources is an important and interesting question. However, this question is different from the replicability of the original studies that are being published in social psychology. That is, we can distinguish two questions: (a) how replicable should original studies be and (b) how many original studies should be replicated?

The first question about replicability is essentially a question about statistical power (Cohen, 1962; Brunner & Schimmack, 2019). Stating that low replicability can support a robust science implies that social psychology can be a robust science, even if average power of published studies is only 25%. Although the title implies that this is the topic of the article, the article does not address this question.

The second question is not about replicability. Rather it is about the value of actual replication studies. There is a way to connect the two questions. It may seem obvious that the value of replication studies decreases (a) the more studies test true hypotheses and (b) the higher the power of original studies is. The reason is that most replication studies are likely to be successful and confirm that the original result was a true positive result. In contrast, if most studies test false hypotheses and power is low, a high percentage of significant results are false positives and true positives are published with inflated effect sizes. In this case, replication studies are likely to fail, and only a small number of studies that succeed in a replication attempt actually contribute robust evidence; the rest is wasted. Not surprisingly, calls for more direct replication studies have arisen in social psychology where false positive rates are relatively high and power is low.

The need for replication studies also increases when researchers use questionable research practices to produce significant results. The use of these practices inflates the risk of false positives, which means only results that have been replicated in honest replication studies can be trusted. Stephan Lewandowsky and Klaus Oberauer (2020) recognize that QRPs are a problem, but they are not interested in addressing this issue. Instead, they “simulate an idealized and transparent scientific community that eschews p-hacking and other questionable research practices and conducts studies with adequate power (P = .8).

In philosophy it is well known that any conclusions that are logically valid, but rest on false assumptions may be false. Thus, the article simply does not speak to the actual problems or the consequences of low replicability for psychology as a science. However, articles about the replication crisis attract a lot of attention and citations, so the authors decided to present their fantasy simulations as if they speak about the replication crisis or ways to make psychology more robust and credible. They do not.

So, is it possible to have a robust science with low replicability? If you ask me, I don’t think so. I think social psychology needs to first become the science that Lewandowsky and Oberauer simulate: ban the use of QRPs, and conduct studies with 80% power. The open science movement is trying to make this happen. Lewandowsky and Oberauer seems to suggest that the open science movement is misguided by emphasizing replication studies. “Perhaps ironically, waste is reduced by withholding replication until after publication” The problem with this conclusion is that it rests on the assumption that original results are trustworthy (no QRPs, adequate power). These are exactly the conditions that make actual replication studies less important. Once original results have high replicability, it is less important to probe replicability with replication studies. Thus, we may all agree that a robust science cannot thrive with low replicability of original studies.

Leave a Reply