How Many Correct Citations Could be False Positives?
Of course, the example above is an exception. Most of the time a cited reference contains an empirical finding that is consistent with the textbook claim. However, this does not mean that textbook findings are based on credible and replicable evidence. Until recently it was common to assume that statistical significance ensures that most published results are true positives (i.e, not a false positive random finding). However, this is only the case if all results are reported. It has been known since 1959 that this is not the case in psychology (Sterling, 1959). Jerry Brunner and I developed a statistical tool that can be used to clean up the existing literature. Rather than actually redoing 50 years of research, we use the statistical results reported in original studies to apply a significance filter post-hoc. Our tool is called zcurve. Below I used zcurve to examine the replicability of studies that were used in chapter 2 about the self.
More detailed information about the interpretation of the graph above is provided elsewhere (link). In short, for each citation in the textbook chapter that is used as evidence for a claim, a team of undergraduate students retrieved the cited article and extracted the main statistical result that matches the textbook claim. These statistical results are then converted into a z-score that reflects the strength of evidence for a claim. Only significant results are important because non-significant results cannot support an empirical claim. Zcurve fits a model to the (density) distribution of significant z-scores (z-scores > 1.96). The shape of the density distribution provides information about the probability that a randomly drawn study from the set would replicate (i.e., reproduce a significant result). The grey line shows the predicted distribution by zcurve. It matches the observed density in dark blue well. Simulation studies show good performance of zcurve. Zcurve estimates that the average replicability of studies in this chapter is 56%. This number would be reassuring if all studies had 56% power. This would mean that all studies are true positives and if a study were replicated every other study would be successful. However, reality does not match this rosy scenario. In reality, studies vary in replicability. Studies with z-scores greater than 5 have 99% replicability (see numbers below x-axis). However, studies with just significant results (z < 2.5) have only 21% replicability. As you can see, there are a lot more studies with z < 2.5 than studies with z > 5. So there are more studies with low replicability than studies with high replicability. The next plot shows model fit (higher numbers = worse fit) for zcurve models with a fixed proportion of false positives. If the data are inconsistent with a fixed proportion of false positives, model fit decreases (higher numbers).
The graph shows that models with 100%, 90% or 80% false positives clearly do not fit the data as well as models with fewer false positives. This shows that some textbook claims are based on solid empirical evidence. However, model fit for models with 0% to 60% look very similar. Thus, it is possible that the majority of claims in the self chapter of this textbook are false positives. It is even more problematic that textbook claims are often based on a single study with a student sample at one university. Social psychologists have warned repeatedly that their findings are very sensitive to minute variations in studies, which makes it difficult to replicate these effects even under very similar conditions (Van Bavel et al., 2016), and that it is impossible to reproduce exactly the same experimental conditions (Stroebe and Strack, 2014). Thus, the zcurve estimate of 56% replicability is a wildly optimistic estimate of replicability in actual replication studies. In fact, the average replicability of studies in social psychology is only 25% (Open Science Collaboration, 2015).
Social psychology textbooks present many findings as if they are established facts, when this is not the case. It is time to audit psychology textbooks to ensure that students receive accurate scientific information to inform their beliefs about human behavior. Ideally, textbook authors will revise their textbooks to make them more scientific and instructors will chose textbook based on the credibility of the evidence in textbooks.