Every year, some of our best undergraduate students apply to work with professors on their research projects for one year. For several years, I have worked with students to examine the credibility of psychological science. After an intensive crash course in statistics, students code published articles. The biggest challenge for them and everybody else is to find the critical statistical test that supports the main conclusion of the article. Moreover, results are often not reported sufficiently (e.g., effect sizes without sampling error or exact p-values). For students it is a good opportunity to see why good understanding of statistics is helpful in reading original research articles.
One advantage of my ROP is that it is based on secondary data. Thus, the Covid-19 pandemic didn’t impede the project. In fact, it probably helped me to get a larger number of students. In addition, zoom made it easy to meet with students to discuss critical articles one on one.
The 2020 ROP team has 13 members:
The main aim of the project is to get a sense of the credibility of psychological research across the diverse areas of psychology. The reason is that actual replication initiatives have focussed mostly on social and cognitive psychology where recruitment of participants is easy and studies are easy to do (Open Science Collaboration, 2015). Despite concerns about other areas, actual replication projects are lacking due to the huge costs involved. A statistical approach has the advantage that credibility can also be assessed by simply examining the strength of evidence (signal/noise) ratio in published articles.
The team started with coding articles from 2010, the year just before the replication crisis started. The journals represent a broad range of areas in psychology with an emphasis on clinical psychology because research in clinical psychology has the most direct practical implications.
Cognitive Therapy and Research
Journal of Anxiety Disorders
Journal of Consulting and Clinical Psychology
Journal of Counseling Psychology
Journal of Applied Psychology
The test statistics are converted into z-scores as a common metric to reflect the strength of evidence against the null-hypothesis. These z-scores are then analyzed with z-curve (Bartos & Schimmack, 2020; Brunner & Schimmack, 2020).
The figure and statistical results are similar to results in social psychology (Schimmack, 2020). First, the graph shows the well-known practice in psychology to publish mostly successful studies; that is, statistically significant results with p < .05 (z > 1.96) (Sterling, 1959). Here, the observed discovery rate is 88%, but the actual discovery rate is even higher because marginally signifcant results (p < .10, z > 1.65) are also often interpreted as sufficient evidence to reject the null-hypothesis.
In comparison, the estimated discovery rate is much lower at 33%. The discrepancy between the observed and expected discovery provides clear evidence that questionable research practices (QRPs, John et al., 2012; Schimmack, 2014). QRPs are research practices that increase the chances of reporting a statistically significant result, including selective reporting of significant results or highlighting significant results as discoveries (Kerr et al., 1998). The presence of QRPs in psychological research in 2010 is expected, but information about the extent of QRPs is lacking. Z-curve suggests that there is massive use of QRPs to boost actual success rates of 33% to nearly perfect success rate in published articles. This has important implication for replication attempts. If reported results are selected to be significant from results with low power, replication studies have a low probability of being significant again.
However, the chance of a replication of a significant result in the original studies, depends on the mean power of the studies with significant results and selection for significance also increases the actual power of studies (Brunner & Schimmack, 2020). The reason is that studies with higher power have a higher chance to produce significant results even without QRPs. The z-curve estimate of the expected replication rate is 52%. This would suggest that ever second study could be successfully replicated. The problem with this estimate is that it assumes that exact replications are possible. However, psychological studies are difficult or impossible to replicate exactly. This may explain why the expected replication rate is higher than the success rate in actual replication studies (cf. Bartos & Schimmack, 2020). For actual replication studies, the expected discovery rate seems to be a better predictor.
In conclusion, the results for clinical psychology and other areas of psychology are similar to those for social psychology (Schimmack, 2020). This is consistent with a comparison of disciplines based on automatic extraction of all test statistics rather than hand-coding of focal hypothesis tests (Schimmack, 2020).
In the upcoming semester (aptly called the winter semester in Canada), the team will code articles from 2019 to see whether a decade of soul searching about research practices in psychology has produced notable changes. There are two possibilities. On the one hand, journals could have become more accepting of non-significant results leading to more publications of non-significant results (i.e., a decrease in the observed discovery rate). On the other hand, journals may have asked for a priori power analysis and bigger sample sizes to reduce sampling error to produce stronger evidence against the null-hypothesis (i.e., an increase in the expected discovery rate).
Stay tuned and check in again in May.