The new year and decade just started and I am excited to announce that publication of a pre-print that introduces Z-Curve.2.0 (Bartoš & Schimmack, 2020; Preprint). The ms. “Z-Curve.2.0: Estimating Replication Rates and Discovery Rates” is the product of a nearly year-long collaboration with František Bartoš.
Last year, František emailed me to introduce a new way of estimating z-curves finite mixture models with an EM-algorithm. We started working together on evaluating the density approach from Brunner & Schimmack (2019) and the EM algorithm. In the end, the EM algorithm performs a bit better although the density approach leaves some wiggle room to improve coverage for confidence intervals. Both methods produce useful estimates and confidence intervals with good coverage.
The collaboration with František was amazing and provides another example of the power of social media. Not only does it allow fast exchange of ideas, it also makes it possible to collaborate with people you might never meet otherwise. Just like I met Rickard Carlsson in person only several years after we became Facebook friends and started a journal together, I still have to meet František in person (hopefully this year).
František also created an R-package for z-curve ( Z-Curve Package ). We are pleased to make this package publicly available. Please try it out and give feedback so that we can improve it before František submits it to the R-team as an official package that can be downloaded.
Here is the abstract of the ms. and a figure that was created with the zcurve package.
This article introduces z-curve.2.0 as a method that estimates the expected replication rate and the expected discovery rate based on the test-statistics of studies selected for significance. Z-curve.2.0 extends the work by Brunner and Schimmack (2019) in several ways. First, we show that a new estimation method using expectation-maximization outperforms the kernel-density approach of z-curve.1.0. Second, we examine the coverage of bootstrapped confidence intervals to provide information about the uncertainty in z-curve estimates. Third, we extended z-curve to estimate the number of all studies that were conducted, including studies with non-significant results that may not have been reported, solely on the basis of significant results. This allows us to estimate the expected discovery rate (EDR); that is, the percentage of significant results that were obtained in all studies. EDR can be used to assess the size of the file-drawer, estimate the maximum number of false positive results, and may provide a better estimate of the success rate in actual replication studies than the expected replication rate because exact replications are impossible.
Keywords: Power estimation, Post-hoc power analysis, Publication bias, Expectancy Maximization, Z-curve, Replicability, Meta-Science, Expected Replication Rate, Expected Discovery Rate, False Discovery Rate, File-Drawer
Example -Figure created with graph.zcurve package.
Data are original test-statistics of 90 studies with good replication studies from the Open Science Collaboration(OSC) rep. project (OSC, 2015). Publication bias is indicated by the observed discovery rate (85/90 = 94% significant results) when the z-curve estimate of the expected discovery rate is only 39%. The expected replication rate of 62% successful replications is based on the assumption that studies can be replicated exactly. However, with contextual sensitivity, the expected discovery rate is a better estimate of the success rate in replication studies and it is more in line with the actual success rate (well, failure rate, really) in the OSC project.