P-Curve vs. Z-Curve: Why Meta-Analysts Are Moving On

Summary (Meta-Description)

P-curve can detect evidential value, but fails under heterogeneity. Z-curve offers accurate replicability estimates with realistic assumptions.

Keywords:

  • p-curve
  • z-curve
  • p-curve vs z-curve
  • heterogeneity in meta-analysis


P-Curve vs. Z-Curve: Why Meta-Analysts Are Moving On

If you’ve been following developments in meta-analysis, you’ve probably heard of p-curve—a method introduced by Simonsohn, Nelson, and Simmons to assess whether a set of statistically significant results shows evidential value (i.e., that not all results are false positives). P-curve became popular because it was simple: take the significant p-values from a set of studies, plot their distribution, and see if there are more very small p-values (e.g., < .01) than you’d expect by chance.

But here’s the catch: p-curve makes a strong assumption that all studies have the same underlying statistical power (homogeneity). In the real world—where effect sizes and sample sizes vary—this assumption rarely holds. And when it doesn’t, p-curve’s “average power” estimate can be severely biased, often overestimating the replicability of a research area.


The Heterogeneity Problem

In practice, studies differ in:

  • Sample size
  • Effect size
  • Measurement reliability
  • Design quality

These differences produce heterogeneity in statistical power. Under heterogeneity, p-curve’s single-parameter model is misspecified—it treats all studies as if they were equally powered, leading to misleadingly high and overconfident estimates.

This problem has been demonstrated in multiple simulation studies (Brunner & Schimmack, 2020; Bartoš & Schimmack, 2022) and in Chapter 5 of the Z-Curve 3.0 Tutorial (2025). The conclusion is consistent: p-curve works for detecting evidential value, but it’s unreliable for quantifying it when power varies.


Enter Z-Curve

Z-curve was developed by Ulrich Schimmack and Jerry Brunner to directly address the heterogeneity issue. Instead of assuming one “true” power level, z-curve models the distribution of significant z-values as a mixture of components, each with its own noncentrality parameter.

This approach:

  • Accurately recovers the expected replication rate (ERR) and expected discovery rate (EDR) under heterogeneity.
  • Produces calibrated confidence intervals that reflect the true uncertainty.
  • Works with the same type of input data as p-curve (significant test statistics), so researchers can easily compare both.

Simulation Comparisons

The Z-Curve 3.0 Tutorial, Chapter 5 ran head-to-head simulations:

Featurep-curvez-curve
Model of powerSingle (homogeneous)Mixture (heterogeneous)
Accuracy under heterogeneityOverestimates powerAccurate estimates
Confidence intervalsToo narrow, misleadingProper coverage
Practical useOK for “is there any evidential value?”Best for “how much?” and “how replicable?”

Results were striking: in realistic mixed-power scenarios, z-curve’s points clustered along the 45° line (accurate estimates), while p-curve’s often overshot, giving an illusion of higher replicability than was actually present.


When to Use Each Method

  • If your only question is: Are all these studies just false positives? — p-curve can still be useful for a quick test of evidential value.
  • If you care about: How much evidential value? How replicable are these findings? — use z-curve, especially when power is likely to vary across studies.

Bottom Line

P-curve was an important step forward, but it’s been surpassed by methods that reflect the complexity of real research literatures. Z-curve is the better choice for quantifying evidential value and replicability, particularly under realistic heterogeneity.


Further Reading & Resources

  • Brunner, W., & Schimmack, U. (2020). Estimating replication rates from significant results. Meta-Psychology.
  • Schimmack, U. (2025). Z-Curve 3.0 Tutorial, Chapter 5: P-Curve vs Z-Curve. replicationindex.com
  • Simonsohn, U., Nelson, L., & Simmons, J. (2014). P-curve: A key to the file-drawer. JEP: General.

Leave a Reply