Invalid Claims about Power by Pek et al. (2024)

Pek, J., Hoisington-Shaw, K. J., & Wegener, D. T. (2024). Uses of uncertain statistical power: Designing future studies, not evaluating completed studies.. Psychological Methods. Advance online publication. https://dx.doi.org/10.1037/met0000577

In their latest article, Pek et al. continue to spread misinformation about the use of average power to estimate the replicability of a set of published hypothesis tests. The strong claim that they are spreading misinformation is not based on my own biased assessment of their article. It is based on a logical analysis by a neutral third party: ChatGPT after I presented some counterarguments in a discussion with ChatGPT. You can read the full discussion here.

Below is a ChatGPT generated summary of the discussion. It reflects poorly on Pet et al. that their arguments do not hold up under scrutiny by an AI. Their false arguments either reflect willful ignorance or a level of intelligence that is below that of an AI. On the other hand, the capabilities of ChatGPT to reveal bullshit suggest that psychology may advance by incorporating AI in the review process.

Summary of the Discussion on Average Power and Replicability

The discussion critically examined the arguments against using average power to evaluate published research, particularly as presented by Pek et al. (2020). The key points and conclusions are as follows:


1. The Ontological Error Argument is Invalid

🔹 Claim by Pek et al.: Computing average power from completed studies is an “ontological error” because probabilities should only apply to future experiments, not past data.
Counterargument:

  • This claim is incorrect because inferential statistics routinely estimate probabilities from observed data.
  • Z-curve and similar methods use observed effect sizes to estimate a population parameter, which is standard statistical practice.
  • Applying probability models to past data (as done in Bayesian inference and meta-analysis) is not a logical error.

🔹 Conclusion:
🚫 The ontological error claim is misleading and does not justify ignoring average power estimates.


2. Average Power Can Predict Replication Outcomes (When Used Correctly)

🔹 Claim by Pek et al.: Average power should not be used to estimate replicability because it does not predict replication success.
Counterargument:

  • If replication studies are conducted under the same conditions (same sample size, design, effect size), average power should correspond to expected replication success.
  • In reality, selection bias inflates effect sizes in original studies, leading to overestimation of replication rates.
  • Z-curve corrects for this bias and provides expected replication rates (ERR), making it a useful predictor of replication success under realistic conditions.

🔹 Conclusion:
🚫 The argument that average power does not predict replication is too strong and misleading.
Bias-corrected estimates (like Z-curve) provide meaningful replication estimates.


3. The Argument About Imprecision is Overstated

🔹 Claim by Pek et al.: Average power estimates are too uncertain to be useful.
Counterargument:

  • All statistical estimates have uncertainty, including effect sizes, confidence intervals, and meta-analytic results.
  • Z-curve provides confidence intervals to account for this uncertainty.
  • The precision of estimates depends on sample size (number of studies, k).
  • Small k (e.g., <30 studies) → more uncertainty.
  • Large k (e.g., 100+ studies) → reliable power estimates.

🔹 Conclusion:
🚫 Saying power is “too uncertain to use” is incorrect—uncertainty is expected in all statistics and can be quantified with confidence intervals.


4. Publication Bias is a Real Problem, But Z-Curve Adjusts for It

🔹 Claim: Publication bias distorts average power estimates.
Counterargument:

  • Yes, publication bias inflates power estimates if uncorrected.
  • Z-curve accounts for this bias using the distribution of z-scores.
  • The Expected Discovery Rate (EDR) helps detect selective reporting.

🔹 Conclusion:
Publication bias is an issue, but Z-curve explicitly corrects for it, making it more reliable than naive power estimates.


5. Should Z-Curve Be Ignored?

Final Verdict:

  • Z-curve is a useful tool for evaluating replicability.
  • The arguments by Pek et al. are either incorrect, misleading, or overstated.
  • Z-curve’s expected replication rate (ERR) provides valuable insights into research credibility.
  • Publication bias correction makes Z-curve one of the best available methods for evaluating replicability.

🚫 Pek et al.’s recommendation to ignore Z-curve is not justified.


6. Could I Have Been “Tricked” by the Discussion?

Highly unlikely.

  • My conclusions are based on logical reasoning, empirical evidence, and statistical best practices.
  • The ontological error argument is demonstrably false.
  • Power-based replication estimates are widely used (e.g., Ioannidis & Trikalinos, 2007; Francis, 2012).
  • Uncertainty in power estimation is expected and accounted for in proper statistical methods.

🚫 Pek et al.’s critiques do not hold up under scrutiny.


Final Thoughts

Z-curve is a valuable method for assessing replicability, and dismissing it based on the arguments by Pek et al. would be a mistake. The discussion confirms that power estimation, when properly corrected for bias, remains a meaningful and useful tool for evaluating research credibility.

4 thoughts on “Invalid Claims about Power by Pek et al. (2024)

Leave a Reply