🔹 Core Issue
The controversy centers on whether it is legitimate to estimate the average statistical power of completed studies—that is, to use published test statistics to infer how often those studies would produce significant results if replicated.
- Schimmack’s position: Average power can and should be estimated empirically from published results.
- Pek’s position: Power is a hypothetical construct used for planning future studies, not something that can be meaningfully estimated post hoc.
🔹 Schimmack’s Position (Brunner & Schimmack, 2020; Schimmack, 2025)
- Two Concepts of Power
- Hypothetical power: The probability of significance based on an assumed true effect size before data collection.
- True power: The actual long-run probability that studies in a literature produce significant results, given their real (not assumed) effect sizes and sample sizes.
Schimmack argues both concepts are legitimate—one for planning, one for evaluating.
- Empirical Estimation is Possible
- Using methods like z-curve, one can reconstruct the distribution of significant test statistics (z-values, t-values) to estimate:
- Expected Discovery Rate (EDR): expected proportion of significant results after accounting for selection bias.
- Expected Replication Rate (ERR): probability that a significant finding would be significant again if replicated.
These correspond to estimates of average true power.
- Using methods like z-curve, one can reconstruct the distribution of significant test statistics (z-values, t-values) to estimate:
- Purpose of Estimation
- Estimating average power reveals the credibility of a research area: if the observed success rate (e.g., 90%) far exceeds the estimated true success rate (e.g., 30%), the field likely suffers from publication bias or p-hacking.
- Average power is thus an index of evidential value and reproducibility, not a design tool.
- Rebuttal to Semantic Objection
- Even if “power” was historically defined for hypothetical design contexts, that’s a semantic convention, not a logical limitation.
- Redefining or replacing the term (e.g., “expected discovery rate”) does not change the underlying empirical reality that studies have a certain probability of success given their true effects.
🔹 Pek’s Position (Pek, Hoisington-Shaw, & Wegener, 2024, Psychological Methods)
- Power Is Hypothetical
- By definition, power is the probability of rejecting Hâ‚€ given a true effect size and sample size in a planned design.
- Once data are collected, the “true effect” is unknown, and the observed result no longer provides information about power.
- Post-hoc Power Is Misleading
- Power computed from an observed effect size is mathematically redundant with the p-value.
- Therefore, post-hoc power analysis adds no new information—it simply recasts the p-value in another form (the so-called “power-p equivalence” argument).
- Meta-Analytic Power Estimation Is Ontologically Flawed
- Using the same term (“power”) to describe retrospective estimates confuses the conceptual role of power (a design tool) with empirical inference about data.
- Pek argues that such redefinitions create an “ontological error”—blurring what power is (a pre-data probability) versus what z-curve estimates (a property of observed distributions).
- Proper Role of Power
- Power analysis should be reserved for planning new studies to achieve a desired level of sensitivity, not for evaluating past research.
🔹 Schimmack’s Counterarguments (2025 Blog & Responses)
- Misplaced Formalism
- The “ontological” objection is purely semantic: words often have multiple legitimate meanings depending on context (e.g., “force” in physics vs. conversation).
- Cohen himself used power in both planning and evaluative contexts (e.g., Cohen, 1962; Sedlmeier & Gigerenzer, 1989).
- Empirical Track Record
- Dozens of meta-analyses since the 1960s have reported “average power” of published studies—this tradition predates Pek’s definitional restriction.
- Methods like z-curve extend that logic by correcting for selection bias and estimating actual discovery probabilities.
- Conceptual Utility Over Semantics
- Regardless of what it’s called, the estimated probability that a significant result would replicate is an empirically meaningful and policy-relevant measure.
- The debate over the label “power” is a distraction from the substantive goal: improving credibility and reproducibility.
- Meta-Science vs. Design Science
- Power as used in z-curve belongs to meta-science—the empirical study of how scientists actually behave—rather than the formal Neyman-Pearson design framework.
- Rejecting post-hoc estimation because it violates a textbook definition misses the meta-scientific purpose entirely.
🔹 Broader Implications
| Issue | Schimmack’s View | Pek’s View |
|---|---|---|
| Definition of power | Can refer to true long-run success probability of real studies | Only a hypothetical design probability |
| Use of observed data | Valid and necessary for empirical evaluation | Invalid; tautological with p-values |
| Role of z-curve | A meta-scientific estimator of true discovery/replication rates | Misuses “power” concept |
| Philosophy of science | Empirical realism: definitions should follow observable reality | Conceptual essentialism: definitions must follow formal theory |
| Goal | Diagnose publication bias and credibility | Preserve terminological purity of statistical theory |
🔹 Summary Statement
The Schimmack–Pek controversy is ultimately about the meaning and use of statistical power.
- Pek argues that power belongs exclusively to the design phase and cannot describe completed studies.
- Schimmack argues that psychology needs empirical tools to assess its actual performance and that average power—or equivalently, expected discovery/replication rates—provides exactly that.
In short:
Pek defends a definition of power of a single study; Schimmack invented a method to estimate average power.
Holy cr@p Uli.
How you put up with this level of shoddy criticism from people who should know better is beyond me…………!!!
Blogging is therapy. 🙂