The Journal Psychological Methods Decides that Psychological Research Is Not Underpowered

For over sixty years, psychologists have been told that their studies are underpowered.
Starting with Cohen (1962) and repeated by Sedlmeier and Gigerenzer (1989), Maxwell (2004), Button et al. (2013), and countless meta-analyses, the message has been consistent: our typical studies lack the statistical sensitivity to detect true effects.

This “power failure” has been a cornerstone of the open science movement and a standard explanation for the replication crisis.

But according to a recent Psychological Methods editorial decision, that entire literature rests on a fundamental ontological error.


Apparently, there is no power failure—because, by definition, completed studies do not have power.


The New Logic

In the journal’s interpretation (based on Pek, Hoisington-Shaw, & Wegener, 2024):

  • Statistical power is a property of a test, not of a study.
  • It is defined only before data are collected, for a hypothetical infinite series of replications with a fixed true effect size.
  • Once data exist, the test’s long-run property no longer applies; therefore, it makes no sense to speak of a study’s power—past or present.

Hence, Cohen (1962) didn’t actually discover that psychology was underpowered.
He merely performed a few algebraic exercises about hypothetical tests.
The claim that real studies were underpowered is, under this logic, a category mistake.


The Consequences of Redefinition

If this view is taken seriously:

  • Cohen (1962, 1988), Sedlmeier & Gigerenzer (1989), Rossi (1990), Maxwell (2004), and Button et al. (2013) all committed the same “ontological error.”
  • There can be no such thing as observed or average power, since power is not a property of results.
  • And since no real study can ever be “underpowered,” there can be no “power failure” to explain replication failures.

Replication failures must therefore be due to something else—perhaps the planets, bad luck, or metaphysical indeterminacy.


The Irony

Of course, the same journal publishes meta-analyses that estimate “average power” and discuss its implications for replicability. It also accepts simulations in which average power determines expected replication rates. But when that same concept is applied to actual psychological research, it suddenly becomes “ontologically incoherent.” In other words, power matters—except when it matters.


The Bigger Picture

Sarcasm aside, this position would erase one of the few robust empirical generalizations about psychology:

that typical studies have a low probability of detecting true effects.

and that publication bias explains the 90% success rates in psychology journals (Sterling, 1959; Sterling et al., 1995; Motyl et al., 2017).

That claim—whether estimated via Cohen’s early surveys, Sedlmeier & Gigerenzer’s analyses, or modern bias-corrected methods like z-curve—has accurately predicted the replication crisis.

To declare the concept meaningless is not theoretical progress; it’s conceptual retreat.
It protects the purity of Neyman–Pearson logic at the cost of empirical relevance.
If taken literally, the new Psychological Methods stance means that “power” applies only to imaginary studies—and that real studies, by definition, can never fail.


Closing Line

So congratulations, psychology.
After six decades of self-criticism, we can finally declare victory:
our research is not underpowered—because power no longer exists.


2 thoughts on “The Journal Psychological Methods Decides that Psychological Research Is Not Underpowered

  1. I asked ChatPT to find sources that make “empirical claims about the average probability of detecting real effects in psychology,” avoiding the term power.

    ChatGPT found only sources that claim this probability is low and used the term power to refer to this probability. Thus, the decision by the gatekeepers to reserve the term power only for study planning with hypothetical effect sizes does not reflect the actual use of the term power in the power literature.

    The decision also does not change the fact that the probability to obtain significant results with the true population effect sizes in psychology is low, even if we cannot use the term power as a name for this probability. Maybe we should just reverse it and start talking about impotence. Rather than low power, psychological research suffers impotence.

    # Sources concluding the average is **low** (well below the 80% norm or Sterling’s ~90% success benchmark)

    * **Cohen (summarized in his power monograph):** classic surveys show median power far below .80 for medium effects; replication summaries report median ≈ .37 for medium effects at α=.05. ([utstat.toronto.edu][1])

    * **Sedlmeier & Gigerenzer (1989, Psychological Bulletin):** multi-journal survey; most papers had 50% and that’s OK”)

    I did **not** find credible field-wide sources claiming that the *average* power of psychological studies is ≈80% (or that ≳50% is adequate) **for actual effect sizes**. What exists are (i) statements of the **normative target** (80%) and (ii) niche/design-specific notes (e.g., some within-subject paradigms can attain high power), but these do **not** claim that psychology’s *overall average* is adequate:

    * Articles and guides reiterating the **80% target** as a convention (not an achieved average): APS/Observer explainer; general primers. ([Association for Psychological Science][8])

    * Design-specific advice showing how to **reach** 80% (e.g., many trials/participants in within-subject tasks), while noting current practice is typically lower. (These are prescriptive, not claims that averages are already high.) ([PMC][9])

    **Bottom line:** The empirical literature converges that psychology’s *average* power is **low to moderate** (often ~.20–.50 depending on effect sizes/designs), **well below** the .80 norm and Sterling’s ~90% success rate—**and I found no credible sources asserting that the overall average is already adequate.**

    [1]: https://utstat.toronto.edu/~brunner/oldclass/378f16/readings/CohenPower.pdf?utm_source=chatgpt.com “Statistical Power Analysis for the Behavioral Sciences”
    [2]: https://library.mpib-berlin.mpg.de/ft/gg/gg_do%20studies_1989.pdf?utm_source=chatgpt.com “Do Studies of Statistical Power Have an Effect on the …”
    [3]: https://statmodeling.stat.columbia.edu/wp-content/uploads/2017/07/maxwell2004.pdf?utm_source=chatgpt.com “The Persistence of Underpowered Studies in …”
    [4]: https://pubmed.ncbi.nlm.nih.gov/23571845/?utm_source=chatgpt.com “Power failure: why small sample size undermines the …”
    [5]: https://journals.plos.org/plosbiology/article?id=10.1371%2Fjournal.pbio.2000797&utm_source=chatgpt.com “Empirical assessment of published effect sizes and power in …”
    [6]: https://pmc.ncbi.nlm.nih.gov/articles/PMC9351632/?utm_source=chatgpt.com “Questionable Research Practices, Low Statistical Power …”
    [7]: https://royalsocietypublishing.org/doi/10.1098/rsos.160254?utm_source=chatgpt.com “Low statistical power in biomedical science: a review of three …”
    [8]: https://www.psychologicalscience.org/publications/observer/powerful-myths-statistical-power.html?utm_source=chatgpt.com “Powerful Myths: Common Misconceptions About Statistical …”
    [9]: https://pmc.ncbi.nlm.nih.gov/articles/PMC8329985/?utm_source=chatgpt.com “Power Contours: Optimising Sample Size and Precision in …”

Leave a Reply