This is another blog post about post-hoc power. It was created by ChatGPT after a discussion with ChatGPT about post-hoc power. You can find the longer discussion at the end of the blog post.
🔍 Introduction
You finish your study, run the stats, and the p-value is… not significant. What next?
Maybe you ask, “Did I just not have enough power to detect an effect?”
So you calculate post-hoc power — also called observed power — to figure out whether your study was doomed from the start.
But here’s the problem:
Post-hoc power doesn’t tell you what you think it does.
This post walks through why that’s the case — and what to do instead.
⚡ What Is Post-Hoc (Observed) Power?
Post-hoc power is a calculation of statistical power after your study is complete, using the effect size you just observed.
It answers the question:
“If the true effect size were exactly what I observed, how likely was I to find a significant result?”
It seems intuitive — but it’s built on shaky ground.
🚨 Why Post-Hoc Power Is Misleading
The main issue is circular logic.
Post-hoc power is based on your observed effect size. But in any given study, your observed effect size includes sampling error — sometimes wildly so, especially with small samples.
So if you got a small, non-significant effect, post-hoc power will always be low — but that doesn’t mean your study couldn’t detect a meaningful effect. It just means it didn’t, and now you’re using that fact to “prove” it couldn’t.
👉 In essence, post-hoc power just repackages your p-value. It doesn’t add new information.
🤔 But What If I Want to Know About Power?
Here’s where things get interesting.
Power analysis is still important — but it needs to be handled differently. The key distinction is between hypothetical power and observed power:
| Type of Power | Based on | When Used | Purpose |
|---|---|---|---|
| Hypothetical | Expected (e.g., theoretical or meta-analytic) effect size | Before study | To design the study |
| Observed | Effect size from current data | After study | Often (wrongly) used to explain significance |
But you can do something more useful with observed data…
✅ A Better Way: Confidence Intervals for Power
Rather than calculating a single post-hoc power number, calculate a confidence interval for the effect size, and then use that to compute a range of plausible power values.
Example:
Let’s say you observed an effect size of 0.3, with a 95% CI of [0.05, 0.55].
You can compute:
- Power if the true effect is 0.05 (low power)
- Power if the true effect is 0.55 (high power)
Now you can say:
“If the true effect lies within our 95% CI, then the power of our study ranged from 12% to 88%.”
That’s honest. It tells you what your data can say — and what they can’t.
🧪 When Are Power Confidence Intervals Informative?
In small studies, the confidence interval for the effect size (and thus the power) will be wide — too wide to draw firm conclusions.
But if you base your effect size estimate on:
- a large study, or
- a meta-analysis,
your confidence interval can be narrow enough that the corresponding power range is actually informative.
✔️ Bottom line: Confidence intervals make power analysis meaningful — but only when your effect size estimate is precise.
💡 Final Thought: Use Power Thoughtfully
If you didn’t find a significant result, it’s tempting to reach for post-hoc power to explain it away.
But instead of asking, “Was my study underpowered?” try asking:
- “What effect sizes are consistent with my data?”
- “How much power would I have had for those?”
- “What sample size would I need to detect effects in that range reliably?”
These are the questions that lead to better science — and more replicable results.
🛠️ TL;DR
- ❌ Post-hoc power (observed power) is often misleading.
- 🔁 It restates your p-value using your observed effect size.
- ✅ Better: Use the 95% CI of your effect size to calculate a range of power estimates.
- 📏 If your effect size estimate is precise (e.g., from a large or meta-analytic study), this range becomes actionable.