Expressing Uncertainty about Analysis Plans with Conservative Confidence Intervals

Unless researchers specify an analysis plan and follow it exactly, it is possible to analyze the same data several ways. If all analysis lead to the same conclusion this is not a problem. However, what should we do when the analyses lead to different conclusions? The problem generally arises when one analysis shows a p-value less than .05 and another plausible analysis shows a p-value greater than .05. The inconsistency introduces uncertainty about the proper conclusion. Traditionally, researchers selectively picked the more favorable analysis, which is known as a questionable research practices because it undermines the purpose of significance testing to control the long-run error rate. However, what do we do if researchers honestly present both results, p = .02 and p = .08? As many statistician have pointed out, the difference between these two results is itself not significant and negligible.

A simple solution to the problem is to switch from hypothesis testing with p-values to hypothesis testing with confidence intervals (Schimmack, 2020). With p = .02 and p = .08, the corresponding confidence intervals could be d = -.05 to .30 and d = .05 to .40. It is simple to present the uncertainty about the proper inference by picking the lower value for the lower limit and the higher value for the upper limit to create a conservative confidence interval, d = -.05 to .40. This confidence interval captures uncertainty about the proper analysis and uncertainty about sampling error. Inferences can then be drawn based on this confidence interval. In this case, there is insufficient information to reject the null-hypothesis. Yet, the data still provide evidence that the effect size is unlikely to be moderate. If this is theoretically meaningful or contradicts previous studies (e.g., studies that used QRPs to inflate effect sizes), the results are still important and publishable.

One problem is when there are many ways to analyze the data. A new suggestion has been to do a multiverse analysis. That is, run all possible analysis and see what you get. The problem is that this may create extremely divergent results and it is not clear how results from a multiverse analysis should be integrated. Conservative confidence intervals provide an easy way to do so, but they may be extremely wide if a multiverse analysis is not limited to a small range of reasonable analyses. It is therefore crucial that researchers think carefully about reasonable alternative ways to analyze the data without trying all possible ways of doing so which makes the results uninformative.

Leave a Reply