A Deep Dive Into Carter et al.’s (2019) Evaluation of Meta-Analytic Methods

This is mainly a landing page for individual blog posts that examine specific conditions in Carter et al.’s extensive and influential simulation study that varied (1) the size of study sets (k), (2) the average effect size (d), (3) heterogeneity of effect sizes (tau), (4) the presence of selection bias, and (5) p-hacking.

In my reexamination of this simulation study, I focus on the selection model implemented in the weightr package in R. Carter et al. (2019) used the default implementation of this model known as the 3 parameter selection model (3PSM). I examine whether modifications of the model increase its performance. Most important, is the modeling of p-hacking with a bin for just significant results between p = .005 and .025 (one-tailed) that corresponds to two-tailed p-values between .05 and .01. P-hacking produces more of these just significant results than we would expect based on the true power of studies and selection bias.

The first four simulations simulated a small average effect size (d = .2) with large heterogeneity (tau = .4). in a set of k = 100 “published” studies. With a normal distribution of population effect sizes, this implies that 95% of the effect sizes are in the range from d = -.6 to d = 1. I used either no or high p-hacking and selection bias as implemented in Carter et al.’s (2019) simulation code. This produced a 2 x 2 design with no or high p-hacking crossed with no or high selection bias. Aside from the selection model, I evaluated PET-PEESE, p-uniform, and z-curve.

Studies 1-4:

The main finding for this limited study was that the improved selection model performed well and performed as well or better than the other methods.

Simulation #296: No Selection Bias, No P-Hacking

Simulation #328: No Selection Bias / P-Hacking

Simulation #392: High Selection Bias / No P-Hacking

Simulation #424: High Selection Bias / High P-Hacking

Study 5

Here I examined simulation #100. This simulation assumes that the null hypothesis is true in all studies. It simulates high selection bias and no p-hacking. Replicating Carter et al.’s (2019), I find a high false positive rate for the selection model. However, the bias is small. Nevertheless, z-curve outperforms the selection model and the other methods in this simulation. Thus, it is useful to complement effect size meta-analysis with the selection model with z-curve analysis, especially when the average effect size estimate and heterogeneity is small.

Simulation #100: High Selection Bias / No P-Hacking

1 thought on “A Deep Dive Into Carter et al.’s (2019) Evaluation of Meta-Analytic Methods

Leave a Reply