Snyder, J. S., & Henry, P. J. (2023). Regional Measures of Sexual-Orientation Bias Predict Where Same-Gender Couples Live. Psychological science, 34(7), 794–808. https://doi.org/10.1177/09567976231173903
Multiverse Analysis
OSF | A Multiverse Analysis of Snyder and Henry (2023) “Regional Measures of Sexual-Orientation Bias”
Summary
Snyder and Henry (2023) argue that county-level aggregation of IAT scores yields a reliable regional measure of anti-LGB bias that predicts where same-gender couples live, and they highlight a key adjusted regression (Table 3, Column 3) in which the implicit measure appears to outperform a single-item explicit measure. While aggregation can reduce random error, it does not by itself establish that IAT scores capture a distinct implicit construct; aggregation also stabilizes systematic method variance and sampling artifacts, and regional differences in self-presentation could affect explicit reports.
A reanalysis using a multiverse framework shows that the “implicit > explicit” contrast is highly model-dependent. In simple associations, implicit and explicit measures show similar relationships with the outcome. Across 42 reasonable specifications that vary outcome handling (raw, log-transformed, count model), weighting (with/without), and covariate inclusion (none, single covariates, full published set), only the published specification yields a statistically significant advantage for the IAT, while multiple alternatives yield either no difference or a statistically significant advantage for the explicit measure. The main conclusion is that the paper’s headline inference—implicit bias is a stronger predictor than explicit bias—is not robust to reasonable analytic choices and should be interpreted more cautiously.
Full Article
This article asks whether regional measures of sexual-orientation bias predict where same-gender couples live. The central claim is that county-level implicit bias provides predictive value beyond explicit measures, and that this pattern remains when adjusting for a set of county-level covariates and region indicators.
The key evidence is a regression framework in which the outcome is a county-level measure of same-gender couple prevalence (and/or counts), with implicit and explicit bias entered jointly, and then a “full” specification that adds a covariate set (policy environment, religion, education, income, political orientation, rurality, and census region). They interpret the implicit coefficient as the stronger (or uniquely informative) predictor in the adjusted model.
They motivate covariates as adjustments for structural and cultural factors that could correlate with both attitudes and residential patterns of same-gender couples. They treat the adjusted model as closer to the causal quantity of interest: the association between bias and couple locations net of these background county characteristics.
What do IATs Measure?
Researchers disagree about what IAT scores mean. The early interpretation was that IATs capture evaluative associations that are at least partly outside conscious awareness. Low correlations between IAT scores and self-reported attitudes were often taken as support for this view. This interpretation remains common, but a growing literature challenges it.
At the individual level, IAT scores correlate only modestly with other indirect measures and with behavior, suggesting that a substantial share of variance reflects random noise and systematic method variance rather than unique, construct-valid “implicit” content. One alternative view is that IATs are indirect, error-prone measures of largely the same evaluative attitudes people can report in questionnaires, with differences between methods driven in part by measurement artifacts rather than distinct underlying constructs.
Snyder and Henry (2023) adopt a related but distinct argument at the regional level. They propose that aggregation of individual IAT scores to the county level reduces random error and yields a more reliable measure of the local “implicit climate,” which can then predict county-level outcomes. This logic is reasonable as far as reliability is concerned. However, improved reliability is not the same as improved discriminant validity.
Aggregation reduces random noise, but it also stabilizes systematic components of the measure that may vary across counties (e.g., platform- and sampling-related artifacts, regional differences in who takes the test, and other method-specific influences). The same concern applies to self-reports. Social desirability and self-presentation may differ across regions, which could attenuate the implicit–explicit correlation even if the two methods track a single underlying construct.
In the present data, the county-level correlation between the IAT measure and a single-item explicit measure is reported as r = .60. This is substantial shared variance, but it still leaves considerable unique variance in each measure. With only two methods, it is difficult to draw strong conclusions about what that unique variance represents. In particular, it is possible for two imperfect measures of the same construct to show different “unique” predictive power in regression models when systematic measurement error and correlated predictors are present. Conversely, if one measure fails to predict the outcome across reasonable model specifications, that would cast doubt on claims that it contains unique valid information about the criterion.
A further question is why one measure would be a stronger predictor than the other. One possibility is validity: the more valid measure should predict relevant outcomes more strongly. Another possibility is model dependence: when two predictors are highly correlated and both contain measurement error, small analytic choices (model form, covariate set, weighting) can shift the apparent “winner” without reflecting a stable underlying difference.
It is sometimes argued that indirect measures should outperform self-report on socially sensitive topics because self-reports are suppressed by social desirability. Yet evidence for this claim is mixed; in many anonymous contexts, people report socially sensitive attitudes with little apparent inhibition, and self-reports often predict behavior at least as well as IAT scores at the individual level. The key point for the present analysis is that differential predictive power does not, by itself, establish that IAT variance is uniquely “implicit.”
The paper’s central result for the “implicit beats explicit” claim appears in Table 3, Column 3. In that specification, the implicit measure shows a stronger negative association with the outcome than the explicit measure when both are entered together and additional county covariates are included. The authors interpret this as evidence that the aggregated IAT captures something distinct and more predictive than the explicit measure.
However, the corresponding zero-order correlations in Table 1 are comparatively balanced: the implicit and explicit measures show similar correlations with the outcome. This suggests that the divergence in Table 3 is driven by the particular multivariable specification—especially the inclusion of several covariates that are themselves strongly related to both attitudes and the outcome (e.g., political conservatism, rurality, and religiosity).
One way to address sensitivity to analytic choices is to provide a strong theoretical rationale for a specific model and, ideally, preregister it. Another is to examine robustness across a transparent set of reasonable alternatives. To that end, I conducted a multiverse analysis (MVA) that focuses on the robustness of the “implicit vs explicit” contrast.
The analysis acknowledges that the outcome is highly skewed and that the authors used a weighting scheme based on the number of IAT observations per county. Accordingly, models were estimated using (a) the raw outcome (as in the paper), (b) a log-transformed outcome, and (c) a count-model approach. Each model was estimated with and without weights. Finally, models were estimated with no covariates, with individual covariates, and with the full covariate set used in Table 3. This produced 42 specifications. For each specification, I computed and tested the difference between the implicit and explicit coefficients.
The results indicate substantial model dependence. Only one specification produced a statistically significant “implicit > explicit” contrast—namely the specification corresponding to Table 3, Column 3. In contrast, 13 specifications produced a statistically significant advantage for the explicit measure over the implicit measure, and the remaining specifications were non-significant. In other words, the published pattern is atypical in the multiverse: modest changes to modeling decisions (e.g., outcome transformation or omitting weights) eliminate the reported contrast, and in a nontrivial subset of specifications the sign of the contrast reverses.
These findings illustrate the value of robustness checks in complex observational analyses. The Open Data badge made it possible to evaluate the sensitivity of the headline claim to reasonable analytic choices. The key conclusion from the MVA is not that the focal association is absent, but that the specific inference that “implicit bias is a stronger predictor than explicit bias” is not robust to alternative, defensible specifications.
At minimum, the results warrant a narrower interpretation: the data show that both regional measures relate to the outcome, but the direction and significance of the implicit–explicit contrast depends strongly on modeling decisions. A cautious reading is therefore that the evidence does not uniquely support the claim that the IAT measures a distinct “implicit” construct that outperforms explicit self-report at the county level.