No Discriminant Validity of Implicit and Explicit Prejudice Measures

Abstract

I reexamine Cunningham, Preacher, and Banaji’s claim that explicit and implicit attitude measures have discriminant validity. Contrary to their claim, a single factor model fits the data better than their hierarchical model with an explicit and an implicit attitude factor. I also show that attitudes over the two-month period were stable and not influenced by contextual factors. There is also no evidence that different implicit measures tap different types of unconscious bias. All measures have low validity as measures of prejudice. I conclude that the concept of unconscious or implicit prejudice lacks empirical support because implicit measures do not show discriminant validity from explicit measures.

Keywords:  Prejudice, Attitudes, Multi-Method, Discriminant Validity, Structural Equation Modeling

No Discriminant Validity of Implicit and Explicit Prejudice Measures

An article in Psychological Science (Cunningham, Preacher, & Banaji, 2001) reported the results of a longitudinal multi-method study of prejudice; that is, attitudes towards African Americans.  The article is frequently cited (446 citations in total, 30 citations in 2018 on January 31 in WebofScience) as evidence that explicit and implicit measures of prejudice measure two different constructs.  Explicit measures are assumed to assess consciously accessible and controllable attitudes, whereas implicit measures are assumed to assess uncontrollable aspects of attitudes that may exist outside of conscious awareness. Although the article was published nearly 20 years ago, it remains “the most sophisticated examination of measurement error and the interrelations among various implicit measures” (Fazio & Olson, 2003).  Thus, it provides the single most important empirical evidence for the construct validity of implicit measures of prejudice. Without evidence for discriminant validity, implicit measures might simply be implicit measures of the same construct that is measured by means of self-report measures. Although implicit measures have many advantages over self-report measures, this view suggests that there is no need for a theoretical distinction between explicit and implicit forms of racial bias.

In this article, I reexamine Cunningham’s structural equation model that was used to support the claim that “the two kinds of attitude measures also tap unique sources of variance (Cunningham et al., 2001); a single-factor solution does not fit the data” (p. 170).  To be blunt, I will show that this claim is false.  A single factor model actually does fit the data better than the model reported in the original article. Second, I use the data to examine the contribution of stable traits and situational factors to measures of racial bias.  These results shed new light on the controversial question about the context sensitivity of implicit attitude measures.  Some experimental studies suggest that implicit measures are sensitive to situational factors (Dasgupta). However, effect sizes in these small studies tend to be inflated. A large replication study with thousands of participants found only an effect size of d = .08, suggesting that implicit measures reflect mostly stable individual differences in prejudice and measurement error (Joy-Gaba & Nosek, 2011).

Description of the Design and Measures

Participants were 93 students with complete data. Each student completed a single explicit measure of prejudice, the Modern Racism Scale (McConahay, 1986), and three implicit measures: (a) the standard race IAT (Greenwald, McGhee, & Schwartz, 1998), a response window IAT (Cunningham et al., 2001), and a response window evaluative priming task (Fazio, Sanbonmatsu, Powell, & Kardes, 1986). The assessment was repeated on four occasions two weeks apart.

Reproducing the Original Model

Although it was not common to publish original data in 2001, structural equation modeling does not require access to the original data.  It is possible to reproduce or test alternative models simply based on the correlations and standard deviations.  Fortunately, Cunningham et al. (2001) published this information and I was able to reproduce their model, using MPLUS8.2. Figure 1 shows the parameter estimates. They close correspond to the original results.  The original article reported good model fit, “chi2(100, N = 93) = 111.58, p = .20; NNFI = .96; CFI = .97; RMSEA = 0.041 (90% confidence interval: 0.00, 0.071)” (p. 168).  The model fit for the reproduced model was very similar, chi2(100, N = 93) = 112, CFI = .977, RMSEA = 0.036, 90%CI = .000 to .067.  Thus, the model fit of the reproduced model serves as a comparison standard for the alternative models that I examined next.

Figure 1. Original Model with reproduced parameter estimates based on the published correlations and standard deviations.

Bi-Factor Model

The original model is a hierarchical model with an implicit attitude factor as a second-order factor, and method-specific first order factors. Each first-order factor has four indicators for the four measurement occasions. A hierarchical model imposes constraint on the first order loadings because they contribute to the first-order relations among indicators of the same method and to the second order relations of different implicit methods to each other. An alternative way to model multi-method data are bi-factor models (Chen, West, & Sousa, 2006).

A bifactor model allows for all measures to be directly related to the general trait factor that corresponds to the second-order factor in a hierarchical model.  However, bi-factor models may not be identified if there are no method factors. Thus, a first step is to allow for method-specific correlated residuals and to examine whether these correlations are positive.

The model with a single factor and method-specific residual correlations fit the data better than the hierarchical model, chi2(80, N = 93) = 87, CFI = .988, RMSEA = 0.029, 90%CI = .000 to .065.  Inspection of the residual correlations showed high correlations for the Modern Racism scale, but less evidence for method-specific variance for the implicit measures.  The response window IAT had no significant residual correlations.  This explains the high factor loading of the respond window IAT in the hierarchical model.  It does not suggest that this is the most valid measure. Rather, it shows that there is little method specific variance. Fixing these residual correlations to zero, improved model fit, chi2(86, N = 93) = 91, CFI = .991, RMSEA = 0.025, 90%CI = .000 to .062. I then tried to create method factors for the remaining methods. For the IAT, a method factor could also be created using the first three occasions because the forth occasion did not load on the method factor.  However, model fit decreased unless occasion 2 was allowed to correlate with occasion 4.  This unexpected finding is unlikely to reflect a real relationship.  Thus, I retained the model with a method factor for the first three occasions only, chi2(89, N = 93) = 97, CFI = .986, RMSEA = 0.029, 90%CI = .000 to .064.  I was able to fit a method factor for evaluative priming, but model fit decreased, x2(91, N = 93) = 101, CFI = .983, RMSEA = 0.033, 90%CI = .000 to .065. The first occasion did not load on the method factor. Model fit could be improved by fixing the loading to zero and by allowing for an additional correlation between occasion 1 and 3, chi2(91, N = 93) = 98, CFI = .988, RMSEA = 0.027, 90%CI = .000 to .062.  However, there is no rational for this relationship and I retained the more parsimonious model.  Fitting the measurement model for the modern racism scale also decreased fit, but fit was better than for the model in the original article, x2(94, N = 93) = 107, CFI = .977, RMSEA = 0.038, 90%CI = .000 to .068.  This was the final model (Figure 2).

Figure2.Bi-FactorModel


The most important results are the factor loadings of the measures on the trait factor. Factor loadings for the Modern racism scale ranged from .35 to .45 (M = .40). Factor loadings for the standard IAT ranged from .43 to .54 (M = .47). Factor loadings for the response window IAT ranged from .41 to .69 (M = .51).  The evaluative priming measures had the lowest factor loadings ranging from .13 to .47 (M = .29).  In terms of absolute validity, all of these validity coefficients are low, suggesting that a single standard IAT measure on a single occasion has.47^2 = 22% valid variance.  Most important, these results suggest that the Modern Racism Scale and the IAT measure a single construct and that the low correlation between implicit and explicit measures reflects low convergent validity rather than high discriminant validity. 

Context Sensitivity

The model in Figure 2 assumes that prejudice is stable over the two-month period of the study and that there are no systematic changes in prejudice levels. To test this assumption, I tested a model with correlated residuals among measures taken at the same occasion.  Model fit improved, chi2(70, N = 93) = 75, CFI = .991, RMSEA = 0.027, 90%CI = .000 to .066.  However, the pattern of residual correlations did not reveal evidence for state variance.  For time 1, the IAT was correlated with the RW-IAT and evaluative priming, but the latter two were not correlated. In addition, evaluative priming was negatively related to modern racism.  At time 2, none of the correlations were significant, and fixing them to zero improved model fit, chi2(76, N = 93) = 78, CFI = .996, RMSEA = 0.016, 90%CI = .000 to .060.   At time 3, the two IAT measures were negatively correlated, but they correlated positively with the modern racism scale.  Fixing the remaining four correlations to zero improved model fit, x2(74, N = 93) = 78, CFI = .993, RMSEA = 0.023, 90%CI = .000 to .060.  At time 4, there were no significant correlations and constraining the correlations to zero did not alter fit, chi2(76, N = 93) = 81, CFI = .991, RMSEA = 0.026, 90%CI = .000 to .064.  These analyses show that there are no systematic changes in prejudice over the course of the study.

Conclusion

A reexamination of Cunningham et al.’s (2001) multi-measure study of racial attitudes challenges the original conclusion that a single factor model does not fit the data.  In fact, a single factor model fits the data better than the original, hierarchical model.  Moreover, the new model shows that the original article falsely suggested that each measure has stable method variance. A careful analysis of residual correlations showed that only the modern racism scale has substantial and stable method variance on all four occasions. Another finding was that implicit measures on the same occasion did not share variance with each other. This finding suggests that prejudice is a stable disposition, at least over a two-month period, and not a malleable state. This is consistent with weak effects of experimental manipulations on IAT scores (Joy Gaba & Nosek, 2010).  

Factor loadings of the two IAT measures on the prejudice factor were slightly higher than those for the Modern Racism Scale.  This might suggest that implicit measures have slightly higher validity than explicit measures. However, this conclusion is limited to the Modern Racism Scale, which tends to show lower convergent validity with the IAT than more direct prejudice measures (Axt, 2018). In addition, the evaluative priming task had lower validity. Thus, validity has to be evaluated for each measure and it is impossible to make general statements about higher or lower validity of implicit versus explicit measures.

The main practical implication of this new look at old data is that claims about implicit racial bias as a distinct form of prejudice is not supported by scientific evidence. Although implicit measures are less susceptible to socially desirable responding, they do not necessarily assess some unconscious form of prejudice.  This is not a criticism of implicit measures like the Implicit Association Test.  The ability to measure prejudice without self-reports is extremely valuable for prejudice researchers. Given the low validity of a single IAT it should not be used for assessment of individuals. However, measurement error is reduced in comparisons of groups of participants and the IAT can reveal important group differences in prejudice levels. However, proponents of the IAT have argued that the IAT also measures some hidden form of prejudice that is not accessible to introspection (Kurdi et al., 2018). This claim requires demonstration of discriminant validity (Campbell & Fiske, 1959), and evidence of discriminant validity is lacking. Evidence for the unique predictive validity of the IAT is also controversial (Kurdi et al., 2018). A meta-analysis suggests that about 1% of the variance in criterion variables is explained by IAT scores. However, the authors also note that most studies were severely underpowered to detect such small effect sizes. Moreover, even unique predictive variance in mono-method studies does not demonstrate that the IAT measures a different construct. I therefore urge prejudice researchers to conduct high-powered multi-method studies to examine the discriminant and predictive validity of implicit prejudice measures.

References

Axt, J. R. (2018). The Best Way to Measure Explicit Racial Attitudes Is to Ask About Them. Social Psychological and Personality Science, 9, 896-906. https://doi.org/10.1177/1948550617728995

Chen, F., West, S.G., & Sousa, K.H. (2006) A Comparison of Bifactor and Second-Order Models of Quality of Life, Multivariate Behavioral Research, 41:2, 189-225,
DOI: 10.1207/s15327906mbr4102_5

Cunningham, W. A., Preacher, K. J., & Banaji, M. R. (2001). Implicit attitude measures: Consistency, stability, and convergent validity. Psychological Science, 12, 163-170. http://dx.doi.org/10.1111/1467-9280.00328

Dasgupta, N., & Greenwald, A. G. (2001). On the malleability of automatic attitudes: Combating automatic prejudice with images of admired and disliked individuals. Journal of Personality and Social Psychology, 81, 800–814. doi:10.1037/0022-3514.81.5.800

Fazio, R.H., Sanbonmatsu, D.M., Powell, M.C., & Kardes, F.R. (1986). On the automatic activation of attitudes. Journal of Personality and Social Psychology, 50, 229–238.

Joy-Gaba, J. A., & Nosek, B. A. (2010). The surprisingly limited malleability of implicit racial evaluations. Social Psychology, 41, 137–146. doi:10.1027/1864-9335/a000020

Greenwald, A.G., McGhee, D.E., & Schwartz, J.L.K. (1998). Measuring individual differences in implicit cognition: The Implicit Association Test. Journal of Personality and Social Psychology, 74, 1464–1480.

Kurdi, B., Seitchik, A. E., Axt, J. R., Carroll, T. J., Karapetyan, A., Kaushik, N., . . . Banaji, M. R. (2018). Relationship between the Implicit Association Test and intergroup behavior: A meta-analysis. American Psychologist. Advance online publication. http://dx.doi.org/10.1037/amp0000364

McConahay, J.B. (1986). Modern racism, ambivalence, and the modern racism scale. In J.F. Dovidio & S.L. Gaertner (Eds.), Prejudice, discrimination, and racism (pp. 91–125). Orlando, FL: Academic Press

2 thoughts on “No Discriminant Validity of Implicit and Explicit Prejudice Measures

Leave a ReplyCancel reply