The Implicit Association Test at Age 21: No Evidence for Construct Validity

PREPRINT (UNDER REVIEW)

Abstract

The Implicit Association Test (IAT) is 21 years old. Greenwald et al. (1998) proposed that the IAT measures individual differences in implicit social cognition.  This claim requires evidence of construct validity. I review the evidence and show that there is insufficient evidence for this claim.  Most important, I show that few studies were able to test discriminant validity of the IAT as a measure of implicit personality characteristics and that a single-construct model fits multi-method data as well or better than a dual-construct models.  Thus, the IAT appears to be a measure of the same personality characteristics that are measured with explicit measures. I also show that the validity of the IAT varies across personality characteristics. It has low validity as a measure of self-esteem, moderate validity as a measure of racial bias, and high validity as a measure of political orientation.  The existing evidence also suggests that the IAT measures stable characteristics rather than states and has low predictive validity of single behaviors. Based on these findings, it is important that users of the IAT clearly distinguish between implicit measures and implicit constructs. The IAT is an implicit measure, but there is no evidence that it measures implicit constructs.

Keywords:  Personality, Individual Differences, Social Cognition, Measurement, Construct Validity, Convergent Validity, Discriminant Validity, Structural Equation Modeling

The Implicit Association Test at Age 21: No Evidence for Construct Validity

Twenty-one years ago, Greenwald, McGree, and Schwartz (1998) published one of the most influential articles in personality and social psychology.  It is already the 4th most cited article (4582 citations in Web of Science) in the Journal of Personality and Social Psychology and will be number 3 this year. As the title “Measuring Individual Differences in Social Cognition” suggests, the article introduced a new individual difference measure that has been used in hundreds of studies to measure attitudes, stereotypes, self-concepts, well-being, and personality traits. Henceforth, I will refer to these constructs as personality characteristics.

A Critical Evaluation of Greenwald’s (1998) Evidence for Discriminant Validity

The Implicit Association Test (IAT) uses reaction times in classification tasks to measure individual differences in the strength of associations (Nosek et al., 2007).  However, the main purpose of the IAT is not to measure associations or to provide an indirect measure of personality characteristics.  The key constructs that the IAT was designed to measure are individual differences in implicit personality characteristics as suggested in the title of Greenwald et al.’s (1998) seminal article “Measuring Individual Differences in Implicit Cognition.” 

The notion of implicit cognition is based on a conception of human information processing that largely takes place outside of consciousness, and the IAT was supposed to provide a window into the unconscious. “There has been an increased interest in measuring aspects of thinking and feeling that may not be easily accessed or available to consciousness. Innovations in measurement have been undertaken with the purpose of bringing under scrutiny new forms of cognition and emotion that were previously undiscovered” (Nosek, Greenwald, & Banaji, 2007, p. 265). 

Thus, the IAT was not just a new way of measuring the same individual differences that were already measured with self-report measures.  It was designed to measure information that is “simply unreachable, in the same way that memories are sometimes unreachable [by introspection]” (Nosek et al., 2007, p. 266).

The promise to measure individual differences that were not accessible to introspection explains the appeal of the IAT, and many articles used the IAT to make claims about individual differences in implicit forms of self-esteem, prejudice, or craving for drugs. Thus, the hypothesis that the IAT measures something different from self-report measures is a fundamental feature of the construct validity of the IAT. In psychometrics, the science of test validation, this property of a measure is known as discriminant validity (Campbell & Fiske, 1959).  If the IAT is a measure of implicit individual differences that are different from explicit individual differences, the IAT should demonstrate discriminant validity from self-report measures.  Given the popularity of the IAT, one might expect ample evidence for the discriminant validity of the IAT.  However, due to methodological limitations this is actually not the case.

Confusion about Convergent and Discriminant Validity

Greenwald et al.’s seminal article promised a measure of individual differences, but failed to provide evidence for the convergent or discriminant validity of the IAT.  Study 1 with N = 32 participants showed that, on average, participants preferred flowers to insects and musical instruments to weapons. These average tendencies cannot be used to validate the IAT as a measure of individual differences. However, Greenwald et al. (1998) also reported correlations across N = 32 participants between the IAT and explicit measures.  These correlations were low.  Greenwald et al. (1998) suggest that this finding provides evidence of discriminant validity. “This conceptual divergence between the implicit and explicit measures is of course expected from theorization about implicit social cognition” (p. 1470).  However, these low correlations are uninformative because discriminant validity requires a multi-method approach.  As the IAT was the only implicit measure, low correlations with explicit measures may simply show that the IAT has low validity as a measure of individual differences.  

Experiment 2 used the IAT with 17 Korean and 15 Japanese American students to assess their attitudes towards Koreans vs. Japanese.  In this study, Greenwald et al. found “unexpectedly the feeling thermometer explicit rating was more highly correlated with the IAT measure (average r = .59) than it was with another explicit attitude measure, the semantic differential (r = .43)” (p. 1473). This finding actually contradicts the hypothesis that the IAT measures some construct that is not measured with self-ratings because discriminant validity implies higher same-method than cross-method correlations (Campbell & Fiske, 1959).

Study 3 introduced the race IAT to measure prejudice with the IAT with a sample of 26 participants.  In this small sample, IAT scores were only weakly and not significantly correlated with explicit measures.  The authors realize that this finding is open to multiple interpretations. “Although these correlations provide no evidence for convergent validity of the IAT, nevertheless because of the expectation that implicit and explicit measures of attitude are not necessarily correlated-neither do they damage the case for construct validity of the IAT” (p. 1476).  In other words, the low correlations might reflect discriminant validity, but it could also show low convergent validity if the IAT and explicit measures measure the same construct.

The discussion has a section on “Discriminant Validity of IAT Attitude Measures,” although the design of the studies makes it impossible to provide evidence for discriminant validity. Nevertheless, Greenwald et al. (1998) claimed that they provided evidence for the discriminant validity of the IAT as a measure of implicit cognitions. “It is clear that these implicit-explicit correlations should be taken not as evidence for convergence among different methods of measuring attitudes but as evidence for divergence of the constructs represented by implicit versus explicit attitude measures” (p. 1477).   The scientific interpretation of these correlations is that they provide no empirical evidence about the validity of the IAT because multiple measures of a single construct are needed to examine construct validity (Campbell & Fiske, 1959). Thus, unlike most articles that introduce a new measure of individual differences, Greenwald et al. (1998) did not examine the psychometric properties of the IAT.  In this article, I examine whether evidence gathered over the past 21 years has provided evidence of construct validity of the IAT as a measure of implicit personality characteristics.

First Problems for the Construct Validity of the IAT

The IAT was not the first implicit measure in social psychology. Several different measures had been developed to measure self-esteem with implicit measures. A team of personality psychologists conducted the first multi-method validation study of the IAT as a measure of implicit self-esteem (Bosson, Swan, & Pennebaker, 2000).  The main finding in this study was that several implicit measures, including the IAT, had low convergent validity.  However, this finding has been largely ignored and researchers started using the self-esteem IAT as a measure of some implicit form of self-esteem that operates outside of conscious awareness (Greenwald & Farnham, 2000).

At the same time, attitude researchers also found weak correlations between the race IAT and other implicit measures of prejudice. However, this lack of convergent validity was also ignored.  An influential review article by Fazio and Olson (2003) suggested that low correlations might be due to different mechanisms. While it is entirely possible that evaluative priming and the IAT have different mechanisms, it is not relevant for the ability of either measure to be a valid measure of personality characteristics. Explicit ratings probably also rely on a different mechanism as the IAT.  The mechanics of measurement have to be separated from the constructs that the measures aim to measure.

Continued Confusion about Discriminant Validity

Nosek et al. (2007) examined evidence for the construct validity of the IAT at age 7.  The section on convergent and discriminant validity lists a few studies as evidence for discriminant validity.  However, closer inspection of these studies show that they suffer from the same methodological limitation as Greenwald et al.’s (1998) seminal study.  That is, constructs were assessed with a single implicit method; the IAT.  Thus, it was impossible to examine construct validity of the IAT as a measure of implicit personality characteristics.

Take Nosek and Smyth’s (2007) “A Multi-trait-multi-method validation of the Implicit Association Test” as an example. The title clearly alludes to Campbell and Fiske’s approach to construct validation.  The data were 7 explicit ratings and 7 IATs of 7 attitude pairs (e.g., flower vs. insect).  The authors fitted several structural equation models to the data and claimed that a model with separate, yet correlated, explicit and implicit factors fitted the data better than a model with a single factor for each attitude pair.  This claim is invalid because each attitude pair was assessed with a single IAT and parcels were used to correct for unreliability.  This measurement model assumes that all of the reliable variance in an IAT that is not shared with explicit ratings or with IATs of other attitudes reflects implicit individual differences. However, it is also possible that this variance reflects systematic measurement error that is unique to a specific IAT.  A proper multi-method approach requires multiple independent measures of the same construct.   As demonstrated with real multi-method data below, there is consistent evidence that the IAT has systematic method variance that is unique to a specific IAT. 

Nevertheless, Nosek and Smyth’s (2007) multi-attitude study provided some interesting information. The correlation of the 7 means of the IAT and the 7 means of the explicit ratings was r = .86. For example, implicit and explicit measures showed a preference for flowers over insects and a dislike of evolution versus creation.  If implicit measures reflect distinct, unconscious processes, it is not clear why the means correspond to those based on self-reports. However, this finding is easily explained by a single-attitude model, where the mean structure depends on the mean structure of the latent attitude variable.

In sum, Nosek et al.’s claim that the IAT has demonstrated discriminant validity is based on a misunderstanding of Campbell and Fiske’s (1959) approach to construct validation. A proper assessment of construct validity requires demonstration of convergent validity before it is possible to demonstrate discriminant validity, and to demonstrate convergent validity it is necessary to use multiple independent measures of the same construct.  Thus, to demonstrate construct validity of the IAT as a measure of implicit personality characteristics requires multiple independent implicit measures.

First Evidence of Discriminant Validity in a Multi-Method Study

Cunningham, Preacher, and Banaji (2001) reported the results of the first multi-method study of prejudice. Participants were 93 students with complete data. Each student completed a single explicit measure of prejudice, the Modern Racism Scale (McConahay, 1986), and three implicit measures: (a) the standard race IAT (Greenwald et al., 1998), a response window IAT (Cunningham et al., 2001), and a response-window evaluative priming task (Fazio et al., 1986). The assessment was repeated on four occasions two weeks apart.

I used the published correlation matrix to reexamine the claim that a single-factor model does not fit the data. First, I was able to reproduce the model fit of the published dual-attitude model with MPLUS8.2 (original fit: chi2(100, N = 93) = 111.58, p = .20; NNFI = .96; CFI = .97; RMSEA = 0.041 (90% confidence interval: 0.00, 0.071); reproduced fit: chi2(100, N = 93) = 112, CFI = .977, RMSEA = 0.036, 90%CI = .000 to .067.  Thus, the model fit of the reproduced model serves as a comparison standard for the alternative models that I examined next.

The original model is a hierarchical model with an implicit attitude factor as a second-order factor, and method-specific first-order factors. Each first-order factor has four indicators for four repeated measurements with the same method.  This model imposes constraint on the first order loadings because they contribute to the first-order relations among indicators of the same method and to the second order relations of different implicit methods to each other.

An alternative way to model multi-method data are bi-factor models (Chen, West, & Sousa, 2006). A bifactor model allows for all measures to be directly related to the general trait factor that corresponds to the second-order factor in a hierarchical model.  However, bi-factor models may not be identified if there are no method factors. Thus, a first step is to allow for method-specific correlated residuals and to examine whether these correlations are positive.

The model with a single factor and method-specific residual correlations fit the data better than the hierarchical model, chi2(80, N = 93) = 87, CFI = .988, RMSEA = 0.029, 90%CI = .000 to .065.  Inspection of the residual correlations showed high correlations for the Modern Racism scale, but less evidence for method-specific variance for the implicit measures.  The response window IAT had no significant residual correlations.  This explains the high factor loading of the respond window IAT in the hierarchical model.  It does not suggest that this is the most valid measure. Rather, it shows that there is little method specific variance. Fixing these residual correlations to zero, improved model fit, chi2(86, N = 93) = 91, CFI = .991, RMSEA = 0.025, 90%CI = .000 to .062. I then tried to create method factors for the remaining methods. For the IAT, a method factor could only be created for the first three occasions. However, model fit for this model decreased unless occasion 2 was allowed to correlate with occasion 4.  This unexpected finding is unlikely to reflect a real relationship.  Thus, I retained the model with a method factor for the first three occasions only, chi2(89, N = 93) = 97, CFI = .986, RMSEA = 0.029, 90%CI = .000 to .064.  I was able to fit a method factor for evaluative priming, but model fit decreased, chi2(91, N = 93) = 101, CFI = .983, RMSEA = 0.033, 90%CI = .000 to .065, and the first occasion did not load on the method factor. Model fit could be improved by fixing the loading to zero and by allowing for an additional correlation between occasion 1 and 3, chi2(91, N = 93) = 98, CFI = .988, RMSEA = 0.027, 90%CI = .000 to .062.  However, there is no rational for this relationship and I retained the more parsimonious model.  Fitting the measurement model for the modern racism scale also decreased fit, but fit was better than for the model in the original article, chi2(94, N = 93) = 107, CFI = .977, RMSEA = 0.038, 90%CI = .000 to .068.  This was the final model (Figure 1).

The most important results are the factor loadings of the measures on the trait factor. Factor loadings for the Modern racism scale ranged from .35 to .45 (M = .40). Factor loadings for the standard IAT ranged from .43 to .54 (M = .47). Factor loadings for the response window IAT ranged from .41 to .69 (M = .51).  The evaluative priming measures had the lowest factor loadings ranging from .13 to .47 (M = .29).  Thus, there is no evidence that implicit measures are more strongly related to each other than to explicit measures, as stated in the original article.

In terms of absolute validity, all of these validity coefficients are low, suggesting that a single standard IAT measure on a single occasion has .47^2 = 22% valid variance.  Most important, these results suggest that the Modern Racism Scale and the IAT measure a single construct and that the low correlation between implicit and explicit measures reflects low convergent validity rather than high discriminant validity. 

In conclusion, a reexamination of Cunningham et al.’s data shows that the data do not provide evidence of discriminant validity and that the IAT may simply be an alternative measure of the same construct that is being measured with explicit measures like the Modern Racism Scale. Thus, the study provides no evidence for the construct validity of the IAT as a measure of implicit individual differences in race attitudes.

Meta-Analysis of Implicit – Explicit Correlations

Hofmann, Gawronski, Geschwendner, and Le (2005) conducted a meta-analysis of 126 studies that had reported correlations between an IAT and an explicit measure of the same construct. Notably, over one-hundred studies had been conducted without using multiple-implicit measures. The mono-method approach taken in these studies suggests that authors took construct validity of the IAT for granted, and used the IAT as a measure of implicit constructs.  As a result, these studies provide no test of the construct validity of the IAT.

Nevertheless, the meta-analysis produced an interesting result.  Correlations between implicit and explicit measures varied across personality characteristics.  Correlations were lowest for self-esteem, which is consistent with Bosson et al.’s (2000) finding, and highest for simple attitude objects like consumer products (e.g. Pepsi vs. Coke).  Any theory of implicit attitude measures has to explain this finding.  One explanation could be that explicit measures of self-esteem are less valid than explicit-measures of preferences for consumer goods. However, it is also possible that the validity of the IAT varies.  Once more, a comparison of different personality characteristics with multiple methods is needed to test this competing theories.

Problems with Predictive Validity

Ten years after the IAT was published another problem emerged.  Some critics voiced concerns that the IAT, especially the race IAT, lacks predictive validity (Blanton, Jaccard, Klick, Mellers, Mitchell, & Tetlock (2009).  To examine the predictive validity of the IAT, Greenwald and colleagues (2009) published a meta-analysis of IAT-criterion correlations. The key finding was that “for 32 samples with criterion measures involving Black–White interracial behavior, predictive validity of IAT measures significantly exceeded that of self-report measures” (p. 17).  Specifically, the authors reported a correlation of r = .24 for the IAT and a criterion and a correlation of r = .12 for an explicit measure and a criterion, and that these correlations were significantly different from each other.  A few years later, Oswald, Mitchell, Blanton, Jaccard, and Tetlock (2013) published a critical reexamination of the literature and reported different results. “IATs were poor predictors of every criterion category other than brain activity, and the IATs performed no better than simple explicit measures” (p. 171).  The only exception were fMRI studies with extremely small samples that produced extremely large correlations, often exceeding the reliability of the IAT.  It is well known that these correlations are inflated and difficult to replicate (Vul, Harris, Winkielman, & Hashler, 2009).  Moreover, correlations with neural activity are not evidence that IAT scores predict behavior.

More recently, Greenwald and colleagues published a new meta-analysis (Kurdi et al., 2018). This meta-analysis produced weaker criterion correlations than the previous meta-analysis.  The median IAT-criterion correlation was r = .050.  This is also true if the analysis is limited to studies with the race IAT.  After correcting for random measurement error, the authors report on average correlation of r = .14.  However, correction for unreliability yields hypothetical correlations that could be obtained if the IAT were perfectly reliable, which it is not. Thus, for the practical evaluation of the IAT as a measure of individual differences, it is more important how much the actual IAT scores can predict some validation criterion.  With small IAT-criterion correlations around r = .1, large samples would be required to have sufficient power to detect effects, especially incremental effects above and beyond explicit measures. Given that most studies had sample sizes of less than 100 participants, “most studies were vastly underpowered” (Kurdi et al., 2018, p. 1). Thus, it is now clear that IAT scores have low predictive validity, but it is not clear whether IAT scores have any predictive validity, when they have predictive validity, and whether they have predictive validity after controlling for explicit predictors of behavior.

Greenwald et al.’s (2009) 2008 US Election Study

In 2008, a historic event occurred in the United States. US voters had the opportunity to elect the first Black president. Although the outcome is now a historic fact, it was uncertain before the election how much Barak Obama’s racial background would influence White voters.  There was also considerable concern that voters might not reveal their true feelings. This provided a great opportunity to test the validity of implicit measures of racial bias.  If White voters are influenced by racial bias, IAT scores should predict voting intentions above and beyond explicit measures. According to the abstract of the article, the results confirm this prediction. “The implicit race attitude measures (Implicit Association Test and Affect Misattribution Procedure) predicted vote choice independently of the self-report race attitude measures, and also independently of political conservatism and symbolic racism. These findings support construct validity of the implicit measures” (p. 242).

These claims were based on results of multiple regression analyses. “When entered after the self-report measures, the two implicit measures incrementally explained 2.1% of vote intention variance, p=.001, and when political conservativism was also included in the model, “the pair of implicit measures incrementally predicted only 0.6% of voting intention variance, p = .05.”  (p. 247).

I tried to reproduce these results with the published correlation matrix and failed to do so.  A multiple regression analysis with explicit measures, implicit measures, and political orientation as predictors showed non-significant effects for the IAT, b = .002, se = .024, t = .087, p = .930 and the AMP, b = .033, se = .023, t = 1.470, p = .142. I also obtained the raw data from Anthony Greenwald, but I was unable to recreate the sample size of N = 1,057. Instead I obtained a similar sample size of N = 1,035.  Performing the analysis on this sample also produced non-significant results; IAT, b = -.003, se = .044, t = .070, p = .944 and the AMP, b = -.014, se = .042, t = 0.344, p = .731.

To fully explore the relationship among the variables in this valuable dataset, I fitted a structural equation model to the raw data (N = 1,035).  The model had good fit, chi2(9) = 18.27, CFI = .995,  RMSEA = .032 90%CI(.009-.052). As shown in Figure 2, the IAT did not have incremental predictive validity as the residual variance was unrelated to voting. There is also no evidence of discriminant validity because the residuals of the two measures are not correlated. However, the model does show that a ProWhite bias predicts voting above and beyond political orientation.  Thus, the results do support the hypothesis that racial bias influenced voting in the 2008 election.  This bias is reflected in explicit and implicit measures.  Interestingly, the validity coefficients in this study differ from those in Cunningham et al.’s study with undergraduate students.  The factor loadings suggest that the IAT is the most valid measure of racial bias with .59^2 = 36% valid variance as a measure of explicit attitudes. This makes the IAT as valid as the feeling thermometer, which is more valid than the Modern Racism Scale in Cunningham’s study.  This finding has been replicated in subsequent studies (Axt, 2018).   

In conclusion, a reexamination of the 2008 election study shows that the data are entirely consistent with a single-attitude model and that there is no evidence for incremental predictive validity or discriminant validity in these data. However, the study does show some predictive validity of the IAT and convergent validity with explicit measures. Thus, the results provide no construct validity of the IAT as a measure of implicit individual differences, but the results can also be interpreted as evidence for validity as a measure of the same construct that is measured with explicit measures.  This shows that claims about validity vary as a function of the construct that is being measured.  A scale is a good measure of weight, but not of intelligence.  The results here suggest that the race IAT is a moderately valid measure of racial bias, but an invalid measure of implicit bias, which may not even exist because scientific claims about implicit bias require valid measures of implicit bias.

Reexamining a Multi-Trait Multi-Method Study

The most recent and extensive multi-trait multi-method validation study of the IAT was published last year (Bar-Anan & Vianello, 2018).  The abstract claims that the results provide clear support for the validity of the IAT as a measure of implicit cognitions, including implicit self-esteem. “The evidence supports the dual-attitude perspective, bolsters the validation of 6 indirect measures, and clears doubts from countless previous studies that used only one indirect measure to draw conclusions about implicit attitudes” (p. 1264). 

Below I show that these claims are not supported by the data, and that single-attitude models fit the data as well as dual-attitude models. I also show that dual-attitude models show low convergent validity across implicit measures, while IAT variants share method variance because they rely on the same mechanisms to measure attitudes.

Bar-Anan and Vianello (2018) fitted a single model to measures of self-esteem, racial bias, and political orientation. This makes the model extremely complex and produced some questionable results (e.g., the implicit and explicit method factors were highly correlated; some measured had negative loadings on the method factors).  In structural equation modeling, it is good practice to fit smaller models before creating a larger model.  Thus, I first examined construct validity for each domain separately before I fitted a model that integrates models into a single unified model.

Race IAT

I first fitted a dual-attitude model to measures of racial attitudes and included contact as the criterion variable. I did not specify a causal relationship between contact and attitudes because attitudes can influence contact and vice versa.  The dual-attitude model had good fit, chi2(48) = 109.41; CFI = .975; RMSEA = 0.010 (90% confidence interval: 0.007, 0.012).  The best indicator of the explicit factor was the preference rating (Figure 3).  The best indicator of the implicit factor was the BIAT.  However, all IAT-variants had moderate to high loadings on the implicit factor. In contrast, the evaluative priming measure had a low loading on the implicit factor and the AMP had a moderate loading on the explicit factor and no significant loading on the implicit factor.  These results show that Bar-Anan and Vianello’s model failed to distinguish between IAT-specific method variance and method variance for implicit measures in general. The present results show that IAT-variants share little valid variance or method variance with conceptually distinct implicit measures.

Not surprisingly, a single-attitude model with an IAT method factor (Figure 4) also fit the data well, chi2(46) = 112.04; CFI = .973; RMSEA = 0.010 (90% confidence interval: 0.008, 0.013).  Importantly, the model has no shared method variance between conceptually different explicit measures like preference ratings and the Modern Racism Scale (MRS).  The AMP and the EP both are valid measures of attitudes but with relatively modest validity. The BIAT has a validity of .46, with 22% explained variance. This result is more consistent with Cunningham et al. (2001) than Greenwald et al. (2009) data.  The model also shows a clear relationship between contact and less pro-White bias. Finally, the model shows that the IAT method factor is unrelated to contact. Thus, any relationship between IAT scores and contact is explained by the shared variance with explicit measures.

These results show that Bar-Anan and Vianello’s (2018) conclusion are not supported by the data. Although a dual-attitude model can be fitted to the data, it shows low convergent validity across different implicit measures, and a single-attitude model fits the data as well as a dual-attitude model.

Political Orientation

Figure 5 shows the dual-attitude model for political orientation.  The explicit factor is defined by a simple rating of preference for republicans versus democrats, the modern racism scale, the right-wing-authoritarianism scale, and ratings of Hillary Clinton.  The implicit factor is defined by the IAT, the brief IAT, the Go-NoGo Task, and single category IATs.  The remaining two implicit measures, the Affect Misattribution Task, and Evaluative Priming are allowed to load on both factors.  Voting in the previous election is predicted by explicit attitudes.  The model has good fit to the data, chi2(48) = 99.34; CFI = .991; RMSEA = 0.009 (90% confidence interval: 0.006, 0.011).  The loading pattern shows that the AMP and EP load on the implicit factor.  This supports the hypothesis that all implicit measures have convergent validity. However, the loadings for the IATs are much higher. In the dual-attitude framework this would imply that the IAT is a much more valid measure of implicit attitudes than the AMP or EP.  Evidence for discriminant validity is weak. The correlation between the explicit and the implicit factor is r = .89.  The correlation in the original article was r = .91.  Nevertheless, the authors concluded that the data favor the two-factor model because constraining the correlation to 1 reduced model fit.

However, it is possible to fit a single-construct model by allowing for an IAT-variant method factor, chi2(50) = 86.25; CFI = .993; RMSEA = 0.007 (90% confidence interval: 0.005, 0.010).  This model (Figure 6) shows that voting is predicted by a single latent factor that represents political orientation and that simple self-report measures of political orientation are the most valid measure of political orientation.  The IAT shows stronger correlations with explicit measures because it is a more valid measure of political orientation,  .74^2 = 55% valid variance, than the race IAT (22% valid variance).   

Self-Esteem

Figure 7 shows the results for a dual-attitude model of self-esteem.  Model fit was good, although CFI was lower than in the previous model due to weaker factor loadings, chi2(16) = 28.62; CFI = .950; RMSEA = 0.008 (90% confidence interval: 0.003, 0.013).  The model showed a moderate correlation between the explicit and implicit factors, r = .46, which is stronger than in the original article, r = .29, but clearly suggestive of two distinct factors. However, the nature of these two factors is less clear. The implicit factor is defined by the three IAT measures, whereas the AMP and EP have very low loadings on this factor.  This is also true in the original article with loadings of .24 for AMP and .13 for EP.  Thus, the results confirm Bosson’s seminal finding that different implicit measures have low convergent validity. 


As the Implicit Factor was mostly defined by the IAT measures, it was also possible to fit a single-factor model mode with an IAT measurement factor (Figure 8), chi2(16) = 31.50; CFI = .938; RMSEA = 0.009 (90% confidence interval: 0.004, 0.013). However, some of the results of this model are surprising.

According to this model, the validity coefficient of the widely used Rosenberg self-esteem scale is only r = .35, suggesting that only 12% of the variance in the Rosenberg self-esteem scale is valid variance. In addition, the IAT and the BIAT would be equally valid measures of self-esteem.  Thus, previous results of low implicit-explicit correlations for self-esteem (Bosson et al., 2000; Hofmann et al., 2005) would imply low validity of implicit and explicit measures.  This finding would have dramatic implications for the interpretation of low self-esteem-criterion correlations.  A valid self-esteem-criterion correlation of r = .3, would produce only an observed correlation of r = .30*.35 = .11 with the Rosenberg self-esteem scale or the IAT.  Correlations of this magnitude require large samples (N = 782) to have an 80% probability to obtain a significant result with alpha = .05 or N = 1,325 with alpha = .005.  Thus, most studies that tried to predict performance criteria form self-esteem were underpowered.  However, the results of this study are limited by the use of an online sample and the lack of proper criterion variables to examine predictive validity.  The main conclusion from this analysis is that a single-factor model with an IAT method factor fit the data well and that the dual attitude model failed to demonstrate convergent validity across different implicit measures; a finding that replicates Bosson et al. (2000), which Bar-Anan and Vianello do not cite.

A Unified Model

After establishing well-fitting models for each personality characteristic, it is possible to fit a unified model. Importantly, no changes to the individual models should be made because a decrease in fit can be attributed to the new relationships across different personality characteristics.  Without any additional modifications, the overall model in Figure 9 had good fit,  XX.  Correlations among the IAT method factors showed significant positive correlations of the method factor for race with the method factor for self-esteem (r = .4) and political orientation (r = .2), but a negative correlation for the method factors for self-esteem and political orientation (r = -.3).  This pattern of correlations is inconsistent with a simple method factor that is expected to produce positive correlations. Thus, it is impossible to fit a general method factor to different IATs. This finding replicates Nosek and Smyth’s (2007) findings.

Correlations among the personality characteristics replicate the finding with Greenwald et al.’s (2009) data that Republicans are more likely to have a pro-white bias, r = .4.  Political orientation is unrelated to self-esteem, r = .0, but Pro-White bias tends to be positively related to self-esteem, r = .2.  

In conclusion, the present results show that Bar-Anan and Vianello’s claims are not supported by the data.  Their data do not provide clear evidence for discriminant validity of implicit and explicit constructs.  The data are fully consistent with the alternative hypothesis that the IAT and other implicit measures measure the same construct that is being measured with implicit factors. Thus, the data provide no support for the construct validity of the IAT as a measure of implicit personality characteristics.

Validity of the Self-Esteem IAT

Bosson et al. (2000) seminal article raised first concerns about the construct validity of the self-esteem IAT. Since then, other critical articles have been published; none of which are cited in Kurdi et al. (2018). Gawronski, LeBel, and Peters (2007) wrote a PoPS article on the construct validity of implicit self-esteem. They fond no conclusive evidence that(a) the self-esteem IAT measures unconscious self-esteem or that (b) low correlations are due to self-report biases in explicit measures of self-esteem. Walker and Schimmack (2008) used informant ratings to examine predictive validity of the self-esteem IAT. Informant ratings are the most widely used validation criterion in personality research, but have not been used by social psychologists. One advantage of informant ratings is that they also measure general personality characteristics rather than specific behaviors, which ensures higher construct-criterion correlations due to the power of aggregation (Epstein, 1980).  Walker and Schimmack (2008) found that informant ratings of well-being were more strongly correlated with explicit self-ratings of well-being than with a happiness or a self-esteem IAT. 

The most recent and extensive review was conducted by Falk and Heine (2014) who found that “the validity evidence for the IAT in measuring ISE [implicit self-esteem] is strikingly weak” (p. 6).  They also point out that implicit measures of self-esteem “show a remarkably consistent lack of predictive validity” (p. 6).  Thus, an unbiased assessment of the evidence is fully consistent with the analyses of Bar-Anan and Vianello’s data that also found low validity of the self-esteem IAT as a measure of self-esteem.

Currently, a study by Falk, Heine, Takemura, Zhang, and Hsu (2013) provides the most comprehensive examination of convergent and discriminant validity of self-esteem measures. I therefore used structural equation modeling of their data to see how consistent the data are with a dual-attitude model or a single-attitude model.  The biggest advantage of the study was the inclusion of informant ratings of self-esteem, which makes it possible to model method-variance in self-ratings (Anusic et al., 2009).  Previous research showed that self-ratings of self-esteem have convergent validity informant ratings of self-esteem (Simms, Zelazny, Yam, & Gros, 2010; Walker & Schimmack, 2008).  I also included the self-report measures of positive affect and negative affect to examine criterion validity.

It was possible to fit a single-factor model to the data (Figure 10), chi2(67) = 115.85; CFI = .964; RMSEA = 0.050 (90% confidence interval: 0.034, 0.065).  Factor loadings show the highest loadings for self-ratings on the self-competence scale and the Rosenberg self-esteem scale. However, informant ratings also had significant loadings on the self-esteem factor, as did self-ratings on the narcissist personality inventory.  A measure of halo bias in self-ratings of personality (SEL) also had moderate loadings, which confirms previous findings that self-esteem is related to evaluative biases in personality ratings (Anusic et al., 2009).  The false uniqueness measure (FU; Falk et al., 2015) had modest validity. In contrast, the implicit measures had no significant loadings on this factor.  In addition, the residual correlations among the implicit measures were weak and not significant. Given the lack of positive relations among implicit measures it was impossible to fit a dual-attitude model to these data.

It is not clear why Bar-Anan and Vianello’s data failed to show higher validity of explicit measures, but the current results are consistent with moderate validity of explicit self-ratings in the personality literature (Simms et al., 2010). Thus, there is consistent evidence that implicit self-esteem measures have low validity as measures of self-esteem and there is no evidence that they are measures of implicit self-esteem.

Explaining Variability in Explicit-Implicit Correlations

One well-established phenomenon in the literature is that correlations between IAT scores and explicit measures vary across domains (Bar-Anan & Vianello, 2018; Hofmann et al., 2005).  As shown earlier, correlations for political orientation are strong, correlations for racial attitudes are moderate, and correlations for self-esteem are weak.  Greenwald and Banaji (2017) offer a dual-attitude explanation for this finding. “The plausible interpretations of the more common pattern of weak implicit– explicit correlations are that (a) implicit and explicit measures tap distinct constructs or (b) they might be affected differently by situational influences in the research situation (cf. Fazio & Towles-Schwen, 1999; Greenwald et al., 2002) or (c) at least one of the measures, plausibly the self-report measure in many of these cases, lacks validity” (p. 868). 

The evidence presented here offers a different explanation.  IAT-explicit correlations and IAT-criterion correlations increase with the validity of the IAT as a measure of the same personality characteristics that are measured with explicit measures.  Thus, low correlations of the self-esteem IAT with explicit measures of self-esteem show low validity of the self-esteem IAT.  High correlations of the political orientation IAT with explicit measures of political orientation show high validity of the IAT as a measure of political orientation; not implicit political orientation.  Finally, modest correlation between the race IAT and explicit measures of racial bias show moderate validity of the race IAT as a measure of racial bias. However, the validity of the race IAT as a measure of racial bias (not implicit racial bias!) varies considerably across studies. This variation may be due to the variability of racial bias in samples which may be lower in student samples.  Thus, contrary to Greenwald and Banaji’s claims, the problem is not with the explicit measures, but with the IAT.

An important question is why the self-esteem IAT is less valid than the political orientation IAT.  I propose that one cause of variation in the validity of the IAT is related to the proportion of respondents on the two ends of a personality characteristic. To test this hypothesis, I used Bar-Anan and Vianello’s data.  To determine the direction of the IAT score, I used a value of 0 as the neutral point.  As predicted, 90% of participants associated self with good, 78% associated White is good, and 69% associated Democrat with good.  Thus, validity decreases with the proportion of participants who are on one side of the bipolar dimension.

Next, I regressed the preference measure on a simple dichotomous predictor that coded the direction of the IAT.  I standardized the preference measure and report standardized and unstandardized regression coefficients.  Standardized regression coefficients are influenced by the distribution of the predictor variable and should show the expected pattern. In contrast, unstandardized coefficients are not sensitive to the proportions and should not show the pattern. I also added the IAT scores as predictors in a second step to examine the incremental predictive validity that is provided by the reaction times.

The standardized coefficients are consistent with predictions (Table 1). However, the unstandardized coefficients also show the same pattern. Thus, other factors also play a role. The amount of incremental explained variance by reaction times shows no differences between the race and the political orientation task.  Most of the differences in validity are due to the direction of the attitude (4% explained variance for race bias vs. 38% explained variance for political orientation).

Table 1

SE       B = .310, SE = .142; b = .093, se = .043; r2 = .009, Δr2 = .002, z = 1.09

Race    B = .467, SE = .010, b = .193, se = .041, r2 = .041, Δr2 = .060, z = 5.79

PO       B = 1.380, SE = .080, b = .637, se = .037, r2 = .380, Δr2 = .070, z = 7.83

The results show the importance of taking the proportion of respondents with opposing personality characteristics into account. The IAT is least valid when most participants are high or low on a personality characteristic, and it is most valid when participants are split into two equally large groups. 

In conclusion, I provided an alternative explanation of variation in explicit-implicit correlations that is consistent with the data.  Implicit-explicit correlations vary at least partially as a function of the validity of the IAT as a measure of the same construct that is measured with explicit measures, and the validity of the IAT varies as a function of the proportion of respondents who are high versus low on a personality characteristic. As most respondents associate the self with good, and reaction times contribute little to the validity of the IAT, the IAT has particularly low validity as a measure of self-esteem.

The Elusive Malleability of Implicit Attitude Measures

Numerous experimental studies have tried to manipulate situational factors in order to change scores on implicit attitude measures (Lai, Hoffman, & Nosek, 2013).  Many of these studies focused on implicit measures of prejudice in order to develop interventions that could reduce prejudice. However, most studies were limited to brief manipulations with immediate assessment of attitudes (Lai et al., 2013).  The results of these studies are mixed.  In a seminal study, Dasgupta and Greenwald (2001) exposed participants to images of admired Black exemplars and disliked White exemplars. They reported that this manipulation had a large effect on IAT scores. However, these days the results of this study are less convincing because it has become apparent that large effect sizes from small samples often do not replicate (Open Science Collaboration, 2015). Consistent with this skepticism, Joy-Gaba and Nosek (2010) had difficulties replicating this effect with much larger samples and found only an average effect size of d = .08.  With effect sizes of this magnitude, other reports of successful experimental manipulations were extremely underpowered.   Another study with large samples found stronger effects (Lai et al., 2016).  The strongest effect was observed for instruction to fake the IAT.  However, Lai et al. also found that none of these manipulations had lasting effects in a follow-up assessment. This finding suggests that even when changes are observed, they reflect context-specific method variance rather than actual changes in the construct that is being measured. 

This conclusion is also supported by one of the few longitudinal IAT studies. Cunningham et al.’s (2001) multi-method study repeated the measurement of racial bias on four separate occasions.  The model shown in Figure 1 shows no systematic relationships between measures taken on the same occasion, and adding these relationships shows non-significant correlated residuals. Thus, in this sample naturally occurring factors did not change race bias. This finding suggests that the IAT and explicit measures measure stable personality characteristics rather than context-specific states.

Only a few serious intervention studies with the IAT have been conducted (Lai et al., 2013).  The most valuable evidence so far comes from studies that examined the influence of living with an African American roommate on White students’ racial attitudes (Shook & Fazio, 2008; Shook, Hopkins, & Koech, 2016).  One study found effects on an implicit measure, F(1,236) = 4.33, p = .04 (Shook & Fazio, 2008), but not on an explicit measure (Shook, 2007).  The other study found effects on explicit attitudes, F(1,107) = 7.34, p = .008 but no results for implicit measures were reported (Shook, Hopkins, & Koech, 2016). Given the small sample sizes of these studies, inconsistent results are to be expected. 

In conclusion, the existing evidence shows that implicit and explicit attitude measures are highly stable over time (Cunningham et al., 2001). I also concur with Joy-Gaba and Nosek that moving scores on implicit bias measures “may not be as easy as implied by the existing experimental demonstrations” (p. 145), and a multi-method assessment is needed to distinguish effects on specific measures from effects on personality characteristics (Olsen & Fazio, 2003).

Future studies of attitude change need a multi-method approach, powerful interventions, adequate statistical power, and multiple repeated measurements of attitudes to distinguish mere occasion-specific variability (malleability) from real attitude change (Anusic & Schimmack, 2016). Ideally, the study would also include informant ratings. For example, intervention studies with roommates could use African Americans as informants to rate their White roommates’ racial attitudes and behaviors.  The single-attitude model predicts that implicit and explicit measures will show consistent results and that variation in effect sizes is explained by the validity of each measure. 

Discussion

Does the IAT Measure Implicit Constructs?

Construct validation is a difficult and iterative process because scientific evidence can alter the understanding of constructs.  However, construct validation research has to start with a working definition of a construct.  The IAT was introduced as a measure of individual differences in implicit social cognition, and implicit social cognitions were defined as aspects of thinking and feeling that may not be easily accessed or available to consciousness (Nosek, Greenwald, & Banaji, 2007, p. 265).  This definition is vague, but it makes a clear prediction that the IAT should measure personality characteristics that cannot be measured with self-reports.  This leads to the prediction that explicit measures and the IAT have discriminant validity.  To demonstrate discriminant validity, unique variance in the IAT has to be related to other indicators of implicit personality characteristics.  This can be demonstrated with incremental predictive validity or convergent validity with other measures of implicit personality characteristics.  Consistent with this line of reasoning, numerous articles have claimed that the IAT has construct validity as a measure of implicit personality characteristics because it shows incremental predictive validity (Greenwald et al., 2009; Kurti et al., 2018) or because the IAT shows convergent validity with other implicit measures and discriminant validity with explicit measures (Bar-Anan & Vianello, 2018).  I demonstrated that all of these claims were false and that the existing evidence provides no evidence for the construct validity of the IAT as a measure of implicit personality characteristics.  The main problem is that most studies that used the IAT assumed construct validity rather than testing it.  Hundreds of studies used the IAT as a single measure of implicit personality characteristics and made claims about implicit personality traits based on variation in IAT scores.  Thus, hundreds of studies made claims that are not supported by empirical evidence simply because it has not been demonstrated that the IAT measures implicit personality constructs.  In this regard the IAT is not alone.  Aside from the replication crisis in psychology (OSC, 2015), psychological science suffers from an even more serious validation crisis. All empirical claims rest on the validity of measures that are used to test theoretical claims. However, many measures in psychology are used without proper validation evidence.  Personality research is a notable exception.  In response to criticism of low predictive validity (Mischell, 1968), personality psychologists embarked on a program of research that demonstrated predictive validity and convergent validity with informant ratings (Funder, $$$).  Another problem is that psychologists treat validity as a qualitative construct, leading to any evidence of validity to support claims that a measure is valid, as if it were 100% valid. However, most measures in psychology have only moderate validity (Schimmack, 2010). Thus, it is important to quantify validity and to use a multi-method approach to increase validity.  The popularity of the IAT reveals the problems with using measures without proper validation evidence.  Social psychologists have influenced public discourse, if not public policy, about implicit racial bias.  Most of these claims are based on findings with the IAT, assuming that IAT scores reflect implicit bias. As demonstrated here, these claims are not valid because the IAT lacks construct validity as a measure of implicit bias.  In the future, psychologists need to be more careful when they make claims based on new measures with limited knowledge about their validity.  Maybe psychological organizations should provide clear guidelines about minimal standards that need to be met before a measure can be used, just like there are guidelines for validity evidence for personality assessment.  In conclusion, psychology suffers as much from a validation crisis as it suffers from a replication crisis.  Fixing the replication crisis will not improve psychology if replicable results are obtained with invalid measures.

The Silver Lining

Psychologists are often divided into opposing camps (e.g. nature vs. nurture; person vs. situation; the IAT is valid vs. invalid).  Many fans of implicit measures are likely to dislike what I had to say about the IAT.  However, my position is different from previous criticisms of the IAT as being entirely invalid (Oswald et al., 2013).  I have demonstrated with several multi-method studies that the IAT has convergent validity with other measures of some personality characteristics. In some domains this validity is too low to be meaningful.  In other domains, the validity of explicit measures is so high that using the IAT is not necessary. However, for sensitive attitudes like racial attitudes, the IAT offers a promising complementary measure to explicit measures of racial attitudes.  Validity coefficients ranged from 20% to 40%.  As the IAT does not appear to share method variance with explicit measures, it is possible to improve the measurement of racial bias by using explicit and implicit measures and to aggregate scores to obtain a more valid measure of racial bias than either an explicit or an implicit measure can provide.  The IAT may also offer benefits in situations where socially desirable responding is a concern.  Thus, the IAT might complement other measures of personality characteristics. This changes the interpretation of explicit-IAT correlations. Rather than (mis)interpreting low correlations as evidence of discriminant validity, high correlations can reveal convergent validity. Similarly, improvements in implicit measures should produce higher correlations with explicit measures.  How useful the IAT and other implicit measures are for the measurement of other personality characteristics has to be examined on a case by case basis. Just like it is impossible to make generalized statements about the validity of self-reports, the validity of the IAT can vary across personality characteristics.  

Conclusion

Social psychologists have always distrusted self-report, especially for the measurement of sensitive topics like prejudice.  Many attempts were made to measure attitudes and other constructs with indirect methods.  The IAT was a major breakthrough because it has relatively high reliability compared to other methods.  Thus, creating the IAT was a major achievement that should not be underestimated because the IAT lacks construct validity as a measure of implicit personality characteristics. Even creating an indirect measure of attitudes is a formidable feat. However, in the early 1990s, social psychologists were enthralled by work in cognitive psychology that demonstrated unconscious or uncontrollable processes. Implicit measures were based on this work and it seemed reasonable to assume that they might provide a window into the unconscious. However, the processes that are involved in the measurement of personality characteristics with implicit measures are not the personality characteristics that are being measured.  There is nothing implicit about being a Republican or Democrat, gay or straight, or low self-esteem.  Conflating implicit processes in the measurement of personality constructs with implicit personality constructs has created a lot of confusion. It is time to end this confusion. The IAT is an implicit measure of personality with varying validity.  It is not a window into people’s unconscious feelings, attitudes or personalities.

References

Axt, J. R. (2018). The Best Way to Measure Explicit Racial Attitudes Is to Ask About Them. Social Psychological and Personality Science, 9, 896-906. https://doi.org/10.1177/1948550617728995

Anusic, I., & Schimmack, U. (2016). Stability and change of personality traits, self-esteem, and well-being: Introducing the meta-analytic stability and change model of retest correlations. Journal of Personality and Social Psychology, 110(5), 766-781. http://dx.doi.org/10.1037/pspp0000066

Anusic, I., Schimmack, U., Pinkus, R., & Lockwood, P. (2009). The nature and structure of correlations among Big Five ratings: the halo-alpha-beta model. Journal of Personality and Social Psychology, 97 6, 1142-56.

Bar-Anan, Y., & Vianello, M. (2018). A multi-method multi-trait test of the dual-attitude perspective. Journal of Experimental Psychology: General, 147(8), 1264-1272. http://dx.doi.org/10.1037/xge0000383

Blanton, H., Jaccard, J., Klick, J., Mellers, B., Mitchell, G., & Tetlock, P. E. (2009). Strong claims and weak evidence: Reassessing the predictive validity of the IAT. Journal of Applied Psychology, 94(3), 567-582. http://dx.doi.org/10.1037/a0014665

Bosson, J. K., Swann, W. B., Jr., & Pennebaker, J. W. (2000). Stalking the perfect measure of implicit self-esteem: The blind men and the elephant revisited? Journal of Personality and Social Psychology, 79(4), 631-643. http://dx.doi.org/10.1037/0022-3514.79.4.631

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(2), 81-105. http://dx.doi.org/10.1037/h0046016

Chen, F., West, S.G., & Sousa, K.H. (2006) A Comparison of Bifactor and Second-Order Models of Quality of Life, Multivariate Behavioral Research, 41:2, 189-225,
DOI: 10.1207/s15327906mbr4102_5

Cunningham, W. A., Preacher, K. J., & Banaji, M. R. (2001). Implicit attitude measures: Consistency, stability, and convergent validity. Psychological Science, 12, 163-170. http://dx.doi.org/10.1111/1467-9280.00328

Dasgupta, N., & Greenwald, A. G. (2001). On the malleability of automatic attitudes: Combating automatic prejudice with images of admired and disliked individuals. Journal of Personality and Social Psychology, 81, 800–814. doi:10.1037/0022-3514.81.5.800

Epstein, S. (1980). The stability of behavior: II. Implications for psychological research. American Psychologist, 35(9), 790-806. http://dx.doi.org/10.1037/0003-066X.35.9.790

Falk, C. F., Heine, S. J., Takemura, K. , Zhang, C. X. and Hsu, C. (2015). Are Implicit Self-Esteem Measures Valid for Assessing Individual and Cultural Differences. Journal of Personality, 83: 56-68. doi:10.1111/jopy.12082

Falk, C., & Heine, S.J. (2015). What is implicit self-esteem, and does it vary across cultures? Personality and Social Psychology Review, 19, 177-98.

Greenwald, A. G., & Farnham, S. D. (2000). Using the Implicit Association Test to measure self-esteem and self-concept. Journal of Personality and Social Psychology, 79(6), 1022-1038. http://dx.doi.org/10.1037/0022-3514.79.6.1022

Fazio, R. H., & Olson, M. A. (2003). Implicit measures in social cognition. research: Their meaning and use. Annual Review of Psychology, 54, 297–327. http://dx.doi.org/10.1146/annurev.psych.54.101601.145225

Fazio, R.H., Sanbonmatsu, D.M., Powell, M.C., & Kardes, F.R. (1986). On the automatic activation of attitudes. Journal of Personality and Social Psychology, 50, 229–238.

Joy-Gaba, J. A., & Nosek, B. A. (2010). The surprisingly limited malleability of implicit racial evaluations. Social Psychology, 41, 137–146. doi:10.1027/1864-9335/a000020

Gawronski, B., LeBel, E. P., & Peters, K. R. (2007). What do implicit measures tell us?: Scrutinizing the validity of three common assumptions. Perspectives on Psychological Science, 2(2), 181-193. http://dx.doi.org/10.1111/j.1745-6916.2007.00036.x

Greenwald, A.G., McGhee, D.E., & Schwartz, J.L.K. (1998). Measuring individual differences in implicit cognition: The Implicit Association Test. Journal of Personality and Social Psychology, 74, 1464–1480.

Greenwald, A. G., Poehlman, T. A., Uhlmann, E. L., & Banaji, M. R. (2009). Understanding and using the Implicit Association Test: III. Meta-analysis of predictive validity. Journal of Personality and Social Psychology, 97, 17–41. http://dx.doi.org/10.1037/a0015575

Greenwald, A. G., Smith, C. T., Sriram, N., Bar-Anan, Y., & Nosek, B. A. (2009). Race attitude measures predicted vote in the 2008 U. S. Presidential Election. Analyses of Social Issues and Public Policy, 9, 241–253.

Hofmann, W., Gawronski, B., Gschwendner, T., Le, H., & Schmitt, M. (2005). A meta-analysis on the correlation between the Implicit Association Test and explicit self-report measures. Personality and Social Psychology Bulletin, 31, 1369 –1385. http://dx.doi.org/10.1177/0146167205275613

Kurdi, B., Seitchik, A. E., Axt, J. R., Carroll, T. J., Karapetyan, A., Kaushik, N., . . . Banaji, M. R. (2018). Relationship between the Implicit Association Test and intergroup behavior: A meta-analysis. American Psychologist. Advance online publication. http://dx.doi.org/10.1037/amp0000364

Lai, C.K., Hoffman, K.M., & Nosek, B.A. (2013). Reducing Implicit Prejudice. XX

McConahay, J.B. (1986). Modern racism, ambivalence, and the modern racism scale. In J.F. Dovidio & S.L. Gaertner (Eds.), Prejudice, discrimination, and racism (pp. 91–125). Orlando, FL: Academic Press

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), 1-8.

Oswald, F. L., Mitchell, G., Blanton, H., Jaccard, J., & Tetlock, P. E. (2013). Predicting ethnic and racial discrimination: A meta-analysis of IAT criterion studies. Journal of Personality and Social Psychology, 105(2), 171-192. http://dx.doi.org/10.1037/a0032734

Pelham, B. W., & Swann, W. B. (1989). From self-conceptions to self-worth: On the sources and structure of global self-esteem. Journal of Personality and Social Psychology, 57, 672– 680

Rosenberg, M. (1965). Society and the Adolescent Self-image. Princeton, NJ: Princeton University Press.

Schneider, D. J. (1973). Implicit personality theory: A review. Psychological Bulletin, 79(5), 294-309. http://dx.doi.org/10.1037/h0034496

Simms, L.J., Zelazny, K., Yam, W.H., & Gros, D.F. (2010). Self-informant Agreement for Personality and Evaluative Person Descriptors: Comparing Methods for Creating Informant Measures. European Journal of Personality, 24 3, 207-221.

Vul, E, Harris, C, Winkielman, P., & Pashler, (2009). Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social Cognition, Perspectives on Psycholical Science, 4, 274-90. doi: 10.1111/j.1745-6924.2009.01125.x.

Walker, S. S., & Schimmack, U. (2008). Validity of a happiness implicit association test as a measure of subjective well-being. Journal of Research in Personality, 42(2), 490-497. http://dx.doi.org/10.1016/j.jrp.2007.07.005

2 thoughts on “The Implicit Association Test at Age 21: No Evidence for Construct Validity

  1. In addition to the points you make, I think the question at the outset should have been to what extent ifs the race-IAT compatibility effect larger than a more general compatibility effect say for, flowers/insects or another attitude object? A relative difference between IAT effects. Instead, the validity was founded on the (unlikely) correlation between self-report measures (typically developed using factor analyses etc) and the IAT (an RT measure).

Leave a Reply