Category Archives: Implicit Attitudes

Hey Social Psychologists: Don’t Mess with Suicide!

Ten years ago, the foundations of psychological science were shaking by the realization that the standard scientific method of psychological science is faulty. Since then it has become apparent that many classic findings are not replicable and many widely used measures are invalid; especially in social psychology (Schimmack, 2020).

However, it is not uncommon to read articles in 2021 that ignore the low credibility of published results. There are too many of these pseudo-scientific articles, but some articles matter more than others; at last to me. I do care about suicide and like many people my age, I know people who have committed suicide. I was therefore concerned when I saw a review article that examines suicide from a dual-process perspective.

Automatic and controlled antecedents of suicidal ideation and action: A dual-process conceptualization of suicidality.

My main concern about this article is that dual-process models in social cognition are based on implicit priming studies with low replicability and implicit measures with low validity (Schimmack, 2021a, 2021b). It is therefore unclear how dual-process models can help us to understand and prevent suicides.

After reading the article, it is clear that the authors make many false statements and present questionable studies that have never been replicated as if they produce a solid body of empirical evidence.

Introduction of the Article

The introduction cites outdated studies that have either not been replicated or produced replication failures.

“Our position is that even these integrative models omit a fundamental and well-established
dynamic of the human mind: that complex human behavior is the result of an interplay between relatively automatic and relatively controlled modes of thought (e.g., Sherman et al., 2014). From basic processes of impression formation (e.g., Fiske et al., 1999) to romantic relationships (e.g., McNulty & Olson, 2015) and intergroup relations (e.g., Devine, 1989), dual-process frameworks that incorporate automatic and controlled cognition have provided a more complete understanding of a broad array of social phenomena.

This is simply not true. For example, there is no evidence that we implicitly love our partners when we consciously hate them or vice versa, and there is no evidence that prejudice occurs outside of awareness.

Automatic cognitions can be characterized as unintentional (i.e., inescapably activated), uncontrollable (i.e., difficult to stop), efficient in operation (i.e., requiring few cognitive resources), and/or unconscious (Bargh, 1994) and are typically captured with implicit measures.

This statement ignores many articles that have criticized the assumption that implicit measures measure implicit constructs. Even the proponent of the most widely used implicit measure have walked back this assumption (Greenwald & Banaji, 2017).

The authors then make the claim that implicit measures of suicide have incremental predictive validity of suicidal behavior.

For example, automatic associations between the self and death predict suicidal ideation and action beyond traditional explicit (i.e., verbal) responses (Glenn et al., 2017).

This claim has been made repeatedly by proponents of implicit measures, so I meta-analyzed the small set of studies that tested this prediction (Schimmack, 2021). Some of these studies produced non-significant results and the literature showed evidence that questionable research practices were used to produce significant results. Overall, the evidence is inconclusive. It is therefore incorrect to point to a single study as if there is clear evidence that implicit measures of suicidality are valid.

Further statements are also based on outdated research and a single reference.

Research on threat has consistently shown that people preferentially process dangers to physical harm by prioritizing attention, response, and recall regarding threats (e.g., Öhman &
Mineka, 2001)

There have been many proposals about stimuli that attract attention, and threatening stimuli are by no means the only attention grabbing stimuli. Sexual stimuli also attract attention and in general arousal rather than valence or threat is a better predictor of attention (Schimmack, 2005).

It is also not clear how threatening stimuli are relevant for suicide which is related to depression rather than anxiety disorders.

The introduction of implicit measures totally disregards the controversy about the validity of implicit measures or the fact that different implicit measures of the same construct show low convergent validity.

Much has been written about implicit measures (for reviews, see De Houwer et al., 2009; Fazio & Olson, 2003; March et al., 2020; Nosek et al., 2011; Olson & Fazio, 2009), but for the present purposes, it is important to note the consensus that implicit measures index the automatic properties of attitudes.

More relevant are claims that implicit measures have been successfully used to understand a variety of clinical topics.

The application of a dual-process framework has consequently improved explanation and prediction in a number of areas involving mental health, including addiction (Wiers & Stacy, 2006), anxiety (Teachman et al., 2012), and sexual assault (Widman & Olson, 2013). Much of this work incorporates advances in implicit measurement in clinical domains (Roefs et al., 2011).

The authors then make the common mistake to conflate self-deception and other-deception. The notion of implicit motives that can influence behavior without awareness implies self-deception. An alternative rational for the use of implicit measures is that they are better measures of consciously accessible thoughts and feelings that individuals are hiding from others. Here we do not need to assume a dual-process model. We simply have to assume that self-report measures are easy to fake, whereas implicit measures can reveal the truth because they are difficult to fake. Thus, even incremental predictive validity does not automatically support a dual-process model of suicide. However, this question is only relevant if implicit measures of suicidality show incremental predictive validity, which has not been demonstrated.

Consistent with the idea that such automatic evaluative associations can predict suicidality later, automatic spouse-negative associations predicted increases in suicidal ideation over time across all three studies, even after accounting for their controlled counterparts (McNulty et al., 2019).

Conclusion Section

In the conclusion section, the authors repeat their false claim that implicit measures of suicidality reflect valid variance in implicit suicidality and that they are superior to explicit measures.

As evidence of their impact on suicidality has accumulated, so has the need for incorporating automatic processes into integrative models that address questions surrounding how and under what circumstances automatic processes impact suicidality, as well as how automatic and controlled processes interact in determining suicide-relevant outcomes.”

Implicit measures are better-suited to assess constructs that are more affective
(Kendrick & Olson, 2012), spontaneous (e.g., Phillips & Olson, 2014), and uncontrollable (e.g., Klauer & Teige-Mocigemba, 2007).

As recent work has shown (e.g., Creemers et al., 2012; Franck, De Raedt, Dereu, et al., 2007; Franklin et al., 2016; Glashouwer et al., 2010; Glenn et al., 2017; Hussey et al., 2016; McNulty et al., 2019; Nock et al., 2010; Tucker,Wingate, et al., 2018), the psychology of suicidality requires formal consideration of automatic processes, their proper measurement, and how they relate
to one another and corresponding controlled processes.

We have articulated a number of hypotheses, several already with empirical support, regarding interactions between automatic and controlled processes in predicting suicidal ideation and lethal acts, as well as their combination into an integrated model.

Then they finally mention the measurement problems of implicit measures.

Research utilizing the model should be mindful of specific challenges. First, although the model answers calls to diversify measurement in suicidality research by incorporating implicit measures, such measures are not without their own problems. Reaction time measures often have problematically low reliabilities, and some include confounds (e.g., Olson et al., 2009). Further, implicit and explicit measures can differ in a number of ways, and structural differences between them can artificially deflate their correspondence (Payne et al., 2008). Researchers should be aware of the strengths and weaknesses of implicit measures.

Evaluation of the Evidence

Here I provide a brief summary of the actual results of studies cited in the review article so that readers can make up their own mind about the relevance and credibility of the evidence.

Creemers, D. H., Scholte, R. H., Engels, R. C., Prinstein, M. J., & Wiers, R. W. (2012). Implicit and explicit self-esteem as concurrent predictors of suicidal ideation, depressive symptoms, and loneliness. Journal of Behavior Therapy and Experimental Psychiatry, 43(1), 638–646

Participants: 95 undergraduate students
Implicit Construct / Measure: Implicit self-esteem / Name Latter Task
Dependent Variables: depression, loneliness, suicidal ideation
Results: No significant direct relationship. Interaction between explicit and implicit self-esteem for suicidal ideation only, b = .28.

Franck, E., De Raedt, R., & De Houwer, J. (2007). Implicit but not explicit self-esteem predicts future depressive symptomatology. Behaviour Research and Therapy, 45(10), 2448–2455.

Participants: 28 clinically depressed patients; 67 not-depressed participants.
Implicit Construct / Measure: Implicit self-esteem / Name Latter Task
Dependent Variable: change in depression controlling for T1
Result: However, after controlling for initial symptoms of depression, implicit, t(48) = 2.21, p = .03, b = .25, but not explicit self-esteem, t(48) = 1.26, p = .22, b = .17, proved to be a significant predictor for depressive symptomatology at 6 months follow-up.

Franck, E., De Raedt, R., Dereu, M., & Van den Abbeele, D. (2007). Implicit and explicit self- esteem in currently depressed individuals with and without suicidal ideation. Journal of Behavior Therapy and Experimental Psychiatry, 38(1), 75–85.

Participants: Depressed patients with suicidal ideation (N = 15), depressed patients without suicidal ideation (N = 14) and controls (N = 15)
Implicit Construct / Measure: Implicit self-esteem / IAT
Dependent variable. Group status
Contrast analysis revealed that the currently depressed individuals with suicidal ideation showed a significantly higher implicit self-esteem as compared to the currently depressed individuals without suicidal ideation, t(43) = 3.0, p < 0.01. Furthermore, the non-depressed controls showed a significantly higher implicit self-esteem as compared to the currently depressed individuals without suicidal ideation, t(43) = 3.7, p < 0.001.
[this finding implies that suicidal depressed patients have HIGHER implicit self-esteem than depressed patients who are not suicidal].

Glashouwer,K.A., de Jong,P. J., Penninx, B.W.,Kerkhof,A. J., vanDyck, R., & Ormel, J. (2010). Do automatic self-associations relate to suicidal ideation? Journal of Psychopathology and Behavioral Assessment, 32(3), 428–437.

Participants: General population (N = 2,837)
Implicit Constructs / Measure: Implicit depression, Implicit Anxiety / IAT
Dependent variable: Suicidal Ideation, Suicide Attempt
Results: simple correlations
Depression IAT – Suicidal Ideation, r = .22
Depression IAT – Suicide Attempt, r = .12
Anxiety IAT – Suicide Ideation, r = .18
Anxiety IAT – Suicide Attempt, r = .11
Controlling for Explicit Measures of Depression / Anxiety
Depression IAT – Suicidal Ideation, b = ..024, p = .179
Depression IAT – Suicide Attempt, b = .037, p = .061
Anxiety IAT – Suicide Ideation, b = .024, p = .178
Anxiety IAT – Suicide Attempt, r = ..039, p = .046

Glenn, J. J., Werntz, A. J., Slama, S. J., Steinman, S. A., Teachman, B. A., &
Nock, M. K. (2017). Suicide and self-injury-related implicit cognition: A
large-scale examination and replication. Journal of Abnormal Psychology,
126(2), 199–211.

Participants: Self-selected online sample with high rates of self-harm (> 50%). Ns = 3,115, 3114
Implicit Constructs / Measure
: Self-Harm, Death, Suicide / IAT
Dependent variables: Group differences (non-suicidal self-injury / control; suicide attempt / control)
Non-suicidal self-injury versus control
Self-injury IAT, d = .81/.97; Death IAT d = .52/.61, Suicide IAT d = .58/.72
Suicide Attempt versus control
Self-injury IAT, d = ..52/.54; Death IAT d = .37/.32, Suicide IAT d = .54/.67
[these results show that self-ratings and IAT scores reflect a common construct;
they do not show discriminant validity; no evidence that they measure distinct
constructs and they do not show incremental predictive validity]

Hussey, I., Barnes-Holmes, D., & Booth, R. (2016). Individuals with current
suicidal ideation demonstrate implicit “fearlessness of death..” Journal of
Behavior Therapy and Experimental Psychiatry, 51, 1–9.

Participants: 23 patients with suicidal ideation and 25 controls (university students)
Implicit Constructs / Measure: Death attitudes (general / personal) / IRAP
Dependent variable: Group difference
Results: No main effects were found for either group (p = .08). Critically, however, a three-way interaction effect was found between group, IRAP type, and trial-type, F(3, 37) = 3.88, p = .01. Specifically, the suicidal ideation group produced a moderate “my death-not-negative” bias (M = .29, SD = .41), whereas the normative group produced a weak “my death-negative” bias (M = -.12, SD = .38, p < .01). This differential performance was of a very large effect size (Hedges’ g = 1.02).
[This study suggests that evaluations of personal death show stronger relationships than generic death]

McNulty, J. K., Olson, M. A., & Joiner, T. E. (2019). Implicit interpersonal evaluations as a risk factor for suicidality: Automatic spousal attitudes predict changes in the probability of suicidal thoughts. Journal of Personality and Social Psychology, 117(5), 978–997

Participants. Integrative analysis of 399 couples from 3 longitudinal study of marriages.
Implicit Construct / Measure: Partner attitudes / evaluative priming task
Dependent variable: Change in suicidal thoughts (yes/no) over time
Result: (preferred scoring method)
without covariates, b = -.69, se = .27, p = .010.
with covariate, b = -.64, se = .29, p = .027

Nock, M. K., Park, J. M., Finn, C. T., Deliberto, T. L., Dour, H. J., & Banaji, M. R. (2010). Measuring the suicidal mind: Implicit cognition predicts suicidal behavior. Psychological Science, 21(4), 511–517.

Participants. 157 patients with mental health problems
Implicit Construct / Measure: death attitudes / IAT
Dependent variable: Prospective Prediction of Suicide
Result: controlling for prior attempts / no explicit covariates
b = 1.85, SE = 0.94, z = 2.03, p = .042

Tucker, R. P., Wingate, L. R., Burkley, M., & Wells, T. T. (2018). Implicit Association with Suicide as Measured by the Suicide Affect Misattribution Procedure (S-AMP) predicts suicide ideation. Suicide and Life-Threatening Behavior, 48(6), 720–731.

Participants. 138 students oversampled for suicidal ideation
Implicit Construct / Measure: suicide attitudes / AMP
Dependent variable: Suicidal Ideation
Result: simple correlation, r = .24
regression controlling for depression, b = .09, se = .04, p = .028

Taken together the reference show a mix of constructs, measures and outcomes, and p-values cluster just below .05. Not one of these p-values is below .005. Moreover, many studies relied on small convenience samples. The most informative study is the study by Glashouwer et al. that examined incremental predictive validity of a depression IAT in a large, population wide, sample. The result was not significant and the effect size was less than r = .1. Thus, the references do not provide compelling evidence for dual-attitude models of depression.


Social psychology have abused the scientific method for decades. Over the past decade, criticism of their practices has become louder, but many social psychologists ignore this criticism and continue to abuse significance testing and to misrepresent these results as if they provide empirical evidence that can inform understanding of human behavior. This article is just another example of the unwillingness of social psychologists to “clean up their act” (Kahneman, 2012). Readers of this article should be warned that the claims made in this article are not scientific. Fortunately, there is a credible research on depression and suicide outside of social psychology.

Did Social Psychologist Really Develop a Love Test?

Until 2011, social psychologists were able to believe that they were actually doing science. They conducted studies, often rigorous experiments with random assignment, analyzed the data and reported results only when they achieved statistical significance, p < .05. This is how they were trained to do science and most of them believed that this is how science works.

However, in 2011 an article by a well-respected social psychologists changed all this. Daryl Bem published an article that showed time-reversed causal processes. Seemingly, people were able to feel the future (Bem, 2011). This article shock the foundations of social psychology because most social psychologists did not believe in paranormal phenomena. Yet, Bem presented evidence for his crazy claim in 8 out of 9 studies. The only study that did not work was with supraliminal stimuli. The other studies used subliminal stimuli, suggesting that only our unconscious self can feel the future.

Over the past decade it has become apparent that Bem and other social psychologists had misused significance testing. They only paid attention to significant results, p < .05, and ignored non-significant results, p > .05. Selective publishing of significant results means that statistical results no longer distinguished between true and false findings. Everything was significant, even time-reversed implicit priming.

Some areas of social psychology have been hit particularly hard by replication failures. Most prominently, implicit priming research has been called out as a poster child of doubt about social psychological results by Nobel Laureate Kahneman. The basic idea of implicit priming is that stimuli outside of participants’ awareness can influence their behavior. Many implicit priming studies have failed to replicate.

Ten years later, we can examine how social psychologists have responded to the growing evidence that many classic findings were obtained with questionable practices (not reporting the failures) and cannot be replicated. Unfortunately, the response is consistent with psychodynamic theories of ego-defense mechanisms and social psychologists’ own theories of motivated reasoning. For the most part, social psychologists have simply ignored the replication failures in the 2010s and continue to treat old articles as if they provide scientific insights into human behavior. For example, Bargh – a leading figure in the implicit priming world – wrote a whole book about implicit priming that does not mention replication failures and presents questionable research as if they were well-established facts (Schimmack, 2017).

Given the questionable status of implicit priming research, it is not surprising that concerns are also growing about measures that were designed to reflect individual differences in implicit cognitions (Schimmack, 2019). The measures often have low reliability (when you test yourself you get different results each time) and show low convergent validity (one measure of your unconscious feelings towards your spouse doesn’t correlate with another measure of your unconscious feelings towards your spouse). It is therefore suspicious, when researchers consistently find results with these measures because measurement error should make it difficult to get significant results all the time.

Implicit Love

In an article from 2019 (i.e., when the replication crisis in social psychology has been well-established), Hicks and McNulty make the following claims about implicit love; that is feelings that are not reflected in self-reports of affection or marital satisfaction.

Their title is based on a classic article by Bargh and Chartrand.

Readers are not informed that the big claims made by Bargh twenty years ago have failed to be supported by empirical evidence. Especially the claim that stimuli often influence behavior without awareness lacks any credible evidence. It is therefore sad to say that social psychologists have moved on from self-deception (they thought they were doing science, but they did not) to other-deception (spreading false information knowing that credible doubts have been raised about this research). Just like it is time to reclaim humility and honesty in American political life, it is important to demand humility and honesty from American social psychologists, who are dominating social psychology.

The empirical question is whether research on implicit love has produced robust and credible results. One advantage for relationship researchers is that a lot of this research was published after Bem (2011). Thus, researchers could have improved their research practices. This could result in two outcomes. Either relationship researchers reported their results more honestly and did report non-significant results when they emerged, or they increased sample sizes to ensure that small effect sizes could produce statistically significant results.

Hicks and McNulty’s (2019) narrative review makes the following claims about implicit love.

1. The frequency of various sexual behaviors was prospectively associated with automatic partner evaluations assessed with an implicit measure but not with self-reported relationship satisfaction. (Hicks, McNulty, Meltzer, & Olson, 2016).

2. Participants with less responsive partners who felt less connected to their partners during conflict-of-interest situations had more negative automatic partner attitudes at a subsequent assessment but not more negative subjective evaluations (Murray, Holmes, & Pinkus, 2010).

3. Pairing the partner with positive affect from other sources (i.e., positive words and pleasant images) can increase the positivity of automatic partner attitudes relative to a control group.

4. The frequency of orgasm during sex was associated with automatic partner attitudes, whereas sexual frequency was associated only with deliberate reports of relationship satisfaction for participants who believed frequent sex was important for relationship health.

5. More positive automatic partner attitudes have been linked to perceiving fewer problems over time (McNulty, Olson, Meltzer, & Shaffer, 2013).

6. More positive automatic partner attitudes have been linked to self-reporting
fewer destructive behaviours (Murray et al., 2015).

7. More positive automatic partner attitudes have been linked to more cooperative relationship behaviors (LeBel & Campbell, 2013)

8. More positive automatic partner attitudes have been linked to displaying attitude-consistent nonverbal communication in conflict discussions (Faure et al., 2018).

9. More positive automatic partner attitudes were associated with a decreased likelihood of dissolution the following year, even after controlling for explicit relationship satisfaction (Lee, Rogge, & Reis, 2010).

10. Newlyweds’ implicit partner evaluations but not explicit satisfaction within the first few months of marriage were more predictive of their satisfaction 4 years later.

11. People with higher motivation to see their relationship in a positive light because of barriers to exiting their relationships (i.e., high levels of relationship investments and poor alternatives) demonstrated a weaker correspondence between their automatic attitudes and their relationship self-reports.

12. People with more negative automatic evaluations are less trusting of their partners when their working memory capacity is limited (Murray et al., 2011).

These claims are followed with the assurance that “these studies provide compelling evidence that automatic partner attitudes do have implications for relationship outcomes” (p. 256).

Should anybody who reads this article or similar claims in the popular media believe them? Have social psychologists improved their methods to produce more credible results over the past decade?

Fortunately, we can answer this question by examining the statistical evidence that was used to support these claims, using the z-curve method. First, all test statistics are converted into z-scores that represent the strength of evidence against the null-hypothesis (i.e., implicit love has no effect or does not exist) in each study. These z-scores are a function of the effect size and the amount of sampling error in a study (signal/noise ratio). Second, the z-scores are plotted as a histogram to show how many of the reported results provide weak or strong evidence against the null-hypothesis. The data are here for full transparency (Implicit.Love.xlsx).

The figure shows the z-curve for the 30 studies that reported usable test results. Most published z-scores are clustered just above the threshold value of 1.96 that corresponds to the .05 criterion to claim a discovery. This clustering is indicative of the use of selecting significant results from a much larger set of analyses that were run and produced non-significant results. The grey curve from z = 0 to 1.96 shows the predicted number of analyses that were not reported. The file drawer ratio implies that for every significant result there were 12 analyses with non-significant results.

Another way to look at the results is to compare the observed discovery rate with the expected discovery rate. The observed discovery rate is simply the percentage of studies that reported a significant result, which is 29 out of 30 or 97%. The estimated discovery rate is the average power of studies to produce a significant result. It is only 8%. This shows that social psychologists still continue to select only successes and do not report or interpret the failures. Moreover, in this small sample of studies, there is considerable uncertainty around the point estimates. The 95%confidence interval for the replication success probability includes 5%, which is not higher than chance. The complementary finding is that the maximum number of false positives is estimated to be 63%, but could be as high as 100%. In other words, the results make it impossible to conclude that even some of these studies produced a credible result.

In short, the entire research on implicit love is bullshit. Ten years ago, social psychologists had the excuse that they did not know better and misused statistics because they were trained the wrong way. This excuse is wearing thin in 2020. They know better, but they continue to report misleading results and write unscientific articles. In psychology, this is called other-deception, in everyday life it is called lying. Don’t trust social psychologists. Doing so is as stupid as believing Donald Trump when he claims that he won the election.

Reexamining Cunningham, Preacher, and Banaji’s Multi-Method Model of Racism Measures

William A. Cunningham, Kristopher J. Preacher, and Mahzarin R. Banaji. (2001).
Implicit Attitude Measures: Consistency, Stability, and Convergent Validity, Psychological Science, 12(2), 163-170.

In recent years, several techniques have been developed to measure implicit social cognition. Despite their increased use, little attention has been devoted to their reliability and validity. This article undertakes a direct assessment of the interitem consistency, stability, and convergent validity of some implicit attitude measures. Attitudes toward blacks and whites were measured on four separate occasions, each 2 weeks apart, using three relatively implicit measures (response window evaluative priming, the Implicit Association Test, and the response-window Implicit Association Test) and one explicit measure (Modern Racism Scale). After correcting for interitem inconsistency with latent variable analyses, we found that (a) stability indices improved and (b) implicit measures were substantially correlated with each other, forming a single latent factor. The psychometric properties of response-latency implicit measures have greater integrity than recently suggested.

Critique of Original Article

This article has been cited 362 times (Web of Science, January 2017).  It still is one of the most rigorous evaluations of the psychometric properties of the race Implicit Association Test (IAT).  As noted in the abstract, the strength of the study is the use of several implicit measures and the repeated measurement of attitudes on four separate occasions.  This design makes it possible to separate several variance components in the race IAT.  First, it is possible to examine how much variance is explained by causal factors that are stable over time and shared by implicit and explicit attitude measures.  Second, it is possible to measure the amount of variance that is unique to the IAT.  As this component is not shared with other implicit measures, this variance can be attributed to systematic measurement error that is stable over time.  A third variance component is variance that is shared only with other implicit measures and that is stable over time. This variance component could reflect stable implicit racial attitudes.  Finally, it is possible to identify occasion specific variance in attitudes.  This component would reveal systematic changes in implicit attitudes.

The original article presents a structural equation model that makes it possible to identify some of these variance components.  However, the model is not ideal for this purpose and the authors do not test some of these variance components.  For example, the model does not include any occasion specific variation in attitudes.  This could be because attitudes do not vary over the one-month interval of the study, or it could mean that the model failed to specify this variance component.

This reanalysis also challenges the claim by the original authors that they provided evidence for a dissociation of implicit and explicit attitudes.  “We found a dissociation between implicit and explicit measures of race attitude: Participants simultaneously self-reported nonprejudiced explicit attitudes toward black Americans while showing an implicit difficulty in associating black with positive attributes” (p. 169). The main problem is that the design does not allow to make this claim because the study included only a single explicit racism measure.  Consequently, it is impossible to determine whether unique variance in the explicit measure reflects systematic measurement in explicit attitude measures (social desirable responding, acquiescence response styles) or whether this variance reflects consciously accessible attitudes that are distinct from implicit attitudes.  In this regard, the authors claim that “a single-factor solution does not fit the data” (p. 170) is inconsistent with their own structural equation model that shows a single second-order factor that explains the covariance among the three implicit measures and the explicit measure.

The authors caution that a single IAT measure is not very reliable, but their statement about reliability is vague. “Our analyses of implicit attitude measures suggest that the degree of measurement error in response-latency measures can be substantial; estimates of Cronbach’s alpha indicated that, on average, more than 30% of the variance associated with the measurements was random error.” (p. 160).  More than 30% random measurement error leaves a rather large range of reliability estimates ranging from 0% to 70%.   The respective parameter estimates for the IAT in Figure 4 are .53^2 = .28, .65^2 = .42, .74^2 = .55, and .38^2 = .14.  These reliability estimates vary considerably due to the small sample size, but the loading of the first IAT would suggest that only 19% of the variance in a single IAT is reliable. As reliablity is the upper limit for validity, it would imply that no more than 20% of the variance in a single IAT captures variation in implicit racial attitudes.

The authors caution readers about the use of a single IAT to measure implicit attitudes. “When using latency-based measures as indices of individual differences, it may be essential to employ analytic techniques, such as covariance structure modeling, that can separate measurement error from a measure of individual differences. Without such analyses, estimates of relationships involving implicit measures may produce misleading null results” (p. 169).  However, the authors fail to mention that the low reliability of a single IAT also has important implications for the use of the IAT for the assessment of implicit prejudice.  Given this low estimate of validity, users of the Harvard website that provides information about individual’s performance on the IAT should be warned that the feedback is neither reliable nor valid by conventional standards for psychological tests.

Reanalysis of Published Correlation Matrix

The Table below reproduces the correlation matrix. The standard deviations in the last row are rescaled to avoid rounding problems. This has no effect on the results.

.80   1
.78 .82  1
.76 .77 .86   1
.21 .15 .15 .14   1
.13 .14 .10 .08 .31  1
.16 .26 .23 .20 .42 .50 1
.14 .17 .16 .13 .16 .33 .17 1
.20 .16 .19 .26 .33 .11 .23 .07 1
.26 .29 .18 .19 .20 .27 .36 .29 .26   1
.35 .33 .34 .25 .28 .29 .34 .33 .36 .39   1
.19 .17 .08 .07 .12 .25 .30 .14 .01 .17 .24 1
.00 .11 .07 .04 .27 .18 .19 .02 .03 .01 .02 .07 1
.16 .08 .04 .08 .26 .27 .24 .22 .14 .32 .32 .17 .13 1
.12 .01 .02 .07 .13 .19 .18 .00 .02 .00 .11 .04 .17 .30 1
.33 .18 .26 .31 .14 .24 .31 .15 .22 .20 .27 .04 .01 .48 .42 1

SD 0.84 0.82 0.88 0.86 2.2066 1.2951 1.0130 0.9076 1.2 1.0 1.1 1.0 0.7 0.8 0.8 0.9

1-4 = Modern Racism Scale (1-4); 5-8 Implicit Association Test (1-4);  9-12 = Response Window IAT (1-4);  13-16 Response Window Evaluative Priming (1-4)


Fitting the data to the original model reproduced the original results.  I then fitted the data to a model with a single attitude factor (see Figure 1).  The model also allowed for measure-specific variances.  An initial model showed no significant measure-specific variances for the two versions of the IAT .  Hence, these method factors were not included in the final model.  To control for variance that is clearly consciously accessible, I modeled the relationship between the explicit factor and the attitude factor as a causal path from the explicit factor to the attitude factor.  This path should not be interpreted as a causal relationship in this case. Rather the path can be used to estimate how much of the variance in the attitude factor is explained by consciously accessible information that influences the explicit measure.  In this model, the residual variance is variation that is shared among implicit measures, but not with the explicit measure.

The model had good fit to the data.  I then imposed constraints on factor loadings.  The constrained model had better fit than the unconstrained model (delta AIC = 4.60, delta BIC = 43.53).  The main finding is that the standard IAT had a loading of .55 on the attitude factor.  The indirect path from the implicit attitude factor to a single IAT measure is only slightly smaller, .55*.92 = .51.  The 95%CI for this parameter ranged from .41 to .60.  The upper bound of the 95%CI would imply that at most 36% of the variance in a single IAT reflects implicit racial attitudes.  However, it is important to note that the model in Figure 1 assumes that the Modern Racism Scale is a perfectly valid measure of consciously accessible attitudes. Any systematic measurement error in the Modern Racism Scale would reduce the amount of variance in the attitude factor that reflects unconscious factors.  Again, the lack of multiple explicit measures makes it impossible to separate systematic measurement error from valid variance in explicit measures.  Thus, the amount of variance in a single IAT that reflects unconscious racial attitudes can range from 0 to 36%.

How Variable are Implicit Racial Attitudes?

The design repeated measurement of implicit attitudes on four occasions.  If recent experiences influence implicit attitudes, we would expect that implicit measures of attitudes on the same occasion are more highly correlated with each other than implicit measures taken on different occasions.  Given the low validity of implicit attitude measures, I examined this question with constrained parameters. By estimating a single parameter, the model has more power to reveal a consistent relationship between implicit measures that were obtained during the same testing session.  Neither the two IATs, nor the IAT and the evaluative priming task (EP) showed significant occasion-specific variance.  Although this finding may be due to low power to detect occasion specific variation, this finding suggests that most of the variance in an IAT is due to stable variation and random measurement error.


Cunningham et al. (2001) conducted a rigorous psychometric study of the Implicit Association Test.  The original article reported results that could be reproduced.  The authors correctly interpret their results as evidence that a single IAT has low reliability. However, they falsely imply that their results provide evidence that the IAT and other implicit measures are valid measures of an implicit form of racism that is not consciously accessible.  My new analysis shows that their results are consistent with this hypothesis, if one assumes that the Modern Racism Scale is a perfectly valid measure of consciously accessible racial attitudes.  Under this assumption, about 25% (95%CI 16-36) of the variance in a single IAT would reflect implicit attitudes.  However, it is rather unlikely that the Modern Racism Scale is a perfect measure of explicit racial attitudes, and the amount of variance in performance on the IAT that reflects unconscious racism is likely to be smaller. Another important finding that was implicit, but not explicitly mentioned, in the original model is that there is no evidence for situation-specific variation in implicit attitudes. At least over the one-month period of the study, racial attitudes remained stable and did not vary as a function of naturally occurring events that might influence racial attitudes (e.g., positive or negative intergroup contact).  This finding may explain why experimental manipulations of implicit attitudes also often produce very small effects (Joy Gaba & Nosek, 2010).

One surprising finding was that the IAT showed no systematic measurement error in this model. This would imply that repeated measures of the IAT could be used to measure racial attitudes with high validity.  Unfortunately, most studies with the IAT rely on a single testing situation and ignore that most of the variance in a single IAT is measurement error.  To improve research on racial attitudes and prejudice, social psychologists should use multiple explicit and implicit measures and use structural equation models to examine which variance components of a measurement model of racial attitudes predict actual behavior.

Validity of the Implicit Association Test as a Measure of Implicit Attitudes

This blog post reports the results of an analysis of correlations among 4 explicit and 3 implicit attitude measures published by Ranganath, Tucker, and Nosek (2008).

Original article:
Kate A. Ranganath, Colin Tucker Smith, & Brian A. Nosek (2008). Distinguishing automatic and controlled components of attitudes from direct and indirect measurement methods. Journal of Experimental Social Psychology 44 (2008) 386–396; doi:10.1016/j.jesp.2006.12.008

Distinct automatic and controlled processes are presumed to influence social evaluation. Most empirical approaches examine automatic processes using indirect methods, and controlled processes using direct methods. We distinguished processes from measurement methods to test whether a process distinction is more useful than a measurement distinction for taxonomies of attitudes. Results from two studies suggest that automatic components of attitudes can be measured directly. Direct measures of automatic attitudes were reports of gut reactions (Study 1) and behavioral performance in a speeded self-report task (Study 2). Confirmatory factor analyses comparing two factor models revealed better fits when self-reports of gut reactions and speeded self-reports shared a factor with automatic measures versus sharing a factor with controlled self-report measures. Thus, distinguishing attitudes by the processes they are presumed to measure (automatic versus controlled) is more meaningful than distinguishing based on the directness of measurement.

Description of Original Study

Study 1 measured relative attitudes towards heterosexuals and homosexuals with seven measures; four explicit measures and three reaction time tasks. Specifically, the four explicit measures were

Actual = Participants were asked to report their “actual feelings” towards gay and straight people when given enough time for full consideration on a scale ranging from 1=very negative to 8 = very positive.

Gut = Participants were asked to report their “gut reaction” towards gay and straight people when given enough time for full consideration on a scale ranging from 1=very negative to 8 = very positive.

Time0 and Time5: A second explicit rating task assessed an “attitude timeline”. Participants reported their attitudes toward the two groups at multiple time points: (1) instant reaction, (2) reaction a split-second later, (3) reaction after 1 s, (4) reaction after 5 s, and (5) reaction when given enough time to think fully. Only the first (Time0) and the last (Time5) rating were included in the model.

The three reaction time measures were the Implicit Association Test (IAT), the Go-NoGo Association Test (GNAT), and a Four-Category Sorting Paired Features Task (SPF). All three measures use differences in response times to measure attitudes.

Table A1 in the Appendix reported the correlations among the seven tasks.

GNAT .36 1
SPF .26 .18 1
GUT .23 .33 .12 1
Actual .16 .31 .01 .65 1
Time0 .19 .31 .16 .85 .50 1
Time5 .01 .24 .01 .54 .81 .50 1

The authors tested a variety of structural equation models. The best fitting model, preferred by the authors, was a model with three correlated latent factors. “In this three-factor model, self-reported gut feelings (GutFeeling, Instant Feeling) comprised their own attitude factor distinct from a factor comprised of the indirect, automatic measures (IAT, GNAT, SPF) and from a factor comprised of the direct, controlled measures (Actual Feeling, Fully Considered Feeling). The data were an excellent fit (chi^2(12) = 10.8).

The authors then state “while self-reported gut feelings were more similar to the indirect measures than to the other self-reported attitude measures, there was some unique variance in self-reported gut feelings that was distinct from both.” (p. 391) and they go on to speculate that “one possibility is that these reports are a self-theory that has some but not complete correspondence with automatic evaluations” (p. 391). The also consider the possibility that “measures like the IAT, GNAT, and SPF partly assess automatic evaluations that are “experienced” and amenable to introspective report, and partly evaluations that are not” (p. 391). But they favor the hypothesis that “self-report of ‘gut feelings’ is a meaningful account of some components of automatic evaluation” (p. 391). The interpret these results as strong support for their “contention that a taxonomy of attitudes by measurement features is not as effective as one that distinguishes by presumed component processes” (p. 391). The conclusion reiterates this point. “The present studies suggest that attitudes have distinct but related automatic and controlled factors contributing to social evaluation and that parsing attitudes by underlying processes is superior to parsing attitude measures by measurement features” (p. 393). Surprisingly, the author do not mention the three-factor model in the Discussion and rather claim support for a two-factor model that distinguishes processes rather than measures (explicit vs. implicit). “In both studies, model comparison using confimatory factor analysis showed the data were better fit to a two-factor model distinguishing automatic and controlled components of attitudes than to a model distinguishing attitudes by whether they were measured directly or indirectly” (p. 393). The authors then suggest that some explicit measures (ratings of gut reactions) can measure automatic attitudes. “These findings suggest that direct measures can be devised to capture automatic components of attitudes despite suggestions that indirect measures are essential for such assessments” (p. 393).

New Analysis 

The main problem with this article is that the author never report parameter estimates for the model. Depending on the pattern of correlations among the three factors and factor loadings, the interpretation of the results can change. I first tried to fit the three-factor model to the covariance matrix (setting variances to 1) to the published correlation matrix. MPLUS7.1 showed some problems with negative residual variance for Actual. Also the model had one less degree of freedom than the published model. However, fixing the residual variance of actual did not solve the problem. I then proceeded to fit my own model. The model is essentially the same model as the three-factor model with the exception that I modeled the correlation among the three-latent factor with a single higher-order factor. This factor represents variation in common causes that influences all attitude measures. The problem of negative variance in the actual measure was solved by allowing for an extra correlation between the actual and gut ratings. As seen in the correlation table, these two explicit measures correlated more highly with each other (r = .65) than the corresponding T0 and T5 measures (rs = .54, .50). As in the original article, model fit was good (see Figure). Figure 1 shows for the first time the parameter estimates of the model.



The loadings of the explicit measures on the primary latent factors are above .80. For single item measures, this implies that these ratings are essentially measuring the same construct with some random error. Thus, the latent factors can be interpreted as explicit ratings of affective responses immediately or after some reflection. The loadings of these two factors on the higher order factor show that reflective and immediate responses are strongly influenced by the common factor. This is not surprising. Reflection may alter the immediate response somewhat, but it is unlikely to reverse or dramatically change the response a few seconds later. Interestingly, the immediate response has a higher loading on the attitude factor, although in this small sample the differences in loadings is not significant (chi^2(1) = 0.22. The third primary factor represents the shared variance among the three reaction time measures. It also loads on the general attitude factor, but the loading is weaker than the loading for the explicit measures. The parameter estimates suggest that about 25% of the variance is explained by the common attitude (.51^2) and 75% is unique to the reaction time measures. This variance component can be interpreted as unique variance in implicit measures. The factor loadings of the three reaction time measures are also relevant. The loading of the IAT suggests that only 28% (.53^2) of the observed variance in the IAT reflects the effect of causal factors that influence reaction time measures of attitudes. As some of this variance is also shared with explicit measures, only 21% ((.86*.53)^2) of the variance in the IAT represents the variance in the implicit attitude factor This has important implications for the use of the IAT to examine potential effects of implicit attitudes on behavior. Even if implicit attitudes had a strong effect on a behavior (r = .5), the correlation between IAT scores and the behavior only would be r = .86*.53*.5 = .23. A sample size of N = 146 participants would be needed to have 80% power to provide significant evidence for such a relationship (p < .05, two-tailed). Given a more modest effect of attitudes on behavior, r = .86*.53*.30 = .14, the sample size would need to be larger (N = 398). As many studies of implicit attitudes and behavior used smaller samples, we would expect many non-significant results, unless non-significant results remain unreported and published results report inflated effect sizes. One solution to the problem of low power in studies of implicit attitudes would be the use of multiple implicit attitude measures. This study suggests that a battery of different reaction time tasks can be used to remove random and task specific measurement error. Such a multi-method approach to the measurement of implicit attitudes is highly recommended for future studies because it would also help to interpret results of studies in which implicit attitudes do not influence behavior. If a set of implicit measures show convergent validity, this finding would indicate that implicit attitudes did not influence the behavior. In contrast, a null-result with a single implicit measure may simply show that the measure failed to measure implicit attitudes.


This article reported some interesting data, but failed to report the actual results. This analysis of the data showed that explicit measures are highly correlated with each other and show discriminant validity from implicit, reaction time measures. The new analysis also made it possible to estimate the amount of variance in the Implicit Association Test that reflects variance that is not shared with explicit measures but shared with other implicit measures. The estimate of 20% suggests that most of the variance in the IAT is due to factors other than implicit attitudes and that the test cannot be used to diagnose individuals. Whether the 20% of variance that is uniquely shared with other implicit measures reflects unconscious attitudes or method variance that is common to reaction time tasks remains unclear. The model also implies that predictive validity of a single IAT for prejudice behaviors is expected to be small to moderate (r < .30), which means large samples are needed to study the effect of implicit attitudes on behavior.