A hallmark characteristic of science is its ability to correct itself. When empirical observations accumulate that are inconsistent with theoretical claims, false theories are abandoned. Unfortunately, this self-correcting process can last decades because scientists are often invested in theories and are reluctant or unwilling to abandon theories. This motivated bias is particularly strong among researchers who benefited from a popular theory.
Another problem that slows down the self-correction process of science is that scientists often hide evidence that contradicts their theoretical predictions. This unscientific practice is prevalent because scientific organizations that are run by scientists are unwilling to sanction these practices. As a result, consumers of scientific information (e.g., undergraduate students) are introduced to false claims with false empirical evidence.
Over the past 10 years, meta-scientists have developed powerful statistical tools to reveal the use of unscientific practices that stand in the way of scientific progress. In this blog post, I used z-curve to demonstrate that research on implicit self-esteem is unscientific and untrustworthy.
The concept of implicit self-esteem emerged in the 1990s, when social psychologists started to believe in powerful unconscious processes that guide human behavior without our knowledge. The research on implicit self-esteem became particularly popular when Anthony G. Greenwald invented a computerized task to measure implicit association and used it to measure implicit self-esteem (Greenwald & Furnham, 2000).
Figure 1 shows the number of articles (bars) and citations (graph) for articles listed in Web Of Science with the topic implicit self-esteem. Publications increased rapidly in the 2000s, and citations continue to increase with over 7,000 citations in 2021. This shows that the scientific community is unaware of major problems with the validity of implicit measures of self-esteem that have been known since 2000 (Bosson et al., 2000; Buhrmester et al., 2011; Falk et al., 2015; Jusepeitis & Rothermund, 2022; Schimmack, 2021). Greenwald and colleagues simply ignore this evidence and perpetuate the illusions that the self-esteem IAT is a valid measure of implicit self-esteem (cf. Schimmack, 2021).

A typical argument to claim that a measure is valid is to point to the large amount of published articles that produced statistically significant results. We would not expect so many findings form a measure that has no validity. However, this argument ignores that published results are selected for significance. Thus, publications give an overly positive impression of the support for a theory. To examine the extent of publication bias in the self-esteem literature, I downloaded the list of 1,585 articles in Web of Science. I then looked for matching articles in a database of 121 major psychology journals (Schimmack, 2022). This search produced 604 matching articles (71 JPSP, 58 JESP, 45 Self & Identity, 44 PSPB, 40 PAID). A search for statistical tests results in these article produced 11,637 tests. Figure 2 shows a z-curve plot of these results. All tests are converted into absolute z-scores that show the strength of evidence (signal/noise ratio) against the null-hypothesis of no effect.
The key finding is that 72% of the results are statistically significant at p < .05 (two -tailed). However, a z-curve analysis of the distribution of significant results (z > 1.96) shows that these results are selected from a much larger set of statistical results that are not reported. The expected discovery rate is the proportion of significant results under the grey curve that extends outside the plot. The EDR is 20% and the upper limit of the 95% confidence interval is 28%. Thus, the observed percentage of 72% is dramatically inflated by selection for significance.

The EDR can also be used to estimate the false discovery risk; that is, a significant result was obtained without a real effect. The false discovery risk is estimated at 22%, but it could be as high as 37% given uncertainty about the actual discovery rate. Moreover, an analyses of all test statistics includes tests of hypotheses that do not include the implicit measures. A focal analysis of only those tests is likely to produce an even lower discovery rate and a higher false positive risk.
The key conclusion is that published significant results cannot be used to make claims about implicit self-esteem. To make credible claims about self-esteem results need to be replicated in credible, pre-registered replication studies. It is noteworthy that Greenwald has not conducted any replication studies of his seminal validation study in 2000.
It is interesting to compare these results with the literature on the serotonin transporter gene (Schimmack, 2022). Behavioral geneticists also had a period of euphoria when it became possible to measure variation in human genes. A large literature focused on genetic variation in the gene responsible for the reuptake of serotonin because this mechanism is used for the treatment of mood disorders with selective serotonin reuptake inhibitors (SSRISs). After one decade, it became apparent that most results did not replicate and interest in single gene variations decreased. This can be seen in decreasing publications and citations; an example of scientific self-correction. A z-curve analysis of this literature produced nearly identical results (EDR = 19%, FDR = 22%). The notable difference is that geneticists listened to their data and mostly abandoned this line of research. In contrast, the Web of Science statistics suggests that social psychologists are ignoring the warning signs that they are chasing a phenomenon that does not exist or that they haven’t been able to measure properly. Time will tell, how long it will take for social psychology to correct itself.