Auditing Social Psychology Textbooks: Hitler had High Self-Esteem

Social psychologists see themselves as psychological “scientists,”  who study people and therefore believe that they know people better than you or me. However, often their claims are not based on credible scientific evidence and are merely personal opinions disguised as science.

For example, a popular undergraduate psychology textbook claims that

Hitler had high self-esteem.

quoting an article that has been cited over 500 times in the journal “Psychological Science in the Public Interest.”  At the end of the article with the title “Does High Self-Esteem Cause Better Performance, Interpersonal Success, Happiness, or Healthier Lifestyles?” the authors write: 

“High self-esteem feels good and fosters initiative. It may still prove a useful tool to promote success and virtue, but it should be clearly and explicitly linked to desirable behavior. After all, Hitler had very high self-esteem and plenty of initiative, too, but those were hardly guarantees of ethical behavior.”

In the textbook this quote is linked to boys who engage in sex at an “inappropriately young age” which is not further specified (in Canada this would be 14) according to recent statistics). 

“High self-esteem does have some benefits—it fosters initiative, resilience, and pleasant feelings (Baumeister & others, 2003). Yet teen males who engage in sexual activity at an “inappropriately young age” tend to have higher than average self-esteem. So do teen gang leaders, extreme ethnocentrists, terrorists, and men in prison for committing violent crimes (Bushman & Baumeister, 2002; Dawes, 1994, 1998). “Hitler had very high self-esteem,” note Baumeister and co-authors (2003).”  (Myers, 2011, Social Psychology, 12th edition)

Undergraduate students pay a lot of money to be informed that people with high self-esteem are like sexually deviants, terrorists, violent criminals, and Hitler. (maybe we should add scientists with big claims and small samples to the list).

The problem is that this is not even true. Students who work with me on fact checking the textbook found this quote in the original article.

“There was no [!] significant difference in self-esteem scores between violent offenders and non-offenders, Ms = 28.90 and 28.89, respectively, t(7653) = 0.02, p > .9, d = 0.0001.”

Although the df of the t-test look impressive, the study compared 63 violent offenders to 7590 unmatched, mostly undergraduate student (gender not specified, probably mostly female) participants. So the sampling error of this study is high and the theoretical importance of comparing these two groups is questionable.

[the latest edition 13 from 2018 still contains the quote 

How Many Correct Citations Could be False Positives?  

Of course, the example above is an exception.  Most of the time a cited reference contains an empirical finding that is consistent with the textbook claim.  However, this does not mean that textbook findings are based on credible and replicable evidence.  Until recently it was common to assume that statistical significance ensures that most published results are true positives (i.e, not a false positive random finding).  However, this is only the case if all results are reported. It has been known since 1959 that this is not the case in psychology (Sterling, 1959). Jerry Brunner and I developed a statistical tool that can be used to clean up the existing literature. Rather than actually redoing 50 years of research, we use the statistical results reported in original studies to apply a significance filter post-hoc.  Our tool is called zcurve.   Below I used zcurve to examine the replicability of studies that were used in chapter 2 about the self.  


More detailed information about the interpretation of the graph above is provided elsewhere (link).  In short, for each citation in the textbook chapter that is used as evidence for a claim, a team of undergraduate students retrieved the cited article and extracted the main statistical result that matches the textbook claim.  These statistical results are then converted into a z-score that reflects the strength of evidence for a claim.  Only significant results are important because non-significant results cannot support an empirical claim.  Zcurve fits a model to the (density) distribution of significant z-scores (z-scores > 1.96).  The shape of the density distribution provides information about the probability that a randomly drawn study from the set would replicate (i.e., reproduce a significant result).  The grey line shows the predicted distribution by zcurve. It matches the observed density in dark blue well. Simulation studies show good performance of zcurve. Zcurve estimates that the average replicability of studies in this chapter is  56%. This number would be reassuring if all studies had 56% power.  This would mean that all studies are true positives and if a study were replicated every other study would be successful. However, reality does not match this rosy scenario.  In reality, studies vary in replicability.  Studies with z-scores greater than 5 have 99% replicability (see numbers below x-axis).  However, studies with just significant results (z < 2.5) have only 21% replicability.  As you can see, there are a lot more studies with z < 2.5 than studies with z > 5.  So there are more studies with low replicability than studies with high replicability. The next plot shows model fit (higher numbers = worse fit) for zcurve models with a fixed proportion of false positives.  If the data are inconsistent with a fixed proportion of false positives, model fit decreases (higher numbers).  


 The graph shows that models with 100%, 90% or 80% false positives clearly do not fit the data as well as models with fewer false positives.  This shows that some textbook claims are based on solid empirical evidence.   However, model fit for models with 0% to 60% look very similar.  Thus, it is possible that the majority of claims in the self chapter of this textbook are false positives. It is even more problematic that textbook claims are often based on a single study with a student sample at one university.  Social psychologists have warned repeatedly that their findings are very sensitive to minute variations in studies, which makes it difficult to replicate these effects even under very similar conditions (Van Bavel et al., 2016), and that it is impossible to reproduce exactly the same experimental conditions (Stroebe and Strack, 2014).  Thus, the zcurve estimate of 56% replicability is a wildly optimistic estimate of replicability in actual replication studies. In fact, the average replicability of studies in social psychology is only 25% (Open Science Collaboration, 2015). 


Social psychology textbooks present many findings as if they are established facts, when this is not the case.  It is time to audit psychology textbooks to ensure that students receive accurate scientific information to inform their beliefs about human behavior.  Ideally, textbook authors will revise their textbooks to make them more scientific and instructors will chose textbook based on the credibility of the evidence in textbooks.

1 thought on “Auditing Social Psychology Textbooks: Hitler had High Self-Esteem

  1. Great post, thank you for sharing! To add my two cents, I think part of the problem is trying to find a ‘magic bullet’ as the explanation. By that I mean ‘X’ will explain phenomenon ‘Y’ under ‘Z’ conditions. However, at the level of a person there will likely by factors ranging from genes to cognition that contribute to explaining ‘Y’.

    Gigerenzer’s quote that social psychology theories are like toothbrushes (it’s fine to use your own, but not another’s) is apt.

Leave a Reply