Self-esteem is one of the most popular constructs in personality/social psychology. The common approach to study self-esteem is to give participants a questionnaire with one or more questions (items). To study gender differences, the scores of multiple items are added up or averaged separately for men and women, and then subtracted from each other. If this difference score is not zero, the data show a gender difference. Of course, the difference will never be exactly zero. So, it is impossible to confirm the nil-hypothesis that men and women are exactly the same. A more interesting question is whether gender differences in self-esteem are fairly consistent across different samples and how large gender differences, on average, are. To answer this question, psychologists conduct meta-analyses. A meta-analysis combines findings from small samples into one large sample.
The first comprehensive meta-analysis of self-esteem reported a small difference between men and women, with men reporting slightly higher levels of self-esteem than women (Kling et al., 1999). What does a small difference look like. First, imagine that you have to predict whether 50 men and 50 women are above or below the average (median in self-esteem, but the only information that you have is their gender. If there was no difference between men and women, you have no reliable information about gender and you might just flip a coin and have a 50% chance of guessing correctly. However, given the information that men are slightly more likely to be above average in self-esteem, you guess above-average for men and below average for women. This blatant stereotype helps you to be correct 54% of the time, but you are still incorrect in your guesses 46% of the time.
Another way to get a sense of the magnitude of the effect size is to compare it to well-known, large gender differences. One of the largest gender differences that is also easy to measure is height. Men are 1.33 standard deviations taller than women, while the difference in self-esteem ratings is only 0.21 standard deviations. This means the difference in self-esteem is only 15% of the difference in height.

A more recent meta-analysis found an even smaller difference of d = .11 (Zuckerman & Hall, 2016). A small difference increases the probability that gender differences in self-esteem ratings may be even smaller or even in the opposite direction in some populations. That is, while the difference in height is so large that it can be observed in all human populations, the difference in self-esteem is so small that it may not be universally observed.
Another problem with small effects is that they are more susceptible to the influence of systematic measurement error. Unfortunately, psychologists rarely examine the influence of measurement error on their measures. Thus, this possibility has not been explored.
Another problem is that psychologists tend to rely on convenience samples, which makes it difficult to generalize findings to the general population. For example, psychology undergraduate samples select for specific personality traits that may make male or female psychology students less representative of their respective gender.
It is therefore problematic to draw premature conclusions about gender differences in self-esteem on the basis of meta-analyses of self-esteem ratings in convenience samples.
What Explains Gender Differences in Self-Esteem Ratings?
The most common explanations for gender differences in self-esteem are gender roles (Zuckerman & Hall, 2016) or biological differences (Schmitt et al, 2016). However, there are few direct empirical tests of these hypotheses. Even biologically oriented researchers recognize that self-esteem is influenced by many different factors, including environmental ones. It is therefore unlikely that biological sex differences have a direct influence on self-esteem. A more plausible model would assume that gender differences in self-esteem are mediated by a trait that shows stronger gender differences and that predicts self-esteem. The same holds for social theories. It seems unlikely that women rely on gender stereotypes to evaluate their self-esteem. It is more plausible that they rely on attributes that show gender differences. For example, Western societies have different beauty standards for men and women and women tend to have lower self-esteem in ratings of their attractiveness (Gentile et al., 2009). Thus, a logical next step is to test mediation models. Surprisingly, few studies have explored well-known predictors of self-esteem as potential mediators of gender differences in self-esteem.
Personality Traits and Self-Esteem
Since the 1980s, thousands of studies have measured personality from the perspective of the Five Factor Model. The Big Five capture variation in negative emotionality (Neuroticism), positive energy (Extraversion), curiosity and creativity (Openness), cooperation and empathy (Agreeableness), and goal-striving and impulse-control (Conscientiousness). Given the popularity of self-esteem and the Big Five in personality research, many studies have examined the relationship between the Big Five and self-esteem, while other studies have examined gender differences in the Big Five traits.
Studies of gender differences show the biggest and most consistent differences for neuroticism and agreeableness. Women tend to score higher on both dimensions than men. The results for the Big Five and self-esteem are more complicated. Simple correlations show that higher self-esteem is associated with lower Neuroticism and higher Extraversion, Openness, Agreeableness, and Conscientiousness (Robins et al., 2001). The problem is that Big Five measures have a denotative and an evaluative component. Being neurotic does not only mean to respond more strongly with negative emotions; it also is undesirable. Using structural equation model, Anusic et al. (2009) separated the denotative and evaluative component and found that self-esteem was strongly related to the evaluative component of personality ratings. This evaluative factor in personality ratings was first discovered by Thorndike (1920) one-hundred years ago. The finding that self-esteem is related to overly positive self-ratings of personality is also consistent with a large literature on self-enhancement. Individuals with high self-esteem tend to exaggerate their positive qualities ().
Interestingly, there are very few published studies of gender differences in self-enhancement. One possible explanation for this is that there is only a weak relationship between gender and self-enhancement. The rational is that gender is routinely measured and that many studies of self-enhancement could have examined gender differences. It is also well known that psychologists are biased against null-findings. Thus, ample data without publications suggest that there is no strong relationship. However, a few studies have found stronger self-enhancement for men than for women. For example, one study showed that men overestimate their intelligence more than women (von Stumm et al., 2011). There is also evidence that self-enhancement and halo predict biases in intelligence ratings (Anusic, et al., 2009). However, it is not clear whether gender differences are related to halo or are specific to ratings of intelligence.
In short, a review of the literature on gender and personality and personality and self-esteem suggests three potential mediators of the gender differences in self-esteem. Men may report higher levels of self-esteem because they are lower in neuroticism, lower in agreeableness, or higher in self-enhancement.
Empirical Test of the Mediation Model
I used data from the Gosling–Potter Internet Personality Project (Gosling, Vazire, Srivastava,
& John, 2004). Participants were visitors of a website who were interested in taking a personality test and receiving feedback about their personality. The advantage of this sampling approach is that it creates a very large dataset with millions of participants. The disadvantage is that men and women who visited this sight might differ in personality traits or self-esteem. The questionnaire included a single-item measure of self-esteem. This item shows the typical gender difference in self-esteem (Bleidorn et al., 2016).
To separate descriptive factors of the Big Five from evaluative bias and acquiescence bias, I fitted a measurement model to the 44-item Big Five Inventory. I demonstrated earlier that this measurement model has good fit for Canadian participants (Schimmack, 2019). To test the mediation model, I added gender and self-esteem to the model. In this study, gender was measured with a simple dichotomous male vs. female question.
Gender was a predictor of all 7 factors (Big Five + Halo + Acquiescence). Exploratory analysis examined whether gender had unique relationships with specific BFI items. These relationships could be due to unique relationships of gender with specific personality traits called facets. However, few notable relationships were observed. Self-esteem was predicted by all seven personality traits and gender. However, openness to experience showed weak relationships with self-esteem. To stabilize the model, this path was fixed to zero.
I fitted the model to data from several nations. I selected nations with (a) a large number of complete data (N = 10,000), familiarity with English as a first or common second language (e.g., India = yes, Japan = no), while trying to sample a diverse range of cultures because gender differences in self-esteem tend to vary across cultures (Bleidorn et al., 2016; Zuckerman & Hall, 2016). I fitted the model to samples from four nations: US, Netherlands, India, and Philippines with N = 10,000 for each nation. Table 1 shows the results.

The first two rows show the fit of the Canadian model to the other four nations. Fit is a bit lower for Asian samples, but still acceptable.
The results for sex differences in the Big Five are a bit surprising. Although all four samples show the typical gender difference in neuroticism, the effect sizes are relatively small. For agreeableness, the gender differences in the two Asian samples are negligible. This raises some concerns about the conclusion that gender differences in personality traits are universal and possibly due to evolved genetic differences (Schmitt et al, 2016). The most interesting novel finding is that there are no notable gender differences in self-enhancement. This also implies that self-enhancement cannot mediate gender differences in self-esteem.
The strongest predictor of self-esteem is self-enhancement. Effect sizes range from d = .27 in the Netherlands to d = .45 in the Philippines. The second strongest predictor is neuroticism. As neuroticism also shows consistent gender differences, neuroticism partially mediates the effect of gender on self-esteem. Although weak, agreeableness is a consistent negative predictor of self-esteem. This replicates Anusic et al.’s (2009) finding that the sign of the relationship reverses when halo bias in agreeableness ratings is removed from measures of agreeableness.
The total effects show the gender differences in the four samples. Consistent with meta-analysis the gender differences in self-esteem are weak with effect sizes ranging from d = .05 to d = .15. Personality explains some of this relationship. The unexplained direct effect of gender is very small.
Conclusion
A large literature and several meta-analysis have documented small, but consistent gender differences in self-ratings of self-esteem. Few studies have examined whether these differences are mere rating biases or tested causal models of these gender differences. This article addressed these questions by examining seven potential mediators; the Big Five traits as well as halo bias and acquiescence bias.
The results replicated previous findings that gender differences in self-esteem are small, d < .2. They also showed that neuroticism is a partial mediator of gender differences in self-esteem. Women tend to be more sensitive to negative information and this disposition predicts lower self-esteem. It makes sense that a general tendency to focus on negative information also extends to evaluations of the self. Women appear to be more self-critical than men. A second mediator was agreeableness. Women tend to be more agreeable and agreeable people tend to have lower self-esteem. However, this relationship was only observed in Western nations and not in Asian nations. This cultural difference explains why gender differences in self-esteem tend to be stronger in Western than in Asian cultures. Finally, a general evaluative bias in self-ratings of personality was the strongest predictor of self-esteem, but showed no notable gender differences. Gender also still had a very small relationship with self-esteem after accounting for personalty mediators.
Overall, these results are more consistent with models that emphasize similarities between men and women (Men and Women are from Earth) than models that emphasize gender differences (Women are from Venus and Men are from Mars). Even if evolutionary theories of gender differences are valid, they explain only a small amount of the variance in personality traits and self-esteem. As one evolutionary psychologists put it “it is undeniably true that men and women are more similar than different genetically, physically and psychologically” (p. 52). The results also undermine claims that women internalize negative stereotypes about them and have notably lower self-esteem as a result. Given the small effect sizes, it is surprising how much empirical and theoretical attention gender differences in self-esteem have received. One reason is that psychologists often ignore effect sizes and only care about the direction of an effect. Given the small effect size of gender on self-esteem, it seems more fruitful to examine factors that produce variation in self-esteem for men and women.