Making Mountains Out of Molehills: Illusory “Huge” Sex Differences



When you hear claims that “men and women differ a lot”, see effect-size numbers such as D > 2, and variables that correctly distinguish men and women with over 90% accuracy — it is worth asking what is being measured and how.

In many cases, researchers have taken many small sex differences (e.g., on spatial ability, aggression, interests) and combined them into a single composite that distinguishes men vs. women. That composite may show a large mean difference — but this does not mean there is a single giant biological cause that explains all of it.

Why this matters:

  • Each individual trait difference may have a modest effect size (say d ~ 0.2-0.4).
  • Aggregating correlated traits boosts the composite’s reliability and amplifies the mean difference.
  • A large composite difference is useful for classification (distinguishing male vs. female) but does not support the claim of a unified biological process underlying all those traits.
  • Because biological sex can already be determined with perfect accuracy, building a model to predict sex from traits is largely redundant from a causal perspective. What matters is how much sex explains trait variation, not how well traits predict sex.
  • When students or media see a big number, they often infer a big innate difference; but this reverses the direction of causality. We are using differences in masturbation and use of pornography to predict whether somebody is a man or a woman, rather than examining how much sex differences cause variation in behaviors.

Critics have long argued that psychological sex/gender differences are, on average, small and that many claims of large or unified differences collapse under closer scrutiny (Fine, 2005; Hyde, 2014). Meta-research shows that for most psychological and cognitive domains the differences are small (Zell et al., 2023; Szymanski & Henning, 2022). The largest and most consistent sex differences are observed for height (d ≈ 1.5), pornography use (d ≈ 1.5), and sex drive (d ≈ 1.0).

Reversing the direction of analysis creates another misunderstanding. When many dimensional traits are used to predict whether someone has XX or XY chromosomes, there is no remaining variability once we correctly classify more than 90 percent of people. But in the opposite direction—even d = 1.5 implies considerable unexplained variation within each group. Some men rarely watch pornography and some women do; some women have higher sex drives than many men. This within-group variation is psychologically meaningful but ignored by analyses that treat variability in predictors as error variance when predicting a dichotomous outcome (male = XY / penis; female = XX / vagina).

Conclusion

Combining many small effects into one large number is not the same as discovering a deep, singular sex-difference mechanism. There is no scientific purpose in creating a statistical predictor of sex when sex is directly observable. The only reason to compute such values is rhetorical; to make biological effects on variation in personality and other traits appear larger and more coherent than they really are.


References

References

Archer, J. (2019). The reality and evolutionary significance of human psychological sex differences. Biological Reviews, 94(4), 1381–1415. https://doi.org/10.1111/brv.12507

Eliot, L., Ahmed, A., Khan, H., & Patel, J. (2021). Dump the “dimorphism”: Comprehensive synthesis of human brain studies reveals few male–female differences beyond size. Neuroscience & Biobehavioral Reviews, 125, 667–697. https://doi.org/10.1016/j.neubiorev.2021.03.013

Fine, C. (2005). The gender similarities hypothesis. American Psychologist, 60(6), 581–592. https://doi.org/10.1037/0003-066X.60.6.581

Hyde, J. S. (2014). Gender similarities and differences. Annual Review of Psychology, 65, 373–398. https://doi.org/10.1146/annurev-psych-010213-115057

Joel, D., & Fausto-Sterling, A. (2016). Beyond sex differences: New approaches for thinking about variation in human behavior. Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1688), 20150451. https://doi.org/10.1098/rstb.2015.0451

Lippa, R. A. (2010). Gender differences in personality and interests: When, where, and why? Social and Personality Psychology Compass, 4(11), 1098–1110. https://doi.org/10.1111/j.1751-9004.2010.00320.x

Su, R., Rounds, J., & Armstrong, P. I. (2009). Men and things, women and people: A meta-analysis of sex differences in interests. Psychological Bulletin, 135(6), 859–884. https://doi.org/10.1037/a0017364

Szymanski, D. M., & Henning, S. L. (2022). Are many sex/gender differences really power differences? PNAS Nexus, 3(2), pgae025. https://doi.org/10.1093/pnasnexus/pgae025

Zell, E., Strickhouser, J. E., Sedikides, C., & Alicke, M. D. (2023). The gender similarities hypothesis 2.0: Meta-analytic synthesis of psychological gender differences across the life span. Psychological Bulletin, 149(2), 109–137. https://doi.org/10.1037/bul0000380

2 thoughts on “Making Mountains Out of Molehills: Illusory “Huge” Sex Differences

  1. Perhaps but”Most” is not “All” Uli as you already know.

    In Social Psychology three of the most important that are often ignored or minimised are:

    1) People vs Things – ~1.1D

    2) Jungian Thinking vs Feeling – ~0.7-1.0D.

    3) Empathising-Systematizing – ~0.8-1.0D

    Most people thing that Jungian T vs F correlates with Agreeableness – But that is only half true. Agreeableness by itself correlates ~D=0.45.

    So when you say “There is no scientific purpose in creating a statistical predictor of sex when sex is directly observable. ” – For me that is putting the cart before the horse.

    It is not a statistical predictor of sex, per se, but perhaps a statistical predictor of behavior x sex.

    And while one can argue over correlation/causation vs socialisation effects one should be obliged to explain why Twin Studies show variance explained by shared environment is essentially negligible by late adolescence.

    And that encapsulates a frequent problem of Social Psych – Trying to measure presumed socialisation effects that are essentially idiosyncratic in nature => Sex behavior x Shared Environment is low for the bulk of the distribution.

    Can you enlighten me as to why so many researchers are not more bothered by the fact that shared environment almost always decreases with age?

    1. Please read my chapter where I discuss actual sex differences and their sizes.

      A standardized mean difference of d = 1 is large compared to most effect sizes in psychology, but it is not large in an absolute sense.

      My comparison stanadard is the evolved and biological sex difference in height, d ~ 1.3.

      If we have evolved differences that were selected for in evolution that were equally important, they should be equally large.

      Most are not and the ones your are listing are also a bit smaller. So, men and women have overlapping distributions for most traits, not categorical differences like for testosterone levels. Variation within groups does not overlap with variation in the other group.

      So, evolution did not think it was important to make men and women very different. For example, men also needed empathy, feelings, and anxiety (not being afraid of a tiger was not very adaptive).

      All of this can be discussed and has nothing to do with the post. Here I point out that it is silly to average many specific traits to get a large sex difference for the average score. What is the point of this statistical exercise. It can be done, it leads to categorical distinctions between men and women, but what do we learn from that. We can do the same for statisticians and people who are not statisicians or any two groups. Given enough predictors we can discriminant between them. The question is what is the point of doing so?

Leave a Reply