Category Archives: personality stability

The Black Box of Meta-Analysis: Personality Change

Psychologists treat meta-analyses as the gold standard to answer empirical questions. The idea is that meta-analyses combine all of the relevant information into a single number that reveals the answer to an empirical question. The problem with this naive interpretation of meta-analyses is that meta-analyses cannot provide more information than the original studies contained. If original studies have major limitations, a meta-analytic integration does not make these limitations disappear. Meta-analyses can only reduce random sampling error, but they cannot fix problems of original studies. However, once a meta-analysis is published, the problems are often ignored and the preliminary conclusion is treated as an ultimate truth.

In this regard meta-analyses are like collateralized debt obligations that were popular until problems with CDOs triggered the financial crisis in 2008. A collateralized debt obligation (CDO) pools together cash flow-generating assets and repackages this asset pool into discrete tranches that can be sold to investors. The problem is when a CDO is considered to be less risky than the actual debt in the CDO actually is and investors believe they get high returns with low risks, when the actual debt is much more risky than investors believe.

In psychology, the review process and publication in a top journal give the appeal that information is trustworthy and can be cited as solid evidence. However, a closer inspection of the original studies might reveal that the results of a meta-analysis rest on shaky foundations.

Roberts et al. (2006) published a highly-cited meta-analysis in the prestigious journal Psychological Bulletin. The key finding of this meta-analysis was that personality levels change with age in longitudinal studies of personality.

The strongest change was observed for conscientiousness. According to the figure, conscientiousness doesn’t change much during adolescence, when the prefrontal cortex is still developing, but increases from d ~ .4 to d ~ .9 from age 30 to age 70 by about half a standard deviation.

Like many other consumers, I bought the main finding and used the results in my Introduction to Personality lectures without carefully checking the meta-anlysis. However, when I analyzed new data from longitudinal studies with large national representative samples, I could not find the predicted pattern (Schimmack, 2019a, 2019b, 2019c). Thus, I decided to take a closer look at the meta-analysis.

Roberts and colleagues list all the studies that were used with information about sample sizes, personality dimensions, and the ages that were studied. Thus, it is easy to find the studies that examined conscientiousness with participants who were 30 years or older at the start of the study.

Study NWeightStart1Max.IntervalES
Costa et al. (2000)22740.4441990.00
Costa et al. (1980)4330.08366440.00
Costa & McCrae (1988)3980.0835646NA
Labouvie-Vief & Jain (2002)3000.0639639NA
Branje et al. (2004)2850.064224NA
Small et al. (2003)2230.046866NA
P. Martin (2002)1790.03655460.10
Costa & McCrae (1992)1750.0353770.06
Cramer (2003)1550.03331414NA
Haan, Millsap, & Hartka (1986)1180.02331010NA
Helson & Kwan (2000)1060.02334247NA
Helson & Wink (1992)1010.0243990.20
Grigoriadis & Fekken (1992)890.023033
Roberts et al. (2002)780.024399
Dudek & Hall (1991)700.01492525
Mclamed et al. (1974)620.013633
Cartwright & Wink (1994)400.01311515
Weinryb et al. (1992)370.013922
Wink & Helson (1993)210.00312525
Total N / Average51441.00411119

There are 19 studies with a total sample size of N = 5,144 participants. However, sample sizes vary dramatically across studies from a low of N = 21 to a high of N = 2,274. Table 1 shows the proportion of participants that would be used to weight effect sizes according to sample sizes. By far the largest study found no significant increase in conscientiousness. I tried to find information about effect sizes from the other studies, but the published articles didn’t contain means or the information was from an unpublished source. I did not bother to obtain information from samples with less than 100 participants, because they contribute only 8% to the total sample size. Even big effects would be washed out by the larger samples.

The main conclusion that can be drawn from this information is that there is no reliable information to make claims about personality change throughout adulthood. If we assume that conscientiousness changes by half a standard deviation over a 40 year period, the average effect size for a decade is d = .12. For studies with even shorter retest intervals, the predicted effect size is even weaker. It is therefore highly speculative to extrapolate from this patchwork of data and make claims about personality change during adulthood.

Fortunately, much better information is now available from longitudinal panels with over thousand participants who have been followed for 12 (SOEP) or 20 (MIDUS) years with three or four retests. Theories of personality stability and change need to be revisited in the light of this new evidence. Updating theories in the face of new data is at the basis of science. Citing an outdated meta-analysis as if it provided a timeless answer to a question is not.

Open-SOEP: Personality and Wellbeing Revisited

[corrected 8/6/2019 5.29pm – there was a mistake in the model for worry]

After behaviorism banned emotions as scientific constructs and cognitivism viewed humans as computers, the 1980s witnessed the affective revolution. Finally, psychologists were again allowed to study feelings.

The 1980s also were a time where personality psychologists agreed on the Big Five as a unified model of personality traits. Accordingly, personality can be efficiently summarized by individuals’ standing on five dimensions: Neuroticism, Extraversion, Openness, Agreeableness, and Conscientiousness.

Not surprisingly, the 1980s also produced a model of personalty, emotions (affect), and well-being that has survived until today. The model was first proposed by Costa and McCrae in 1980 (see Schimmack, 2019, for details). This model assumed that extraversion is a disposition to experience more positive affect, neuroticism is a disposition to experience more negative affect, and the balance of positive and negative affect is a major determinant of life-satisfaction. As extraversion and neuroticism are independent dimensions, the model also assumed that positive affect and negative affect are independent, which led to the creation of the widely used Positive Affect and Negative Affect Schedule (Watson et al., 1988) as a measure of well-being.

The model also assumed that general affective dispositions account for most of the stability in well-being over time, while environmental factors produce only momentary and short-lived fluctuations around dispositional levels of well-being (Diener, 1984; Lykken & Tellgen, 1996). This model dominated well-being research in psychology for 20 years (see Diener, Suh, Lucas, & Smith, 1999, for a review).

However, when Positive Psychology emerged at the beginning of the new millenium, psychologists focus shifted from the influence of stable dispositions to factors that could be changed with interventions to boost individuals’ wellbeing (Seligman & Csikszentmihalyi, 2000) and some articles even questioned the influence of dispositions on well-being (Diener, Lucas, & Scollon, 1996). As a result, the past 20 years have seen very little new research on dispositional influences on well-being. The last major article is a meta-analysis that showed positive correlations of extraversion and neuroticism with several well-being indicators (Steel, Schmidt, & Shultz, 2008).

Revisiting the Evidence

There is robust evidence for the influence of neuroticism on wellbeing. Most important, this relationship has been demonstrated in multi-method studies that control for shared method variance when self-ratings of personality are correlated with self-ratings of well-being (McCrae & Costa, 1991; Schimmack, Oishi, Funder, & Furr, 2004). However, the relationship between extraversion and well-being is not as strong or consistent as one would expect based on Costa and McCrae’s (1980) model. For example, McCrae and Costa failed to find evidence for this relationship in a multi-method study, and other studies that controlled for response styles also failed to find the predicted effect (Schimmack, Schupp, & Wagner, 2008).

Taking a closer look at Costa and McCrae’s (1980) article, we see that they did not include life-satisfaction measures in their study. The key empirical finding supporting their model is that extraversion facets like sociabilty measured at time 1 predict positive affect and hedonic balance (positive affect minus negative affect) concurrently and longitudinally and that these correlations remain fairly stable over time. This suggests that personality is stable and contributes to the stable variance in the affect measures. However, the effect size is small (r = .22 to .24). This suggests that extraversion accounts for about 5% of the variance in affect. This finding hardly supports the claim that extraversion accounts for half of the stable variance in well-being.

It is symptomatic of psychology that subsequent articles run with the story while ignoring gaps in the actual empirical evidence. As longitudinal studies in psychology are rare, there have been few attempts to replicate Costa and McCrae’s findings.

Headey and Wearing (1989) replicated and extended Costa and McCrae’s study by including life-satisfaction measures as an indicator of wellbeing. They replicated the key findings and showed that personality also predicts future life-satisfaction. However, the effect size for extraversion was again fairly small; as was the effect of neuroticism, suggesting that most of the stable variance in life-satisfaction is not explained by extraversion and neuroticism.

A key limitation of both studies is that they do not take shared method variance into account. Although method variance may be transient, it is also possible that it is stable over time (Anusic et al., 2009). Thus, even the already modest effect sizes may still be inflated by shared method variance.

New Evidence

Data and Model

Fortunately, better data are now available to revisit the longitudinal relationships between personality and life-satisfaction. I used the data from the German Socio-Economic Panel (SOEP). The SOEP measured the Big Five personality traits on four occasions (waves) spanning a period of 12 years (2005, 2009, 2013, 2017). Personality was measured with the 15-item BFI-S. I created a measurement model for the BFI-S that shows measurement invariance across the four occasions (Schimmack, 2019a). I also related personality to the single-item life-satisfaction rating in the SOEP (Schimmack, 2019b). Here, I extend this analysis by taking advantage of the fourth measurement of personality in 2017, which makes it possible to separate trait and state variance in personality and well-being.

The SOEP measures life-satisfaction in two ways. First, it includes several domain-satisfaction items (health, finances, recreation, housing). Second, it includes a global life-satisfaction item. In a different post (Schimmack, 2019c), I examined the relationship between these items and found that global items are influenced by a general disposition factor and satisfaction with finances and health, while the other two domains are relatively unimportant. Based on this finding and related evidence (Zou, Schimmack, & Gere, 2013), I averaged the domain satisfaction judgments and used it as an indicator of life-satisfaction. This makes it possible to remove random measurement error from the measurement of life-satisfaction on a single occasion. I then fitted latent-trait-state (LST) models to the personality factors and the well-being factor. These models separate the longitudinal correlations into two components. A stable trait component and a changing state component. A third parameter estimates how stable state variance is over time.

There are several ways to relate personality to life-satisfaction in this model. I chose to predict life-satisfaction variance on each occasion to the personality variances on the same occasion. The model indirect function can then be used to examine how much of the variance is due to stable personality traits or due to personality states.

The availability of four waves of data also makes it possible to model stability of the residual variances in personality items. Typically, these residuals are allowed to correlate to allow for item-specific stability, but the use of correlated residuals makes it impossible to relate this variance to other constructs. With four waves, it is possible to fit an LST model to item-residuals. Exploration of the data showed that the neuroticism item “worry” showed consistent relationships with well-being. Thus, I fitted an LST model to this item and allowed for an influence of worry on life-satisfaction.

The synatax and the complete results are posted on OSF (SOEP.4W.B5.DSX.LS).


Overall model fit was acceptable, CFI = .967, RMSEA = .019, SRMR = .030.

Trait Variance and Stability of State Variance

Table 1 shows the amount of trait variance and the stability of state variance in the personality predictor variables. A more detailed discussion of the implications of these results for personality research can be found elsewhere (Schimmack, 2019a). The results for the Big Five serve as a comparison for the trait variance in life-satisfaction.


Table 2 shows how life-satisfaction at each time point is related to personality predictors. For model identification purposes, it is necessary to fix one relationship to zero. I used openness because meta-analysis show that it is the weakest predictor of life-satisfaction (Steel et. al., 2008). I did not impose constraints across the four waves.


The results show that out of the Big Five, neuroticism is the only notable predictor of life-satisfaction with a moderate effect size (r = -.26 to -.29). A notable finding is that extraversion is a weak predictor of life-satisfaction (r = .07 to .09). This finding is inconsistent with Costa and McCrae’s (1980) model. The results for agreeableness and conscientiousness are also weak. This finding is inconsistent with meta-analysis and with McCrae and Costa’s (1991) suggestion that high agreeableness and conscientiousness are also instrumental for higher life-satisfaction. Both halo and acquiescence bias are stronger predictors of life-satisfaction judgments than extraversion, agreeableness, and conscientiousness. Another notable finding is that the worry-facet of neuroticism is the strongest personality predictor; even stronger than the neuroticism factor (rs = -.33 to -.35). This finding is consistent with previous studies that facets of neuroticism and extraversion are better predictors of life-satisfaction than the global factors (Schimmack, Oishi, Funder, & Furr, 2004).

Table 3 shows how much of the variance in life-satisfaction is explained by trait factors that remain stable over time.


Given the weak effects of extraversion, agreeableness, and conscientiousness, it is not surprising that these Big Five traits explain less than 1% of the variance in life-satisfaction judgments. The only notable predictor is neuroticism, which explains 5-6% of the variance. In addition, the worry facet of neuroticism is an even stronger predictor of trait variance in life-satisfaction. This finding shows that more specific traits below the Big Five add to the prediction of life-satisfaction (Schimmack, Oishi, Furr, & Funder, 2004). Halo adds only 2% and acquiescence only 1%. By far the largest portion of the trait variance was unexplained with 41% of the variance. Combined this implies that approximately half of the variance in life-satisfaction is trait variance. This finding is consistent with estimates in a meta-analysis and other analyses of the SOEP data (Anusic & Schimmack, 2016; Schimmack, Krupp, Wagner & Schupp, 2010). The estimate of 55% trait variance is also smaller than the estimate of 70% trait variance in the Big Five personality traits. This finding is also consistent with meta-analytic comparison of personality and well-being measures (Anusic & Schimmack, 2016).

Table 4 shows the results for the state-predictors of life-satisfaction. Once more extraversion, agreeableness, and conscientiousness predict less than 1% of the variance. This time, neuroticism and worry are also relatively weak predictors because most of the relationship for this traits stems from the stable component. However, the results suggest that some changes in neuroticism and worry are related to changes in life-satisfaction. However, most of the state variance in life-satisfaction is not explained by the personality predictors (33% out of 44%).



These results challenge Costa and McCrae’s (1980) model of personality and well-being in several ways. First, extraversion is not a strong predictor of the stable variance in life-satisfaction. Second, even the influence of neuroticism accounts for only 10% of the stable trait variance in life-satisfaction. Adding other Big Five predictors also does not help because they have negligible relationships with life-satisfaction. Thus, most of the trait variance in life-satisfaction remains unexplained. It is either explained by more specific personality traits than the Big Five (facets) or by stable environmental factors (e.g., income). The SOEP data provide ample opportunity to look for additional predictors of trait variance. Also, researchers should conduct studies with broader personality questionnaires to find additional predictors of life-satisfaction. Searching for these predictors is an important area of research in an area that has stagnated over the past two decades.

Costa and McCrae’s model also underestimated the importance of state-factors. State factors are highly stable over fairly long periods of times and account for 50% of the reliable variance in life-satisfaction. As the Big Five mostly reflect stable traits, they cannot account for this important variance in life-satisfaction. Schimmack and Lucas (2010) argued that these factors are environmental factors because changes in life-satisfaction are shared between spouses. Thus, changes in actual life-circumstances may contribute to state variance in life-satisfaction. Consistent with this model, spouses were more similar in domains that are shared (housing, income) than in domains that are less shared (health, recreation).

Evidently, the conclusions are based on a single German sample. As impressive as these data are, it is important to compare results across samples from different populations. At least regarding the influence of extraversion, the present results are consistent with other studies that suggest the influence of extraversion on life-satisfaction (Kim, Schimmack, & Tsutsui, 2019). The idea that extraverts are happier has been exaggerated by Costa and McCrae’s model, while their own empirical results did not warrant this claim. The reason is that psychologists often ignore effect sizes.


The present results also have implications for developmental theories of personality. The idea of development is a process with an ideal outcome. For humans, the outcome is an adult human being with optimal capabilities. A collective of personality psychologists suggested that optimal personality development results in a personality type with optimal personality characteristics. I criticized this idea and argued that there is no such thing as an optimal personality. Just like there is no optimal height as the end-goal of human growth, there is no optimal level of extraversion or conscientiousness. In clinical psychology, the key criterion of mental health is that an intervention is beneficial for a patients’ well-being. Thus, we could argue that an optimal personality is a personality that maximizes individuals’ well-being. Meta-analyses suggests that extraveted, agreeable, and conscientious people have higher well-being. Thus, it might be beneficial for individuals to become more extraverted, agreeable, and conscientious. However, the present results challenge this view. After removing the evaluative aspect of personality from the Big Five only neuroticism remains a notable predictor of well-being. Thus, the key personality trait for self-improvement is neuroticism. Not surprisingly, this is also the key aspect that is targeted in self-help books and well-being programs. Until we have a better understanding of the relationship between personality and well-being, it seems premature to propose interventions that are aimed at changing individuals’ personality. Just like personality psychologists no longer endorse conversion therapy for sexual orientation, I urge for caution in submitting individuals who are carefree and impulsive to a conscientiousness conversion program. You never know when acting on the spur of a moment is the best course of action.

Open-SOEP: No Significant Personality Change over 12 Years

Studying personality stability and change is easy and hard. It is easy because the method is straightforward. Administer a valid measure of personality to a group of participants and repeat the measurement several times. Describing the method takes a sentence or two compared to pages that describe an intricate laboratory experiment with an elaborate deception. It is hard because it requires time and participants may drop out of a study. Meanwhile there is nothing to publish while a researcher is waiting for the next retest. In our fast paced world of academic publishing where researchers are expected to publish 5 or more articles a year, there is no place for slow research. As a result, evidence on personality change is scarce. The best evidence so far comes from a meta-analysis that patched together small studies with different measures, populations, and small samples. Although this meta-analysis is the best evidence available, it cannot be trusted because the evidence is inconclusive.

Psychologists have to thank economists and sociologists who are used to collaborate on big data collections. One of these collaborations is the German Socio-Economic Panel (SOEP). The SOEP is an ongoing longitudinal study with a representative sample that started in 1984. In 2005, the SOEP included the BFI-S; a 15-item personality measure that assesses the Big Five. Since then, the BFI-S has been administered in four-year intervals in 2009, 2013, and 2017. Thus, we now have longitudinal data spanning 12-years with four waves of data. This makes it possible to revisit the question of personality stability with much better data than a meta-analysis of heterogeneous studies can provide. Surely, the results are based on a German sample, but there is little evidence that personality development varies across cultures.


One drawback of the SOEP is that each personality dimension is measured with just three items. This makes scale scores unreliable and scale scores can be contaminated with method variance (e.g., evaluative bias, acquiescence bias). To avoid these problems and to examine measurement invariance, it is better to analyze the data with a measurement model that examines personality change at the level of latent variables that correct for measurement error. I developed a measurement model for the SOEP (Schimmack, 2019a) and I already demonstrated invariance across the first three waves of the SOEP (Schimmack, 2019b). Here I added the fourth wave of data from 2017 to the dataset to produce even better information about long-term changes in personality.

To analyze the data, I first fitted the measurement model for the BFI-S to the data from each wave and imposed equality constraints to ensure measurement invariance. The longitudinal stability of personality was examined using the latent-trait-state (LTS) model that decomposes stability over time into two components; (a) a stable trait component that never changes and (b) a changing state component. The changing state component allows for factors that influence personalty to change over time and to change personality. These changing factors may produce changes that last a long time or changes that are more temporary. The time course of changes in personality is modeled with an autoregressive parameter that reflects how many of the changes at time 1 are still present at time 2.

The LTS model is typically fitted without modeling mean level changes. However, the model can also be used to model the mean structure in the data. In latent variable models, changes in personality are assumed to occur at the level of the latent traits, while item means (intercepts) are assumed to be constant over time. As the latent trait is stable, it cannot be used to model mean-level changes. Thus, one option is to free the means of the state factors. However, the influence of the state factors decreases over time, which is inconsistent with the idea of lasting changes in personality. Thus, a better option is to let the means of the occasion specific factors to vary freely, even if the occasion specific variance is zero. Although this model may lack realism, it would show the pattern of mean level changes in the data without imposing some model on the data (e.g., a linear trend).

The model specification and the complete results can be found on OSF (ttps:// The overall model fit was acceptable, CFI = .971, RMSEA = .019, SRMR = .031.

Rank-order Stability and Change

A study of the first three waves in the SOEP replicated earlier findings of high retest stability in personality with stabilities over .9 over a one-year period (Conley, 1984; Schimmack, 2019c). However, three ways are insufficient to separate trait variance from state variance, and few studies with four waves of personality are available. Anusic and Schimmack (2016) used a meta-analytic approach to do so on the basis of smaller studies. Their model suggested that about 70% of the reliable variance in personality is trait variance and that the remaining 30% state variance are rather unstable with a low annual stability of .3. This would suggest that any changes in personality do not last long and individuals quickly revert back to their trait level of personality.

Table 1 shows the results for the SOEP data.


The results show a similar split between trait and state variance as the meta-analysis, with about two-thirds of the variance being trait variance and one-third being state variance. A new finding is that the halo factor, an evaluative bias in personality ratings, also has 60% trait variance. Thus, this response style can also be considered a stable trait. In contrast, acquiescence bias has less trait variance and seems to be more influenced by momentary factors that are inconsistent over time.

The results for the stability of the state variance are different from the meta-analysis. The SOEP data suggest that changes in personality are more persistent than the meta-analysis suggested. The annual stability estimates are around .7. Thus, any changes that are evident at time 2 would still be evident over the next years. The stability over 4-years is around .3. These results are more encouraging for researchers who are interested in personality change than the meta-analytic results in Anusic and Schimmack, 2016). Nevertheless, the relatively small amount of state variance and the high stability of the state variance imply that it takes time to find even small changes in personality. Not surprisingly, it has been difficult to uncover predictors of personality change even in large samples like the SOEP (Specht et al., 2011).

In sum, the results confirm that personality ratings are highly stable over extended periods of time and that a large portion of this stability is caused by stable factors that ensure persistent individual differences in personality over the life span.

Mean Levels

Table 2 shows the results for the mean levels. Means in the first year, 2005, are used as the reference group. The results provide little evidence for personalty change in adulthood. None of the Big Five dimensions shows a consistent trend over time. The results for conscientiousness are most important because a meta-analysis suggested that conscientiousness increases substantially throughout adulthood. There is no evidence for such a trend in the SOEP.


The general pattern of decreases for all five dimensions suggests that acquiescence bias might have changed over time. Thus, I also fitted a model with free means for acquiescence bias but the results did not change. Thus, it does not account for the small decrease in the Big Five. Adding means for the halo factor, instead, reduces changes for most scales, but would suggest a stronger decrease in neuroticism. However, the pattern is never a gradual change, but a drop from time 1 to time 2 with no major changes afterwards. This suggests that some panel effect or period effects have small effects on personality ratings, but there is no evidence to support the claim that personality systematically changes throughout adulthood.


Personality research was attacked by situationists who claimed that personality is a mere social construction. In the 1980s, personality researchers had presented evidence that personality traits are real and stable using twin studies, multi-rater studies, and longitudinal studies. However, two meta-analysis by Roberts and colleagues suggested that personality exists but is less stable than personality psychologists assumed. These meta-analysis had a strong influence on personality psychology in the 2000s. They are featured in personality textbooks and often cited as evidence that personality still develops throughout adulthood. However, more recent evidence are more consistent with the view of personality as mostly stable throughout adulthood. Costa and McCrae famously compared personality to plaster. While it can be shaped and molded early on, it finally sets into a shape that can not be altered. Yes, there may be cracks here and there, but the overall shape is set. While this image may be too rigid, it is consistent with the evidence that even major life-events that occur during adulthood seem to have very little influence on personality (Specht et al., 2011).

The idea of personalty change is often coupled with the notion that personality develops and that there can be personal growth in adulthood. The problem with these notions is that it implies that there is a normative or desirable direction of personality change. For example, an increase in conscientiousness is seen as evidence of growing maturity. However, the measurement model that I used distinguishes between the denotative and connotative aspects of personality. Lazy is both descriptive and evaluative. However, evaluations are rooted in cultural norms and values. Why is it good to work as much as possible, to avoid mistakes at any costs? Should education and policies try to increase conscientiousness levels? Is there an optimal level? These are all very difficult questions that go well beyond the existing science of personality. Once we focus on the denotative aspect of personality, we see that some people work harder than others or that some people are more creative than others, and that these differences are fairly stable, without any evidence what causes this stability. Just like people differ in personality, they differ in other characteristics that have received more attention. Current culture aims towards greater acceptance of differences in sexual orientation, gender identity, body types, religion, etc. Maybe we should also include personalty traits there and let introverts be proud introverts and disagreeable people be proud disagreeable people. Maybe personality differences only exist because they were not a problem during human evolution or diversity is even an advantage that allows humans as a group to adapt to different circumstances. Thus, the strong evidence of personality stability is not necessary a problem that needs to be solved because there is normal personality. There is only normal variation in personality.

The Hierarchy of Consistency Revisited

In 1984, James J. Conley published one of the most interesting studies of personality stability. However, this important article was published in Personality and Individual Differences and has been ignored. Even today, the article has only 184 citations in WebofScience. In contrast, the more recent meta-analyses of personality stability by Roberts and DelVeccio (2001) has 1,446 citations.

Sometimes more recent and more citations doesn’t mean better. The biggest problem in studies of stability is that random and occasion specific measurement error attenuates observed retest correlations. Thus, observed retest correlations are prone to underestimate the true stability of personality traits. With a single retest-correlation it is impossible to separate measurement error from real change. However, when more then two repeated measurements are observed, it is possible to separate random measurement error from true change, using a statistical approach that was developed by Heise (1969).

The basic idea of Heise’s model is that change accumulates over time. Thus, if traits change from T1 to T2 and from T2 to T3, the trait changed even more from T1 to T3.

Without going into mathematical details, the observed retest correlation from T1 to T3 should match the product of the retest correlations from T1 to T2 and T2 to T3.

For example, if r12 = .8 and r 23 = .8, r13 should be .8 * .8 = .64.

The same is also true if the retest correlations are not identical. Maybe more change occurred from T1 to T2 than from T2 to T3. The total stability is still a function of the product of the two partial stabilities. For example, r12 = .8 and r23 = .5 yields r13 = .8 * .5 = .4.

However, if there is random measurement error, the r13 correlation will be larger than the product of the r12 and r23 correlations. For example, using the above example and a reliability of .8, we get r12 = .8 * .8 = .64, r23 = .4 * .8 = .32 and the product is .64 * .32 = .20, while the actual r13 correlation is .32 * .8 = .256. Assuming that reliability is constant, we have three equations with three unknowns and it is possible to solve the equations to estimate reliability.

(1) r12 = rel*s1; s1 = r12/rel
(2) r23 = rel*s2; s2 = r23/rel
(3) r13 = rel*s1*s2, rel = r13/(s1*s2)

r = (r12*r23)/r13

with r12 = .64, r23 = .32, and r13 = .256, we get (.64*.32)/.256 = .8.

Heise’s model is called an autoregressive model which implies that over time, retest correlations will become smaller and smaller until they approach zero. However, if stability is high, this can take a long time. For example, Conley (1984) estimated that the annual stability of IQ tests is r = .99. With this high stability, the retest correlation over 40 years is still r = .67. Consistent with Conley’s prediction a study found that the retest correlation from age 11 to age 70 of r = .67 (ref), which is even higher than predicted by Conley.

The Figure below shows Conley’s estimate for personality traits like extraversion and neuroticism. The figure shows that reliability varies across studies and instruments from as low as .4 to as high as .9. After correcting for unreliability, the estimated annual stability of personality traits is s = .98.

The figure also shows that most studies in this meta-analysis of retest correlations covered short time-intervals from a few month up to 10 years. Studies with 10 or more years are rare. As a result, Conley’s estimates are not very precise.

To test Conley’s predictions, I used the three waves of the Midlife in the US study (MIDUS). Each wave was approximately 10 years apart with a total time span of 20 years. To analyze the data, I fitted a measurement model to the personality items in the MIDUS. The fit of the measurement model has been examined elsewhere (Schimmack, 2019). The measurement model was constrained for all three waves (see OSF for syntax). The model had acceptable overall fit, CFI = .963, RMSEA = .018, SRMR = .035 (see OSF for output).

The key finding are the retest correlations r12, r23, and r13 for the Big Five and two method factors; a factor for evaluative bias (halo) and acquiescence bias.


For all traits except acquiescence bias, the r13 correlation is lower than the r12 or r23 correlation, indicating some real change. However, for all traits, the r13 correlation is higher than the product of r12 and r23, indicating the presence of random measurement error or occasion specific variance.

The next table shows the decomposition of the retest-correlations into a reliability component and a stability component.

Reliable20Y Stability1Y Stability

The reliability estimates range from .84 to .92 for the Big Five scales. Reliability of the method factor is estimated to be lower. After correcting for unreliability, 20-year stability estimates increase from observed levels of .72 to .85 to estimated levels of .83 to .1. The implied annual stability estimates are above .99, which is higher than Conley’s estimate of .98.

Unfortunately, three time points are not enough to test the assumptions of Heise’s model. Maybe reliability increases over time. Another possibility is that some of the variance in personality is influenced by stable factors that never change (e.g., genetic variance). In this case, retest correlations do not approach zero, but to a level that is set by the influence of stable factors.

Anusic and Schimmack’s meta-analysis suggested that for the oldest age group, the amount of stable variance is 80, and that this asymptote is reached very quickly (see picture). However, this model predicts that 10-year retest correlations are equivalent to 20-year retest correlations, which is not consistent with the results in Table 1. Thus, the MIDUS data suggest that the model in Figure 1 overestimates the amount of stable trait variance in personality. More data are needed to model the contribution of stable factors to stability of personality traits. However, both models predict high stability of personality over a long period of 20 years.


Science can be hard. Astronomy required telescopes to study the universe. Psychologists need longitudinal studies to examine stability of personality and personality development. The first telescopes were imperfect and led to false beliefs about canals and life on Mars. Similarly, longitudinal data are messy and provide imperfect glimpses into the stability of personality. However, the accumulating evidence shows impressive stability in personality differences. Many psychologists are dismayed by this finding because they have a fixation on disorders and negative traits. However, the Big Five traits are not disorders or undesirable traits. They are part of human diversity. When it comes to normal diversity, stability is actually desirable. Imagine you train for a job and after ten years of training you don’t like it anymore. Imagine you marry a quiet introvert and five year later, he is a wild party animal. Imagine, you never know who you are because your personality is constantly changing. The grass on the other side of the fence is often greener, but self-acceptance and building on one’s true strength may be a better way to live a happy life than to try to change your personality to fit cultural norms or parental expectations. Maybe stability and predictability aren’t so bad after all.

The results also have implications for research on personality change and development. If natural variation in factors that influence personality produces only very small changes over periods of a few years, it will be difficult to study personality change. Moreover, small real changes will be contaminated with relatively large amounts of random measurement error. Good measurement models that can separate real change from noise are needed to do so.


Conley, J. J. (1984). The hierarchy of consistency: A review and model of longitudinal findings on adult individual differences in intelligence, personality and self-opinion. Personality and Individual Differences, 84, 11-25.

Heise D. R. (1969) Separating reliability and stability in test-retest correlation. Am. social. Rev. 34, 93-101.

Roberts, B. W., & DelVecchio, W. F. (2000). The rank-order consistency of personality traits from childhood to old age: A quantitative review of longitudinal studies. Psychological Bulletin, 126, 3–25.

Should Governments Shape Personality

Dear Wiebke, Patrick, Mitja, Jaap, Marie, Christian, Richard, Maike, Ulrich, Jenny, Cornelia, Johannes, and Brent.

You suggested that personality traits are actionable targets for public policy (Bleidorn et al., 2019).  I was surprised and actually shocked by this proposal.  I have taught personality psychology for over a decade and I always emphasize that individual differences are normal and should be celebrated like we celebrate other aspects of human diversity in culture and in sports.  Therefore I don’t think personality interventions are needed or desirable. Maybe there is some fundamental misunderstanding, but reading your article suggests that you are really proposing that public policy should target personality traits.

This idea is not new.  Socialistic governments and fascist governments had ideals of the model citizen and aimed to fit their citizens into this mold.

In marked contrast, democracies and market economies are built on the idea that citizens’ well-being is maximized by personal choice. The rule of governments is mainly to protect the safety of citizens and to regulate conflict when individual preferences are in conflict.  Well-being surveys consistently show that free and individualistic societies produce higher well-being than societies that impose ideological or religious norms on their citizens. 

The history of psychology also casts a shadow on attempts to shape individuals’ personality.  When homosexuality was a taboo, the Diagnostic and Statistical Manual of Mental Discorders included homosexuality as a mental illness.  Today most psychologists consider it a success that homosexuality is considered an expression of personal preferences and conversion therapy to cure homosexuals from some invented illness is considered unethical. More generally, mental illness has been defined in terms of patients’ suffering and concerns about patients’ well-being rather than in terms of cultural norms of acceptable or unacceptable characteristics.

New insights into biological influences on many illnesses (e.g., cancer) have given rise to personalized medicine which is based on the idea that the same treatment can have different effects for different individuals.  Rather than trying to fit patients to treatments, personalized medicine aims to fit treatments to patients.

Given these general trends one could argue that modern societies need personality psychology because a better understanding of individual differences is needed to create policies that respect individual freedom and creates opportunities for individuals to pursue their own well-being and to maximize their own potential. The call to shape personality, however, seems to suggest the opposite.  In fact, the call for governments to regulate personality development seems so absurd that it is seems improbable that a group of modern, presumably liberal leaning, psychologists would argue for it.  Does this mean I misunderstood your article? I hope so, but reading it didn’t help me to understand your position.

We agree that personality traits as enduring factors (a.k.a. causes, dispositions) within an individual that influence their thoughts, feelings, and behaviors.  You propose that governments should influence personality traits because personality traits influence life outcomes.  For example, personality traits influence divorce.  If governments want to reduce the divorce rates, they could target the personality traits that lead to divorce.  Another advantage of changing personality traits is that they are broad dispositions that influence a range of diverse behaviors. For example, conscientiousness influences class attendance, health behaviors, and making your bed every morning. Instead of having different interventions for each behavior, making individuals more conscientious would influence all three behaviors.  

Most of the article discusses empirical research whether it is actually possible to change personality traits.  I am not going to quibble with you about the evidence here because it is irrelevant to the main question that your article brings up: if it were possible to change personality, should governments role out interventions that shape personality? As the article focused on the Big Five traits, the question is whether governments should make citizens more or less neurotic, extraverted, agreeableness, conscientious, or open to experience?

“Our most general assertion is that personality traits are both stable and changeable, which makes personality trait change a powerful and hitherto relatively underused resource for policy makers”

You appear to be so convinced that government interventions that target personality are desirable that you ask only when to intervene, what intervention to use, who to target, and how to intervene. You never stop to wonder whether interventions are a good idea in the first place.

For example, you suggest that increasing conscientiousness in adolescence is a desirable policy goal because “it could elicit a cascade of positive outcomes” (p. 19).  And decreasing neuroticism is good because it “could significantly reduce one’s likelihood of experiencing negative life events” (p. 19).

In passing you mention the main problem of your proposal to regulate personality. “This is not to say that there are optimal trait levels that should be universally promoted in all people” However, you do not reconcile this observation with your call for personality policies. If there are no optimal levels, then what should be the target of personality policies?  And are the previous examples not evidence that you consider higher conscientiousness and lower neuroticism as optimal? If they are not considered more optimal, why should governments develop interventions to increase conscientiousness and to reduce neuroticism?

You end with the conclusion that “personality traits are ideal targets for interventions designed to improve life success,” which once more begs the question what the goal of personality interventions should be.  What is life success?  We know the answer is 42 (h/t Hitchhiker’s Guide to the Galaxy), but we don’t really understand the question very well.  

To end on a more positive note, I do think that governments can play a role in helping individuals to have better lives with higher well-being, and national rankings of quality of life and well-being show that some governments are doing a better job than others.  One main indicator of a good life is a healthy and long life, and health care is both a major contributor to GDP and a major policy agenda. Good health includes physical health and mental health.  Prevention and treatment of mental health problems such as anxiety, depression, or addiction are important. Unlike personality, health can be defined in terms of optimal functioning and we can evaluate policies in terms of their effectiveness to maximize optimal functioning. Addressing those concerns is an important policy agenda and psychologists can play an important role in addressing these issues. But I prefer to leave normal variation in personality alone. As you noted yourself, there are no optimal personality traits. The best personality policy is to create free societies that let individuals pursue their own happiness in the way they want to pursue it.

Your disagreeable colleague,