The Structure of Agreeableness

In 1934, Thurstone published his groundbreaking article “The Vectors of Mind” that introduced factor analysis as an objective method to examine the structure of personality traits. His first application of his method to a list of 60 trait adjectives rated by 1,300 participants yielded five factors. It would take several more decades for personality psychologists to settle on a five-factor model of personality traits (Digman, 1990).

Although the five-factor model dominates personality, it is not the only theory of personality traits. The biggest rival of the Big Five model is the HEXACO model that postulates six factors (Ashton, Lee, Perugini et al, 2004; Lee & Ashton, 2004).

A recent special issue in the European Journal of Personality contained a target article by Ashton and Lee in favor of replacing the Big Five model with the HEXACO model and responses by numerous prominent personality psychologists in defense of the Big Five model.

The key difference between the Big Five model and the HEXACO model is the representation of pro-social (self-transcending) versus egoistic (self-enhancing) traits. Whereas the Big Five model assumes that a single general factor called agreeableness produces shared variance among pro-social traits, the HEXACO model proposes two distinct factors called Honesty-Humility and Agreeableness. While the special issue showcases the disagreement among persoality researchers, it fails to offer an empirical solution to this controversy.

I argue that the main reason for stagnation in research on the structure of personality is the reliance on Thurstone’s outdated method of factor analysis. Just like Thurstone’s multi-factor method was an important methodological contribution that replaced Spearman’s single-factor method, Jöreskog (1969) developed confirmatory factor analysis that addresses several limitations in Thurstone’s method. Neither Ashton and Lee nor any of the commentators suggested the use of CFA to test empirically whether pro-social traits are represented by one general or two broad traits.

There are several reasons why personality psychologists have resisted the use of CFA to study personality structure. Some of these reasons were explicitly states by McCrae, Zonderman, Costa, Bond, and Paunonen (1996).

1. “In this article we argue that maximum likelihood confirmatory factor analysis (CFA), as it has typically been applied in investigating personality structure, is systematically flawed” (p. 552).

2. “The CFA technique may be inappropriately applied” (p. 553)

3. “CFA techniques are best suited to the analysis of simple structure models” (p. 553)

4. “Even proponents of CFA acknowledge a long list of problems with the technique, ranging from technical difficulties in estimation of some models to the cost in time and effort involved.”

5. The major advantage claimed for CFA is its ability to provide statistical tests of the fit of empirical data to different theoretical models. Yet it has been known for years that the chi-square test on which most measures of fit are based is problematic”

6. A variety of alternative measures of goodness-of-fit have been suggested, but their interpretation and relative merits are not yet clear, and they do not yield tests of statistical

7. Data showing that chisquare tests lead to overextraction in this sense call into question
the appropriateness of those tests in both exploratory and confirmatory maximum likelihood models. For example, in the present study the largest single problem was a residual correlation between NEO-PI-R facet scales E4: Activity and C4: Achievement Striving. It would be possible to specify a correlated error term between these two scales, but the interpretation of such a term is unclear. Correlated error usually refers to a nonsubstantive source of variance. If Activity and Achievement Striving were, say, observer ratings, whereas all other variables were self-reports, it would make sense to control for this difference in method by introducing a correlated error term. But there are no obvious sources of correlated error among the NEO-PI-R
facet scales in the present study.

8, With increasing familiarity of the technique and the availability of convenient computer programs (e.g., Bentler, 1989; Joreskog & Sorbom, 1993), it is likely that many more researchers will conduct CFA analyses in the future. It is therefore essential to point out the dangers in an uncritical adoption and simplistic application of CFA techniques (cf. Breckler, 1990).

9. Structures that are known to be reliable showed poor fits when evaluated by CFA techniques. We believe this points to serious problems with CFA itself when used to examine personality structure.

I may be giving McCrae et al. (1996) to much credit for the lack of CFA studies of personality structure, but their highly cited article clearly did not encourage future generations to explore personality structure with CFA. This is unfortunate because CFA has many advantages over traditional EFA.

The first advantage is the ability to compare model fit. McCrae et al. are overly concerned about the chi-square statistic that is sensitive to sample size and rewards overly complex models. Even when they published their article, other fit indices were already used to address these problems. Today researchers have even more experience with the evaluation of model fit. More important, well-established fit indices also reward parsimony and can favor a more parsimonious model with five factors over a less parsimonious model with six factors. This opens the door to head to head model comparisons of the Big Five and HEXACO model.

The second advantage of CFA is that factors are theoretically specified by researchers rather than empirically driven by patterns in the data. This means that CFA can find factors that are represented by as few as two items and factors that are represented by 10 or more items. In contrast, EFA will favor factors with many items. This means that researchers need to know the structure to represent all factors equally, which is not possible in studies that try to discover a structure and the number of factors. Different sampling from the item space may explain differences in personality structures with EFA, but is not a problem for CFA as long as factors are represented by a minimum of two items.

A third advantage of CFA is that it is possible to model hierarchical structures. Theoretically, most personality researchers agree that the Big Five or HEXACO are higher-order factors that explain only some of the variance in so-called facets or primary traits like modesty, altruism, forgiveness, or morality. However, EFA cannot represent hierarchies. This makes it necessary to examine the higher order structure of personality with scales that average several items. However, these scales are impure indicators of the primary factors and the impurities can distort the higher-order factor structure.

A fourth advantage is that CFA can also model method factors that can distort the actual structure of personality. For example, Thurstone’s results appeared to be heavily influenced by an evaluative factor. With CFA it is possible to model evaluative biases and other response styles like acquiescence bias to separate systematic method variance from the actual correlations between traits (Anusic et al., 2009).

Finally, CFA requires researchers to think hard about the structure of personality to specify a plausible model. In contrast, EFA produces a five factor or six-factor solution in less than a minute. Most of the theoretical work is then to find post-hoc explanations for the structure.

In short, most of the problems listed by McCrae et al. (1996) are not bugs, but features of CFA. The fact that they were unable to create a fitting model to their data only shows that they didn’t take advantage of these features. In contrast, I was able to fit a hierarchical model with method factors to Costa and McCrae’s NEO-PI-R questionnaire (Schimmack, 2019). The results mostly confirmed the Big Five model, but some facets did not have primary loadings on the predicted factor.

Here I am using hierarchical CFA to examine the structure of pro-social and anti-social traits. The Big Five model predicts that all traits are related to one general factor that is commonly called Agreeableness. The HEXACO model predicts that there are two relatively independent factors that are called Agreeableness and Honestiy-Humility (Ashton & Lee, 2005).


The data were collected by Crowe, Lynam, and Miller (2017). 1205 participants provided self-ratings on 104 items that were selected from various questionnaires to measure Big-5 agreeableness or HEXACO-agreeableness and honesty and humility. The data were analyzed with EFA. In the recent debate about the structure of personality, Lynam, Crowe, Vize, and Miller (2020) pointed to the finding of a general factor to argue that there is “Little Evidence That Honesty-Humility Lives Outside of FFM Agreeableness” (p. 530). They also noted that “at no point in the hierarchy did a separate Honesty-Humility factor emerge” (p. 530).

In their response, Ashton and Lee criticize the item-selection and argue that the authors did not sample enough items that reflect HEXACO-Agreeableness. “Now, the Crowe et al. variable set did contain a substantial proportion of items that should represent good markers of Honesty-
Humility, but it was sharply lacking in items that represent some of the best markers of HEXACO Agreeableness: for example, one major omission was item content related to
even temper versus anger-proneness, which was represented only by three items of the Patience facet of HEXACO-PI-R Agreeableness” (p. 565). They also are concerned about oversampling of other facets. “The Crowe et al. “Compassion” factor is one of the
all-time great examples of a ‘bloated specific.’ (p. 565). These are valid concerns for analyses with EFA that allow researchers to influence the factor structure by undersampling or oversampling items. However, CFA is not influenced by the number of items that reflect a specific facet. Even two items are sufficient to create a measurement model of a specific factor and to examine whether these two factors are fairly independent or substantially correlated. Thus, it is a novel contribution to examine the structure of pro-social and anti-social traits using CFA.

Exploratory Analyses

Before I present the CFA results, I also used EFA as implemented in MPLUS to examine the structure of the 104 items. Given the predicted hierarchical structure, it is obvious that neither a one-factor nor a two-factor solution should fit the data. The main purpose of an EFA analysis would be to explore the number of primary factors / facets that is represented in the 104 items. To answer this question, it is inappropriate to rely on the scree test that is overly influenced by item-selection or on the chi-square test that leads to over-extraction. Instead, factor solutions can be compared with standard fit indices from CFA such as the Comparative Fit Index (CFI), the Root Mean Square Error of Approximation (RMSEA), the Akaike Information Criterion (AIC), The Bayesian Information Criterion (BIC), and the sample-size adjusted BIC (SSA-BIC).

Although all indices take parsimony into account, three of the indices favor the most complex structure with 20 factors. BIC favors parsimony the most and settles for 12 factors as the optimal number. The sample-size adjusted BIC favors 16 factors. Evidently, the number of primary factors is uncertain. The reason is that many larger primary factors may be split into highly correlated more specific factors that are called nuances. The structure of primary factors can be explored with CFA analyses of primary factors. Thus, I settled for the 12 factor solution to start the CFA analyses.

Another way to determine the number of primary factors is to look at the scales that were represented in the items. There were 9 HEXACO scales: forgiving (A), gentle (A), flexible (A), patient (A), modest (H), fair (H), greed avoidant (H), sincere (H), and altruistic. In addition, there were the Big Five facets empathetic, trusting, straightforward, modest, compassionate, and polite. Some of these facets overlap with HEXACO facets, suggesting that the 12 factor solution may reflect the full content of the HEXACO and Big Five facets.

Exploration of the Primary Factors with CFA

Before I start, it is important to point out a major misconception about CFA. The term confirmatory has mislead researchers to assume that CFA should only be used to confirm a theoretically expected structure. Any post-hoc modifications of a model to fit actual data would then be a questionable research practice. This is a misconception that is based on Joreskog’s unfortunate decision to label his statistical method confirmatory. Joreskog’s actual description of his method that few researchers have read makes it clear that CFA can be used for exploration.

“We shall give examples of how a preliminary interpretation of the factors can be successively modified to determine a final solution that is acceptable from the point of view of both goodness of fit and psychological interpretation. It is highly desirable that a hypothesis that has been generated in this way should subsequently be confirmed or disproved by obtaining new data and subjecting these to a confirmatory analysis.” (p. 183).

There is nothing different from using CFA to explore data than to run a multiple regression analysis or an exploratory ANOVA. Data exploration is an important part of science. It is only questionable when exploratory analyses are falsely presented as confirmatory. For example, I could pretend that I came up with an elaborate theory of agreeableness and present the final model as theoretically predicted. This is known as HARKing (Kerr, 1998). However exploration is needed to generate a model that is worthwhile testing in a confirmatory study. As nobody has examined the structure of agreeableness with CFA, the first attempt to fit a model is naturally exploratory and can only serve as a starting point for the development of a structural model of agreeableness.

Items were considered to be candidate items of a factor if they loaded at least .3 on a factor. This may seem a low level, but it is unrealistic to expect high loadings of single items on a single factor.

The next step was to fit a simple structure to the items of each factor. When possible this model also included an acquiescence factor that coded direct versus reverse scored items. Typically, this model did not fit the data well. The next step was to look for correlated residuals that show shared variance between items. These items violate the assumption of local independence. That is, the only reason for the correlation between items should be the primary factor. Items can show additional shared variance for a number of reasons such as similar wording or shared specific content (nuances). When many items were available, one of the items with correlated residuals was deleted. Another criterion for item selection was the magnitude of primary loadings. A third criterion aimed to get a balanced number of direct and reverse scored items when this was possible.

Out of the 12 factors, 9 factors were interpretable and matched one of the a priori facets. The 9 primary factors were allowed to correlate freely with each other. The model had acceptable overall fit, CFI = .954, RMSEA = .030. Table 2 shows the factors and the items with their source and primary loadings.

The factor analysis only failed to distinguish clearly between the gentle and flexible facets of HEXACO-agreeableness. Thus, HEXACO agreeableness is represented by three rather than four factors. More important is that the four Honesty-Humility facets of the HEXACO model, namely Greed-Avoidance (F11), Sincerity(F9), Modest (F7), and Morality (F5) were clearly identified. Thus, it is possible to examine the relationship of Honesty-Humilty to Big-Five agreeableness with these data. Importantly, CFA is not affected by the number of indicators. Three items with good primary loadings are sufficient to identify a factor.

Table 3 shows the pattern of correlations among the 9 factors. The aim of a structural model of agreeableness is to explain this pattern of correlations. However, visual inspection of these correlations alone can provide some valuable insights about the structure of agreeableness. Strong evidence for a two-factor model would be provided by high correlations among the Honesty-Humility facets, high correlations among the Agreeableness facets, and low correlations between Honesty-Humulity and Agreeableness facets (Campbell & Fiske, 1959).

The pattern of correlations is only partially consistent with the two-factor structure of the HEXACO model. All of the correlations among the Honesty-Humility facets and all of the correlations among the Agreeableness facets are above .38. However, 6 out of the 20 cross-trait correlations are also above .38. Moreover all of the correlations are positive, suggesting that Honesty-Humility and Agreeableness are not independent.

For the sake of comparability, I also computed scales corresponding to the nine factors. Table 3 shows the correlations for scales that could be used for a two-step hierarchical analysis by means of EFA. In general correlations in Table 3 are weaker than correlations in Table 2 because correlations among scale scores are attenuated by random measurement error. The pattern of correlations remains the same, but there are more cross-trait correlations that exceed the lowest same-trait correlation of .31.

The Structure of Agreeableness

A traditional EFA has severe limitations in examining the structure of correlations in Table 3. One major limitation is that structural relations have to be due to higher-order factors. The other limitation is that 9 variables can only identify a small number of factors. These problems are typically overlooked because traditionally EFA results are not examined for goodness of fit. However, an EFA analysis with fit indices in MPLUS shows that the two-factor model does not meet the standard fit criteria of .95 for CFI and .06 for RMSEA, CFI = .939, RMSEA = .092. The two factors clearly did correspond to the Hexaco-Humility and Agreeableness factors, but even with secondary loadings, the model fails to fully account for the pattern of correlations. Moreover, the two factors were correlated r = .41, suggesting that they are not independent of each other.

Figure 1 shows a CFA model that is based on the factor correlations in Table 2. This model does not fit the data as well, CFI = .950, RMSEA = .031, as the simple measurement model with correlated factors, CFI = .954, RMSEA = .030, but I was unable to find a plausible model with better fit. I encourage others to do so. On the other hand, the model fit the data much better than the two-factor EFA model, CFI = .939, RMSEA = 092.

The model does show the expected separation of Honesty-Humility and Agreeableness facets, but the structure is more complex. First morality has nearly equal loadings on the Honesity-Humility and the Agreeableness facet. The markers of Honesty-Humility are the other three facets, modest, manipulative (reversed), and materialistic (reversed). I suggest that the common element of these facets is self-enhancement. The loading of morality suggests that individuals who are highly motivated to self-enhance are more likely to engage in immoral behaviors to do so.

Four of the remaining factors have direct loadings on the agreeableness factor. Considerate (caring/altruistic) has the highest loading, but all four have substantial loadings. The aggressive factor has the weakest relationship with agreeableness. One reason is that it is also related to neuroticism in the Big Five model. In this model, aggressiveness is linked to agreeableness indirectly by consideration and self-enhancement, suggesting that individuals who do not care about others (low consideration) and who care about themselves (self-enhancement) are more likely to aggress against others.

In addition, there were several correlated residuals between some facets. Trusting and forgiving shared unique variance. Maybe forgiveness is more likely to occur when individuals trust people that they had good intentions and are not going to repeat their transgression again in the future. Aggression showed shared variance with modesty and morality. One explanation could be that modestly is related to low assertiveness and that assertive people are more likely to use aggression. Morality may relate to aggression because it is typically considered immoral to harm others. Morality was also related to manipulativeness. Here the connection is rather obvious because manipulating people is immoral.

The model in Figure 1 should not be considered the ultimate solution to the controversy about the structure of pro-social and anti-social behaviors. To the contrary. The model should be considered the first structural model that actually fits the data. In contrast, previous results based on EFA produced models that approximated the structure, but never fit the actual data. Future research should test alternative models and these models should be evaluated in terms of model fit and theoretical plausibility (Joreskog, 1969).

Do these results answer the question whether there are five or six higher-order factors of personality? The answer is no. Consistent with the Five Factor model, the Honesty-Humility or Self-Enhancement factor is not independent of agreeableness. It is therefore reasonable to think about Honesty-Humility as subordinate to Agreeableness in a hierarchical model of traits. Personally, I favor this interpretation of the results. However, proponents of the HEXACO model may argue that the correlation between the agreeableness factor and the honesty-humility factor is low enough to make honesty-humility a separate factor. Moreover, it was not possible to control for evaluative bias (halo) variance in this model, and halo bias may have inflated the correlation between the two factors. On the other hand, if correlations of .4 are considered low enough to split Big-Five factors, it is possible that closer inspection of other Big Five domains can also be split into distinct, yet positively correlated factors. The main appeal of the Big Five model is that the five factors are fairly independent after controlling for evaluative bias variance. Moreover, many facets have loadings of .4 or even lower on the Big Five factor. It is therefore noteworthy that all correlations among the 9 factors were positive and suggest that a general factor produces covariation among them. The common factor can also be clearly interpreted in terms of the focus on self-interest versus other’s interests or needs to guide behaviors. Individuals high in agreeableness take others’s needs and feelings into account, whereas those low in agreeableness are guided strongly by self-interest. The split into two factors may be due to the fact that importance of self and other are not always in conflict with each other. Especially individuals low in self-enhancement may still differ in terms of their pro-social behaviours.


The main contribution of this blog post is to show the importance of testing model fit in investigations of the structure of personality traits. While it may seem self-evident that a theoretical model should fit the data, personality psychologists have failed to test model fit or ignored feedback that their models do not fit the data. Not surprisingly, personality psychologists continue to argue over models because it is easy to propose models, if they do not have to fit actual data. If structural research wants to be an empirical science, it has to subject models to empirical tests that can falsify models that do not fit the data.

I empirically showed that a simple two-factor model does not fit the data. At the same time, I showed that a model with a general agreeableness factor and several independent facets also does not fit the data. Thus, neither of the dominant models fits the data. At the same time, the data are consistent with the idea of a general factor underlying pro-social and anti-social behaviors, while the relationship among facets remains to be explored in more detail. Future research needs to control for evaluative bias variance and examine how the structure of agreeableness is embedded in a larger structural model of personality.


Ashton, M. C., Lee, K., Perugini, M., Szarota, P., de Vries, R. E., Di Blas, et al.
(2004). A six-factor structure of personality-descriptive adjectives: Solutions
from psycholexical studies in seven languages. Journal of Personality and Social
Psychology, 86, 356–366

Lee, K., & Ashton, M. C. (2004). Psychometric properties of the HEXACO Personality
Inventory. Multivariate Behavioral Research, 39, 329–358.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s