Tag Archives: Higher Order Structure

Confirmation Bias is Everywhere: Serotonin and the Meta-Trait of Stability

September 13, 2019DeYoung, Digman, Meta-Traits, Personality, Personality Measurement, Personality StructureHigher Order StructureUlrich Schimmack

Most psychologists have at least a vague understanding of the scientific method. Somewhere they probably heard about Popper and the idea that empirical data can be used to test theories. As all theories are false, these tests should at some point lead to an empirical outcome that is inconsistent with a theory. This outcome is not a failure. It is an expected outcome of good science. It also does not mean that the theory was bad. Rather it was a temporary theory that is now modified or replaced by a better theory. And so, science makes progress….

However, psychologists do not use the scientific method popperly. Null-hypothesis significance testing adds some confusion here. After all, psychologists publish over 90% successful rejections of the nil-hypothesis. Doesn’t that show they are good Popperians? The answer is no because the nil-hypothesis is not predicted by a theory. The nil-hypothesis is only useful to reject it to claim that there is a predicted relationship between two variables. Thus, psychology journals are filled with over 90% reports of findings that confirm theoretical predictions. While this may look like a major success, it actually shows a major problems. Psychologists never publish results that disconfirm a theoretical prediction. As a result, there is never a need to develop better theories. Thus, a root evil that prevents psychology from being a real science is verificationism.

The need to provide evidence for, rather than against, a theory led to the use of questionable research practices. Questionable research practices are used to report results that confirm theoretical predictions. For example, researchers may simply not report results of studies that did not reject the nil-hypothesis. Other practices can help to produce significant results by inflating the risk of a false positive result. The use of QRPs explains why psychology journals have been publishing over 90% results that confirm theoretical predictions for 60 years (Sterling, 1959). Only recently, it has become more acceptable to report studies that failed to support a theoretical prediction and question the validity of a theory. However, these studies are still a small minority. Thus, psychological science suffers from confirmation bias.

Structural Equation Modelling

Multivariate, correlational studies are different from univariate experiments. In a univariate experiment, a result is either significant or not. Thus, only tempering with the evidence can produce confirmation bias. In multivariate statistics, data are analyzed with complex statistical tools that provide researchers with flexibility in their data analysis. Thus, it is not necessary to alter the data to produce confirmatory results. Sometimes it is sufficient to analyze the data in a way that confirm a theoretical prediction without showing that alternative models fit the data equally well or better.

It is also easier to combat confirmation bias in multivariate research by fitting alternative models to the same data. Model comparison also avoids the problem of significance testing, where non-significant results are considered inconclusive, while significant results are used to confirm and cement a theory. In SEM, statistical inferences work the other way around. A model with good fit (non-significant chi-square or acceptable fit) is a possible model that can explain the data, while a model with significant deviation from the data is rejected. The reason is that the significance test (or model fit) is used to test an actual theoretical model rather than the nil-hypothesis. This forces researchers to specify an actual set of predictions and subject them to an empirical test. Thus, SEM is ideally suited to test theories popperly.

Confirmation Bias in SEM Research

Although SEM is ideally suited to test competing theories against each other, psychology journals are not used to model comparisons and tend to publish SEM research in the same flawed confirmatory way as other research is conducted and reported. For example, an article in Psychological Science this year published an investigation of the structure of personality and the hypothesis that several personality traits are linked to a bio-marker (Wright et al., 2019).

Their preferred model assumes that the Big Five traits neuroticism, agreeableness, and conscientiousness are not independent, but systematically linked by a higher-order triat called alpha or stability (Digman, 1997; DeYoung, 2007). In their model, the stability factor is linked to a marker of the serotonin (5-HT) prolactin response. This model implies that all three traits are related to the biomarker as there are indirect paths from all three traits to the biomarker that are “mediated” by the stability factor (for technical reasons the path goes from stabilty to the biomarker, but theoretically, we would expect the relationship to go the other way from a neurological mechanism to behaviour).

Thanks to the new world of open science, the authors shared actual MPLUS outputs of their models on OSF ( https://osf.io/h5nbu/ ). All the outputs also included the covariance matrix among the predictor variables, which made it possible to fit alternative models to the data.

Alternative Models

Another source of confirmation bias in psychology is that literature reviews fail to mention evidence that contradicts the theory that authors try to confirm. This is pervasive and by no means a specific criticism of the authors. Contrary to the claims in the article, the existence of a meta-trait of stability is actually controversial. Digman (1997) reported some SEM results that were false and could not be reproduced (cf. Anusic et al., 2009). Moreover, alpha could not be identified when the Big Five were modelled as latent factors (Anusic et al., 2009). This led me to propose that meta-traits may be an artifact of using impure Big Five scales as indicators of the Big Five. For example, if some agreeableness items have negative secondary loadings on neuroticism, the agreeableness scale is contaminated with valid variance in neuroticism. Thus, we would observe a negative correlation between neuroticism and agreeableness even across raters (e.g., self-ratings of neuroticism and informant ratings of agreeableness). Here I fitted a model with secondary loadings and independent Big Five factors to the data. I also examined the prediction that the biomarker is related to all three Big Five traits. The alternative model had acceptable fit, CFI = .976, RMSEA = .056.

The main finding of this model is that the biomarker shows only a significant relationship with conscientiousness, while the relationship with agreeableness trended in the right direction, but was not significant (p = .089) and the relationship for neuroticism was even weaker (p = .474). Aside from the question about significance, we also have to take effect sizes into account. Given the parameter estimates, the bimarker would produce very small correlations among the Big Five traits (e.g., r(A,C) = .19 * .10 = .019. Thus, even if these relationships were significant, they would not provide compelling evidence that a source of shared variance among the three traits has been identified.

The next model shows that the authors’s model ignored the stronger relationship between conscientiousness and the biomarker. When this relationship is added to the model, there is no significant relationship between the stability factor and the biomarker.

Thus, the main original finding of this study was that a serotonin related bio-marker was significantly related to conscientiousness, but not significantly related to neuroticism. This finding is inconsistent with theories that link neuroticism to serotonin, and evidence that serotonin reuptake inhibitors reduce neuroticism (at least in depressed patients). However, such results are difficult to publish because a single study with a non-significant results does not provide sufficient evidence to falsify a theory. However, fitting data to a theory only leads to confirmation bias.

The good news is that the authors were able to publish the results of an impressive study and that their data are openly available and can provide credible information for meta-analytic evaluations of structural models of personality, while the results of this study alone are inconclusive and compatible with many different theories of personality.

One way to take more advantage of these data would be to share the covariance matrix of items to model personality structure with a proper measurement model of the Big Five traits and to avoid the problem of contaminated scale scores, which is the best practice for the use of structural equation models. These models provide no evidence for Digman’s meta-traits (Schimmack, 2019a, Schimmack, 2019b).

In conclusion, the main point of this post is that (a) SEM can be used to test and falsify models, (b) SEM can be used to realize that data are consistent with multiple models and that better data are needed to find the better model, (c) studies of Big Five factors require a measurement model with Big Five factors and cannot rely on messy scale scores as indicators of the Big Five, and (d) personality psychologists need better training in the use of SEM.

When Personality Psychologists are High

August 13, 2019Personality, Personality MeasurementHigher Order Structure, Personality StructureUlrich Schimmack

Correction (8/31/2019): In an earlier version, I misspelled Colin DeYoung’s name. I wrote DeYoung with a small d. I thank Colin DeYoung for pointing out this mistake.

Introduction

One area of personality psychology aims to classify personality traits. I compare this activity to research in biology where organisms are classified into a large taxonomy.

In a hiearchical taxnomy, the higher levels are more abstract, less descriptive, but also comprise a larger group of items. For example, there are more mammals (class) than dogs (species).

in the 1980s, personality psychologists agreed on the Big Five. The Big Five represent a rather abstract level of description that combines many distinct traits into traits that are predominantly related to one of the Big Five dimensions. For example, talkative falls into the extraversion group.

To illustrate the level of abstraction, we can compare the Big Five to the levels in biology. After distinguishing vertebrate and invertebrate animals, there are five classes of vertebrate animals: mammals, fish, reptiles, birds, and amphibians). This suggests that the Big Five are a fairly high level of abstraction that cover a broad range of distinct traits within each dimension.

The Big Five were found using factor or pincipal component analysis (PCA). PCA is a methematical method that reduces the covariances among personality ratings to a smaller number of factors. The goal of PCA is to capture as much of the variance as possible with the smallest number of components. Evidently there is a trade-off. However, often the first components account for most of the variance while additional components add very little additional information. Using various criteria, five components seemed to account for most of the variance in personality ratings and the first five components could be identified in different datasets. So, the Big Five were born.

One important feature of PCA is that the components are independent (orthogonal). This is helpful to maximize the information that is captured with five dimensions. If the five dimensions would correlated, they would present overlapping variances and this redundancy would reduce the amount of explained variance. Thus, the Big Five are conceptually independent because they were discovered with a method that enforced independence.

Scale Scores are not Factors

While principal component analysis is useful to classify personality traits, it is not useful to do basic research on the causes and consequences of personality. For this purpose, personality psychologists create scales. Scales are usually created by summing items that belong to a common factor. For example, responses to the items “talkative,” “sociable,” and “reserved” are added up to create an extraversion score. Ratings of the item “reserved” are reversed so that higher scores reflect extraversion. Importantly, sum scores are only proxies of the components or factors that were identified in a factor analysis or a PCA. Thus, we need to distinguish between extraversion-factors and extraversion-scales. They are not the same thing. Unfortunately, personality psychologists often treat scales as if they were identical with factors.

Big Five Scales are not Independent

Now something strange happened when personalty psychologists examined the correlations among Big Five SCALES. Unlike the factors that were independent by design, Big Five Scales were not independent. Moreover, the correlations among Big Five scales were not random. Digman (1997) was the first to examine these correlations. The article has garnered over 800 citations.

Digman examined these correlations conducted another principal component analysis of the correlations. He found two factors. One factor for extraversion and openesss and the other factor for agreeableness and conscientiousness (and maybe low neuroticism). He proposed that these two factors represent an even higher level in a hierarchy of personality traits. Maybe like moving from the level of classess (mammals, fish, reptiles) to the level Phylum; a level that is so abstract that few people who are not biologists are familiar with.

Digman’s article stimulated further research on higher-order factors of personality, where higher means even higher than the Big Five, which are already at a fairly high level of abstraction. Nobody stopped to wonder how there could be higher-order factors if the Big Five are actually independent factors, and why Big Five scales show systematic correlations that were not present in factor analyses.

Instead personality psychologists speculated about the biological underpinning of the higher order factors. For example, Jordan B. Peterson (yes, them) and colleagues proposed that serotonin is related to higher stability (high agreeableness, high conscientiousness, and low neuroticism) (DeYoung, Peterson, and Higgins, 2002).

Rather than interpreting this finding as evidence that response tendencies contribute to correlations among Big Five scales, they interpreted this finding as a substantive finding about personality, society in the context of psychodynamic theories.

Only a few years later, separated from the influence of his advisor, DeYoung (2006) published a more reasonable article that used a multi-method approach to separate personality variance from method variance. This article provided strong evidence that a general evaluative bias (social desirable responding) contributes to correlations among Big Five Scales, which was formalized in Anusic et al.’s (200) model with an explicit evaluative bias (halo) factor.

However, the idea of higher-order factors was sustained by finding cross-method correlations that were consistent with the higher-order model.

After battling Colin as a reviewer, when we submitted a manuscript on halo bias in personality ratings, we finally were able to publish a compromise model that also included the higher order factors (stability/alpha; plasticity/beta), although we had problems identifying the alpha factor in some datasets.

The Big Mistake

Meanwhile, another article built on the 2002 model that did not control for rating biases and proposed that the correlation between the two higher-order factors implies that there is an even higher level in the hierarchy. The Big Trait of Personality makes people actually have more desirable personalities; They are less neurotic, more sociable, open, agreeable, and conscientious. Who wouldn’t want one of them as a spouse or friend? However, the 2006 article by DeYoung showed that the Big One only exists in the imagination of individuals and is not shared with perceptions by others. This finding was replicated in several datasets by Anusic et al. (2009).

Although claims about the Big One were already invalidated when the article was published, it appealed to some personality psychologists. In particular, white supremacist Phillip Rushton found the idea of a generally good personality very attractive and spend the rest of his life promoting it (Rushton & Irving, 2011; Rushton Bons, & Hur, 2008). He never realized the distinction between a personality factor, which is a latent construct, and a personality scale, which is the manifest sum-score of some personality items, and ignored DeYoung’s (2006) and other (Anusic et al., 2009) evidence that the evaluative portion in personality ratings is a rating bias and not substantive covariance among the Big Five traits.

Peterson and Rushton are examples of pseudo-science that mixes some empirical findings with grand ideas about human nature that are only loosely related. Fortunately, interest in the general factor of personality seems to be decreasing.

Higher Order Factors or Secondary Loadings?

Ashton, Lee, Goldberg, and deVries (2009) put some cold water on the idea of higher-order factors. They pointed out that correlations between Big Five Scales may result from secondary loadings of items on Big Five Factors. For example, the item adventurous may load on extraversion and openness. If the item is used to create an extraversion scale, the openness and extraversion scale will be positively correlated.

As it turns out, it is always possible to model the Big Five as independent factors with secondary loadings to avoid correlations among factors. After all, this is how exploratory factor analysis or PCA are able to account for correlations among personality items with independent factors or components. In an EFA, all items have secondary loadings on all factors, although some of these correlations may be small.

There are only two ways to distinguish empirically between a higher-order model and a secondary-loading model. One solution is to obtain measures of the actual causes of personality (e.g., genetic markers, shared environment factors, etc.) If there are higher order factors, some of the causes should influence more than one Big Five dimension. The problem is that it has been difficult to identify causes of personality traits.

The second approach is to examine the number of secondary loadings. If all openness items load on extraversion in the same direction (e.g., adventurous, interest in arts, interest in complex issues), it suggests that there is a real common cause. However, if secondary loadings are unique to one item (adventurous), it suggests that the general factors are independent. This is by no means a definitive test of the structure of personality, but it is instructive to examine how many items from one trait have secondary loadings on another trait. Even more informative would be the use of facet-scales rather than individual items.

I have examined this question in two datasets. One dataset is an online sample with items from the IPIP-100 (Johnson). The other dataset is an online sample with the BFI (Gosling and colleagues). The factor loading matrices have been published in separate blog posts and the syntax and complete results have been posted on OSF (Schimmack, 2019b; 2019c).

IPIP-100

Neuroticism items show 8 out of 16 secondary loadings on agreeableness, and 4 out of 16 secondary loadings on conscientiousnes.

	Item#	N	E	O	A	C	EVB	ACQ
Neuroticism
easily disturbed	3	0.44					-0.25
not easily bothered	10	-0.58			-0.12	-0.11	0.25
relaxed most of the time	17	-0.61		0.19	-0.17		0.27
change my mood a lot	25	0.55				-0.15	-0.24
feel easily threatened	37	0.50					-0.25
get angry easily	41	0.50			-0.13
get caught up in my problems	42	0.56			0.13
get irritated easily	44	0.53			-0.13
get overwhelmed by emotions	45	0.62			0.30
stress out easily	46	0.69			0.11
frequent mood swings	56	0.59				-0.10
often feel blue	77	0.54	-0.27			-0.12
panic easily	80	0.56			0.14
rarely get irritated	82	-0.52
seldom feel blue	83	-0.41	0.12
take offense easily	91	0.53
worry about things	100	0.57			0.21	0.09
	SUM	0.83	-0.05	0.00	0.07	-0.02	-0.38	0.12

Agreeableness items show only one secondary loading on conscientiousness and one on neuroticism.

Agreeableness
indifferent to feelings of others	8				-0.58		-0.27	0.16
not interested in others’ problems	12				-0.58		-0.26	0.15
feel little concern for others	35				-0.58		-0.27	0.18
feel others’ emotions	36				0.60		0.22	0.17
have a good word for everybody	49				0.59		0.10	0.17
have a soft heart	51				0.42		0.29	0.17
inquire about others’ well-being	58				0.62		0.32	0.19
insult people	59	0.19	0.12		-0.32	-0.18	-0.25	0.15
know how to comforte others	62		0.26		0.48		0.28	0.17
love to help others	69		0.14		0.64		0.33	0.19
sympathize with others’ feelings	89				0.74		0.30	0.18
take time out for others	92				0.53		0.32	0.19
think of others first	94				0.61		0.29	0.17
	SUM	-0.03	0.07	0.02	0.84	0.03	0.41	0.09

Finally, conscientiousness items show only one secondary loading on agreeableness.

Conscientiousness
always prepared	2					0.62	0.28	0.17
exacting in my work	4		-0.09			0.38	0.29	0.17
continue until everything is perfect	26			0.14		0.49	0.13	0.16
do things according to a plan	28					0.65	-0.45	0.17
do things in a half-way manner	29					-0.49	-0.40	0.16
find it difficult to get down to work	39	0.09				-0.48	-0.40	0.14
follow a schedule	40					0.65	0.07	0.14
get chores done right away	43					0.54	0.24	0.14
leave a mess in my room	63					-0.49	-0.21	0.12
leave my belongings around	64					-0.50	-0.08	0.13
like order	65					0.64	-0.07	0.16
like to tidy up	66				0.19	0.52	0.12	0.14
love order and regularity	68				0.15	0.68	-0.19	0.15
make a mess of things	72	0.21				-0.50	-0.26	0.15
make plans and stick to them	75					0.52	0.28	0.17
neglect my duties	76					-0.55	-0.45	0.16
forget to put things back	79					-0.52	-0.22	0.13
shirk my duties	85					-0.45	-0.40	0.16
waste my time	98					-0.49	-0.46	0.14
	SUM	-0.03	-0.01	0.01	0.03	0.84	0.36	0.00

Of course, there could be additional relationships that are masked by fixing most secondary loadings to zero. However, it also matters how strong the secondary loadings are. Weak secondary loadings will produce weak correlations among Big Five scales. Even the secondary loadings in the model are weak. Thus, there is little evidence that neuroticism, agreeableness, and conscientiousness items are all systematically related as predicted by a higher-order model. At best, the data suggest that neuroticism has a negative influence on agreeable behaviors. That is, people differ in their altruism, but agreeable neurotic people are less agreeable when they are in a bad mood.

Results for extraversion and openness are similar. Only one extraversion item loads on openness.

Extraversion
hard to get to know	7		-0.45				-0.23	0.13
quiet around strangers	16		-0.65				-0.24	0.14
skilled handling social situations	18		0.65		0.13		0.39	0.15
am life of the party	19		0.64				0.16	0.14
don’t like drawing attention to self	30		-0.54			0.13	-0.14	0.15
don’t mind being center of attention	31		0.56				0.23	0.13
don’t talk a lot	32		-0.68				0.23	0.13
feel at ease with people	33	-0.20	0.64		0.16		0.35	0.16
feel comfortable around others	34	-0.23	0.65		0.15		0.27	0.16
find it difficult to approach others	38		-0.60				-0.40	0.16
have little to say	57	-0.14	-0.52				-0.25	0.14
keep in the background	60		-0.69				-0.25	0.15
know how to captivate people	61		0.49	0.29			0.28	0.16
make friends easily	73	-0.10	0.66		0.14		0.25	0.15
feel uncomfortable around others	78	0.22	-0.64				-0.24	0.14
start conversations	88		0.70		0.12		0.27	0.16
talk to different people at parties	93		0.72				0.22	0.13
	SUM	-0.04	0.88	0.02	0.06	-0.02	0.37	0.01

And only one extraversion item loads on openness and this loading is in the opposite direction from the prediction by the higher-order model. While open people tend to like reading challenging materials, extraverts do not.

Openness
full of ideas	5			0.65			0.32	0.19
not interested in abstract ideas	11			-0.46			-0.27	0.16
do not have good imagination	27			-0.45			-0.19	0.16
have rich vocabulary	50			0.52			0.11	0.18
have a vivid imagination	52			0.41		-0.11	0.28	0.16
have difficulty imagining things	53			-0.48			-0.31	0.18
difficulty understanding abstract ideas	54	0.11		-0.48			-0.28	0.16
have excellent ideas	55			0.53	-0.09		0.37	0.22
love to read challenging materials	70		-0.18	0.40			0.23	0.14
love to think up new ways	71			0.51			0.30	0.18
	SUM	-0.02	-0.04	0.75	-0.01	-0.02	0.40	0.09

The next table shows the correlations among the Big Five SCALES.

Scale Correlations	N	E	O	A	C
Neuroticism (N)	–
Extraversion (E)	-0.21	–
Openness (O)	-0.16	0.13	–
Agreeableness (A)	-0.13	0.27	0.17	–
Conscientiousness (C)	-0.17	0.11	0.14	0.20	–

The pattern mostly reflects the influence of the evaluative bias factor that produces negative correlations of neuroticism with the other scales and positive correlations among the other scales. There is no evidence that extraversion and openness are more strongly correlated in the IPIP-100. Overall, these results are rather disappointing for higher-order theorists.

The next table shows the correlations among the Big Five Scales.

Scale Correlations	N	E	O	A	C
Neuroticism (N)	–
Extraversion (E)	-0.21	–
Openness (O)	-0.16	0.13	–
Agreeableness (A)	-0.13	0.27	0.17	–
Conscientiousness (C)	-0.17	0.11	0.14	0.20	–

The pattern of correlations reflects mostly the influence of the evaluative bias factor. As a result, the neuorticism scale is negatively correlated with the other scales and the other scales are positively correlated with each other. There is no evidence for a stronger correlation between extraversion and openness because there are no notable secondary loadings. There is also no evidence that agreeableness and conscientiousness are more strongly related to neuroticism. Thus, these results show that DeYoung’s (2006) higher-order model is not consistent across different Big Five questionnaires.

Big Five Inventory

DeYoung found the higher-order factors with the Big Five Inventory. Thus, it is particularly interesting to examine the secondary loadings in a measurement model with independent Big Five factors (Schimmack, 2019b).

Neuroticism items have only one secondary loading on agreeableness and one on conscientiousness and the magnitude of these loadings is small.

	Item#	N	E	O	A	C	EVB	ACQ
Neuroticism
depressed/blue	4	0.33	-0.15	0.20			-0.48	0.06
relaxed	9	-0.72					0.23	0.18
tense	14	0.51					-0.25	0.20
worry	19	0.60	-0.08		0.07		-0.21	0.17
emotionally stable	24	-0.61					0.27	0.18
moody	29	0.43					-0.33	0.18
calm	34	-0.58	-0.04	-0.14		-0.12	0.25	0.20
nervous	39	0.52					-0.25	0.17
	SUM	0.79	-0.08	-0.01	-0.05	-0.02	-0.42	0.05

Four out of nine agreeableness items have secondary loadings on neuroticism, but the magnitude of these loadings is small. Four items also have loadings on conscientiousness, but one item (forgiving) has a loading opposite to the one predicted by the hgher-order model.

Agreeableness
find faults w. others	2	0.15			-0.42		-0.24	0.19
helpful / unselfish	7				0.44	0.10	0.29	0.23
start quarrels	12	0.13	0.20		-0.50	-0.09	-0.24	0.19
forgiving	17				0.47	-0.14	0.24	0.19
trusting	22			0.15	0.33		0.26	0.20
cold and aloof	27		-0.19	0.14	-0.46		-0.35	0.17
considerate and kind	32	0.04			0.62		0.29	0.23
rude	37	0.09	0.12		-0.63	-0.13	-0.23	0.18
like to cooperate	42		0.15	-0.10	0.44		0.28	0.22
	SUM	-0.07	0.00	-0.07	0.78	0.03	0.44	0.04

For conscientiousness, only two items have a secondary loading on neuroticism and two items have a secondary loading on agreeableness.

Conscientiousness
thorough job	3					0.59	0.28	0.22
careless	8				-0.17	-0.51	-0.23	0.18
reliable worker	13			-0.09	0.09	0.55	0.30	0.24
disorganized	18			0.15		-0.59	-0.20	0.16
lazy	23					-0.52	-0.45	0.17
persevere until finished	28					0.56	0.26	0.20
efficient	33	-0.09				0.56	0.30	0.23
follow plans	38		0.10	-0.06		0.46	0.26	0.20
easily distracted	43	0.19	0.09			-0.52	-0.22	0.17
	SUM	-0.05	0.00	-0.05	0.04	0.82	0.42	0.03

Overall, these results provide no support for the higher-order model that predicts correlations among all neuroticism, agreeableness, and conscientiousness items. These results are also consistent with Anusic et al.’s (2009) difficulty of identifying the alpha/stability factor in a study with the BFI-S, a shorter version of the BFI.

However, Anusic et al. (2009) did find a beta-factor with BFI-S scales. The present analysis of the BFI do not replicate this finding. Only two extraversion items have small loadings on the openness factor.

Extraversion
talkative	1	0.13	0.70			-0.07	0.23	0.18
reserved	6		-0.58			0.09	-0.21	0.18
full of energy	11		0.34		-0.11		0.58	0.20
generate enthusiasm	16	0.07	0.44	0.11			0.50	0.20
quiet	21		-0.81		0.04		-0.21	0.17
assertive	26	-0.09	0.40	0.14	-0.24	0.18	0.24	0.19
shy and inhibited	31	0.18	0.64				-0.22	0.17
outgoing	36		0.72		0.09		0.35	0.18

And only one openness item has a small loading that is opposite to the predicted direction. Extraverts are less likely to like reflecting.

Openness
original	5			0.53	-0.11		0.38	0.21
curious	10			0.41		-0.07	0.31	0.24
ingenious	15			0.57			0.09	0.21
active imagination	20	0.13		0.53		-0.17	0.27	0.21
inventive	25	-0.09		0.54	-0.10		0.34	0.20
value art	30	0.12		0.46	0.09		0.16	0.18
like routine work	35			-0.28	0.10	0.13	-0.21	0.17
like reflecting	40		-0.08	0.58			0.27	0.21
few artistic interests	41			-0.26			-0.09	0.15
sophisticated in art	44	0.07		0.44		-0.06	0.10	0.16
	SUM	0.04	-0.03	0.76	-0.04	-0.05	0.36	0.19

In short, there is no support for the presence of a higher-order factor that produces overlap between extraversion and openness.

The pattern of correlations among the BFI scales, however, might suggest that there is an alpha factor because neuroticism, agreeableness and conscientiousness tend to be more strongly correlated with each other than with other dimensions. This shows the problem of using scales to study higher-order factors. However, there is no evidence for a higher-order factor that combines extraversion and openness as the correlation between these traits is an unremarkable r = .18.

Scale Correlations	N	E	O	A	C
Neuroticism (N)	–
Extraversion (E)	-0.26	–
Openness (O)	-0.11	0.18	–
Agreeableness (A)	-0.28	0.16	0.08	–
Conscientiousness (C)	-0.23	0.18	0.07	0.25	–

So, why did DeYoung (2006) find evidence for higher-order factors? One possible explanation is that BFI scale correlations are not consistent across different samples. The next table shows the self-report correlations from DeYoung (2006) below the diagonal and discrepancies above the diagonal. Three of the four theoretically important correlations tend to be stronger in DeYoung’s (2006) data. It is therefore possible that the secondary loading pattern differs across the two datasets. It would be interesting to fit an item-level model to DeYoung’s data to explore this issue further.

Scale Correlations	N	E	O	A	C
Neuroticism (N)	–	0.10	0.03	-0.06	-0.08
Extraversion (E)	-0.16	–	0.07	0.01	0.03
Openness (O)	-0.08	0.25	–	-0.02	0.02
Agreeableness (A)	-0.36	0.15	0.06	–	-0.01
Conscientiousness (C)	-0.31	0.21	0.09	0.24	–

In conclusion, an analysis of the BFI also does not support the higher-order model. However, results seem to be inconsistent across different samples. While this suggests that more research is needed, it is clear that this research needs to model personality at the level of items and not with scale scores that are contaminated by evaluative bias and secondary loadings.

Conclusion

Hindsight is 20/20 and after 20 years of research on higher-order factors a lot of this research looks silly. How could there be higher order factors for the Big Five factors if the Big Five are independent factors (or components) by default. The search for higher-order factors with Big Five scales can be attributed to methodological limitations, although higher-order models with structural equation modeling have been around since the 1980. It is rather obvious that scale scores are impure measures and that correlations among scales are influenced by secondary loadings. However, even when this fact was pointed out by Ashton et al. (2009), it was ignored. The problem is mainly due to the lack of proper training in methods. Here the problem is the use of scales as indicators of factors, when scales introduce measurement error and higher-order factors are method artifacts.

The fact that it is possible to recover independent Big Five factors from questionnaires that were designed to measure five independent dimensions says nothing about the validity of the Big Five model. To examine the validity of the Big Five as a valid model of the highest level in a taxonomy of personality trait it is important to examine the relationship of the Big Five with the diverse population of personality traits. This is an important area of research that could also benefit from proper measurement models. This post merely focused on the search for higher order factors for the Big Five and showed that searching for higher-order factors of independent factors is a futile endeavor that only leads to wild speculations that are not based on empirical evidence (Peterson, Rushton).

Even DeYoung and Peterson seems to have realized that it is more important to examine the structure of personality below rather than above the Big Five (DeYoung, Quility, & Peterson, 2007) . Whether 10 aspects, 16 factors (Cattell) or 30 facets (Costa & McCrae) represent another meaningful level in a hierarchical model of personality traits remains to be examined. Removing method variance and taking secondary loadings into account will be important to separate valid variance from noise. Also, factor analysis is superior to principle component analysis unless the goal is simply to describe personality with atheoretical components that capture as much variance as possible.

Correct me if you can

This blog post is essentially a scientific article without peer-review. I prefer this mode of communication over submitting manuscript to traditional journals where a few reviewers have the power to prevent research from being published. This happened with a manuscript that Ivana Anusic and I submitted and that was killed by Colin DeYoung as a reviewer. I prefer open reviews and I invite Colin to write an open review of this “article.” I am happy to be corrected and any constructive comments would be a welcome contribution to advancing personality science. Simply squashing critical work so that nobody gets to see it is not advancing science. The new way of conducting open science with open submissions, open reviews is the way to go. Of course, others are also invited to engage in the debate. So, let’s start a debate with the thesis “Higher-order factors of the Big Five do not exist.”

Replicability-Index

Improving the replicability of empirical research