The first measures of the Big Five used a large number of items to measure personality. This made it difficult to include personality measures in studies as the assessment of personality would take up all of the survey time. Over time, shorter scales became available. One important short Big Five measure is the BFI-S (Lang et al., 2011). This 15-item measure has been used in several national representative, longitudinal studies such as the German Socio-Economic Panel (Schimmack, 2019a). These results provide unique insights into the stability of personality (Schimmack, 2019b) and the relationship of personality with other constructs such as life-satisfaction (Schimmack, 2019c). Some of these results overturn textbook claims about personality. However, critics argue that these results cannot be trusted because the BFI-S is an invalid measure of personality.
Thus, it is is critical importance to evaluate the validity of the BFI-S. Here I use Gosling and colleagues data to examine the validity of the BFI-S. Previously, I fitted a measurement model to the full 44-item BFI (Schimmack, 2019d). It is straightforward to evaluate the validity of the BFI-S by examining the correlation of the 3-item BFI-S scale scores with the latent factors based on all 44 BFI items. For comparison purposes, I also show the correlations for the BFI scale scores. The complete results for individual items are shown in the previous blog post (Schimmack, 2019d).
The measurement model for the BFS has seven independent factors. Five factors represent the Big Five and two factors represent method factors. One factor represents acquiescence bias. The other factor represents evaluative bias that is present in all self-ratings of personality (Anusic et al., 2009). As all factors are independent, the squared coefficients can be interpreted as the amount of variance that a factor explains in a scale score.
The results show that the BFI-S scales are nearly as valid as the longer BFI scales (Table 1).
For example, the factor-scale correlations for neuroticism, extraversion, and agreeableness are nearly identical. The biggest difference was observed for openness with a correlation of r = .76 for the BFI-scale and r = .66 for the BFI-S scale. The only other notable systematic variance in scales is the evaluative bias influence which tends to be stronger for the longer scales with the exception of conscientiousness. In the future, measurement models with an evaluative bias factor can be used to select items with low loadings on the evaluative bias factor to reduce the influence of this bias on scale scores. Given these results, one would expect that the BFI and BFI-S produce similar results. The next analyses tested this prediction.
I examined gender differences three ways. First, I examined standardized mean differences at the level of latent factors in a model with scalar invariance (Schimmack, 2019d). Second, I computed standardized mean differences with the BFI scales. Finally, I computed standardized mean differences with the BFI-S scales. Table 2 shows the results. Results for the BFI and BFI-S scales are very similar. The latent mean differences show somewhat larger differences for neuroticism and agreeablness because these mean differences are not attenuated by random measurement error. The latent means also show very small gender differences for the method factors. Thus, mean differences based on scale scores are not biased by method variance.
Table 2. Standardized Mean Differences between Men and Women
Note. Positive values indicate higher means for women than for men.
In short, there is no evidence that using 3-item scales invalidates the study of gender differences.
I demonstrated measurement invariance for different age groups (Schimmack, 2019d). Thus, I used simple correlations to examine the relationship between age and the Big Five. I restricted the age range from 17 to 70. Analyses of the full dataset suggest that older respondents have higher levels of conscientiousness and agreeableness (Soto, John, Gosling, & Potter, 2011).
Table 3 shows the results. The BFI and the BFI-S both show the predicted positive relationship with conscientiousness and the effect size is practically identical. The effect size for the latent variable model is stronger because the relationship is not attenuated by random measurement error. Other relationships are weaker and also consistent across measures except for Openness. The latent variable model reveals the reason for the discrepancies. Three items (#15 ingenious, #l35 like routine work, and #10 sophisticated in art) showed unique relationships with age. The art-related items showed a unique relationship with age. The latent factor does not include the unique content of these items and shows a positive relationship between openness and age. The scale scores include this content and show a weaker relationship. The positive relationship of openness with age for the latent factor is rather surprising as it is not found in nationally representative samples (Schimmack, 2019b). One possible explanation for this relationship is that older individuals who take an online personality test are more open.
In sum, the most important finding is that the 3-item BFI-S conscientiousness scale shows the same relationship with age as the BFI-scale and the latent factor. Thus, the failure to find aging effects in the longitudinal SOEP data with the BFI-S cannot be attributed to the use of an invalid short measure of conscientiousness. The real scientific question is why the cross-sectional study by Soto et al. (2011) and my analysis of the longitudinal SOEP data show divergent results.
Science has changed since researchers are able to communicate and discuss research findings on social media. I strongly believe that open science outside of peer-controlled journals is beneficial for the advancement of science. However, the downside of social media of open science is that it becomes more difficult to evaluate expertise of online commentators. True experts are able to back up their claims with scientific evidence. This is what I did here. I showed that Brenton Wiernik’s comment has as much scientific validity as a Donald Trump tweet. Whatever the reason for the lack of personality change in the SOEP data will be, it is not the use of the BFI-S to measure the Big Five.
2 thoughts on “How Valid are Short Big-Five Scales?”