Every scientist who read Kuhn’s influential book “The structure of scientific revolutions” might wonder about the long-term impact of their work. According to Kuhn, scientific progress is marked by periods of normal growth and periods of revolutionary change (Stanford Encyclopedia of Philosophy, 2004). During times of calm, scientific research is guided by paradigms. Paradigms are defined as “the key theories, instruments, values and metaphysical assumptions that guide research and are shared among researchers within a field” Paradigm shifts occur when one or more of these fundamental assumptions are challenged and shown to be false.
Revolutionary paradigm shifts can have existential consequences for scientists who are invested in a paradigm. Just like revolutionary technologies may threaten incumbent technologies (e.g., electric vehicles), scientific research may lose its value after a paradigm shift. For example, while the general principles of operant conditioning hold, many of the specific studies with Skinner boxes lost their significance after the demise of behaviorism. Similarly, the replicability revolution invalidated many social psychological experiments on priming after it became apparent that selective publication of significant results with small samples produces results that cannot be replicated, a hallmark feature of science.
Personality research has seen a surprisingly long period of paradigmatic growth over the past 40 years. In the 1980s, a consensus emerged that many personality traits can be related to five higher-order factors that became to be known as the Big Five. Paradigmatic research on the Big Five has produced thousands of results that show how the Big Five are related to other trait measures, genetic and environmental causes, and life outcomes. The key paradigmatic assumption of the Big Five paradigm is that self-reports of personality are accurate measures of the Big Five traits. Aside from this basic assumption, the Big Five paradigm is surprisingly vague about other common features of paradigms. For example, there does not exist a dominant theory about the nature of the Big Five traits (i.e., What are these dimensions). It is also unclear why there would be five rather than four or six higher order traits. Moreover, there is no agreement about the relationship among the Big Five (are they independent or correlated), or the relationship of specific traits to the Big Five (e.g., is trust related to neuroticism, agreeableness, or both?). These questions can be considered paradigmatic questions that researchers aim to answer by conducting studies with self-report measures of the Big Five.
Research on personality disorders has an even longer history with its roots in psychiatry and psychodynamic theories. However, the diagnosis of personality disorders also witnessed a scientific revolution when clinical psychologists started to examine disorders from the perspective of Big Five theories of normal personality (Millon & Frances, 1987). Psychological research on personality disorders was explicitly framed as developing an alternative model of personality disorders that would replace the model of personality disorders developed by psychiatrists (Widiger & Simonsen, 2005). Currently, the scientific revolution is ongoing and the Diagnostic and Statistical Manual of Mental Disorders lists several approaches to the diagnosis of personality disorders.
The common assumptions of the Personality Disorder Paradigm in Clinical Psychology are that (a) there is no clear distinction between normal and disordered personality and that disorders are defined by arbitrary values on a continuum (Markon, Krueger, & Watson, 2005), (b) the Big Five traits account for most of the meaningful variance in personality disorders (Costa & McCrae, 1980), and self-reports of personality disorders are valid measures of actual personality disorders (Markon, Quilty, Bagby, & Krueger, 2013).
Over the past two decades, paradigmatic research within the PDP has examined personality disorders using the following paradigmatic steps: (a) write items that are intended to measure personality disorders or maladaptive traits, (b) administer these items to a sample of participants, (c) demonstrate that these items have internal consistency and can be summed to create scale scores, and (d) examine the correlations among scale scores. These studies typically show that personality disorder scales (PDS) are correlated and that five factors can represent most, but not all, of these correlations (Kotov et al., 2017). Four of these five factors appear to be similar to four of the Big Five factors, but Openness is not represented in factor models of personality disorders and Psychoticism is not represented in the Big Five. This has led to paradigmatic questions about the relationship between Openness and Psychoticism among personality disorder researchers.
Another finding has been that factor analytic models that allow for correlations among factors show replicable patterns of correlations. This finding is surprising because the Big Five factors and measures of the Big Five were developed using factor models that impose independence on factors and items were selected to be representative of these orthogonal factors. The correlations among personality scales have been the topic of various articles in the Big Five paradigm and the personality disorder paradigm. This research has led to hierarchical models of personality with a single factor at the top of the hierarchy (Musek, 2007). In the personality disorder paradigm, this factor is often called “general personality pathology,” the “general factor of personality pathology,” or simply the p-factor (Asadi, Bagby, Krueger, Pollock, & Quilty, 2021; Constantinou et al., 2022; Hopewood, Good, & Morey, 2018; Hyatt et al., 2021; McCabe, Oltmanns, & Widiger, 2022; Oltmanns, Smith, Oltmanns, & Widiger, 2018; Shields, Giljen, Espana, & Tackett, 2021; Uliaszek, Al-Dajani, Bagby, 2015, Van den Broeck, Bastiaansen, Rossi, Dierckx, De Clercq, & Hofmans, 2014; Widiger & Oltmanns, 2017; Williams, Scalco, & Simms, 2018).
It is symptomatic of a pathological paradigm that researchers within the paradigm have uncritically accepted that the general factor in a factor analysis represents a valid construct and that alternative interpretations of this finding are ignored or dismissed with flawed arguments. Most aforementioned articles do not even mention alternative explanations for the general factor in self-ratings. Others, mention, but dismiss the possibility that this general factor at least partially reflects method variance in self-ratings. McCabe et al. (2022) note that “the results of the current study are consistent with, or at least don’t rule out, the social undesirability or evaluation bias hypothesis” (p. 151). They dismiss this alternative explanation with a reference to a single study from 1983 that showed “much of the variance in socially desirability scales was substantively meaningful individual differences (McCrae & Costa, 1983)” (p. 151). Notably, the authors cite several more recent articles that provided direct evidence for the presence of evaluative biases in self-ratings of personality (Anusic, Schimmack, Pinkus, & Lockwood, 2009; Backstrom, Bjorklund, and Larsson, 2009; Chang, Connelly, & Geeza, 2012; Pettersson, Turkheimer, Horn, & Menatti, 2012), but do not explain why these studies do not challenge their interpretation of the general factor in self-ratings of personality disorders.
The strongest evidence for the interpretation of the general factor as a method factor comes from multi-trait-multi-method studies (Campbell & Fiske, 1959). True traits should show convergent validity across raters. In contrast, method factors produce correlations among ratings by the same rater, but not across different raters. Most factors are likely to be a mixture of trait and method variance. Thus, it is essential to quantify the amount of method and trait variance and avoid general statements of validity (Cronbach & Meehl, 1955; Schimmack, 2021). A few studies of personality disorders have used multiple methods. However, most publications have not analyzed these data using a multi-trait-multi-method approach to separate trait and method variance. I could find only one article that modeled multi-method data (Blackburn, Donnelly, Logan, & Renwick, 2004). Consistent with multi-method studies of normal personality, the results showed modest convergent validity across raters and a clear method factor that often explained more variance in self-report scales than the trait factors. However, this finding has been ignored by subsequent researchers. To revisit this issue, I analyzed three multi-method datasets.
Study 1
Lisa M. Niemeyer, Michael P. Grosz, Johannes Zimmermann & Mitja D. Back
(2022) Assessing Maladaptive Personality in the Forensic Context: Development and Validation
of the Personality Inventory for DSM-5 Forensic Faceted Brief Form (PID-5-FFBF), Journal of
Personality Assessment, 104:1, 30-43, DOI: 10.1080/00223891.2021.1923522
This study was conducted in Germany with male prisoners. Personality was measured with self-reports and informant ratings by the prisoners’ psychologist or social worker and a penal service officer. This made it possible to separate method and trait variance, where trait variance is defined as variance that is shared among the three raters.
Normal personality was assessed with a 15-item measure of the Big Five. Given the small sample, scale scores rather than items were used as indicators of normal personality. Maladaptive personality was measured with a forensic adaptation of the German version of the PID-5 faceted Brief Form. This measure has 25 scales. Given the small sample size the focus was on scales that can serve as indicators of four higher-order factors that are related to neuroticism, extraversion, agreeableness, and conscientiousness. A fifth psychoticism factor could not be identified in this dataset. The scales were Anxiousness, Separation Insecurity, and Depression for Neuroticism (Negative Affectivity), Withdrawal, Intimacy Avoidance, and Anhedonia for low Extraversion (Detachment), Manipulativeness, Deceitfulness, and Grandiosity for low Agreeableness (Antagonism), and Impulsivity, Irresponsibility, and Distractibility for low Conscientiousness (Disinhibition).
There are multiple ways to separate method and trait variance in hierarchical multi-trait-multi-method models. I used Anusic et al.’s (2009) approach that first modeled the hierarchical structure separately for each rater and then defined trait factors at the highest level of the hierarchy. This approach makes it possible to examine the amount of convergent validity for the higher-order factors that are the primary focus in this analysis. Additional agreement for the unique variance in facets was modeled using a bi-factor approach where additional facet factors reflect only the unique variance in facets.
The first model assumed that there are no secondary loadings, no correlations among the four trait factors, and no correlations among the rater-specific indicators. This model had poor fit, CFI = .732, RMSEA = .079.
The second model added correlations among the four Big Five factors. The general personality pathology model predicts correlations among the four factors. Neuroticism should be negatively corelated with the other three factors and the other other three factors should be positively correlated. Allowing for these correlations improved model fit, CFI = .754, RMSEA = .076, but overall fit was still poor. Furthermore, the pattern of correlations did not conform to predictions. Mainly, neuroticism was positively correlated with agreeableness, r = .24, and extraversion was negatively correlated with agreeableness, r = -.12.
Exploration of the other relationships suggested that secondary loadings accounted for some of the correlations among the trait factors. Namely, Anxiousness and Depression had negatively loadings on Extraversion and Anhedonia had a secondary loading on Neuroticism (N-E); Deceitfulness had a secondary loading on Conscientiousness and Impulsiveness and Irresponsibility had secondary loadings on Agreeableness (A-C); finally, Impulsiveness, Irresponsibility, and Distractedness had secondary loadings on Neuroticism (N-C). Adding these secondary loadings to the measurement model of each rater improved model fit and RMSEA suggested acceptable fit, CFI = .853, RMSEA = .060. In this model, none of the correlations among the trait factors were significant at the .01 level, and the pattern still did not conform to predictions of the g-factor model. However, it was possible to replace the correlations among the four factors with a fixed loading pattern. This model had only slightly worse fit, CFI = .850, RMSEA = .060. Loadings on this factor ranged from .19 for extraversion to r = .46 for conscientiousness. At the same time, a model without correlations or a GFP factor equally fit the data, CFI = .850, RMSEA = .060.
The next models examined potential method factors. Evaluative bias factors were added for each of the three raters with fixed loadings. This improved model fit, CFI = .854, RMSEA = .059. The standardized loadings for the halo factor in self-ratings ranged from r = .31 (Extraversion) to .67 (Conscientiousness). A general factor could not be identified for one of the informant factors and none of the loadings on the other informant factor were significant at alpha = .01. This suggests that evaluative biases were mostly present in self-ratings. Removing the method factors for the two informants did not change model fit, CFI = .854, RMSEA = .059.
To examine the unconstrained relationships among the self-rating factors, I replaced the method factor with free correlations among the four self-rating factors. Model fit decreased a bit, indicating that the more parsimonious model did not severely distort the pattern of correlations, CFI = .854, RMSEA = .060. The pattern of correlations matched predictions of the halo model, but only one of the correlations was significant at alpha = .01.
In conclusion, model comparisons suggested the presence of an evaluative bias factor in self-ratings of German male prisoners and provided no evidence that a general personality pathology factor produces correlations among measures of normal and maladaptive personality. Of course, these results from a relatively small sample drawn from a unique population cannot be generalized to other populations, but the results are consistent with multi-method studies of normal personality in various samples ().
Study 2
Brauer, K., Sendatzki, R. & Proyer, R. T. (2022). Localizing gelotophobia, gelotophilia, and katagelasticism in domains and facets of maladaptive personality traits: A multi-study report using self- and informant ratings. Journal of Research in Personality, 98, No. 104224.
This study examined personality disorders in a German community sample. For each target, one close other provided informant ratings. Personality disorders were assessed with the German version of the brief PID-5 that is designed to measure five higher-order dimensions of personality pathology, namely Negative Affectivity (Neuroticism), Detachment (low Extraversion), Antagonism (low Agreeableness), Disinhibition (low Conscientiousness), and Psychoticism (not represented in the Big Five model of normal personality).
With only two methods, it is necessary to make assumptions about the validity of each rater. A simple way of doing so is to constrain the unstandardized loadings of self-ratings and informant ratings under assumption that self-ratings and informant ratings are approximately equally valid. The first model assumed that there is no method variance and that the five factors are independent. This model had poor fit, CFI = .295, RMSEA = .236. I then allowed the five factors to correlate freely with each other. Model fit improved, but remained low, CFI = .605, RMSEA = .204. The pattern of correlations conformed to the predictions of the general personality pathology model. I then added method factors for self-ratings and informant ratings to the model. Loadings on this factors were constrained to be equal for all five scales. This modification increased model fit and model fit considerably, CFI = .957, RMSEA = .067. Loadings on the method factor were substantial (> .4). Furthermore, several of the trait correlations were no longer significant at the .01 level, suggesting that some of these correlations were spurious and reflected unmodeled method variance.
The next model examined whether some of the variance in the method factors reflected an actual g-factor of personality pathology. To do so, I removed the correlations among the trait factors and let the two method factors correlate. A correlation between these method factors can be interpreted as convergent validity for independent measures of the g-factor. This modification produced a reduction in model fit, CFI = .865, RMSEA = .107, and showed a significant correlation between the two method factors, r = .33. This finding suggests that one-third of the variance in these factors may reflect a real g-factor. However, model fit suggests that this model mispresents the actual pattern of correlations in the data. Exploratory analyses suggested that Extraversion and Psychoticism were negatively related to Conscientiousness and that Psychoticism had a stronger loading on the method factor. Adding these modifications raised model fit to acceptable levels, CFI = .957, RMSEA = .064. In this model the two method factors remained correlated, but the confidence interval shows that a substantial amount of the variance is unique to each method factor, r = .27, 95%CI = .08 to .44. Although, the lower bound of the confidence interval is close to zero. In sum, these results provide further evidence that the general factor in self-ratings of personality pathology reflects partially method variance rather than a general disposition to have many personality disorders.
Study 3
Oltmanns, J. R., & Widiger, T. A. (2021). The self- and informant-personality inventories for ICD-11: Agreement, structure, and relations with health, social, and satisfaction variables in older adults. Psychological Assessment, 33(4), 300–310. https://doi.org/10.1037/pas0000982
The data of this study are from a longitudinal study of personality disorders. Participants nominated close relatives who provided informant ratings. Personality disorders were measured using the Personality Inventory for ICD-11 and an informant version of the same questionnaire. The questionnaire assesses 5 dimensions. Four of these dimensions correspond to the Big Five, namely Negative Affectivity (Neuroticism), Detachment (low Extraversion), Dissociality (low Agreeableness), and Disinhibition (low Conscientiousness). The fifth dimension is called Anankastia, which may be related to a maladaptive form of high Conscientiousness (e.g., Perfectionism).
I used the published MTMM matrix in Table 2 to examine the presence of a general personality factor and method variance. Like the original authors, I fitted a four-factor model with Disinhibition and Anankastia as opposite indicators of a bipolar Conscientiousness factor. The four factors were clearly identified, but the model without method factors and correlations among the four factors did not fit the data, CFI = .279, RMSEA = .243. Allowing for correlations among the four factors improved model fit, but fit remained poor, CFI = .443, RMSEA = .232. The pattern of correlations was consistent with the p-factor predictions. The next model added method factors for self-ratings and informant ratings. All loadings except those for the Anankastia scales were fixed. The Anankastia loadings were free because high conscientiousness is desirable and should load less on a factor that reflects undesirable content in other scales. The inclusion of method factors improved model fit, CFI = .844, RMSEA = .131, but fit was not acceptable. All scales except the Anankastia scales had notable (> .4) loadings on the method factors. Only the trait correlation between agreeableness and conscientiousness was significant at alpha = .01, r = .22. The next model removed all of the other correlations and allowed for a correlation between the two method factors. This model had similar fit, CFI = .845, RMSEA = .123. The correlation between the two method factors was r = .24. Exploratory analysis showed rater specific correlations between the Disinhibition and Anankastia (i.e., low and high conscientiousness) scales. Adding these parameters to the model improved model fit, CFI = .920, RMSEA = .091, but did not alter the correlation between the two method factors, r = .26. Freeing the loading of the Negative Affectivity scale on the method factors further improved model fit, CFI = .948, RMSEA = .076, but did not alter the correlation between the two method factors. Freeing the loading of the Disinhibition scales on the method factors further improved model fit, CFI = .972, RMSEA = .058, but the correlation between the two method factors remained the same, r = .24. The 95% confidence interval ranged from .14 to .33. These results are consistent with Study 2.
General Discussion
Factor analyses of personality disorder questionnaires have suggested the presence of one general factor that predicts higher scores on all measures of maladaptive personality traits. A major limitation of these studies was the reliance on a single method, typically self-reports. Mono-method studies are unable to distinguish between method and trait variance (Campbell & Fiske, 1959). Although a few studies have used multiple methods to measure personality traits, personality disorder researchers did not analyze these data with multi-method models. I provide results of multi-method modeling of three datasets. All three datasets show a method factor in self-reports of personality disorders that is either independent of informant ratings by observers (Study 1) or only weakly related to informant ratings by close others (Studies 2 and 3). These results are consistent with the presence of method factors in self-ratings of normal personality (Anusic et al., 2009; Biesanz & West, 2004; Chang et al., 2012). It is therefore not surprising that the same results were obtained. However, it is surprising that the personality disorder researchers have ignored this evidence. The reason does not appear a lack of awareness that multi-method data are important. For example, nearly a decade ago several prominent personality disorder researchers noted that “because most of these studies (including our own study) are based on self-report measures, substantial parts of the multitrait-multimethod matrix currently remain unexplored” (Zimmermann, Altenstein, Krieger, Holtforth, Pretsch, Alexopoulus, Spitzer, Benecke, Krueger, Markon, & Leising, 2014). One possible explanation for the lack of multi-method analyses might be that they threaten the construct validity of personality disorder instruments if a substantial portion of the variance in personality disorder scales reflects method factors. Nearly two decades ago, Blackburn, a multi-method model of personality showed that method factors explained more variance than trait factors (Blackburn, Donnelly, Logan, & Renwick, 2004). However, this article received only 43 citations, whereas articles that interpret method variance as a general trait have garnered over 1,000 citations (Kotov et al., 2017). The uncritical reliance on self-ratings reveals a pathological paradigm that is built on false assumptions. Self-reports of personality disorders are not highly valid measures of personality disorders. Even if scale scores are internally consistent and reliable over time, they cannot be accepted as unbiased measures of actual pathology. The limitations of self-reports are well known in other domains and have led to reforms in clinical assessments of other disorders. For example, the DSM-5 now explicitly states that the diagnosis of Attention Deficit Hyperactivity Disorder (ADHD) requires assessment with symptom ratings by at least two raters (Martel, Schimmack, Nikolas, & Nigg, 2015). The present results show the importance of a multi-rater approach to assess personality disorders.
In contrast, the assessment of personality disorders in the DSM-5 is not clearly specified. A traditional system is still in place, but two alternative approaches are also mentioned. One approach is called the criterion A. It is based on the assumption that distinct personality disorders are highly correlated and that it is sufficient to measure a single dimension that is assumed to reflect severity of dysfunction (Sharp & Wall, 2021). A popular self-report measure of Criterion A is the Levels of Personality Functioning Scale (Morey et al., 2011). Aside from conceptual problems, it has been shown that the LPFS is nearly identical to measures of evaluative bias in self-ratings of normal personality (Schimmack, 2022). The present results show that most of this variance is rater-specific and reflects method factors. Thus, there is currently no evidence for the construct validity for measures of general personality functioning. Unless such evidence can be provided with multi-method data, Criterion A should be removed from the next version of the DSM.
Meta-Psychological Reflections
Psychology is not a paradigmatic science that rests on well-established assumptions. In contrast, psychology is best characterized as a collection of mini-paradigms that are based on assumptions that are not shared across paradigms. For example, many experimentalists would request evidence based on cross-sectional studies of self-reports. In contrast, research on personality disorders rests nearly entirely on the assumption that self-reports provide valid information about personality disorders. While most researchers would probably acknowledge that method factors exist, there are no scientific attempts to assess and minimize their impact. Instead, method effects are minimized by false appeals to the validity of self-reports. For example, without any empirical evidence and total disregard of existing evidence, Widiger and Oltmanns (2017) state “It is evident that most persons are providing reasonably accurate and
honest self-descriptions. It would be quite unlikely that such a large degree of variance would reflect simply impression management” (p. 182). The need for a multi-method assessment is often acknowledged in the limitation section and delegated to future research that is never conducted, even if these data are available (Asadi, Bagby, Krueger, Pollock, & Quilty, 2021). These examples are symptomatic of a pathological paradigm in need of a scientific revolution. Unlike other pathological paradigms in psychology, the assessment of mental illnesses has huge practical implications that require a thorough examination of the assumptions that underpin clinical diagnoses. At present, the diagnosis of personality disorder is not based on scientific evidence and claims about validity of personality disorder measures are unscientific. Most likely a valid assessment of personality disorders requires a multi-method approach that follows the steps outlined by Chronbach and Meehl (1955). To make progress, clinical psychologists need better training in psychometrics and journals need to prioritize multi-method studies over quick and easy studies that rely exclusively on self-reports with online samples (e.g. Hopwood et al., 2018). Most importantly, it is necessary to overcome the human tendency to confirm prior beliefs and to allow empirical data to disconfirm fundamental assumptions. It is also important to listen to scientific criticism, even if it challenges fundamental assumptions of a paradigm. Too often scientists take these criticism as personal attacks because their self-esteem and identity is wrapped up in their paradigmatic achievements. As a result, threats to the paradigm become existential threats. A healthy distinction between self and work is needed to avoid defensive reactions to valid criticism. Clinical psychologists should be the first to realize the importance of this recommendation for the long-term well-being of their science and their own well-being.