In this blog post (pre-print), I examine the construct validity of the Elementary Psychopathy Assessment Super-Short Format scale (EPA-SSF) with Rose et al.’s (2022) open data. I examine construct validity by means of structural equation modeling. I find that the proposed 3-factor structure does not fit the data and find support for a four-factor structure. I also find evidence for a fifth factor that reflects a tendency to endorse desirable traits more and undesirable traits less. I find that most of the reliable variance in the scale scores is predicted by this factor, whereas substantive traits play a small role. I also show that the general factor contributes to the prediction of self-reported criminal behaviors. I find no evidence to support the inclusion of Emotional Stability in the definition of psychoticism. Finally, I raise theoretical objections about the use of sum scores to measure multi-trait constructs. Based on these concerns, I argue that the EPA-SSF is not a valid measure of psychoticism and that results based on this measure do not add to the creation of a nomological net surrounding the construct of psychoticism.
Measurement combines invention and discovery. The invention of microscopes made it possible to see germs and to discovery the causes of many diseases. Turning a microscope to the skies allowed Galileo to make new astronomical discoveries. In the 20th century, psychology emerged as a scientific discipline and the history of psychology is marked by the development of psychological measures. Nowadays, psychological measurement is called psychometrics. Unfortunately, psychometrics is not a basic, fundamental part of mainstream psychological science. Instead, psychometrics is mostly taught in education departments and used for applied purposes of educational testing. As a result, many psychologists who use measures in their research have very little understanding of psychological measurement.
For any measure to be able to discover new things, it has to be valid. That is, the numbers that are produced by a measure should reflect mostly variation in the actual objects that are being examined. Science progresses when new measures are invented that can produce more accurate, detailed, and valid information about the objects that are being studied. For example, developments in technology have created powerful microscopes and telescopes that can measure small objects in nanometers and galaxies billions of lightyears away. In contrast, psychological measures are more like kaleidoscopes. They show pretty images, but these images are not a reflection of actual objects in the real world. While this criticism may be harsh, it is easily supported by the simple fact that psychologists do not quantify validity of their measures and that there are often multiple measures that claim to measure the same construct even though they are only moderately correlated. For example, at least eight different measures claim to be measures of narcissism without a clear definition of narcissism and without validity information that makes it possible to pick the best measure of narcissism (Schimmack, 2022).
A fundamental problem in psychological science is the way scientific findings are produced. Typically, a researcher has an idea, conducts a study, and then publishes results if the results support their initial ideas. This bias is easily demonstrated by the fact that 95% of articles in psychology journals are supportive of researchers’ ideas, which is an unrealistically high success rate (Sterling, 1959; Sterling et al., 1995). Journals are also reluctant to publish work that is critical of previous articles, especially if these articles are highly cited, and authors are often asked to be expert reviewers of work that is critical of their work. It would take extra-human strength to be impartial in these reviews, and these self-serving reviews are often the death of critical work. Thus, psychological science lacks the basic mechanism that drives scientific progress: falsification of bad theories or learning from errors. Evidence for the lack of self-correction that is a necessary element of science was produced during the past decade that was called the replication crisis, when researchers dared to publish replication failures of well-known findings. However, while the replication crisis has focused on empirical tests of hypotheses, criticism of psychological measures has remained relatively muted (Flake & Fried, 2020). It is time to use the same critical attitude that fueled the replication crisis and apply it to psychological measurement. I predict that many of the existing measures lack sufficient construct validity or are redundant with other measures. As a result, progress in psychological measurement would be marked by a consolidation of measures that is based on a comparison of measures’ construct validity. As one of my favorite psychologists once observed in a different context, in science “less is more” (Cohen, 1990), and this is also true for science. While cuckoo’s clocks are fun, they are not used for scientific measurement of time.
A very recent article reviewed the literature on psychopathy (Patrick, 2022). The article describes psychopathy as a combination of three personality traits.
A conceptual framework that is helpful for assimilating different theoretical perspectives and integrating findings across studies using different measures of psychopathy is the triarchic model (Patrick et al. 2009, Patrick & Drislane 2015b, Sellbom 2018). This model characterizes psychopathy in terms of three trait constructs that correspond to distinct symptom features of psychopathy but relate more clearly to biobehavioral systems and processes. These are (a) boldness, which encompasses social dominance, venturesomeness, and emotional resilience and connects with the biobehavioral process of threat sensitivity; (b) meanness, which entails low empathy, callousness, and aggressive manipulation of others and relates to biobehavioral systems for affiliation (social connectedness and caring); and (c) disinhibition, which involves boredom proneness, lack of restraint, irritability, and irresponsibility and relates to the biobehavioral process of inhibitory control. (p. 389).
This definition of psychopathy raises several questions about the relationship between boldness, meanness, and disinhibition and psychopathy that are important for valid measurement of psychopathy. First, it is clear that psychopathy is a formative construct. That is psychopathy is not a common cause of boldness, meanness, and disinhibition and the definition imposes no restrictions on the correlation among the three traits. Boldness could be positively or negatively correlated with meanness or they could be independent. In fact, models of normal personality would predict that these three dimensions are relatively independent because boldness is related to extraversion, meanness is related to low agreeableness and disinhibition is related to low conscientiousness and these three broader traits are independent. As a result, the definition of psychopathy as a combination of three relatively independent traits implies that psychopaths are characterized by high levels on all three traits. This definition raises questions about the combination of information about the three traits to produce a valid score that reflects psychopathy. However, in practice scores on these dimensions are often averaged without a clear rational for this scoring method.
Patrick’s (2022) review also points out that multiple measures aim to measure psychopathy with self-reports. “multiple scale sets exist for operationalizing biobehavioral traits corresponding to boldness, disinhibition, and meanness in the modality of self-report (denoted in Figure 3 by squares labeled with subscript-numbered S’s)” (p. 405). It is symptomatic for the lack of measurement theories that Patrick uses the term operationalize instead of measurement because psychometricians have rejected the notion of operational measurement over 50 years ago (Chronbach & Meehl, 1955). The problem with operationalism is that every measure is by definition a valid measure of a construct because the construct is essentially defined by the measurement instrument. Accordingly, a psychopathy measure is a valid measure of psychopathy and if different measures produce different scores, they simply measure different forms of psychopathy. However, few researchers would be willing to accept that their measure is just an arbitrary collection of items without a claim to measure something that exists independent of the measurement instrument. Yet, they also fail to provide evidence that their measure is a valid measure of psychopathy.
Here, I examine the construct validity of one self-report measure of psychopathy using the open data shared by the authors who used this measure, namely the 18-item short form of the Elementary Psychopathy Assessment (EPA. Lynam, Gaughan, Miller, Miller, Mullins-Sweatt, & Widiger, 2011; Collison, Miller, Gaughanc, Widiger, & Lynam, 2016). The data were provided by Rose, Crowe, Sharpe, Til, Lynam, & Miller, 2022).
Rose et al.’s description of the EPA is brief.
The EPA-SSF (Collison et al., 2016) yields a total psychopathy score (alpha = .70/.77) as
well as scores for each of three subscales: Antagonism (alpha = 61/.72), Emotional Stability (alpha = .66/.65), and Disinhibition (alpha = .68/.71).
The description suggests that the measure aims to measure psychopathy as a combination of three traits, although boldness (high Extraversion) is replaced with Emotional Stability (Low Neuroticism).
Based on their empirical findings, Rose et al. (2022) conclude that two of the three traits predict the negative outcomes that are typically associated with psychopathy. “It is the ATM Antagonism and Impulsivity [Disinhibition] domains that are most responsible for
psychopathy, narcissism, and Machiavellianism’s more problematic
correlates – antisocial behavior, substance use, aggression, and risk taking” (p. 10). In contrast, emotional stability/boldness are actually beneficial. “Conversely, the Emotional
Stability and Agency factors are more responsible for the more adaptive aspects including self-reported political and interpersonal skill” (p. 11).
This observation might be used to modify and construct of narcissism in an iterative process known as construct validation (Cronbach & Meehl, 1955). Accordingly, disconfirming evidence can be attributed to problems with a measure or problems with a construct. In the present case, the initial assumption appears to be that psychopaths have to be low in Neuroticism or bold to commit horrible crimes. Yet, the evidence suggests that there also can be neurotic psychopaths who are violent and may the cause of violence is a combination of high neuroticism (especially anger) and low conscientiousness (lack of impulse control). We might therefore limit the construct of psychopathy to low agreeableness and low conscientiousness, which would be consistent with some older models of psychopathy (van Kampen, 2009). Even this definition of psychopathy can be critically examined given the independence of these two traits. If the actual personality factors underlying anti-social behaviors are independent, we might want to focus on these independent causes. The term psychopath would be akin to the word girl that simply describes the combination of two independent traits; disagreeable and impulsive or young and female. The term psychopath does not add anything to the theoretical understanding of anti-social behaviors because it is defined as nothing more than being mean and impulsive.
Does the EPA-SSF measure Antagonism, Emotional Stability, Disinhibition
The EPA was based on the assumption that Psychotocism is related to 18 specific personality traits and that these 18 traits are related to four of the Big Five dimensions. Empirical evidence supported this assumption. Five traits were related to low Neuroticism, namely Unconcerned, Self-Contentment, Self-Contentment, Self-Assurance, Impulsivity, and Invulnerability, and one was related to high Neuroticism (Anger). Evidently, a measure that combines items that reflect the high and low pole of factor is not a good measure of the factor. Another problem is that several of these scales had notable secondary loadings on other Big Five factors. Anger loaded more strongly and negatively on Agreeableness than on Neuroticism. and Self-Assurance loaded more highly on Extraversion. Thus, it is a problem to refer to the Emotional Stability scale as a measure of Emotional Stability. If the theoretical model assumes that Emotional Stability is a component of Psychoticism, it would be sufficient to use a validated measure of Emotional Stability to measure this component. Presumably, the choice of different items was motivated by the hypothesis that the specific item content of the EPA scales adds to the measurement of psychoticism. In this case, however, it is misleading to ignore this content in the description of the measure and to focus on the shared variance among items.
Another six items loaded negatively on Agreeableness, namely Distrust, Manipulation, Self-Centeredness, Opposition, Arrogance, and Callousness. The results showed that these six items were good indicators of Agreeableness. A minor problem is to call this scale antagonism, which is a common term among personality disorder researchers. It is also a general understanding that Antagonism and Agreeableness are strongly negatively correlated without any evidence of discriminant validity. Thus, it may be confusing to label this factor by a different name, when this name merely refers to the low end of Agreeableness (Disagreeableness). Aside from this terminological confusion, it is a question whether the specific item content of the Antagonism scale adds to the definition of psychoticism. For example, the item “I could make a living as a con artist” may not just be a measure of agreeableness, but also measure specific aspects of psychoticism.
Another three constructs were clearly related to low conscientiousness, namely Disobliged, Impersistence, and Rashness. A problem occurs when these constructs are measured with a single item because exploratory factor analysis may fail to identify factors that have only three indicators, especially when factors are not independent. Once again, calling this factor Disinhibition can create confusion if it is not stated clearly that Disinhibition is merely a label for low Conscientiousness.
Most surprising is the finding that the last three constructs were unrelated to the three factors that are supposed to be captured with the EPA. Coldness was related to low Extraversion and low Agreeableness. Dominance was related to high Extraversion and low Agreeableness. Finally, Thrill-Seeking had low loadings on all Big Five factors. It is not clear why these items would be retained in a measure of psychoticism unless it is assumed that the specific content of these scales adds to the measurement and therefore the operational definition of psychoticism.
In conclusion, the EPA is based on a theory that psychoticism is a multi-dimensional construct that reflects the influence of 18 narrow personality traits. Although these narrow traits are not independent and are related to four of the Big Five factors, the EPA psychoticism scale is not identical to a measure that combines Emotional Stability, low agreeableness, and Low Conscientiousness.
Lynam et al. (2011) also examined how the 18 scales of the EPA are related to other measures of anti-social behaviors. Most notable, all of the low Neuroticism scales showed no relationship with anti-social behavior. The only Neuroticism-related scale that was a predictor was Anger, but Anger not only reflects high Neuroticism, but also low Agreeableness. These results raise questions about the inclusion of Emotional Stability in the definition of Psychoticism. Yet, the authors conclude “overall, the EPA appears to be a promising new instrument for assessing the smaller, basic units of personality that have proven to be important to the construct of psychopathy across a variety of epistemological approaches” (p. 122). It is unclear what evidence could have changed the authors mind that their newly created measure is not a valid measure of psychoticism or that their initial speculation about the components of psychoticism was wrong. The use of an 18-item scale in 2022 shows that the authors have never found evidence to revise their theory of psychoticism or improved the measure of psychoticism. This is therefore important to critically examine the construct validity of the EPA from an independent perspective. I focus on the 18-item EPA-SSF because this scale was used by Rose et al. (2022) and I was able to use their open data.
Collins et al. (2016) conducted exploratory factor analyses with Promax rotation to examine the factor structure of the 18-item EPA-SSF. Although Lynam et al. (2011) demonstrated that items were related to four of the Big Five dimensions, they favored a three-factor solution. The problem of exploratory analysis is that they provide no evidence of the fit of a model to the data. Another problem is that factor solutions are atheoretical and influenced by item selection and arbitrary rotations. This might explain why the factor solution did not identify the expected factors. I conducted a replication of Collins’s EFAs with Rose et al.’s (2022) data from Study 1 and Study 2. I conducted these analyses in MPLUS, which provides fit indices that can be used to evaluate the fit of a model to the data. I used the Geomin rotation because this default method produces more fit indices and the corresponding fit index (RMSEA) is the same. Evidently, this choice of a rotation method has no influence on the validity of the results because neither of these rotation methods is based on substantive theory about Psychoticism.
The results are consistent across the two datasets. RMSEA and CFI favor 5-factors, while the criterion that favors parsimony the most, BIC, favors 4 factors. A three-factor model does not have bad fit, but it does fail to capture some of the structure in the data.
To examine the actual factor structure. I first replicated Collins et al.’s EFA using a three-factor structure and Promax rotation. Factor loadings greater than .4 (16% explained variance) are highlighted. The results show that the disinhibition factor is clearly identified and all five items have notable (> .4) loadings on this factor. In contrast, only three items (Coldness, Callous, & Self-Centered) have consistent loadings on the Antagonism factor. The Emotional Stability factor is not identified in the first replication sample because factor 3 shows high loadings for the Extraversion items. The variability of factor loading patterns across datasets may be caused by the arbitrary rotation of factors.
It is unclear why the authors did not use Confirmatory Factor Analysis to test their a priori theory that the 18 items represent different facets of Big Five factors. Rather than relying on arbitrary statistical criteria, CFA makes it possible to examine whether the pattern of correlation is consistent with a substantive theory. Using Collins et al.’s correlations with the Big Five, I fitted a CFA model with four factors to the data. The loading pattern was specified based on Lynam et al.’s (2011) pattern of correlations with a Big Five measure. Correlations greater than .3 were used to allow for a free parameter.
Fit of this model did not meet standard criteria of acceptable model fit (CFI > .95, RMSEA < .06), but it was not terrible, CFI = .728, RMSEA = .088. 29 of the 33 free parameters were statistically significant at p < .05 and many were significant at p < .001. 20 of the coefficients were greater than .3. It is expected that effect sizes are a bit smaller because the indicators were single items and replication studies are expected to show some regression to the mean due to selection effects. Overall, these results show similarity between Lynam et al.’s (2011) results and the pattern of correlations in the replication study.
The next step was to build a revised model to improve fit. The first step was to add a general evaluative factor to the model. Numerous studies of self-ratings of the Big Five and personality disorder instruments have demonstrated the presence of this factor. Adding a general evaluative factor to the model improved model fit, but it remained below standard criteria of acceptable model fit, CFI = .790, RMSEA = .078.
I then added additional parameters that were suggested by large modification indices. First, I added a loading for Impersistence on Extraversion. This loading was just below the arbitrary cut-off value of 30 in Lynam et al.’s study (r = .29). Another suggested parameter was a loading of Invulnerability on Extraversion (Lynam r = .18). A third parameter was a negative loading of Self-Assurance on Agreeableness. This loading was r = .00 in Lynam et al.’s (2011) study, but this could be due to the failure to control for evaluative bias that inflates rates on Self-Assurance and Agreeableness items (Anusic et al., 2009). Another suggested parameter was a positive loading of Opposition on Neuroticism (Lynam r = .18). These modifications improved model fit, but were not sufficient to achieve an acceptable RMSEA value, CFI = .852, RMSEA = .066. I did not add additional parameters to avoid overfitting the model.
The next step was to fit these two models to Rose et al.’s second dataset. The direct replication of Lynam et al.’s (2011) structure did not fit the data well, CFI = .744, RMSEA = .090, whereas fit of the modified model with the general factor was even better than in Study 1, CFI = .886, RSMEA = .062, and RMSEA was close to the criterion for acceptable fit (.060). These results show that I did not overfit the data. I tried further improvements, but suggested parameters were not consistent across the two datasets.
In the final step, I deleted free parameters that were not significant in both datasets. Surprisingly, the Disobliged and Impersistence items did not load on Conscientiousness. This suggests some problems with these single item indicators rather than a conceptual problem because these constructs have been related to Conscientiousness in many studies. Self-contentment did not load on Conscientiousness either. Distrust was not related to Extraversion, and Disobliged was not related to Neuroticism. I then fitted this revised model to the combined dataset. This model had acceptable fit based on RMSEA, CFI = .887, RMSEA = 058.
This final model captures the main structure of the correlations among the 18 EPA-SSF items and is consistent with Lynam et al.’s (2011) investigation of the structure by correlating EPA scales with a Big Five measure. It is also consistent with measurement models that show a general evaluative factor in self-ratings. Thus, I am proposing this model as the first validated measurement model of the EPA-SSF. This does not mean that it is the best model, but critics have to present a plausible model that fits the data as well or better. It is not possible to criticize the use of CFA because CFA is the only method to evaluate measurement models. Exploratory factor analysis cannot confirm or disconfirm theoretical models because EFA relies on arbitrary statistical rules that are not rooted in substantive theories. As I showed, EFA led to the proposal of a three-factor model that has poor fit to the data. In contrast, CFA confirmed that the 18 EPA-SSF items are related to four of the Big Five scales. Thus, four – not three – factors are needed to describe the pattern of correlations among the 18 items. I also showed the presence of a general evaluative factor that is common to self-reports of personality. This factor is often ignored in EFA models that rotate factors.
After establishing a plausible measurement model for the EPA-SSF, it is possible to link the factors to the scale scores that are assumed to measure Psychoticism, using the model indirect function. The results showed that the general factor explained most of the variance in the scale scores, r = .82, r^2 = 67%. Agreeableness/Antagonism explained only r = -.17, r^2 = 3% of the variance. This is a surprisingly low percentage given the general assumption that antagonism is a core personality predictor of anti-social behaviors. Conscientiousness/Disinhibition was a stronger predictor, but also explained less than 10% of the variance, r = -.300, r^2 = 9%. The contribution of Neuroticism and Extraversion was negligible. Thus, the remaining variance reflects random measurement error and unique item content. In short, these results raise concerns about the ability of the EPA-SSF to measure psychoticism rather than a general factor that is related to many personality disorders or may just reflect method variance in self-ratings.
I next examined predictive validity by adding measures of non-violent and violent criminal behaviors. The first model used the EPA-SSF scale to predict the shared variance of non-violent and violent crime based on the assumption that psychopathy is related to both types of criminal behaviors. The fit of this model was slightly better than the fit of the model without the crime variables, CFI = .789 vs. .854, RMSEA = .059 vs. .064. In this model, the EPA-SSF scale was a strong predictor of the crime factor, r = .56, r^2 = 32%. I then fitted a model that used the factors as predictors of crime. This model had slightly better fit than the model that used the EPA-SSF scale as predictor of time, CFI = .796 vs. .789, RMSEA = .059 vs. 059. Most importantly, neuroticism and extraversion were not significant predictors of crime, but the general factor was. I deleted the parameters for neuroticism and extraversion from the model. This further increased model fit, CFI = .802, RMSEA = .058. More important, the three factors explained more variance in the crime factor than the EPA-SSF scale, R = .70, R^2 = 49%. There were no major modification indices suggesting that unique variance of the items contributed to the prediction of crime. Nevertheless, I examined a model that only used the general factor as predictor and added items if they explained additional variance in the crime factor akin to stepwise regression. This model selected four specific items and explained 44% of the variance. The items were Manipulativeness (“I could have my life as a con-artist), b = .22, Self-Centeredness (“I have more important things to worry about than other people’s feelings”), b = .25, and thrill-seeking (“I like doing things that are risky or dangerous”), b = .39.
Dimensional Models of Psychopathy
The EPA-SSF is a dimensional measure of psychopathy. Accordingly, higher scores on the EPA-SSF scale reflect more severe levels of psychopathy. Dimensional models have the advantage that they do not require validation of some threshold that distinguishes normal personality variation from pathological variation. However, this advantage comes with the disadvantage that there is no clear distinction between low agreeableness (normal & healthy) and psychopathy (abnormal & unhealthy). Another problem is that the multi-dimensional nature of psychopathy makes it difficult to assess psychopathy. To illustrate, I focus on the key components of psychopathy, namely antagonism (disagreeableness) and disinhibition (low conscientiousness). One possible way to define psychopathy in relationship to these two components would be to define psychopathy as being high on both dimensions. Another one would be to define it with an either/or rule, assuming that each dimension alone may be pathological. A third option is to create an average, but this definition has the problem that the average of two independent dimensions no longer captures all of the information about the components. As a result, the average will be a weaker predictor of actual behavior. This is a problem of sum score definitions such as socio-economic status that averages income and education and reduces the amount of variance that can be explained by income and education independently.
One way to test the definition of psychopathy as being high in antagonism and disinhibition is to examine whether the two factors interact in the prediction of criminal behaviors. Accordingly, crimes are most likely to be committed by individuals who are both antagonistic and disinhibited, whereas each dimension alone is only a weak predictor of crime. I fitted a model with an interaction term as predictor of the crime factor. The interaction effect was not significant, b = .04, se = .14, p = .751. Thus, there is presently no justification to define psychopathy as a combination of antagonism and disinhibition. Thus, psychoticism appears to be better defined as being either antagonistic or disinhibited to such an extent that individuals engage in criminal or other harmful behaviors. Yet, this definition does not really add anything to our understanding of personality and criminal behavior. It is like the term infection that may refer to a viral or bacterial infection.
The Big Five Facets and Criminal Behavior
Investigation of the construct validity of the EPA-SSF showed that the 18-items reflect four of the Big Five dimensions and that two of the Big Five factors predicted criminal behavior. However, the 18-items are poor indicators of the Big Five factors. Fortunately, Rose et al. (2022) also included a 120 -item Big Five measure that also measures 30 Big Five facets (4 items per scale). It is therefore possible to examine the personality predictors of criminal behaviors with a better instrument to measure personality. To do so, I first fitted a measurement model to the 30 facet scales. This model was informed by previous CFA analyses of the 30 facets. Most importantly, the model included a general evaluative factor that was independent of the Big Five factors. I then added the items about nonviolent and violent crime and created a factor for the shared variance. Finally, I added the three EPA-SSF items that appeared to predict variance in the crime factor. I also related these items to the facets that predicted variance in these items. The final model had acceptable fit according to the RMSEA criterion (< .006), RMSEA = .043, but not the CFI criterion (> .95), CFI = .874, but I was not able to find meaningful ways to improve model fit.
The personality predictors accounted for 61% of the variance in the crime factor. This is more variance than the EPA-SSF factors explained. The strongest predictor was the general evaluative or halo factor, b = -.49, r^2 = 24%. Surprisingly, the second strongest predictor was the Intellect facet of Openness and the relationship was positive, b = .41, r^2 = .17%. More expected was a significant contribution of the Compliance facet of Agreeableness, b = -.31, r^2 = 9%. Finally, the unique variance in the three EPA-SSF items (controlling for evaluative bias and variance explained by the 30 facets) added another 10% explained variance, r = .312.
These results further confirm that Emotional Stability is not a predictor of crime, suggesting that it should not be included in the definition of psychopathy. These results also raise questions about the importance of disinhibition. Surprisingly, conscientiousness was not a notable predictor of crime. It is also notable that Agreeableness is only indirectly related to crime. Only the Compliance facet was a significant predictor. This means that disagreeableness is only problematic in combination with other unidentified factors that make disagreeable people non-compliant. As a result, it is problematic to treat the broader agreeableness/antagonism factor as a disorder. Similarly, all murders are human, but we would not consider being human a pathology.
Concerns about the validity of psychological measures led to the creation of a taskforce to establish scientific criteria of construct validity (Cronbach & Meehl, 1955). The key recommendation was to evaluate construct validity within a nomological net. A nomological net aims to explain a set of empirical findings related to a measure in terms of a theory that predicts these relationships. Psychometricians developed structural equation modeling (SEM) that make it possible to test nomological nets. Here, I used structural equation modeling to examine the construct validity of the Elemental Psychopathy Assessment – Super Short Form scale.
My examination of the psychometric properties of this scale raise serious questions about its construct validity. The first problem is that the scale was developed without a clear definition of psychopathy. The measure is based on the hypothesis that psychoticism is related to 18 distinct, maladaptive personality traits (Lynam et al., 2011). This initial assumption could have led to a program of validation research that could have suggested revisions to this theory. Maybe some traits were missing or unnecessary. However, the measure and its short-form have not been revised. This could mean that Lynam et al. (2011) discovered the nature of psychopathy in a strike of genius or that Lynam et al. failed to test the construct validity of the EPA. My analyses suggest the latter. Most importantly, I showed that there is no evidence to include Emotional Stability in the definition and measurement of psychopathy.
I am not the first to point out this problem of the EPA. Collins et al. (2016) discuss the inclusion of Emotional Stability in a definition of psychopathy at length.
it may seem counter-intuitive that Emotional Stability would be included as a factor of the super-short form (and other EPA forms). We do not believe its inclusion is inconsistent with
our previous positions as the EPA was developed, in part, from clinical expert ratings of personality traits of the prototypical psychopath. Given that many of these ratings came from proponents of the idea that FD is a central component of psychopathy, it is natural that traits resembling FD, or emotional stability, would be present in the obtained profiles. While we present Emotional Stability as a factor of the EPA measure, however, we do not claim Emotional Stability to be a central feature of psychopathy. Its relatively weak relations to other measures
of psychopathy and external criteria traditionally related with psychopathy support this argument (Gartner, Douglas, & Hart, 2016; Vize, Lynam, Lamkin, Miller, & Pardini, 2016).” (p. 2016).
Yet, Rose et al. (2022) treat the EPA-SSF as if it is a valid measure of psychopathy and make numerous theoretical claims that rely on the assumption that the EPA-SSF is a valid measure of psychopathy. It is of course possible to define psychopathy in terms of low neuroticism, but it should be made clear that this definition is stipulative and cannot be empirically tested. The construct that being measured is an artifact that is created by the researchers. While neuroticism is a construct that describes something in the real world (some people are more anxious than others), psychoticism is merely a list of traits. Some people may want to include psychoticism on the list and others may not. The only problem is when term psychoticism is used for different lists. The EPA scale is best understood as a measure of 18 traits. We may call this psycoticism-18 to distinguish it from other constructs and measures of psychoticism.
The list definition of psychological constructs creates serious problems for the measurement of these constructs because list theories imply that a construct can be defined in terms of its necessary and sufficient components. Accordingly, a psychopath could be somebody who is high in Emotional Stability, low in Agreeableness, and low in Conscientiousness, or somebody who possess all of the specific traits included in the definition of Psychoticism-18. However, traits are continuous constructs and it is not clear how individual profiles should be related to quantitative variation in psychoticism. Lynam et al. sidestepped this problem by simply averaging across the profile scores and to treat this sum score as a measure of psychoticism. However, averaging results in a loss of information and the sum score depends on the correlations among the traits. This is not a problem when the intended construct is the factor that produces the correlations, but it is a problem when the construct is the profile of trait scores. As I showed, the sum score of the 18 EPA-SSF items mainly reflects information about the variance that is shared among all 18 items, which reflects a general evaluative factor. This general factor is not mentioned by Lynam et al. and is clearly not the intended construct that the EPA-SSF was intended to measure. Thus, even if psychopathy were defined in terms of 18 specific traits, the EPA-SSF sum score does not provide information about psychopathy because the actual information that the items were supposed to be measured is destroyed by averaging them.
In conclusion, I am not an expert on personality disorders or psychopathy. I don’t know what psychopathy is. However, I am an expert on psychological measurement and I am able to evaluate construct validity based on the evidence that authors’ of psychological measures provide. My examination of the construct validity of the EPA-SSF using the authors own data makes it clear that the EPA-SSF lacks construct validity. Even if we follow the authors proposal that psychopathy can be defined in terms of 18 specific traits, the EPA-SSF sum score fails does not capture the theoretical construct. If you would take this test and get a high score, it doesn’t mean you are a psychopath. More importantly, research findings based on this measure do not help us to explore the nomological network of psychopathy.