In 2007, I was asked to review a ms. about the relationship between self-esteem and depression. The authors used a cross-lagged panel model to examine “prospective prediction” which is a code word for causal claims in a non-experimental study. The problem is that the cross-lagged model is fundamentally flawed because it ignores stable traits and underestimates stability. To compensate for this flaw, it uses cross-lagged paths which leads to false and inflated cross-lagged effects, especially from the more stable to the less stable construct.

I wrote a long and detailed review that was ignored by the editor and the authors and the flawed cross-lagged panel model was published (Orth, Robins, & Roberts, 2008). The article served as the basis for several follow up articles (Orth, Robins, Meier, & Conger, 2016; Rieger, Göllner, Trautwein, & Roberts, 2016; Orth, Robins, Widaman, & Conger, 2014; Orth & Robins, 2013; Sowislo & Orth, 2013; Kuster, Orth, Meier, 2012; Orth, Robins, Trzesniewski, Maes, & Schmitt, 2009; Orth, Robins, & Meier, 2009) and the main author continues to push the flawed cross-lagged panel model (Orth, Clark, Donnellan, & Robins, 2020), although he himself published a model with a trait factor to explain stability in self-esteem (Kuster & Orth, 2013). It is scientifically unjustified to omit this trait factor from bivariate models that relate self-esteem to depression, if ample evidence shows that a trait factor underlies stability of self-esteem (Kuster & Orth, 2013). So, an entire literature is based on a statistical artifact that has been well known four four decades (Rogosa, 1980).

I just found my old review while looking into a folder called “file drawer” and I thought I share it here. It just shows how peer-review doesn’t serve the purpose of quality control and that ambition often trumps the search for truth.

Review – Dec / 3 / 2017

This article tackles an important question: What is the causal relation between depression and self-esteem? As always, at the most abstract level there are three answers to this question. Self-esteem causes (low) depression. Depression causes (low) self-esteem. The correlation is due to a third unobserved variable. To complicate matters, these causal models are not mutually exclusive. It is possible that all three causal models contribute to the observed correlations between self-esteem and depression.

The authors hope to test causal models by means of longitudinal studies, and their empirical data are better than data from many previous studies to examine this question. However, the statistical analyses have some shortcomings that may lead to false inferences about causality.

The first important question is the definition of depression and self-esteem. Depression and self-esteem can be measured in different ways. Self-esteem measures can measure state self-esteem or self-esteem in general. Similarly, depression measures can ask about depressive symptoms over a short time interval (a few weeks) or dispositional depression. The nature of the measure will have a strong impact on the observed retest correlations, even after taking random measurement error into account.

In the empirical studies, self-esteem was measured with a questionnaire that asks about general tendencies (Rosenberg’s self-esteem scale). In contrast, depression was assessed by asking about symptoms within the preceding seven days (CES-D). Surprisingly, Study 1 shows no differences in the retest correlations of depression and self-esteem. Less surprising is the fact that in the absence of different stabilities, cross-lagged effects are small and hardly different from each other, whereas Study 2 shows differences in stability and asymmetrical patterns of cross-lagged coefficients. This pattern of results suggests that the cross-lagged coefficients are driven by the stability of the measures (see Rogosa, 1980, for an excellent discussion of cross-lagged panel studies).

The good news is that the authors’ data are suitable to test alternative models. One important alternative model would be a model that postulates two latent dispositions for depression and self-esteem (not a single common factor). The latent disposition would produce stability in depression and self-esteem over time. The lower retest correlations of depression would be due to more situational fluctuation of depressive symptoms. The model could allow for a correlation between the latent trait factors of depression and self-esteem. Based on Watson’s model, one would predict a very strong negative correlation between the two trait factors (but less than -1), while situational fluctuation of depression could be relatively weakly related to fluctuation in self-esteem.

The main difference between the cross-lagged model and the trait model concerns the pattern of correlations across different retest intervals. The cross-lagged model predicts a simplex structure (i.e., the magnitude of correlations decreases with increasing retest intervals). In contrast, the trait model predicts that retest correlations are unrelated to the length of the retest interval. With adequate statistical power, it is therefore possible to test these models against each other. With more complex statistical methods it is even possible to test a hybrid model that allows for all three causal effects (Kenny & Zautra, 1995).

The present manuscript simply presents one model with adequate model fit. However, model fit is largely influenced by the measurement model. The measurement model fits the data well because it is based on parcels (i.e., parcels are made to be parallel indicators of a construct and are bound to fit well). Therefore, the fit indices are insensitive to the specification of the longitudinal pattern of correlations. To illustrate, global fit is based on the fit to a correlation matrix with 276 parameters (3 indicators * 2 constructs * 4 waves = 24 indicators , 24 * 23 / 2 = 276 correlations). At the latent level, there are only 28 parameters (2 constructs * 4 waves = 8 latent factors, 8 * 7 / 2 = 28 parameters). The cross-lagged model constrains only 12 of these parameters (12 / 276 < 5%). Thus, the fit of the causal model should be evaluated in terms of the relative fit of the measurement model to the structural model. Table 2 shows the relevant information. Surprisingly, it shows only a difference of 6 degrees of freedom between Model 2 and 3, where I would have expected 12 degrees of freedom difference (?). More important, with six degrees of freedom, the chi-square difference is quite large 59. Although the qui-square test may be overly sensitive, it would be important to know why the model fit is not better. My guess is that the model underestimates long-term stability due to the failure to include a trait component. The same test for Study 2 suggests a better fit of the cross-lagged model in Study 2. However, even a good fit does not indicate that the model is correct. A trait model may fit the data as well or even better.

Regarding Study 1, the authors commit the common fallacy to interpret null-effects as evidence for the lack of a significant effect. Even if in Study 1, self-esteem was a significant (p < .05) lagged predictor of depression, and depression was not a significant (p > .05) lagged predictor of self-esteem, it is incorrect to conclude that self-esteem has an effect, but depression does not have an effect. Indeed, given the small magnitude of the two effects (-.04 vs -.10 in Figure 1) it is likely that these effects are not significantly different from each other (it is good practice in SEM studies to report confidence intervals, which would make it easier to interpret the results).

The limitation section does acknowledge that “the study designs do not allow for strong conclusions regarding the causal influence of self-esteem on depression” However, without more detail and explicit discussion of alternative models, the importance of this disclaimer in the fine print is lost to most readers unfamiliar with structural equation modeling, and the statement seems to contradict the conclusions drawn in the abstract and causal interpretations of the results in the discussion (e.g., Future research should seek to identify the mediating processes of the effect of self esteem on depression).

I have no theoretical reasons to favor any causal model. I am simply trying to point out that alternative models are plausible and likely to fit the data as well as those presented in the manuscript. At a minimum a revision should acknowledge this, and present the actual empirical data (correlation tables) to allow other researchers to test alternative models.