Category Archives: Measurement

Measuring Well-Being in the SOEP

Psychology has a measurement problem. Big claims about personality, self-esteem, or well-being are based on sum-scores of self-ratings; or sometimes a single rating. This would be a minor problem if thorough validation research had demonstrated that sum-scores of self-ratings are valid measures of the constructs they are intended to represent, but such validation research is often missing. As a result, the validity of widely used measures in psychology and claims based on these measures is unknown.

The well-being literature is an interesting example of the measurement crisis because two opposing views about the validity of well-being measures co-exist. On the one hand, experimental social psychologists argue that life-satisfaction ratings are invalid and useless (Schwarz & Strack, 1999); a view that has been popularized by Noble Laureate Daniel Kahneman in his book “Thinking: Fast and Slow” (cf. Schimmack, 2018). On the other hand, well-being scientists often assume that life-satisfaction ratings are near perfect indicators of individuals’ well-being.

An editor of JPSP, which presumably means he or she is an expert, has no problem to mention both positions in the same paragraph without noting the contradiction.

There is a huge literature on well-being. Since Schwarz and Strack (1999), to take that arbitrary year as a starting point, there have been more than 11,000 empirical articles with “wellbeing” (or well-being or well being) in the title, according to PsychInfo. The vast majority of them, I submit, take the subjective evaluation of one’s own life as a perfectly valid and perhaps the best way to assess one’s own evaluation of one’s life. “

So, since Schwarz and Strack concluded that life-satisfaction judgments are practically useless, 11,000 articles have used life-satisfaction judgments as perfectly valid measures of life-satisfaction and nobody thinks this is a problem. No wonder, natural scientists don’t consider psychology a science.

The Validity of Well-Being Measures

Any attempt at validating well-being measures requires a definition of well-being that leads to testable predictions about correlations of well-being measures with other measures. Testing these predictions is called construct validation (Cronbach & Meehl, 1955; Schimmack, 2019).

The theory underlying the use of life-satisfaction judgments as measures of well-being assumes that well-being is subjective and that (healthy, adult) individuals are able to compare their actual lives to their ideal lives and to report the outcome of these comparison processes (Andrews & Whithey, 1973; Diener, Lucas, Schimmack, & Helliwell, 2009).

One prediction that follows from this model is that global life-satisfaction judgments should be correlated with judgments of satisfaction in important life domains, but not in unimportant life domains. The reason is that satisfaction with life as a whole should be related to satisfaction with (important) parts. It would make little sense for somebody to say that they are extremely satisfied with their life as a whole, but not satisfied with their family life, work, health, or anything else that matters to them. The whole point of asking a global question is the assumption that people will consider all important aspects of their lives and integrate this information into a global judgment (Andrews & Whithey, 1973). The main criticism of Schwarz and Strack (1999) was that this assumption does not describe the actual judgment process and that actual life-satisfaction judgments are based on transient and irrelevant information (e.g., current mood, Schwarz & Clore, 1983).

Top-Down vs. Bottom-Up Theories of Global and Domain Satisfaction

To muddy the waters, Diener (1984) proposed on the one hand that life-satisfaction judgments are, at least somewhat, valid indicators of life-satisfaction, while also proposing that correlations between satisfaction with life as a whole and satisfaction with domains might reflect a top-down effect.

A top-down effect implies that global life-satisfaction influences domain satisfaction. That is, health satisfaction is not a cause of life-satisfaction because good health is an important part of a good life. Instead, life-satisfaction is a content-free feeling of satisfaction that creates a halo in evaluations of specific life aspects independent of the specific evaluations of a life domain.

Diener overlooked that top-down processes invalidate life-satisfaction judgments as valid measures of wellbeing because a top-down model implies that global life-satisfaction judgments reflect only a general disposition to be satisfied without information about the actual satisfaction in important life domains. In the context of a measurement model, we can see that the top-down model implies that life-satisfaction judgments only capture the shared variance among specific life-satisfaction judgments, but fail to represent the part of satisfaction that reflects unique variance in satisfaction with specific life domains. In other words, top-down models imply that well-being does not encompass evaluations of the parts that make up an individuals entire life.

The problem that measurement models in psychology often consider unique or residual variances error variances that are often omitted from figures does not help. In the figure, the residual variances are shown and represent variation in life-aspects that are not shared across domains.

Some influential articles that examined top-down and bottom-up processes have argued in favor of top-down processes without noticing that this invalidates the use of life-satisfaction judgments as indicators of well-being or at least requires a radically different conception of well-being (well-being is being satisfied independent of how things are actually going in your life) (Heller, Watson, & Ilies, 2004).

An Integrative Top-Down vs. Bottom-Up Model

Brief et al. (1993) proposed an integrative model of top-down and bottom-up processes in life-satisfaction judgments. The main improvement of this model was to distinguish between a global disposition to be more satisfied and a global judgment of important aspects of life. As life-satisfaction judgments are meant to represent the latter, life-satisfaction judgments are the ultimate outcome of interest, not a measure of the global disposition. Brief et al. (1993) used neuroticism as an indicator for the global disposition to be less satisfied, but there are probably other factors that can contribute to a general disposition to be satisfied. The integrative model assumes that any influence of the general disposition is mediated by satisfaction with important life domains (e.g., health).

FIGURE 1. DisSat = Dispositional Satisfaction, DS1 = Domain Satisfaction 1 (e.g., health), DS2 = Domain Satisfaction 2, DS3 = Domain Satisfaction 3, LS = Life-Satisfaction.

It is important to realize that the mediation model separates two variances in domain satisfaction judgments, namely the variance that is explained by dispositional satisfaction and the variance that is not explained by dispositional satisfaction (residual variance). Both variances contribute to life-satisfaction. Thus, objective aspects of health that contribute to health satisfaction can also influence life-satisfaction. This makes the model an integrative model that allows for top-down and bottom-up effects.

One limitation of Brief et al.’s (1993) model was the use of neuroticism as sole indicator of dispositional satisfaction. While it is plausible that neuroticism is linked to more negative perceptions of all kinds of life-aspects, it may not be the only trait that matters.

Another limitation was the use of a health satisfaction as a single life domain. If people also care about other life domains, other domain satisfactions should also contribute to life-satisfaction and they could be additional mediators of the influence of neuroticism on life-satisfaction. For example, neurotic individuals might also worry more about money and financial satisfaction could influence life-satisfaction, making financial satisfaction another mediator of the influence of neuroticism on life-satisfaction.

One advantage of structural equation modeling is the ability to study constructs that do not have a direct indicator. This makes it possible to examine top-down effects without “direct” indicators of dispositional satisfaction. The reason is that dispositional satisfaction should influence satisfaction with various life domains. Thus, dispositional satisfaction is reflected in the shared variance among different domain satisfaction judgments and domain satisfaction judgments serve as indicators that can be used to measure dispositional satisfaction (see Figure 2).

Domain Satisfactions in the SOEP

It is fortunate that the creators of the Socio-Economic Panel in the 1980s included domain satisfaction measures and that these measures have been included in every wave from 1984 to 2017. This makes it possible to test the integrative top-down bottom-up model with the SOEP data.

The five domains that have been included in all surveys are health, household income, recreation, housing, and job satisfaction. However, job satisfaction is only available for those participants who are employed. To maximize the number of domains, I used all five domains and limited the analysis to working participants. The model can be used to build a model with four domains for all participants.

One limitation of the SOEP is the use of single-item indicators. This makes sense for expensive panel studies, but creates some psychometric problems. Fortunately, it is possible to estimate the reliability of single-item indicators in panel data by using Heise’s (1969) model which estimates reliability based on the pattern of retest correlations for three waves of data.

REL = r12 * r23 / r13

More data would be better and are available, but the goal was to combine the well-being model with a model of personality ratings that are available for only three waves (2005, 2009, & 2013). Thus, the same three waves for used to create an integrative top-down bottom-up model that also examined how domain satisfaction is related to global life-satisfaction across time.

The data set consisted of 3 repeated measures of 5 domain satisfaction judgments and a single life-satisfaction judgments for a total of 18 variables. The data were analyzed with MPLUS (see OSF for syntax and detailed results https://osf.io/vpcfd/ ).

Results

Overall model fit was acceptable, CFI = .988, RMSEA = .023, SRMR = .029.

The first results are the reliability and stability estimates of the five domain satisfactions and global life satisfaction (Table 1). For comparison purposes, the last column shows the estimates based on a panel analyses with annual retests (Schimmack, Krause, Wagner, & Schupp, 2010). The results show fairly consistent stability across domains with the exception of job satisfaction. Job satisfaction is less stable than other domains. The four-year stability is high, but not as high as for personality traits (Schimmack, 2019). A comparison with the panel data shows higher stability, which indicates that some of the error variance in 4-year retest studies is reliable variance that fluctuates over the four-year retest period. However, the key finding is that there is high stability in domain satisfaction judgments and life-satisfaction judgments. which makes it theoretically interesting to examine the relationship between the stable variances in domain satisfaction and life-satisfaction.

ReliabilityStability1Y-StabilityPanel
Job Satisfaction0.620.620.89
Health Satisfaction0.670.790.940.93
Financial Satisfaction0.740.810.950.91
Housing Satisfaction0.660.810.950.89
Leisure Satisfaction0.670.800.950.92
Life Satisfaction0.660.780.940.89

Table 2 examines the influence of top-down processes on domain satisfaction. Results show the factor loadings of domain satisfaction on a common factor that reflects dispositional satisfaction; that is, a general disposition to report higher levels of satisfaction. The results show that somewhere between 30% and 50% of the reliable variance in life-satisfaction judgments is explained by a general disposition factor. While this leaves ample room for domain-specific factors to influence domain satisfaction judgments, the results show a strong top-down influence.

T1T2T3
Job Satisfaction0.690.680.68
Health Satisfaction0.680.660.65
Financial Satisfaction0.600.610.63
Housing Satisfaction0.720.740.76
Leisure Satisfaction0.610.610.61

Table 3 shows the unique contribution of the disposition and the five domains to life-satisfaction concurrently and longitudinally.

DS1-LS1DS1-LS2DS1-LS3DS2-LS2DS2-LS3DS3-LS3
Disposition0.560.590.570.610.590.60
Job 0.140.100.050.170.080.12
Health0.230.220.210.280.270.33
Finances0.340.200.140.240.180.22
Housing0.040.030.030.040.040.06
Leisure0.060.100.060.130.070.09

The first notable finding is that the disposition factor accounts for the lion share of the explained variance in life-satisfaction judgments. The second important finding is that the relationship is very stable over time. The disposition measured at time 1 is an equally good predictor of life-satisfaction at time 1 (r = .56), time 2 (r = .59), and at time 3 (r = .57). This suggests that about one-third of the reliable variance in life-satisfaction judgments reflects a stable disposition to report higher or lower levels of satisfaction.

Regarding domain satisfaction, health is the strongest predictor with correlations between .21 and .33. Finances is the second strongest predictor with correlations between .14 and .34. For health satisfaction there is high stability over time. That is, time 1 health satisfaction predicts time 1 life-satisfaction nearly as well (r = .23) as time 3 life-satisfaction (r = .21). In contrast, financial satisfactions shows a bit more change over time with concurrent correlations at time 1 of r = .34 and a drop to r = .14 for life-satisfaction at time 3. This suggests that changes in financial satisfaction produces changes in life-satisfaction.

Job satisfaction has a weak influence on life-satisfaction with correlations ranging from r = .14 to .05. Like financial satisfaction, there is some evidence that changes in job satisfaction predict changes in life-satisfaction.

Housing and leisure have hardly any influence on life-satisfaction judgments with most relationships being less than .10. There is also no evidence that changes in these domain produce changes in life-satisfaction judgments.

These results show that most of the reliable variance in global life-satisfaction judgments remains unexplained and that a stable disposition accounts for most of the explained variance in life-satisfaction judgments.

Implications for the Validity of Life-Satisfaction Judgments

There are two ways to interpret the results. One interpretation is that is common in the well-being literature and hundreds of studies with the SOEP data is that life-satisfaction judgments are valid measures of well-being. Accordingly, well-being in Germany is determined mostly by a stable disposition to be satisfied. Accordingly, changing actual life-circumstances will have negligible effects on well-being. For example, Nakazato et al. (2011) used the SOEP data to examine the influence of moving on well-being. They found that decreasing housing satisfaction triggered a decision to move and that moving produces lasting increases in housing satisfaction. However, moving had no effect on life-satisfaction. This is not surprising given the present results that housing satisfaction has a negligible influence on life-satisfaction judgments. Thus, we would conclude that people are irrational by investing money in a better house, if we assume that life-satisfaction judgments are a perfectly valid measure of well-being.

The alternative interpretation is that life-satisfaction judgments are not as good as well-being researchers think they are. Rather than reflecting a weighted summary of all important aspects of life, they are based on accessible information that does not include all relevant information. The difference to Schwarz and Strack’s (1999) criticism is that bias is not due to temporarily accessible information (e.g., mood) that makes life-satisfaction judgments unreliable. As demonstrated here and elsewhere, a large portion of the variance in life-satisfaction judgments is stable. The problem is that the stable factors may be biases in life-satisfaction ratings rather than real determinants of well-being.

It is unfortunate that psychologist and other social sciences have neglected proper validation research of a measure that has been used to make major empirical claims about the determinants of well-being, and that this research has been used to make policy recommendation (Diener, Lucas, Schimmack, & Helliwell, 2009). The present results suggest that any policy recommendations based on life-satisfaction ratings alone are premature. It is time to take measurement more seriously and to improve the validity of measuring well-being.

Measuring Personality in the SOEP

The German Socio-Economic-Panel (SOEP) is a longitudinal study of German households. The core questions address economic issues, work, health, and well-being. However, additional questions are sometimes added. In 2005, the SOEP included a 15-item measure of the Big Five; the so-called BFI-S (Lang et al., 2011). As each personality dimension is measured with only three items, scale scores are rather unreliable measures of the Big Five. A superior way to examine personality in the SOEP is to build a measurement model that relates observed item scores to latent factors that represent the Big Five.

Anusic et al. (2009) proposed a latent variable model for an English version of the BFI-S.

The most important feature of this model is the modeling of method factors in personality ratings. An acquiescence factor accounts for general response tendencies independent of item content. In addition, a halo factor accounts for evaluative bias that inflates correlations between two desirable or two undesirable items and attenuates correlations between a desirable and an undesirable item. The Figure shows that the halo factor is bias because it correlates highly with evaluative bias in ratings of intelligence and attractiveness.

The model also includes a higher-order factor that accounts for a correlation between extraversion and openness.

Since the article was published I have modified the model in two ways. First, the Big Five are conceptualized as fully independent which is in accordance with the original theory. Rather than allowing for correlations among Big Five factors, secondary loadings are used to allow for relationships between extraversion and openness items. Second, halo bias is modeled as a characteristic of individual items rather than the Big Five. This approach is preferable because some items have low loadings on halo.

Figure 2 shows the new model.

I fitted this model to the 2005 data using MPLUS (syntax and output: https://osf.io/vpcfd/ ). The model had acceptable fit to the data, CFI = .962, RMSEA = .035, SRMR = .029.

Table 1 shows the factor loadings. It also shows the correlation of the sum scores with the latent factors.

Item#NEOACEVBACQ
Neuroticism
worried50.49-0.020.19
nervous100.64-0.310.18
relaxed15-0.550.350.21
SUM0.750.000.000.000.00-0.300.09
Extraversion
talkative20.600.130.400.23
sociable80.640.370.22
reserved12-0.520.20-0.110.19
SUM0.000.750.00-0.100.050.360.09
Openess
original40.260.41-0.330.380.22
artistic90.150.360.290.17
imaginative140.300.550.220.21
SUM0.000.300.57-0.130.000.390.26
Agreeableness
rude30.12-0.51-0.320.19
forgiving60.230.320.24
considerate130.490.480.29
SUM0.00-0.070.000.580.000.500.11
Conscientiousness
thorough10.710.350.30
lazy7-0.16-0.41-0.350.20
efficient110.390.480.28
SUM0.000.000.000.090.640.510.11

The results show that all items load on their primary factor although some loadings are very small (e.g., forgiving). Secondary loadings tend to be small (< .2), although they are highly significant in the large sample. All items load on the evaluative bias factor, with some fairly large loadings for considerate, efficient, and talkative. Reserved is the most evaluatively neutral item. Acquiescence bias is rather weak.

The scale scores are most strongly related to the intended latent factor. The relationship is fairly strong for neuroticism and extraversion, suggesting that about 50% of the variance in scale scores reflects the intended construct. However, for the other three dimensions, correlations suggest that less than 50% of the variance reflects the intended construct. Moreover, the remaining variance is not just random measurement error. Evaluative bias contributes from 10% up to 25% of additional variance. Acquiescence bias plays a minor role because most scales have a reverse scored item. Openness is an exception and acquiescence bias contributes 10% of the variance in scores on the Openness scale.

Given the good fit of this model, I recommend it for studies that want to examine correlates of the Big Five or that want to compare groups. Using this model will produce better estimates of effect sizes and control for spurious relationships due to method factors.