Category Archives: Top-Down

Construct Validity of the Satisfaction with Life Scale

With close to 10,000 citations in WebofScience, Ed Diener’s article that introduced the “Satisfaction with Life Scale” (SWLS) is a citation classic in well-being science. While single-item measures are used in large national representative surveys (e.g., General Social Survey, German Socio-Economic Panel, World Value Survey), psychologists prefer multi-item scales because they have higher reliability and therewith also higher validity.

Study 1 in Diener et al. (1985) demonstrated that the SWLS shows convergent validity with single-item measures like Cantril’s ladder, r = .62, .66), and Andrews and Withey’s Delighted-Terrible scale, r = .68, .62. Attesting to the higher reliability of the 5-item SWLS is the finding that the internal consistency was .87 and the retest reliability was r = .82. These results suggest that the SWLS and single-item measures measure a single construct with different amounts of random measurement error.

The important question for well-being scientists who use the SWLS and other global well-being measures is whether these items measure what they are intended to measure. To answer this question, we need to know what life-satisfaction measures are intended to measure.

Diener et al. (1985) draw on Andrews and Withey’s (1976) model of well-being perceptions. Accordingly, life-satisfaction judgments are based on subjective evaluations of important concerns.

Judgments of satisfaction are dependent upon a comparison of one’s circumstances with what is thought to be an appropriate standard. It is important to point out that the judgment of how satisfied people are with their present state of affairs is based on a comparison with a standard which each individual sets for him· or herself; it is not externally imposed. It is a hallmark of the subjective well-being area that it centers on the person’s own judgments, not upon some criterion which is judged to be important by the researcher (Diener, 1984).

This definition of life-satisfaction makes two important points. First, it is assumed that respondents are thinking about their circumstances when they judge their life-satisfaction. That is, we we can think about life-satisfaction as an attitude with an individual’s life as the attitude object. Just like individuals are assumed to think about the important features of Coca Cola when they are asked to report their attitudes towards Coca Cola, respondents are assumed to think about the important features of their lives, when they report their attitudes towards their lives.

The second part of the definition makes it clear that attitudes towards lives are based on subjectively chosen criteria to evaluate lives. Just like individuals may like the taste of Coke or dislike the taste of Coke, the same life circumstance can be evaluated differently by different individuals. Some may be extremely satisfied with an income of $100,000 and some may be extremely dissatisfied with the same income. For students, some students may be happy with a GPA of 2.9, others may be unhappy with the same GPA. The reason is that the evaluation criteria or standards can very across individuals and that there is no objective criterion that is used to evaluate life circumstances. This makes life-satisfaction judgments an indicator of subjective well-being.

The reliance on subjective evaluation criteria also implies that individuals can give different weights to different life domains. For some people, family life may be the most important domain, for others it may be work (Andrews & Withey, 1976). The same point is made by Diener et al. (1985).

For example, although health, energy, and so forth may be desirable, particular individuals may place different values on them. It is for this reason that ,we need to ask the person for their overall evaluation of their life, rather than summing across their satisfaction with specific domains, to obtain a measure of overall life-satisfaction (p. 71).

This point makes sense. If life-satisfaction judgments on evaluations of life circumstances and individuals place different emphasis on different life domains, more important domains should have a stronger influence on global life-satisfaction judgments (Schimmack, Diener, & Oishi, 2002). However, starting with Andrews and Withey (1976), empirical tests of this prediction have failed to confirm it. When individuals are asked to rate the importance of life domains, and these weights are used to compute a weighted average, the weighted average is not a better predictor of global judgments than a simple unweighted average (Rohrer & Schmukle, 2018).

Although this fact has been known since 1974, its theoretical significance has been ignored. There are two possible interpretations of this finding. On the one hand, it could be that importance ratings are invalid. That is, people don’t really know what is important to them and the actual importance is best revealed by the regression weights when global life-satisfaction ratings are regressed on domain satisfaction either across participants or within-participants over time. The alternative explanation is more troubling. In this case, global life-satisfaction judgments are invalid. Maybe these judgments are not based on subjective evaluations of life-circumstances.

Schwarz and Strack (1999) made the point that global life-satisfaction judgments are based on quick heuristics that produce invalid information. The problem of their criticism is that they focused on unstable sources such as mood or temporarily accessible information as the main sources of life-satisfaction judgments. This model fails to explain the high temporal stability of life-satisfaction judgments. (Schimmack & Oishi, 2005).

However, it is possible that stable factors produce systematic method variance in life-satisfaction judgments. For example, Andrews and Withey (1976) suggested that halo bias could influence ratings of domain satisfaction and life-satisfaction. They used informant ratings to rule out this possibility, but their test of this hypothesis was statistically flawed (Schimmack, 2019). Thus, it is possible that a substantial portion of the reliable variance in SWLS scores is halo bias.

Diener et al. (1985) tried to address the problem of systematic measurement error in two ways. First, they included the Marlowe-Crowne Social Desirability (MCSD) scale to measure social desirable responding and found no correlation with SWLS scores, r = .02. The problem is that the MCSD is not a valid measure of socially desriable responding or halo bias, but rather a measure of agreeableness and conscientiousness. Thus, the correlation is better interpreted as evidence that life-satisfaction is fairly independent of these personality traits. Second, Study 3 with 53 elderly residents of Urbana-Champaign included an interview with two trained interviewers. Afterwards, the interviewers made ratings of the interviewees’ well-being. The averaged interviewer’ ratings correlated r = .43 with the self-ratings of well-being. The problem here is that individuals who are motivated to present a positive image in their SWLS ratings are also likely to present a positive image in an interview. Moreover, the conveyed sense of well-being could reflect individuals’ personality more than their life-circumstances. Thus, it is not clear how much of the agreement between self-ratings and interviewer-ratings reflects evaluations of actual life-circumstances.

The most recent review article by Ed Diener was published last year; “Advances and Open Questions in the Science of Subjective Well-Being” (Diener, Lucas, & Oishi, 2018). The article makes it clear that the construct has not changed since 1985.

Subjective well-being (SWB) reflects an overall evaluation of the quality of a person’s life from her or his own perspective” (p. 1).

As the term implies, SWB refers to the extent to which a person believes or feels that his or her life is going well. The descriptor “subjective” serves to define and limit the scope of the construct: SWB researchers are interested in evaluations of the quality of a person’s life from that person’s own perspective.” (p. 2)

The authors also explicitly state that subjective well-being measures are subjective because individuals can focus on different aspects of their lives depending on their importance to them.

it is the subjective nature of the construct that gives it its power. This is due to the fact that different people likely weight different objective circumstances differently depending on their goals, their values, and even their culture” (p. 3).

The fact that global measures allow individuals to assign different weights to different domains is seen as a strength.

Presumably, subjective evaluations of quality of life reflect these idiosyncratic reactions to objective life circumstances in ways that alternative approaches (such as the objective list
approach) cannot. Thus, when evaluating the impact of events, interventions, or public-policy decisions on quality of life, subjective evaluations may provide a better mechanism for assessment than alternative, objective approaches
(p. 3).

The problem is that this claim requires empirical evidence to show that global life-satisfaction judgments are indeed more valid measures of subjective well-being than simple averages because they properly weigh information in accordance with individuals’ subjective preferences, and since 1976 this evidence has been lacking.

Diener et al.’s (2018) review glosses over this glaring problem for the construct validity of the SWLS and other global well-being measures.

Because most measures are simple self-reports, considerable research addresses the psychometric properties of these types of assessments. This research consistently shows that existing self-report measures exhibit strong psychometric properties including high internal consistency when multiple-item measures are used; moderately strong test-retest reliability, especially over short periods of time; reasonable convergence with alternative measures (especially those that have also been shown to have high levels of reliability and validity); and theoretically meaningful patterns of associations with other constructs and criteria (see Diener et al., 2009, and Diener, Inglehart, & Tay, 2013, for reviews). There is little debate about the quality of SWB measures when evaluated using these traditional criteria.

While it is true that there is little debate, this does not mean that there is strong evidence for the construct validity of the SWLS. The open question is how much respondents are really conducting a memory search for information about important life domains, evaluate these domains based on subjective criteria, and then report an overall summary of these evaluations. If so, subjective importance weights should improve predictions, but they often do not. Moreover, in regression models individual life domains often contribute small amounts of unique variance (Andrews & Withey, 1976), and some important aspects like health often account for close to zero percent of the variance in life-satisfaction judgments.

Convergent Validity

One key feature of construct validity is convergent validity between two independent methods that measure the same construct (Campbell & Fiske, 1959). Ideally, multiple methods are used and it is possible to examine whether the pattern of correlations matches theoretical predictions (Cronbach & Meehl, 1955; Schimmack, 2019). Diener et al. (2018) mention some evidence of convergent validity.

For example, Schneider and Schimmack (2009) conducted a meta-analysis of the correlation between self and informant reports, and they found that there is reasonable agreement (r = .42) between these two methods of assessing SWB.

The problem with this evidence is that the correlation between two measures only shows that both methods are valid, but it is not possible to quantify the amount of valid variance in self-ratings or informant ratings, which requires at least three methods (Andrews & Withey, 1976; Zou, Schimmack, & Gere, 2013). Theoretically, it would be possible that most of the variance in self-ratings is valid and that informant ratings are rather invalid. This is what Andrews and Withey (1976) claimed with estimates of 65% valid variance in self-ratings and 15% valid variance in informant ratings, with a correlation of r = .32. However, their model was incorrect and allowed for method variance in self-ratings to inflate the factor loading of self-ratings.

Zou et al. (2013) avoided this problem by using self-ratings and ratings by two informants as independent methods and found no evidence that self-ratings are more valid than informant ratings; a finding that is mirrored in ratings of personality traits (Anusic et al., 2009). Thus, a correlation of r = .3, implies that 30% of the variance in self-ratings is valid and 30% of the variance in informant ratings is valid.

While this evidence shows that self-ratings of life-satisfaction show convergent validity with informant ratings, it also shows that a substantial portion of the reliable variance in self-ratings is not shared with informants. Moreover, it is not clear what information produces agreement between self-ratings and informant ratings. This question has received surprisingly little attention, although it is critical for the construct validity of life-satisfaction judgments. Two articles have examined this question with opposite conclusions. Schneider and Schimmack (2010) found some evidence that satisfaction in important life domains contributed to self-informant agreement. This finding would support the bottom-up model of well-being judgments that raters are actually considering life circumstances when they make well-being judgments. In contrast, Dobewall, Realo, Allik, Esko, andMetspalu (2013) proposed that personality traits like depression and cheerfulness accounted for self-informant agreement. In this case, informants do not need ot know anything about life circumstances. All they need to know is whether an individual has a positive or negative lens to evaluate their lives. If informants are not using information about life circumstances, they cannot be used to validate self-ratings to show that self-ratings are based on evaluations of life circumstances.

Diener et al. (2018) cite a number of additional findings as evidence of convergent validity.

Physiological measures, including brain activity (Davidson, 2004) and hormones (Buchanan, al’Absi, & Lovallo, 1999), along with behavioral measures such as the amount of smiling (e.g., Oettingen & Seligman, 1990; Seder & Oishi, 2012) and patterns of online behaviors (Schwartz, Eichstaedt, Kern, Dziurzynski, Agrawal et al., 2013) have also been used to assess SWB. (p. 7).

This evidence has several limitations. First, hormones do not reflect evaluations and are at best indirectly related to life-evaluations. Asymmetries in prefrontal brain activity (Davidson, 2004) have been shown to reflect approach and avoidance motivation more than pleasure and displeasure, and brain activity is a better measure of momentary states than the evaluation of fairly stable life circumstances. Finally, they also may reflect individuals’ personality more than their life circumstances. The same is true for the behavioral measures. Most important, correlations with a single indicators do not provide information about the amount of valid variance in life-satisfaction judgments. To quantify validity it is necessary to examine these findings within a causal network (Schimmack, 2019).

Diener et al. (2019) agree with my assessment in their final conclusions about measurement of subjective well-being.

The first (and perhaps least controversial) is that many open questions remain
regarding the associations among different SWB measures and the extent to which these measures map on to theoretical expectations; therefore, understanding how the measures relate and how they diverge will continue to be one of the most important goals of research in the area of SWB. Although different camps have emerged that advocate for one set of measures over others, we believe that such advocacy is premature. More research is needed about the strengths, weaknesses, and relative merits of the various approaches to measurement that we have documented in this review
(p. 7).

The problem is that well-being scientists have made no progress on this front since Andrews and Withey (1976) conducted the first thorough construct validation studies. The reason is that social and personality psychology suffers from a validation crisis (Schimmack, 2019). Researchers simply assume that measures are valid rather than testing it or they use necessary, but insufficient criteria like internal consistency (alpha), retest reliability as evidence. Moreover, there is a tendency to ignore inconvenient findings. As a result, 40 years after Andrews and Withey’s (1976) seminal article was published, it remains unclear (a) whether respondents aggregate information about important life domains to make global judgments, (b) how much of the variance in life-satisfaction judgments is valid, and (c) which factors produce systematic biases in life-satisfaction judgments that may lead to false conclusions about the causes of life-satisfaction and to false policy recommendations.

Health is probably the best example to illustrate the importance of valid measurement of subjective well-being. It makes intuitive sense that health has an influence on well-being. Illness often disables individuals from pursuing their goals and enjoying life as everybody who had the flu knows. Diener et al. (2018) agree.

“One life circumstance that might play a prominent role in subjective well-being is a person’s health” (p. 15).

It is also difficult to see how there could be dramatic individual differences in the criteria that are used to evaluate health. Sure, fitness levels may be a matter of personal preference, but nobody is enjoying a stroke, heart attack, or cancer, or even having the flu.

Thus, it was a surprising finding that health seemed to have a small influence on global well-being judgments.

“Initial research on the topic of health conditions often concluded that health played only a minor role in wellbeing judgments (Diener et al., 1999; Okun, Stock, Haring,
& Witter, 1984).”

More problematic was the finding that subjective evaluations of health seemed to play no role in these judgments in multivariate analyses that controlled for shared variance among ratings of several life domains. For example, in Andrews and Withey’s (1976) studies satisfaction with health contributed only 1% unique variance in the global measure.

In contrast, direct importance ratings show that health is rated as the second most important domain (Rohrer & Schmukle, 2018).

Thus, we have to conclude that health doesn’t seem to matter for people’s subjective well-being. Or we can conclude that global measures are (partially) invalid measures because respondents do not weigh life domains in accordance with their importance. This question clearly has policy relevance as health care costs are a large part of wealthy nations’ GDP and financing health care is a controversial political issue, especially in the United States. Why would this be the case, if health is actually not important for well-being. We could argue that it is important for life expectancy (Veenhoven’s happy life-years) or that it matters for objective well-being, but not for subjective well-being, but clearly the question why health satisfaction plays a small role in global measures of subjective well-being is an important one. The problem is that 40 years of well-being science have passed without addressing this important question. But as they say, better late than never. So, let’s get on with it and figure out how responses to global well-being questions are made and whether these cognitive processes are in line with the theoretical model of subjective well-being.

Measuring Well-Being in the SOEP

Psychology has a measurement problem. Big claims about personality, self-esteem, or well-being are based on sum-scores of self-ratings; or sometimes a single rating. This would be a minor problem if thorough validation research had demonstrated that sum-scores of self-ratings are valid measures of the constructs they are intended to represent, but such validation research is often missing. As a result, the validity of widely used measures in psychology and claims based on these measures is unknown.

The well-being literature is an interesting example of the measurement crisis because two opposing views about the validity of well-being measures co-exist. On the one hand, experimental social psychologists argue that life-satisfaction ratings are invalid and useless (Schwarz & Strack, 1999); a view that has been popularized by Noble Laureate Daniel Kahneman in his book “Thinking: Fast and Slow” (cf. Schimmack, 2018). On the other hand, well-being scientists often assume that life-satisfaction ratings are near perfect indicators of individuals’ well-being.

An editor of JPSP, which presumably means he or she is an expert, has no problem to mention both positions in the same paragraph without noting the contradiction.

There is a huge literature on well-being. Since Schwarz and Strack (1999), to take that arbitrary year as a starting point, there have been more than 11,000 empirical articles with “wellbeing” (or well-being or well being) in the title, according to PsychInfo. The vast majority of them, I submit, take the subjective evaluation of one’s own life as a perfectly valid and perhaps the best way to assess one’s own evaluation of one’s life. “

So, since Schwarz and Strack concluded that life-satisfaction judgments are practically useless, 11,000 articles have used life-satisfaction judgments as perfectly valid measures of life-satisfaction and nobody thinks this is a problem. No wonder, natural scientists don’t consider psychology a science.

The Validity of Well-Being Measures

Any attempt at validating well-being measures requires a definition of well-being that leads to testable predictions about correlations of well-being measures with other measures. Testing these predictions is called construct validation (Cronbach & Meehl, 1955; Schimmack, 2019).

The theory underlying the use of life-satisfaction judgments as measures of well-being assumes that well-being is subjective and that (healthy, adult) individuals are able to compare their actual lives to their ideal lives and to report the outcome of these comparison processes (Andrews & Whithey, 1973; Diener, Lucas, Schimmack, & Helliwell, 2009).

One prediction that follows from this model is that global life-satisfaction judgments should be correlated with judgments of satisfaction in important life domains, but not in unimportant life domains. The reason is that satisfaction with life as a whole should be related to satisfaction with (important) parts. It would make little sense for somebody to say that they are extremely satisfied with their life as a whole, but not satisfied with their family life, work, health, or anything else that matters to them. The whole point of asking a global question is the assumption that people will consider all important aspects of their lives and integrate this information into a global judgment (Andrews & Whithey, 1973). The main criticism of Schwarz and Strack (1999) was that this assumption does not describe the actual judgment process and that actual life-satisfaction judgments are based on transient and irrelevant information (e.g., current mood, Schwarz & Clore, 1983).

Top-Down vs. Bottom-Up Theories of Global and Domain Satisfaction

To muddy the waters, Diener (1984) proposed on the one hand that life-satisfaction judgments are, at least somewhat, valid indicators of life-satisfaction, while also proposing that correlations between satisfaction with life as a whole and satisfaction with domains might reflect a top-down effect.

A top-down effect implies that global life-satisfaction influences domain satisfaction. That is, health satisfaction is not a cause of life-satisfaction because good health is an important part of a good life. Instead, life-satisfaction is a content-free feeling of satisfaction that creates a halo in evaluations of specific life aspects independent of the specific evaluations of a life domain.

Diener overlooked that top-down processes invalidate life-satisfaction judgments as valid measures of wellbeing because a top-down model implies that global life-satisfaction judgments reflect only a general disposition to be satisfied without information about the actual satisfaction in important life domains. In the context of a measurement model, we can see that the top-down model implies that life-satisfaction judgments only capture the shared variance among specific life-satisfaction judgments, but fail to represent the part of satisfaction that reflects unique variance in satisfaction with specific life domains. In other words, top-down models imply that well-being does not encompass evaluations of the parts that make up an individuals entire life.

The problem that measurement models in psychology often consider unique or residual variances error variances that are often omitted from figures does not help. In the figure, the residual variances are shown and represent variation in life-aspects that are not shared across domains.

Some influential articles that examined top-down and bottom-up processes have argued in favor of top-down processes without noticing that this invalidates the use of life-satisfaction judgments as indicators of well-being or at least requires a radically different conception of well-being (well-being is being satisfied independent of how things are actually going in your life) (Heller, Watson, & Ilies, 2004).

An Integrative Top-Down vs. Bottom-Up Model

Brief et al. (1993) proposed an integrative model of top-down and bottom-up processes in life-satisfaction judgments. The main improvement of this model was to distinguish between a global disposition to be more satisfied and a global judgment of important aspects of life. As life-satisfaction judgments are meant to represent the latter, life-satisfaction judgments are the ultimate outcome of interest, not a measure of the global disposition. Brief et al. (1993) used neuroticism as an indicator for the global disposition to be less satisfied, but there are probably other factors that can contribute to a general disposition to be satisfied. The integrative model assumes that any influence of the general disposition is mediated by satisfaction with important life domains (e.g., health).

FIGURE 1. DisSat = Dispositional Satisfaction, DS1 = Domain Satisfaction 1 (e.g., health), DS2 = Domain Satisfaction 2, DS3 = Domain Satisfaction 3, LS = Life-Satisfaction.

It is important to realize that the mediation model separates two variances in domain satisfaction judgments, namely the variance that is explained by dispositional satisfaction and the variance that is not explained by dispositional satisfaction (residual variance). Both variances contribute to life-satisfaction. Thus, objective aspects of health that contribute to health satisfaction can also influence life-satisfaction. This makes the model an integrative model that allows for top-down and bottom-up effects.

One limitation of Brief et al.’s (1993) model was the use of neuroticism as sole indicator of dispositional satisfaction. While it is plausible that neuroticism is linked to more negative perceptions of all kinds of life-aspects, it may not be the only trait that matters.

Another limitation was the use of a health satisfaction as a single life domain. If people also care about other life domains, other domain satisfactions should also contribute to life-satisfaction and they could be additional mediators of the influence of neuroticism on life-satisfaction. For example, neurotic individuals might also worry more about money and financial satisfaction could influence life-satisfaction, making financial satisfaction another mediator of the influence of neuroticism on life-satisfaction.

One advantage of structural equation modeling is the ability to study constructs that do not have a direct indicator. This makes it possible to examine top-down effects without “direct” indicators of dispositional satisfaction. The reason is that dispositional satisfaction should influence satisfaction with various life domains. Thus, dispositional satisfaction is reflected in the shared variance among different domain satisfaction judgments and domain satisfaction judgments serve as indicators that can be used to measure dispositional satisfaction (see Figure 2).

Domain Satisfactions in the SOEP

It is fortunate that the creators of the Socio-Economic Panel in the 1980s included domain satisfaction measures and that these measures have been included in every wave from 1984 to 2017. This makes it possible to test the integrative top-down bottom-up model with the SOEP data.

The five domains that have been included in all surveys are health, household income, recreation, housing, and job satisfaction. However, job satisfaction is only available for those participants who are employed. To maximize the number of domains, I used all five domains and limited the analysis to working participants. The model can be used to build a model with four domains for all participants.

One limitation of the SOEP is the use of single-item indicators. This makes sense for expensive panel studies, but creates some psychometric problems. Fortunately, it is possible to estimate the reliability of single-item indicators in panel data by using Heise’s (1969) model which estimates reliability based on the pattern of retest correlations for three waves of data.

REL = r12 * r23 / r13

More data would be better and are available, but the goal was to combine the well-being model with a model of personality ratings that are available for only three waves (2005, 2009, & 2013). Thus, the same three waves for used to create an integrative top-down bottom-up model that also examined how domain satisfaction is related to global life-satisfaction across time.

The data set consisted of 3 repeated measures of 5 domain satisfaction judgments and a single life-satisfaction judgments for a total of 18 variables. The data were analyzed with MPLUS (see OSF for syntax and detailed results https://osf.io/vpcfd/ ).

Results

Overall model fit was acceptable, CFI = .988, RMSEA = .023, SRMR = .029.

The first results are the reliability and stability estimates of the five domain satisfactions and global life satisfaction (Table 1). For comparison purposes, the last column shows the estimates based on a panel analyses with annual retests (Schimmack, Krause, Wagner, & Schupp, 2010). The results show fairly consistent stability across domains with the exception of job satisfaction. Job satisfaction is less stable than other domains. The four-year stability is high, but not as high as for personality traits (Schimmack, 2019). A comparison with the panel data shows higher stability, which indicates that some of the error variance in 4-year retest studies is reliable variance that fluctuates over the four-year retest period. However, the key finding is that there is high stability in domain satisfaction judgments and life-satisfaction judgments. which makes it theoretically interesting to examine the relationship between the stable variances in domain satisfaction and life-satisfaction.

ReliabilityStability1Y-StabilityPanel
Job Satisfaction0.620.620.89
Health Satisfaction0.670.790.940.93
Financial Satisfaction0.740.810.950.91
Housing Satisfaction0.660.810.950.89
Leisure Satisfaction0.670.800.950.92
Life Satisfaction0.660.780.940.89

Table 2 examines the influence of top-down processes on domain satisfaction. Results show the factor loadings of domain satisfaction on a common factor that reflects dispositional satisfaction; that is, a general disposition to report higher levels of satisfaction. The results show that somewhere between 30% and 50% of the reliable variance in life-satisfaction judgments is explained by a general disposition factor. While this leaves ample room for domain-specific factors to influence domain satisfaction judgments, the results show a strong top-down influence.

T1T2T3
Job Satisfaction0.690.680.68
Health Satisfaction0.680.660.65
Financial Satisfaction0.600.610.63
Housing Satisfaction0.720.740.76
Leisure Satisfaction0.610.610.61

Table 3 shows the unique contribution of the disposition and the five domains to life-satisfaction concurrently and longitudinally.

DS1-LS1DS1-LS2DS1-LS3DS2-LS2DS2-LS3DS3-LS3
Disposition0.560.590.570.610.590.60
Job 0.140.100.050.170.080.12
Health0.230.220.210.280.270.33
Finances0.340.200.140.240.180.22
Housing0.040.030.030.040.040.06
Leisure0.060.100.060.130.070.09

The first notable finding is that the disposition factor accounts for the lion share of the explained variance in life-satisfaction judgments. The second important finding is that the relationship is very stable over time. The disposition measured at time 1 is an equally good predictor of life-satisfaction at time 1 (r = .56), time 2 (r = .59), and at time 3 (r = .57). This suggests that about one-third of the reliable variance in life-satisfaction judgments reflects a stable disposition to report higher or lower levels of satisfaction.

Regarding domain satisfaction, health is the strongest predictor with correlations between .21 and .33. Finances is the second strongest predictor with correlations between .14 and .34. For health satisfaction there is high stability over time. That is, time 1 health satisfaction predicts time 1 life-satisfaction nearly as well (r = .23) as time 3 life-satisfaction (r = .21). In contrast, financial satisfactions shows a bit more change over time with concurrent correlations at time 1 of r = .34 and a drop to r = .14 for life-satisfaction at time 3. This suggests that changes in financial satisfaction produces changes in life-satisfaction.

Job satisfaction has a weak influence on life-satisfaction with correlations ranging from r = .14 to .05. Like financial satisfaction, there is some evidence that changes in job satisfaction predict changes in life-satisfaction.

Housing and leisure have hardly any influence on life-satisfaction judgments with most relationships being less than .10. There is also no evidence that changes in these domain produce changes in life-satisfaction judgments.

These results show that most of the reliable variance in global life-satisfaction judgments remains unexplained and that a stable disposition accounts for most of the explained variance in life-satisfaction judgments.

Implications for the Validity of Life-Satisfaction Judgments

There are two ways to interpret the results. One interpretation is that is common in the well-being literature and hundreds of studies with the SOEP data is that life-satisfaction judgments are valid measures of well-being. Accordingly, well-being in Germany is determined mostly by a stable disposition to be satisfied. Accordingly, changing actual life-circumstances will have negligible effects on well-being. For example, Nakazato et al. (2011) used the SOEP data to examine the influence of moving on well-being. They found that decreasing housing satisfaction triggered a decision to move and that moving produces lasting increases in housing satisfaction. However, moving had no effect on life-satisfaction. This is not surprising given the present results that housing satisfaction has a negligible influence on life-satisfaction judgments. Thus, we would conclude that people are irrational by investing money in a better house, if we assume that life-satisfaction judgments are a perfectly valid measure of well-being.

The alternative interpretation is that life-satisfaction judgments are not as good as well-being researchers think they are. Rather than reflecting a weighted summary of all important aspects of life, they are based on accessible information that does not include all relevant information. The difference to Schwarz and Strack’s (1999) criticism is that bias is not due to temporarily accessible information (e.g., mood) that makes life-satisfaction judgments unreliable. As demonstrated here and elsewhere, a large portion of the variance in life-satisfaction judgments is stable. The problem is that the stable factors may be biases in life-satisfaction ratings rather than real determinants of well-being.

It is unfortunate that psychologist and other social sciences have neglected proper validation research of a measure that has been used to make major empirical claims about the determinants of well-being, and that this research has been used to make policy recommendation (Diener, Lucas, Schimmack, & Helliwell, 2009). The present results suggest that any policy recommendations based on life-satisfaction ratings alone are premature. It is time to take measurement more seriously and to improve the validity of measuring well-being.