Reliability - Replicability-Index

In this review of Schwarz and Strack’s (1999) “Reports of Subjective Well-Being: Judgmental Processes and Their Methodological Implications”, I present verbatim quotes form their chapter and explain why these statements are misleading or false, and how the authors distort the actual evidence by selectively citing research that supports their claims, while hiding evidence that contradicts their claims. I show that the empirical evidence for the claims made by Schwarz and Strack is weak and biased.

Unfortunately, this chapter has had a strong influence on Daniel Kahneman’s attitude towards life-satisfaction judgments and his fame as a Noble laureate has led many people to believe that life-satisfaction judgments are highly sensitive to the context in which these questions are asked and practically useless for the measurement of well-being. This has led to claims that wealth is not a predictor of well-being, but only a predictor of invalid life-satisfaction judgments (Kahneman et al., 2006) or that the effects of wealth on well-being are limited to low incomes. None of these claims are valid because they rely on the unsupported assumption that life-satisfaction judgments are invalid measures of well-being.

The original quotes are highlighted in bold followed by my comments.

Much of what we know about individuals’ subjective well-being (SWB) is based on self-reports of happiness and life satisfaction.

True. The reason is that sociologists developed brief, single-item measures of well-being that could be included easily in large surveys such as the World Value Survey, the German Socio-Economic Panel, or the US General Social Survey. As a result, there is a wealth of information about life-satisfaction judgments that transcends scientific disciplines. The main contribution of social psychologists to this research program that examines how social factors influence human well-being has been to dismiss the results based on claims that the measure of well-being is invalid.

As Angus Campbell (1981) noted, the “use of these measures is based on the assumption that all the countless experiences people go through from day to day add to . . . global feelings of well-being, that these feelings remain relatively constant over extended periods, and that people can describe them with candor and accuracy”

Half true. Like all self-report measures, the validity of life-satisfaction judgments depends on respondents’ ability and willingness to provide accurate information. However, it is not correct to suggest that life-satisfaction judgments assume that feelings remain constant over extended periods of time or that respondents have to rely on feelings to answer questions about their satisfaction with life. There is a long tradition in the well-being literature to distinguish cognitive measures of well-being like Cantrill’s ladder and affective measures that focus on affective experiences in the recent past like Bradburn’s affect balance scale. The key assumption underlying life-satisfaction judgments is that respondents have chronically accessible information about their lives or can accurately estimate the frequency of positive and negative feelings. It is not necessary that the feelings are stable.

These assumptions have increasingly been drawn into question, however, as the empirical work has progressed.

It is not clear which assumptions have been drawn into question. Are people unwilling to report their well-being, are they unable to do so, or are feelings not as stable as they are assumed to be? Moreover, the statement ignores a large literature that has demonstrated validity of well-being measures going back to the 1930s (see Diener et al., 2009; Scheider & Schimmack, 2009, for a meta-analysis).

First, the relationship between individuals’ experiences and objective conditions of life and their subjective sense of well-being is often weak and sometimes counter-intuitive. Most objective life circumstances account for less than 5 percent of the variance in measures of SWB, and the combination of the circumstances in a dozen domains of life does not account for more than 10 percent (Andrews and Whithey 1976; Kammann; 1982; for a review, see Argyle, this volume).

First, it is not clear what weak means. How strong should the correlation between objective conditions of life and subjective well-being be? For example, should marital status be a strong predictor of happiness? Maye it matters more whether people are happily married or unhappily married than whether they are married or single? Second, there is no explanation for the claim that these relationships are counter-intuitive. Employment, wealth, and marriage are positively related to well-being as most people would expect. The only finding in the literature that may be considered counter-intuitive is that having children does not notably increase well-being and sometimes decreases well-being. However, this does not mean well-being measures are false, it may mean that people’s intuitions about the effects of life-events on well-being are wrong. If intuitions would always be correct, we would not need scientific studies of determinants of well-being.

Second, measures of SWB have low test-retest reliabilities, usually hovering around .40, and not exceeding .60 when the same question is asked twice during the same one-hour interview (Andrews and Whithey 1976; Glatzer 1984).

This argument ignores that responses to a single self-report item often have a large amount of random measurement error, unless participants can recall their previous answer. The typical reliability of a single-item self-report measure is about r =.6 +/- .2. There is nothing unique about the results reported here for well-being measures. Moreover, the authors blatantly ignore evidence that scales with multiple items like Diener’s Satisfaction with Life Scale have retest correlations over r = .8 over a one-month period (see Schimmack & Oishi, 2005, for a meta-analysis). Thus, this statement is misleading and factually incorrect.

Moreover, these measures are extremely sensitive to contextual influences.

This claim is inconsistent with the high retest correlation over periods of one month. Moreover, survey researchers have conducted numerous studies in which they examined the influence of the survey context on well-being measures and a meta-analysis of these studies shows only a small effect of previous items on these judgments and the pattern of results is not consistent across studies (see Schimmack & Oishi, 2005 for a meta-analysis).

Thus, minor events, such as finding a dime (Schwarz 1987) or the outcome of soccer games (Schwarz et al. 1987), may profoundly affect reported satisfaction with one’s life as a whole.

As I will show, the chapter makes many statements about what may happen. For example, finding a dime may profoundly affect well-being report or it may not have any effect on these judgments. These statements are correct because well-being reports can be made in many different ways. The real question is how these judgments are made when well-being measures are used to measure well-being. Experimental studies that manipulate the situation cannot answer this question because they purposefully create the situation to demonstrate that respondents may use mood (when mood is manipulated) or may use information that is temporarily accessible, when relevant information is made salient and temporarily accessible. The processes underlying judgments in these experiments may reveal influences on life-satisfaction judgment in a real survey context or they may reveal processes that do not occur under normal circumstances.

Most important, however, the reports are a function of the research instrument and are strongly influenced by the content of preceding questions, the nature of the response alternatives, and other “technical” aspects of questionnaire design (Schwarz and Strack 1991a, 1991b).

We can get different answers to different questions. The item “So far, I have gotten everything I wanted in life” may be answered differently than the item “I feel good about my life, these days.” If so, it is important to examine which of these items is a better measure of well-being. It does not imply that all well-being items are flawed. The same logic applies to the response format. If some response formats produce different results than others, it is important to determine which response formats are better for the measurement of well-being. Last, but not least, the claim that well-being reports are “strongly influenced by the content of preceding questions” is blatantly false. A meta-analysis shows that strong effects were only observed in two studies by Strack, but that other studies find much weaker or no effects (see Schimmack & Oishi, 2005, for a meta-analysis).

Such findings are difficult to reconcile with the assumption that subjective social indicators directly reflect stable inner states of well-being (Campbell 1981) or that the reports are based on careful assessments of one’s objective conditions in light of one’s aspirations (Glatzer and Zapf 1984). Instead, the findings suggest that reports of SWB are better conceptualized as the result of a judgment process that is highly context-dependent.

Indeed. A selective and bias list of evidence is inconsistent with the hypothesis that well-being reports are valid measures of well-being, but this only shows that the authors misrepresent the evidence, not that well-being reports lack validity, which was carefully examined in Andrew & Withey’s book (1976), which the authors cite without mentioning the evidence presented in the book for the usefulness of well-being reports.

[A PREVIEW]

Not surprisingly, individuals may draw on a wide variety of information when asked to assess the subjective quality of their lives.

Indeed. This means that it is impossible to generalize from an artificial context created in an experiment to the normal conditions of a well-being survey because respondents may use different information in the experiment than in the naturalistic context. The experiment may led respondents to use information that they normally would not use.

[USING INFORMATION ABOUT ONE’S OWN LIFE: INTRAINDIVIDUAL COMPARISONS]

Comparison-based evaluative judgments require a mental representation of the object of judgment, commonly called a target, as well as a mental representation of a relevant standard to which the target can be compared.

True. In fact, Cantril’s ladder explicitly asks respondents to compare their actual life to the best possible life they could have and the worst possible life they could have. We can think about these possible lives as imaginary intrapersonal comparisons.

When asked, “Taking all things together, how would you say things are these days?” respondents are ideally assumed to review the myriad of relevant aspects of their lives and to integrate them into a mental representation of their life as a whole.”

True, this is the assumption underlying the use of well-being reports as measures of well-being.

In reality, however, individuals rarely retrieve all information that may be relevant to a judgment

This is also true. It is impossible to retrieve ALL of the relevant information. But it is possible that respondents retrieve most of the relevant information or enough relevant information to make these judgments valid. We do not require 100% validity for measures to be useful.

Instead, they truncate the search process as soon as enough information has come to mind to form a judgment with sufficient subjective certainty (Bodenhausen and Wyer 1987).

This is also plausible. The question is what would be the criterion for sufficient certainty for well-being judgments and whether this level of certainty is reached without retrieval of relevant information. For example, if I have to report how satisfied I am with my life overall and I am thinking first about my marriage would I stop there or would I think that my overall life is more than my marriage and also think about my work? Depending on the answer to this question, well-being judgments may be more or less valid.

Hence, the judgment is based on the information that is most accessible at that point in time. In general, the accessibility of information depends on the recency and frequency of its use (for a review, see Higgins 1996).

This also makes sense. A sick person may think about their health. A person in a happy marriage may think about their loving wife, and a person with financial problems may think about their problems paying bills. Any life domain that is particularly salient in a person’s life is also likely to be a salient when they are confronted with a life-satisfaction question. However, we still do not know which information people will use and how much information they will use before they consider their judgment sufficiently accurate to provide an answer. Would they use just one salient temporarily accessible piece of information or would be continue to look for more information?

Information that has just been used-for example, to answer a preceding question in the questionnaire-is particularly likely to come to mind later on, although only for a limited time.

Wait a second. Higgins emphasized that accessibility is driven by recency and frequency (!) of use. Individual who are going through a divorce or cancer treatment have probably thought frequently about this aspect of their lives. A single question about their satisfaction with their recreational activities may not make them judge their lives based on their hobbies. Thus, it does not follow from Higgins’s work on accessibility that preceding items have a strong influence on well-being judgments.

This temporarily accessible information is the basis of most context effects in survey measurement and results in variability in the judgment when the same question is asked at different times (see Schwarz and Strack 1991b; Strack 1994a; Sudman, Bradburn, and Schwarz 1996, chs. 3 to 5; Tourangeau and Rasinski 1988)

Once more, the evidence for these temporary accessibility effects is weak and it is not clear why well-being judgments would be highly stable over time, if they were driven by making irrelevant information temporarily accessible. In fact, the evidence is more consistent with Higgins’ suggests that frequent of use influences well-being judgments. Life domains that are salient to individuals are likely to influence life-satisfaction judgments because they are chronically accessible even if other information is temporarily accessible or primed by preceding questions.

Other information, however, may come to mind because it is used frequently-for example, because it relates to the respondent’s current concerns (Klinger 1977) or life tasks (Cantor and Sanderson, this volume). Such chronically accessible information reflects important aspects of respondents’ lives and provides for some stability in judgments over time.

Indeed, but look at the wording. “This temporarily accessible information IS the basis of most context effects in survey measurement” vs. “Other information, however, MAY come to mind.” The wording is not balanced and it does not match the evidence that most of the variation in well-being reports across individuals is stable over time and only a small proportion of the variance changes systematically over time. The wording is an example of how some scientists create the illusion of a balanced literature review while pushing their biased opinions.

As an example, consider experiments on question order. Strack, Martin, and Schvvarz (1988) observed that dating frequency was unrelated to students’ life satisfaction when a general satisfaction question preceded a question about the respondent’s dating frequency, r = – 12. Yet reversing the question order increased the correlation to r = .66. Similarly, marital satisfaction correlated with general life satisfaction r = .32 when the general question preceded the marital one in another study (Schwarz, Strack, and Mai 1991). Yet reversing the question order again increased this correlation to r = .67.

The studies that are cited here are not representative. They show the strongest item-order effects and the effects are much stronger than the meta-analytic average (Schimmack & Oishi, 2005). Both studies were conducted by Strack. Thus, these examples are at best considered examples what might happen under very specific conditions that differ from other specific conditions where the effect was much smaller. Moreover, it is not clear why dating frequency should be a strong positive predictor of life-satisfaction. Why is my life better when I have a lot of dates as opposed to somebody who is in a steady relationship, and we would not expect a married respondent with lots of dates to be happy with their marriage. The difference between r = .32 and r = .66 is strong, but it was obtained with small samples and it is common that small samples overestimate effect sizes. In fact, large survey studies show much weaker effects. In short, by focusing on these two examples, the authors create the illusion that strong effects of preceding items are common and that these studies are just an example of these effects. In reality, these are the only two studies with extremely and unusually strong effects that are not representative of the literature. The selective use of evidence is another example of unscientific practices that undermine a cumulative science.

Findings of this type indicate that preceding questions may bring information to mind that respondents would otherwise not consider.

Yes, it may happen, but we do not know under what specific circumstances it happens. At present, the only predictor of these strong effects is that the studies were conducted by Fritz Strack. Nobody else has reported such strong effects.

If this information is included in the representation that the respondent forms of his or her life, the result is an assimilation effect, as reflected in increased correlations. Thus, we would draw very different inferences about the impact of dating frequency or marital satisfaction on overall SWB, depending on the order in which the questions are asked.

Now the authors extrapolate from extreme examples and discuss possible theoretical implications if this were a consistent and replicable finding. “We would draw different inferences.” True. If this were a replicable finding and we would ask about specific life domains first, we would end up with false inferences about the importance of dating and marriage for life-satisfaction. However, it is irrelevant what follows logically from a false assumption (if Daniel Kahneman had not won the Noble price, it would be widely accepted that money buys some happiness). Second, it is possible to ask global life-satisfaction question first without making information about specific aspects of life temporarily salient. This simple procedure would ensure that well-being reports are more strongly influenced on chronically accessible information that reflects people’s life concerns. After all, participants may draw on chronically accessible information or temporarily accessible information and if no relevant information was made temporarily accessible, respondents will use chronically accessible information.

Theoretically, the impact of a given piece of accessible information increases with its extremity and decreases with the amount and extremity of other information that is temporarily or chronically accessible at the time of judgment (see Schwarz and Bless 1992a). To test this assumption, Schwarz, Strack, and Mai ( 1991) asked respondents about their job satisfaction, leisure time satisfaction, and marital satisfaction prior to assessing their general life satisfaction, thus rendering a more varied set of information accessible. In this case, the correlation between marital satisfaction and life satisfaction increased from r = .32 (in the general-marital satisfaction order) to r = .46, yet this increase was less pronounced than the r = .67 observed when marital satisfaction was the only specific domain addressed.

This finding also suggests that strong effects of temporarily accessible information are highly context dependent. Just asking for satisfaction with several life-domains reduces the item order effect and with the small samples in Schwarz et al. (1991), the difference between r = .32 and r = .46 is not statistically significant, meaning it could be a chance finding. So, their own research suggests that temporarily accessible information may typically have a small effect on life-satisfaction and this conclusion would be consistent with the evidence in the literature.

In light of these findings, it is important to highlight some limits for the emergence of question-order effects. First, question-order effects of the type discussed here are to be expected only when answering a preceding question increases the temporary accessibility of information that is not chronically accessible anyway… Hence, chronically accessible current concerns would limit the size of any emerging effect, and the more they do so, the more extreme the implications of these concerns are.

Here the authors acknowledge that there are theoretical reasons why item-order effects should typically not have a strong influence on well-being reports. One reason is that some information such as marital satisfaction is likely to be used even if marriage is not made salient by a preceding question. It is therefore, not clear why marital satisfaction would produce a big increase from r = .32 to r = .67, as this would imply that numerous respondents do not consider their marriage when they made the judgment and it would explain why other studies found much weaker effects for item-order effects with marital satisfaction and higher correlations between marital satisfaction and life-satisfaction than r =.32. However, it is interesting that this important theoretical point is offered only as a qualification after presenting evidence from two studies that did show strong item-order effects. If the argument had been presented first, the question would arise why these studies did produce strong item-order effects and it would be evident that it is impossible to generalize from these specific studies to well-being reports in general.

[CONSERVATION NORMS]

“Complicating things further, information rendered accessible by a preceding question may not always be used.”

How is this complicating things further? If there are ways to communicate to respondents that they should not be influenced by previous items (e.g., “Now on to another topic” or “take a moment to think about the most important aspects of your life”) and this makes context effects disappear, why don’t we just use the proper conversational norms to avoid these undesirable effects? And some surveys actually do this and we would therefore expect that they elicit valid reports of well-being that are not based on responses to previous questions in the survey.

In the above studies (Strack et al. 1988; Schwarz ct al. 1991), the conversational norm of nonrcdundancy was evoked by a joint lead-in that informed respondents that they would now be asked two questions pertaining to their well-being. Following this lead-in, they first answered the specific question (about dating frequency or marital satisfaction) and subsequently reported their general life satisfaction. In this case, the previously observed correlations of r = .66 between dating frequency and life satisfaction, or of r = .67 between marital satisfaction and life satisfaction, dropped to r = -15 and .18, respectively. Thus, the same question order resulted in dramatically different correlations, depending on the elicitation of the conversational norm of nonredundancy.

The only evidence for these effects comes from a couple of studies by the authors. Even if these results hold, they suggest that it should be possible to use conversational norms to get the same results for both item-orders if conversational norms suggest that participants should use all relevant chronically accessible information. However, the authors did not conduct such as study. One reason may be that the prediction would be that there is no effect and researchers are only interested in using manipulations that show effects so that they can reject the null-hypothesis. Another explanation could be that Schwarz and Strack’s program of research on well-being reports was built on the heuristics and bias program in social psychology that is only interested in showing biases and ignores evidence for accuracy (Funder, 1987). The only result that is deemed relevant and worthy of publishing are experiments that successfully created a bias in judgments. The problem with this approach is that it cannot reveal that these judgments are also accurate and can be used as valid measures of well-being.

[SUMMARY]

Judgments arc based on the subset of potentially applicable information that is chronically or temporarily accessible at the time.

Yes, it is not clear what else the judgments could be based on.

Accessible information, however, may not be used when its repeated use would violate conversational norms of nonredundancy.

Interestingly this statement would imply that participants are not influenced by subtle information (priming). The information has to be consciously accessible to determine whether it is relevant and only accessible information that is considered relevant is assumed to influence judgments. This also implies that making information accessible that is not considered relevant will not have an influence on well-being reports. For example, asking people about their satisfaction with the weather or the performance of a local sports team does not lead to a strong influence of this information on life-satisfaction judgments because most people do not consider this information relevant (Schimmack et al., 2002). Once more, it is not clear how well-being reports can be highly context dependent, if information is carefully screened for relevance and responses are only made when sufficient relevant information was retrieved.

[MENTALCONSTRUALSOFONE’S LIFE AND A RELEVANT STANDARD: WHAT IS, WAS,WILL BE, AND MIGHT HAVE BEEN]

Suppose that an extremely positive (or negative) life event comes to mind. If this event is included in the temporary representation of the target “my life now,” it results in a more positive (negative) assessment of SWB, reflecting an assimilation effect, as observed in an increased correlation in the studies discussed earlier. However, the same event may also be used in constructing a standard of comparison, resulting in a contrast efficient: compared to an extremely positive (negative) event, one’s life in general may seem relatively bland (or pretty benign). These opposite influences of the same event are sometimes referred to as endowment (assimilation) and contrast effects (Tversky. and Griffin 1991).

This is certainly a possibility, but it not necessarily limited to temporarily accessible information. A period in an individuals’ life may be evaluated relative to other periods in a person’s life. In this way, subjective well-being is subjective. Objectively identical lives can be evaluated differently because past experiences created different ideals or comparison standards (see Cantril’s early work on human concerns). This may happen for chronically accessible information just as much as for temporarily accessible information and it does not imply that well-being reports are invalid; it just shows that they are subjective.

Strack, Schwarz, and Gschneidingcr (1985, Experiment 1) asked respondents to report either three positive or three negative recent life events, thus rendering these events temporarily accessible. As shown in the top panel of Table 1, these respondents reported higher current life satisfaction after they recalled three positive rather than negative recent events. Other respondents, however, had to recall events that happened at least five years before. These respondents reported higher current life satisfaction after recalling negative rather than positive past events.

This finding shows that contrast effects can occur. However, it is important to note that these context effects were created by the experimental manipulation. Participants were asked to recall events from 5 years ago. In the naturalistic scenario, where participants are simply asked to report “how is your life these days” participants are unlikely to suddenly recall events from 5 years ago. Similarly, if you were asked about your happiness with your last vacation you are unlikely to recall earlier vacations and contrast your most recent vacation with it. Indeed, Suh et al. (1996) showed that life-satisfaction judgments are influenced by recent events and that older events do not have an effect. They found no evidence for contrast effects when participants were not asked to recall events from the distant past. So, this research shows what can happen in a specific context where participants were to recall extreme negative or positive from their past, but without prompting by an experimenter this context hardly ever would occur. Thus, this study has no ecological or external validity for the question how participants actually make life-satisfaction judgments.

These experimental results are consistent with correlational data (Elder 1974) indicating that U.S. senior citizens, the “children of the Great Depression,” are more likely to report high subjective well-being the more they suffered under adverse economic conditions when they were adolescents.

This finding again does not mean that elderly US Americans who suffered more during the Great Depression were actively thinking about the Great Depression when they answered questions about their well-being. It is more likely that they may have lower aspirations and expectations from life (see Easterlin). This means that we can interpret this result in many ways. One explanation would be that well-being judgments are subjective and that cultural and historic events can shape individuals’ evaluation standards of their lives.

[SUMMARY]

In combination, the reviewed research illustrates that the same life event may affect judgments of SWB in opposite directions, depending on its use in the construction of the target “my life now” and of a relevant standard of comparison.

Again, the word “may” makes this statement true. Many things may happen, but that tells us very little about what actually is happening when respondents report on their well-being. How past negative events can become positive events (a divorce was terrible, but it feels like a blessing after being happily remarried, etc.) and positive events can become negative events (e.g., the dream of getting tenure comes true, but doing research for life happens to be less fulfilling than one anticipated) is an interesting topic for well-being research, but none of these evaluative reversals undermine the usefulness of well-being measures. In fact, they are needed to reveal that subjective evaluations have changed and that past evaluations may have carry over effects on future evaluations.

It therefore comes as no surprise that the relationship between life events and judgments of SWB is typically weak. Today’s disaster can become tomorrow’s standard, making it impossible to predict SWB without a consideration of the mental processes that determine the use of accessible information.

Actually, the relationship between life-events and well-being is not weak. Lottery winners are happier and accident victims are unhappier. And cross-cultural research shows that people do not simply get used to terrible life circumstances. Starving is painful. It does not become a normal standard for well-being reports on day 2 or 3. Most of the time, past events simply lose importance and are replaced by new events and well-being measures are meant to cover a certain life period rather than an individual’s whole life from birth to death. And because subjective evaluations are not just objective reports of life-events, they depend on mental processes. The problem is that a research program that uses experimental manipulations does not tell us about the mental processes that are underlying life-satisfaction judgments when participants are not manipulated.

[WHAT MIGHT HAVE BEEN: COUNTERFACTUALS]

Counterfactual thinking can influence affect and subjective well-being in several ways (see Roese 1997; Roese and Olson 1995b).

Yes, it can, it may, and it might, but the real question is whether it does influence well-being reports and if so, how it influences these reports.

For example, winners of Olympic bronze medals reported being more satisfied than silver medalists (Medvec, Madey, and Gilovich 1995), presumably because for winners of bronze medals, it is easier to imagine having won no medal at all (a “downward counterfactual”), while for winners of silver medals, it is easier to imagine having won the gold medal (an “upward counterfactual”).

This is not an accurate summary of the article that contained three studies. Study 1 used ratings of video clips of Olympic medalists immediately after the event (23 silver & 18 bronze medalists). The study showed a strong effect that bronze medalists were happier than silver medalists, F(1,72) = 18.98. The authors also noted that in some events the silver medal means that an athlete lost a finals match, whereas in other events they just placed second in a field of 8 or more athletes. An analysis that excluded final matches showed weaker evidence for the effect, F(1,58) = 6.70. Most important, this study did not include subjective reports of satisfaction as claimed in the review article. Study 2 examined interviews of 13 silver and 9 bronze medalists. Participants in Study 2 rated interviews of silver medal athletes to contain more counter-factual statements (e.g., I almost), t(20) = 2.37, p< ,03. Importantly, no results regarding satisfaction are reported. Study 3 actually recruited athletes for a study and had a larger sample size (N = 115). Participants were interviewed by the experimenters after they won a silver or bronze medal at an athletic completion (not the Olympics). The description of the procedure is presented verbatim here.

Procedure. The athletes were approached individually following their events and asked to rate their thoughts about their performance on the same 10-point scale used in Study 2. Specifically, they were asked to rate the extent to which they were concerned with thoughts of “At least I. . .” (1) versus “/ almost” (10). Special effort was made to ensure that the athletes understood the scale before making their ratings. This was accomplished by mentioning how athletes might have different thoughts following an athletic competition, ranging from “I almost did better” to “at least I did this well.”

What is most puzzling about this study is why the experiments seemingly did not ask questions about emotions or satisfaction with performance. It would have taken only a couple of questions to obtain reports that speak to the question of the article whether winning a silver medal is subjectively better than winning a bronze medal. Alas, these questions are missing. The only result from Study 3 is “as predicted, silver medalists’ thoughts following the competition were more focused on “I almost” than were bronze medalists’. Silver medalists described their thoughts with a mean rating of 6.8 (SD = 2.2), whereas bronze medalists assigned their thoughts an average rating of 5.7 (SD = 2.7), t(113) = 2.4, p < .02.

In sum, there is no evidence in this study that winning an Olympic silver medal or any other silver medal for that matter makes athletes less happy than winning a bronze medal. The misrepresentation of the original study by Schwarz and Strack is another example of unscientific practices that can lead to the fabrication of false facts that are difficult to correct and can have a lasting negative effect on the creation of a cumulative science.

In summary, judgments of SWB can be profoundly influenced by mental constructions of what might have been.

This statement is blatantly false. The cited study on medal winners does not justify this claim and thre is no scientific basis for the claim that these effects are profound.

In combination, the discussion in the preceding sections suggests that nearly any aspect of one’s life can be used in constructing representations of one’s “life now” or a relevant standard, resulting in many counterintuitive findings.

A collection of selective findings that were obtained using different experimental procedures does not mean that well-being reports obtained under naturalistic conditions produce many counterintuitive findings, nor is there any evidence that they do produce many counterintuitive findings. This statement lacks any empirical foundation and is inconsistent with other findings in the well-being literature.

Common sense suggests that misery that lasts for years is worse than misery that lasts only for a few days.

Indeed. Extended periods of severed depression can drive some people to attempt suicide. A week with the flu does not. Consistent with this common sense observation, well-being reports of depressed people are much lower than those of other people, once more showing that well-being reports often produce results that are consistent with intuitions.

Recent research suggests, however, that people may largely neglect the duration of the episode, focusing instead on two discrete data points, namely, its most intense hedonic moment (“peak”) and its ending (Fredrickson and Kahneman 1993; Varey and Kahneman 1992). Hence, episodes whose worst (or best) moments and endings are of comparable intensity are evaluated as equally (un)pleasant, independent of their duration (for a more detailed discussion, see Kahneman, this volume).

Yes, but this research focusses on brief episodes with a single emotional event. It is interesting that duration of episodes seems to matter very little, but life is a complex series of events and episodes. Having sex for 20 minutes or 30 minutes may not matter, but having sex regularly, at least once a week, does seem to matter for couples’ well-being. As Diener et al. (1985) noted, it is the frequency, not the intensity (or duration) of positive and negative events in people’s lives that matters.

Although the data are restricted to episodes of short duration, it is tempting to speculate about the possible impact of duration neglect on the evaluation of more extended episodes.

Yes, interesting, but this statement clearly indicates that the research on duration neglect is not directly relevant for well-being reports.

Moreover, retrospective evaluations should crucially depend on the hedonic value experienced at the end of the respective episode.

This is a prediction not a fact. I have actually examined this question and found that frequency of positive and negative events has a stronger influence on satisfaction judgments with a day than how respondents felt at the end of the day when they reported daily satisfaction.

[SUMMARY]

As our selective review illustrates, judgments of SWB are not a direct function of one’s objective conditions of life and the hedonic value of one’s experiences.

First, it is great that the authors acknowledge here that their review is selective. Second, we do not need a review to know that subjective well-being is not a direct function of objective life conditions. The whole point of subjective well-being reports is to allow respondents to evaluate these events from their own subjective point of view. And finally, at no point has this selective review shown that these reports do not depend on the hedonic value of one’s experiences. In fact, measures of hedonic experiences are strong predictors of life-satisfaction judgments (Schimmack et al., 2002; Lucas et al., 1996; Zou et al., 2012).

Rather they crucially depend on the information that is accessible at the time of judgment and how this information is used in constructing mental representations of the to-be-evaluated episode and a relevant standard.

This factual statement cannot be supported by a selective review of the literature. You cannot say, my selective review of creationist literature shows that evolution theory is wrong. You can say that a selective review of creationist literature would suggest that evolution theory is wrong, but you cannot say that it is wrong. To make scientific statements about what is (highly probable to be) true and what is (highly probable to be) false, you need to conduct a review of the evidence that is not selective and not biased.

As a result of these construal processes, judgments of SWB are highly malleable and difficult to predict on the basis of objective conditions.

This is not correct. Evaluations do not directly depend on objective conditions. This is not a feature of well-being reports but a feature of evaluations. At the same time, the construal processes that related objective events to subjective well-being are systematic, predictable, and depend on chronically accessible and stable information. Well-being reports are highly correlated with objective characteristics of nations, bereavement, unemployment, and divorce have negative effects on well-being and winning the lottery, marriage, and remarriage have positive effects on well-being. Schwarz and Strack are fabricating facts. This is not considered fraud. Only data manipulation and fabricating data is considered scientific fraud, but this does not mean that fabricated facts are less harmful than fabricated data. Science can only provide a better understanding if it is based on empirically verified and replicable facts. Simply stating ‘judgments of SWB are difficult to predict” without providing any evidence for this claim is unscientific.

[USING INFORMATION ABOUT OTHERS: SOCIAL COMPARISONS]

The causal impact of comparison processes has been well supported in laboratory experiments that exposed respondents to relevant comparison standards…For example, Strack and his colleagues (1990) observed that the mere presence of a handicapped confederate was sufficient to increase reported SWB under self-administered questionnaire conditions, presumably because the confederate served as a salient standard of comparison….As this discussion indicates, the impact of social comparison processes on SWB is more complex than early research suggested. As far as judgments of global SWB are concerned, we can expect that exposure to someone who is less well off will usually result in more positive-and to someone who is better off in more negative assessments of one’s own life. However, information about the other’s situation will not always be used as a comparison standard.

The whole section about social comparison does not really address the question of the influence of social comparison effects on well-being reports. Only a single study with a small sample is used to provide evidence that respondents may engage in social comparison processes when they report their well-being. The danger of this occurring in a naturalistic context is rather slim. Even in face-to-face interviews, the respondent is likely to have answered several questions about themselves and it seems far-fetched that they would suddenly think about the interviewer as a relevant comparison standard, especially if the interviewer does not have a salient characteristic like a disability that may be considered relevant. Once more the authors generalize from one very specific laboratory experiment to the naturalistic context in which SWB reports are normally made without considering the possibility that the experimental results are highly contextual sensitive and do not reveal how respondents normally judge their lives.

[Standards Provided by the Social Environment]

In combination, these examples draw attention to the possibility that salient comparison standards in one’s immediate environment, as well as socially shared norms, may constrain the impact of fortuitous temporary influences. At present, the interplay of chronically and temporarily accessible standards on judgments of SWB has received little attention. The complexities that are likely to result from this interplay provide a promising avenue for future research.

Here the authors acknowledge that their program of research is limited and fails to address how respondents use chronically accessible information. They suggest that this is a promising avenue for future research, but they fail to acknowledge why they haven’t conducted studies that start to address this question. The reason is that their research program with experimental manipulations of the situation doesn’t allow to study the use of chronically accessible information. The use of information that by definition comes to mind spontaneously independent of researchers’ experimental manipulations is a blind-spot of the experimental approach.

[Interindividual Standards Implied by the Research Instrument]

Finally, we extend our look at the influences of the research instrument by addressing a frequently overlooked source of temporarily accessible comparison information…As numerous studies have indicated (for a review, see Schwarz 1996), respondents assume that the list of response alternatives reflects the researcher’s knowledge of the distribution of the behavior: they assume that the “average” or “usual” behavioral frequency is represented by values in the middle range of the scale, and that the extremes of the scale correspond to the extremes of the distribution. Accordingly, they use the range of the response alternatives as a frame of reference in estimating their own behavioral frequently, resulting in different estimates of their own behavioral frequency, as shown in table 4.2. More important for our present purposes, they further extract comparison information from their low location on the scale…Similar findings have been obtained with regard to the frequency of physical symptoms and health satisfaction (Schwarz and Scheuring 1992), the frequency of sexual behaviors and marital satisfaction (Schwarz and Scheuring 1988), and various consumer behaviors (Menon, Raghubir, and Schwarz 1995).

One study is in German and not available. I examined the study by Schwarz and Scheurig (1988) in European Journal of Social Psychology. Study 1 had four conditions with n = 12 or 13 per cell (N = 51). The response format varied frequencies so that having sex or masturbating once a week was either a high or low frequency occurrence. Subsequently, participants reported their relationship satisfaction. The relationship satisfaction ratings were analyzed with an ANOVA. “Analysis of variance indicates a marginally reliable interaction of both experimental variables, F(1,43) = 2.95, p < 0.10, and no main effects.” The result is not significant by conventional standards and the degrees of freedom show that some participants were excluded from this analysis without further mentioning of this fact. Study 2 manipulated the response format for frequency of sex and masturbation within subject. That is, all subjects were asked to rate frequencies of both behaviors in four different combinations. There were n = 16 per cell, N = 64. No ANOVA is reported presumably because it was not significant. However, a PLANNED contrast between the high sex/low masturbation and the low/sex high masturbation group showed a just significant result, t(58) = 2.17, p = .034. Again, the degrees of freedom do not match sample size. In conclusion, the evidence that subtle manipulations of response formats can lead to social comparison processes that influence well-being reports is not conclusive. Replication studies with larger samples would be needed to show that these effects are replicable and to determine how strong these effects are.

In combination, they illustrate that response alternatives convey highly salient comparison standards that may profoundly affect subsequent evaluative judgments.

Once, more the word “may” makes the statement true in a trivial sense that many things may happen. However, there is no evidence that these effects actually have profound effects on well-being reports, and the existing studies show statistically weak evidence and provide no information about the magnitude of these effects.

Researchers are therefore well advised to assess information about respondents’ behaviors or objective conditions in an open-response format, thus avoiding the introduction of comparison information that respondents would not draw on in the absence of the research instrument.

There is no evidence that this would improve the validity of frequency reports and research on sexual frequency shows similar results with open and closed measures of sexual frequency (Muise et al., 2016).

[SUMMARY]

In summary, the use of interindividual comparison information follows the principle of cognitive accessibility that WC have highlighted in our discussion of intraindividual comparisons. Individuals often draw on the comparison information that is rendered temporarily accessible by the research instrument or the social context in which they form the judgment, although chronically accessible standards may attenuate the impact of temporarily accessible information.

The statement that people often rely on interpersonal comparison standards is not justified by the research. By design, experiments that manipulate one type of information and make it salient cannot determine how often participants use this type of information when it is not made salient.

[THE IMPACT OF MOOD STATES]

In the preceding sections, we considered how respondents use information about their own lives or the lives of others in comparison-based evaluation strategies. However, judgments of well-being are a function not only of what one thinks about but also of how one feels at the time of judgment.

Earlier, the authors stated that respondents are likely to use a minimum of information that is deemed sufficient. “Instead, they truncate the search process as soon as enough information has come to mind to form a judgment with sufficient subjective certainty (Bodenhausen and Wyer 1987)” Now we are supposed to believe that they use intrapersonal and interpersonal information that is temporarily and chronically accessible and their feelings. That is a lot of information and it is not clear how all of this information is combined into a single judgment. A more parsimonious explanation for the host of findings is that each experiment carefully created a context that made respondents use the information that the experimenters wanted respondents to use to confirm the hypothesis that they use this information. The problem is that this only shows that a particular source of information may be used in one particular context. It does not mean that all of these sources of information are used and need to be integrated into a single judgment under naturalistic conditions. The program of research simply fails to address the question which information respondents actually use when they are asked to judge their well-being in a normal context.

A wide range of experimental data confirms this intuition. Finding a dime on a copy machine (Schwarz 1987), spending time in a pleasant rather than an unpleasant room (Schwarz ct al. 1987, Expcrimcnt 2), or watching the German soccer team win rather than lose a championship game (Schwarz et al. 1987, Experiment 1) all resulted in increased reports of happiness and satisfaction with one’s life as a whole…Experimental evidence supports this assumption. For example, Schwarz and Clore (1983, Experiment 2) called respondents on sunny or rainy days and assessed reports of SWB in telephone interviews. As expected, respondents reported being in a better mood, and being happier and more satisfied with their life as a whole, on sunny rather than on rainy days. Not so, however, when respondents’ attention was subtly drawn to the weather as a plausible cause of their current feelings.

The problem is that all of the cited studies were conducted by Schwarz and that other studies that produced different results are not mentioned. The famous weather study has recently been called into question. However, the weather effect on life-satisfaction judgments is not ideal because weather effects on mood are not very strong either. Respondents in sunny California do not report higher life-satisfaction than respondents in Ohio (Schkade & Kahneman), and several large scale studies have now failed to replicate the famous weather effect on well-being reports (Lucas & Lawless, 2013; Schmiedeberg, 2014).

On theoretical grounds, we may assume that people are more likely to use the simplifying strategy of consulting their affective state the more burdensome it would be to form a judgment on the basis of comparison information.

Here it is not clear why it would be burdensome to make global life-satisfaction judgments. The previous chapters suggested that respondents have access to large amount of chronically and temporarily information that they apparently used in the previous studies. Suddenly, it is claimed that retrieving relevant information is too hard and mood is used. It is not clear why respondents would consider their current mood sufficient to evaluate their lives, especially if inconsistent accessible information also comes to mind.

Note in this regard that evaluations of general life satisfaction pose an extremely complex task that requires a large number of comparisons along many dimensions with ill-defined criteria and the subsequent integration of the results of these comparisons into one composite judgment. Evaluations of specific life domains, on the other hand, are often less complex.

If evaluations of specific life domains are less complex and global questions are just an average of specific domains, it is not clear why it would be so difficult to evaluate satisfaction in a few important life domains (health, family, work) and integrate this information. The hypothesis that mood is only used as a heuristic for global well-being reports also suggests that it would be possible to avoid the use of this heuristic by asking participants to report satisfaction with specific life domains. As these questions are supposed to be easier to answer, participants would not use mood. Moreover, preceding items are less likely to make information accessible that is relevant for a specific life domain. For example, a dating question is irrelevant for satisfaction with academic or health satisfaction. Thus, participants are most likely to draw on chronically accessible information that is relevant for answering a question about satisfaction with specific domains. It follows that averages of domain satisfaction judgments would be more valid than global judgments, if participants were relying on mood to judge global judgments. For example, finding a dime would make people judge their lives more positively, but not their health, social relationships, and income. Thus, many of the alleged problems with global well-being reports could be avoided by asking for domain specific reports and then aggregate them (Andrews & Whithey, 1976; Zou et al., 2013).

If judgments of general well-being are based on respondents’ affective state, whereas judgments of domain satisfaction are based on comparison processes, it is conceivable that the same event may influence evaluations of one’s life as a whole and evaluations of specific domains in opposite directions. For example, an extremely positive event in domain X may induce good mood, resulting in reports of increased global SWB. However, the same event may also increase the standard of comparison used in evaluating domain X, resulting in judgments of decreased satisfaction with this particular domain. Again, experimental evidence supports this conjecture. In one study (Schwarz ct al. 1987, Experiment 2), students were tested in cither a pleasant or an unpleasant room, namely, a friendly office or a small, dirty laboratory that was overheated and noisy, with flickering lights and a bad smell. As expected, participants reported lower general life satisfaction in the unpleasant room than in the pleasant room, in line with the moods induced by the experimental rooms. In contrast, they reported higher housing satisfaction in the unpleasant than in the pleasant room, consistent with the assumption that the rooms served as salient standards of comparison.

The evidence here is a study with 22 female students assigned to two conditions (n = 12 and 10 per condition). The 2 x 2 ANOVA with room (pleasant vs. unpleasant) and satisfaction judgment (life vs. housing) produced a significant interaction of measure and room, F(1,20) = 7.25, p = .014. The effect for life-satisfaction was significant, F(1,20) = 8.02, p = .010 (reported as p < .005), and not significant for housing satisfaction, F(1,20) = 1.97, p = .18 (reported as p < .09 one-tailed).

This weak evidence in a single study with a very small sample is used to conclude that life-satisfaction judgments and domain satisfaction judgments may diverge. However, numerous studies have shown high correlations between average domain satisfaction judgments and global life-satisfaction judgments (Andrews & Whithey, 1976; Schimmack & Oishi, 2005; Zou et al., 2013). This finding cannot occur if respondents use mood for life-satisfaction judgments and other information for domain satisfaction judgments. Yet readers are not informed about this finding that undermines Schwarz and Stracks’ model of well-being reports and casts doubt on the claim that the same information has opposite effects on global life-satisfaction judgments and domain specific judgments. This may happen in highly artificial laboratory conditions, but it does not happen often in normal survey contexts.

The Relative Salience of Mood and Competing Information

If recalling a happy or sad life event elicits a happy or sad mood at the time of recall, however, respondents are likely to rely on their feelings rather than on recalled content as a source of information. This overriding impact of current feelings is likely to result in mood-congruent reports of SWB, independent of the mental construal variables discussed earlier. The best evidence for this assumption comes from experiments that manipulated the emotional involvement that subjects experienced while thinking about past life events.

This section introduces a qualification of the earlier claim that recall of events in the remote past leads to a contrast effect. Here the claim is that recalling a positive event from the remote past (a happy time with a deceased spouse) will not lead to a contrast effect (intensify dissatisfaction of a bereaved person), if the recall of the event triggers an actual emotional experiences (My life these days is good because I feel good when I think about the good times in the past). The problem with this theory is that it is inconsistent with the earlier claims that people will discount their current feelings if they think they are irrelevant. If respondents do not use mood to judge their lives when they attribute it to the weather, it is not clear why they would use their feelings if they are triggered by recall of an emotional event from their past? Why would a widower evaluate his current life as a widower more favorably when he is recalling the good times with his wife?

Even if this were a reliable finding, it would be practically irrelevant for actual ratings of life-satisfaction because respondents are unlikely to recall specific events in sufficient detail to elicit strong emotional reactions. The studies that demonstrated the effect instructed participants to do so, but under normal circumstances participants make judgments very quickly often without recall of detailed, specific emotional episodes. In fact, even the studies that showed these effects showed only weak evidence that recall of emotional events had notable effects on mood (Strack et al.. 1985).

[REPORTING THE JUDGMENT]

Self-presentation and social desirability concerns may arise at the reporting stage, and respondents may edit their private judgment before they communicate it

True. All subjective ratings are susceptible to reporting styles. This is why it is important to corroborate self-ratings of well-being with other evidence such as informant ratings of well-being. However, the problem of reporting biases would be irrelevant, if the judgment without these biases is already valid. A large literature on reporting biases in general shows that these biases account for a relatively small amount of the total variance in ratings. Thus, the key question remains whether the remaining variance provides meaningful information about respondents’ subjective evaluations of their lives or whether this variance reflects highly unreliable and context-dependent information that has no relationship to individuals’ subjective well-being.

[A JUDGMENT MODEL OF SUBJECTIVE WELL-BEING]

Figure 4.2 summarizes the processes reviewed in this chapter. If respondents are asked to report their happiness and satisfaction with their “life as a whole,” they are likely to base their judgment on their current affective state; doing so greatly simplifies the judgmental task.

As noted before, this would imply that global well-being reports are highly unstable and strongly correlated with measures of current mood, but the empirical evidence does not support these predictions. Current mood has a small effect on global well-being reports (Eid & Diener, 2004) and they are highly stable (Schimmack & Oishi, 2005) and predicted by personality traits even when these traits are measured a decade before the well-being reports (Costa & McCrae, 1980).

If the informational value of their affective state is discredited, or if their affective state is not pronounced and other information is more salient, they are likely to use a comparison strategy. This is also the strategy that is likely to be used for evaluations of less complex specific life domains.

Schwarz and Strack’s model would allow for weak mood effects. We only have to make the plausible assumption that respondents often have other information to judge their lives and that they find this information more relevant than their current feelings. Therefore, this first stage of the judgment model is consistent with evidence that well-being judgments are only weakly correlated with mood and highly stable over time.

When using a comparison strategy, individuals draw on the information that is chronically or temporarily most accessible at that point in time.

Apparently the term “comparison strategy” is now used to refer to the retrieval of any information rather than an active comparison that takes place during the judgment process. Moreover, it is suddenly equally plausible that participants draw on chronically accessible information or on temporarily accessible information. While the authors did not review evidence that would support the use of chronically accessible information, their model clearly allows for the use of chronically accessible information.

Whether information that comes to mind is used in constructing a representation of the target “my life now” or a representation of a relevant standard depends on the variables that govern the use of information in mental construal (Schwarz and Bless 1992a; Strack 1992).

This passage suggests that participants have to go through the process of evaluating their live each time when they are asked to make a well-being report. They have to construct what their live is like, what they want from life, and make a comparison. However, it is also possible that they can draw on previous evaluations of life domains (e.g., I hate my job, I am healthy, I love my wife, etc.). As life-satisfaction judgments are made rather quickly within a few seconds, it seems more plausible that some pre-established evaluations are retrieved than to assume that complex comparison processes are being made at the time of judgments.

If the accessibility of information is due to temporary influences, such as preceding questions in a questionnaire, the obtained judgment is unstable over time and a different judgment will be obtained in a different context.

This statement makes it obvious that retest correlations provide direct evidence on the use of temporarily accessible information. Importantly, low retest stability could be caused by several factors (e.g. random responding). So, we cannot verify that participants rely on temporarily accessible information when retest correlations are low. However, we can use high retest stability to falsify the hypothesis that respondents rely heavily on temporarily accessible information because the theory makes the opposite prediction. It is therefore highly relevant that retest correlations show high temporal consistency in global well-being reports. Based on this solid empirical evidence we can infer that responses are not heavily influenced by temporarily accessible information (Schimmack & Oishi, 2005).

On the other hand, if the accessibility of information reflects chronic influences such as current concerns or life tasks, or stable characteristics of the social environment, the judgment is likely to be less context dependent.

This implies that high retest correlations are consistent with the use of chronically accessible information, but high retest correlations do not prove that participants use chronically accessible information. It is also possible that stable variance is due to reporting styles. Thus, other information is needed to test the use of chronically accessible information. For example, agreement in well-being reports by several raters (self, spouse, parent, etc.) cannot be attributed to response styles and shows that different raters rely on the same chronically accessible information to provide well-being reports (Schneider & Schimmack, 2012).

The size of context-dependent assimilation effects increases with the amount and extremity of the temporarily accessible information that is included in the representation of the target.

This part of the model would explain why experiments and naturalistic studies often produce different results. Experiments make temporarily accessible information extremely salient, which may lead participants to use it. In contrast, such extremely salient information is typically absent in naturalistic studies, which explains why chronically accessible information is used. The results are only inconsistent if results from experiments with extreme manipulations are generalized to normal contexts without these extreme conditions.

[METHODOLOGICAL IMPLICATIONS]

Our review emphasizes that reports of well-being are subject to a number of transient influences.

This is correct. The review emphasized evidence from the authors’ experimental research that showed potential threats to the validity of well-being judgments. The review did not examine how serious these threats are for the validity of well-being judgments.

Although the information that respondents draw on reflects the reality in which they live, which aspects of this reality they consider and how they use these aspects in forming a judgment is profoundly influenced by features of the research instrument.

This statement is blatantly false. The reviewed evidence suggests that the testing situation (a confederate, a room) or an experimental manipulation (recall positive or negative events) can influence well-being reports. There was very little evidence that the research instrument influenced well-being reports and there was no evidence that these effects are profound.

[Implications for Survey Research]

The reviewed findings have profound methodological implications.

This is wrong. The main implication is that researchers have to consider a variety of potential threats to the validity of well-being judgments. All of these threats can be reduced and many survey studies do take care to avoid some of these potential problems.

First, the obtained reports of SWB are subject to pronounced question-order effects because the content of preceding questions influences the temporary accessibility of relevant information.

As noted earlier, this was only true in two studies by the authors. Other studies do not replicate this finding.

Moreover, questionnaire design variables, like the presence or absence of a joint lead-in to related questions, determine how respondents use the information that comes to mind. As a result, mean reported well-being may differ widely, as seen in many of the reviewed examples

The dramatic shifts in means are limited to experimental studies that manipulated lead-ins to demonstrate these effects. National representative surveys show very similar means year after year.

Moreover, the correlation between an objective condition of life (such as dating frequency) and reported SWB can run anywhere from r = – .l to r = .6, depending on the order in which the same questions are asked (Strack et al. 1988), suggesting dramatically different substantive conclusions.

Moreover? This statement just repeats the first false claim that question order has profound effects on life-satisfaction judgments.

Second, the impact of information that is rendered accessible by preceding questions is attenuated the more the information is chronically accessible (see Schwarz and Bless 1992a).

So, how can we see pronounced item-order effects for marital satisfaction if marital satisfaction is a highly salient and chronically accessible aspects of married people’s lives? So, this conclusion directly undermines the previous claim that item-order has profound effects.

Third, the stability of reports of SWB over time (that is, their test-retest reliability) depends on the stability of the context in which they are assessed. The resulting stability or change is meaningful when it reflects the information that respondents spontaneously consider because the same, or different, concerns are on their mind at different points in time.

There is no support for this claim. If participants draw on chronically accessible information, which the authors’ model allows, the judgments do not depend on the stability of the context because chronically accessible information is by definition context-independent.

Fourth, in contrast to influences of the research instrument, influences of respondents’ mood at the time of judgment are less likely to result in systematic bias. The fortuitous events that affect one respondent’s mood are unlikely to affect the mood of many others.

This is true, but it would still undermine the validity of the judgments. If participants rely on their current mood, variation in these responses will be unreliable and unreliable measures are by definition invalid. Moreover, the average mood of participants during the time of a survey is also not a valid measure of average well-being. So, even though mood effects may not be systematic, they would undermine the validity of well-being reports. Fortunately, there is no evidence that mood has a strong influence on these judgments, while there is evidence that participants draw on chronically accessible information from important life domains (Schimmack & Oishi, 2005).

Hence, mood effects are likely to introduce random variation.

Yes, this is a correct prediction, but evidence contradicts this prediction, and the correct conclusion is that mood does not introduce a lot of random variation in well-being reports because it is not heavily used by respondents to evaluate their lives or specific aspects of their lives.

Fifth, as our review indicates, there is no reason to expect strong relationships between the objective conditions of life and subjective assessments of well-being under most circumstances.

There are many reasons not to expect strong correlations between life-events and well-being reports. One reason is that a single event is only a small part of a whole life and that few life events have such dramatic effects on life-satisfaction that they make any notable contribution to life-satisfaction judgments. Another reason is that well-being is subjective and the same life event can be evaluated differently by different individuals. For example, the publication of this review in a top journal in psychology would have different effects on my well-being and on the well-being of Schwarz and Strack.

Specifically, strong positive relationships between a given objective aspect of life and judgments of SWB are likely to emerge when most respondents include the relevant aspect in the representation that they form of their life and do not draw on many other aspects. This is most likely to be the case when (a) the target category is wide (“my life as a whole”) rather than narrow (a more limited episode, for example); (b) the relevant aspect is highly accessible; and (c) other information that may be included in the representation of the target is relatively less accessible. These conditions were satisfied, for example, in the Strack, Martin, and Schwarz (1988) dating frequency study, in which a question about dating frequency rendered this information highly accessible, resulting in a correlation of r = .66 with evaluations of the respondent’s life as a whole. Yet, as this example illustrates, we would not like to take the emerging correlation seriously when it reflects only the impact of the research instrument, as indicated by the fact that the correlation was r = – .l if the question order was reversed.

The unrepresentative extreme result from Strack’s study is used again as evidence, when other studies do not show the effect (Schimmack & Oishi, 2005).

Finally, it is worth noting that the context effects reviewed in this chapter limit the comparability of results obtained in different studies. Unfortunately, this comparability is a key prerequisite for many applied uses of subjective social indicators, in particular their use in monitoring the subjective side of social change over time (for examples see Campbell 198 1; Glatzer and Zapf 1984).

This claim is incorrect. The experimental demonstrations of effects under artificial conditions that were needed to manipulate judgment processes do not have direct implications for the way participants actually judge well-being and the authors model allows for chronically accessible information to have a strong influence on these judgments under less extreme and less artificial conditions, and the authors model makes predictions that are disconfirmed by evidence of high stability and low correlations with mood.

Which Measures Are We to Use?

By now, most readers have probably concluded that there is little to be learned from self-reports of global well-being being.

If so, the authors succeeded with their biased presentation of the evidence to convince readers that these reports are highly susceptible to a host of context effects that make the outcome of the judgment process essentially unpredictable. Readers would be surprised to learn that well-being reports of twins who never met are positively correlated (Lykken & Tellgen, 1996).

Although these reports do reflect subjectively meaningful assessments, what is being assessed, and how, seems too context dependent to provide reliable information about a population’s well- being, let alone information that can guide public policy (but see Argyle, this volume, for a more optimistic take).

The claim that well-being reports are too context dependent to provide reliable information about a population’s well-being is false for several reasons. First, the authors did not show that well-being reports are context dependent. They showed that with very extreme manipulations in highly contrived and unrealistic contexts, judgments moved around statistically significantly in some studies. They did not show that these shifts are large, as it would require larger samples to estimate effect sizes. They did not show that these effects have a notable influence on well-being reports in actual surveys of populations well-being. And finally, they already pointed out that some of these effects (e.g., mood effects) would only add random noise, which would lower the reliability of individuals’ well-being reports, but when aggregated across responses would not alter the mean of a sample. And last, but not least, the authors blatantly ignore evidence (reviewed in this volume by Diener and colleagues) that variation across nationally representative samples shows highly reliable variation across populations in different nations that are highly correlated with objective life circumstances that are correlated with nations’ wealth.

In short, Schwarz and Strack’s claims are not scientifically founded and merely express the authors’ pessimistic take on the validity of well-being reports. This pessimistic view is a direct consequence of a myopic focus on laboratory experiments that were designed to invalidate well-being reports and ignoring evidence from actual well-being surveys that are more suitable to examine the reliability and validity of well-being reports when well-being reports are provided under naturalistic conditions.

As an alternative approach, several researchers have returned to Bentham’s ( 1789/1948) notion of happiness as the balance of pleasure over pain (for examples, SW Kahneman, this volume; Parducci 1995).

This statement ignores the important contribution of Diener (1984) who argued that the concept of well-being may consist of life evaluations as well as the balance of pleasure over pain or Positive Affect and Negative Affect, as these constructs are called in contemporary psychology. As a result of Diener’ s (1984) conception of well-being as a construct with three components, researchers have routinely measured global life-evaluations along with measures of positive and negative affect. A key finding is that these measures are highly correlated, although not perfectly identical (Lucas et al., 1996; Zou et al., 2013). Schwarz and Strack ignore this evidence, presumably because it would undermine their view that global life-satisfaction judgments are highly context sensitive and that measures of positive and negative affect could produce notably different results.

END OF REVEW: CONCLUSIONS

In conclusion, Schwarz and Strack’s (1999) chapter is a prototypical example of several bad scientific practices. First, the authors conduct a selective review of the literature that focuses on one specific paradigm and ignores evidence from other approaches. Second, the review focuses strongly on original studies conducted by the authors themselves and ignores studies by other researchers that produced different results. Third, the original studies are often obtained with small samples and there are no independent replications by other researchers, but the results are discussed as if they are generalizable. Fourth, life-satisfaction judgments are influenced by a host of factors and any study that focuses on one possible predictor of these judgments is likely to account for only a small amount of the variance. Yet, the literature review does not take effect sizes into account and the theoretical model overemphasizes the causes that were studied and ignores causes that were not studied. Fifth, the experimental method has the advantage of isolating single causes, but it has the disadvantage that results cannot be generalized to ecologically valid contexts in which well-being reports are normally obtained. Nevertheless, the authors generalize from artificial experiments to the typical survey context without examining whether their predictions are confirmed. Finally, the authors make broad and profound claims that do not logically follow from their literature review. They suggest that decades of research with global well-being reports can be dismissed because the measures are unreliable, but these claims are inconsistent with a mountain of evidence that shows the validity of these measures that the authors willfully ignore (Diener et al., 2009).

Unfortunately, the claims in this chapter were used by Noble Laureate Daniel Kahneman as arguments to push for an alternative conception and measurement of well-being. In combination, the unscientific review of the literature and the political influence of a Noble price has had a negative influence on well-being science. The biggest damage to the field has been the illusion that the processes underlying global well-being reports are well-understood. In fact, we know very little how respondents make these judgments and how accurate these judgments are. The chapter lists a number of possible threats to the validity of well-being reports, but it is not clear how much these threats actually undermine the validity of well-being reports and what can be done to reduce biases in these measures to improve their validity. A program that relies exclusively on experimental manipulations that create biases in well-being reports is unable to answer these questions because well-being judgments can be made in numerous ways and results that are obtained in artificial laboratory contexts may or may not generalize to the context that is most relevant, namely when well-being reports are used to measure well-being.

What is needed is a real scientific program of research that examines accuracy and biases in well-being reports and creates well-being measures that maximize accuracy and minimize biases. This is what all other sciences do when they develop measures of theoretical important constructs. It is time for well-being researchers to act like a normal science. To do so, research on well-being reports needs a fresh start and needs an objective and scientific review of the empirical evidence regarding the validity of well-being measures.

Replicability-Index

Improving the replicability of empirical research

Tag Archives: Reliability

Subjective Wellbeing – Chapter 03

Subscribe to continue reading