Frequency Judgments of Emotions: How Happy Were You Last Week?

Cite as: Schimmack. U. (1997). Frequency Judgments of Emotions: How Accurate are They and How are They Made? Unpublished dissertation. Free University Berlin.

Preface

“If you haven’t read it, it is new to you.”

I received my Ph.D. from the Free University Berlin in 1997. My dissertation contained two daily diary studies and two laboratory experiments. The main question that intrigued me at that time was how individuals make judgments about the frequency of their emotions (e.g., how often did you feel happy in the past week or month). I was also interested in the accuracy of these judgments because they are routinely used in personality questionnaires and in the measurement of subjective well-being. My dissertation never got published. I was fortunate that I was able to publish other work. Otherwise, my life might have turned out differently.

Although 23 years are a long time in real sciences, psychology is not a real science. The past 20 years have been wasted on questionable research to support false theories like Kahneman and Tversky’s (1973) influential availability heuristic. It now turns out that key results cannot be replicated (Schimmack, 2019). As you can see from my dissertation, cognitive psychologists already showed that ease of retrieval is not a plausible model of frequency estimation in the 1980s. Social psychologists simply ignored this work. So, now that the easy-of-retrieval model has failed, it may be a good time to introduce social and personality psychologists to cognitive models of frequency estimation that were developed in the 1980s. These models may provide a framework for applied research on frequency judgments of emotions and behaviors that are routinely used to measure personality traits (Fleisher, Woehr, Edwards, & Cullen, 2011).

DISSERTATION

Frequency Judgments of Emotions: How Accurate are They and How are They Made?

Ulrich Schimmack

Freie Universität Berlin
Fachbereich Erziehungswissenschaften, Psychologie und Sportwissenschaften

Betreuung durch Prof. Dr. Hubert Feger

Januar 1997

DEDICATION
To my grandparents Dr. Gerhard Walpert and Martha Walpert

ACKNOWLEDGMENT
Thanks are due to the many individuals and institutions that made this dissertation possible: Hubert Feger for the academic freedom and the resources needed to carry out the empirical investigations; “Studienstiftung des deutschen Volkes,” “Deutscher Akademischer Austauschdienst,” and my parents Frank and Liesel Schimmack for financial support; Ed Diener, Shigeh Oishi, and Mark Suh for collaboration on Study 1; Stephan Dutke, Hubert Feger, Bärbel Knäuper, Rainer Reisenzein, Thomas Rodenhausen, Matthias Siemer, and Germmi Temme for valuable comments on drafts of the dissertation; and, last but not least, Phanikiran Radhakrishnan and Joachim Stöber for social support.

1 ABSTRACT

The frequency of emotional experiences is an important topic for several basic and applied domains in psychology. Most studies investigating the frequency of emotions rely on retrospective self-reports of emotional experiences (e.g., “How frequently did you feel happy in the last month?”). However, relatively little is known about (a) the accuracy of such retrospective frequency estimates of emotions, (b) the representation of information about the frequency of emotions in memory, and (c) the cognitive processes underlying frequency judgments of emotions. The present dissertation addresses these questions in four studies. In two field studies, averaged daily frequency estimates of emotions are compared with frequency judgments of emotions extending over several weeks, to test the accuracy of the latter judgments. In two experiments the cognitive processes underlying frequency judgments of emotions were investigated under controlled conditions. In these studies, participants first rated their likely emotional reactions to several hypothetical scenarios and then judged the frequency of emotions in these scenarios. The results indicate that absolute frequency estimates of emotions underestimate the actual frequencies of emotional experiences, but they accurately discriminate between the frequencies of emotions across different emotions as well as across participants. In addition, the studies provide further support for the familiarity model of frequency judgments of emotions (Hintzman, 1988; Schimmack & Reisenzein, in press). This model assumes that memories of emotional experiences are stored in separate memory traces in an episodic memory. When a frequency judgments of an emotion is required, memories of experiences of this emotion are activated in parallel. This generates a feedback signal, which is experienced as a feeling of familiarity. The more memory traces were activated, the stronger is the feeling of familiarity, and the higher is the frequency judgment. This model is contrasted with (a) models that assume the direct encoding of frequency information in memory and (b) models that assume the retrieval of memories into consciousness.

2 INTRODUCTION

2.1 What is the Frequency of Emotions?

More than hundred years after James’s (1884) question “What is emotion?”, researchers of emotions still search for an answer to this question. In the last decade, however, several researchers used prototype theory (Rosch, 1975) to address this question empirically. Prototype theory at least allows to identify a set of typical emotions1 (cf. Fehr & Russell, 1984). Typical emotions are, for example, love, hate, joy, anger, fear, and sadness.

Schimmack and Reisenzein (1994) extended this work, demonstrating that emotions can be differentiated from moods by means of typicality ratings: Some concepts denote emotions but not moods (e.g., love, pride), others denote moods but not emotions (e.g., relaxed, nervous), and there are also concepts that denote moods and emotions (e.g., happy, sad). Furthermore, Schimmack and Siemer (1995) found that intentionality or objected directedness is a characteristic that differentiates typical emotion concepts (i.e., one is proud about something) from typical mood concepts (i.e., one feels relaxed, but not relaxed about something). This finding supports cognitive emotion theories, such as Stumpf’s theory (cf. Reisenzein & Schönpflug, 1992), in which intentionality is necessary feature of emotions that differentiates them from other affective states.

Furthermore, the object directedness of emotions explains their episodic nature; that is, emotions are elicited by the cognitive appraisal of an event for one’s own well-being, are maintained as long as the feeling is directed at an object, and are terminated once the thoughts are no longer directed at the object (Lazarus, 1991). The aroused affect might remain; this is, however, often considered a mood and no longer an emotion (cf. Ekman & Davidson, 1994, chapter 2). Due to the episodic nature of emotional experiences like pride, disappointment, gratitude, or shame, there are times in which individuals do not experience emotions at all, compared to times when they experience emotions. As a consequence, it is possible to ask people at any moment in time, whether they feel a particular emotion or not. If this question is asked repeatedly, one can determine the frequency with which the individual experienced the emotion. In other words, the question about the frequency of emotions, is about the number of times that a particular emotion has been elicited. The frequency of emotional experiences has to be differentiated from two other important characteristics of emotional experiences, namely their duration and their intensity. The differences between these three features of emotional experiences are illustrated in Figure 1, which shows an individual’s experience of a particular emotion over time. Most of the time the individual does not experience this emotion at all (i.e., the intensity equals zero).

Over the time interval displayed in Figure 1, the emotion is elicited two times (i.e., a change from zero to an intensity greater than zero), so that the frequency of the emotion equals two. The duration of the first emotional experience is longer than the one of the second episode (i.e., a longer distance along the time axis with intensity greater than zero). Regarding the intensity of an emotional experience, different definitions have been proposed: (a) the maximum intensity at the peak of the experience or (b) the integral of the area under the curve from the beginning to the end of the emotional episode (Frijda, Ortony, Sonnemans, & Clore, 1992).

Duration and intensity are important aspects of emotional experiences which can vary independently from the frequency of emotional experiences. For example, Schimmack and Diener (in press) demonstrated that individuals who experience emotions frequently do not necessarily experience emotions more intensely. Hence, frequency, intensity, and duration should be studied separately. The present dissertation focuses exclusively on the frequency of emotions.

2.2 Why Studying Frequency Judgments of Emotions?

The frequency of emotions is important in many everyday life situations. For example, a person who often feels joy seeing a friend is likely to spend time with this friend in the future, whereas a person who often feels fear of flying is likely to use other means of transportation. Apparently, information about the frequency of emotions in past situations can serve as a guide for future behavior (cf. Emmons & Diener, 1986a).

The frequency of emotions is also used as information in the formation of impressions of oneself (e.g., “I am an emotional person”) or of others (e.g., ”She is cold-blooded”). Because frequency of emotions is an important source of information in everyday life, it is not surprising that it is also relevant for psychological disciplines, such as personality, social, cross-cultural, clinical, and industrial and organizational psychology.

I briefly summarize some prevalent questions regarding frequency of emotions in these fields of inquiry. This short review shows that a valid assessment of the frequency of emotions is a necessary requirement for some of the research in these diverse fields, but that the validity of the most often used measure of frequency of emotions; that is, retrospective frequency judgments of emotions, is not yet firmly established. Indeed, various researchers are skeptical about the validity of retrospective reports in general. For example, Lewinsohn and Rosenbaum (1987) state that “retrospective memory should probably never be construed to represent what really occurred” (p. 618). However, evidence for such extreme claims is scarce (cf. Brewin, Andrews, & Gotlib, 1993). The present set of studies examines this issue for retrospective judgments of the frequency of emotional experiences.

2.2.1 Relevance for Personality Psychology

Over the last years, research on stable individual differences in the experience of affect has increased considerably. In several studies, retrospective estimates of experienced pleasant affect were correlated with extraversion and retrospective estimates of experienced unpleasant affect were correlated with neuroticism (Costa & McCrae, 1980; Emmons & Diener, 1986b; Izard, Libero, Putnam, & Haynes, 1993; Larsen & Diener, 1992; Meyer & Shack, 1989; Pavot, Diener, & Fujita, 1990; Watson & Clark, 1992). Some researchers even found these personality dimensions and experiences of affect to be so highly correlated that they equated neuroticism with the disposition to experience unpleasant affect and extraversion with the disposition to experience pleasant affect (Meyer & Shack, 1989).

The finding that extraversion and neuroticism are also correlated with averaged daily reports of affect suggests that this relation is substantial and that the retrospective judgments do possess some validity (Emmons & Diener, 1986b). However, it is possible that the correlations between retrospective estimates of experienced affect and personality traits overestimate the true strength of the relation between these traits and the actual frequencies of experienced affect. This hypothesis is suggested by a meta-analysis (Schimmack, 1996b), which shows that correlations between the two traits and the amount of experienced pleasant and unpleasant affect were higher for retrospective estimates than for averaged daily ratings of affect. Therefore, at least part of the shared variance between self-reports of personality traits and amount of experienced affect could be due to a so called personality-congruent memory bias (Martin, 1985). That is, people overestimate experiences of affect that is consistent with their personality. For example, neurotic individuals tend to overestimate the amount of unpleasant affect, whereas extravert individuals tend to overestimate the amount of pleasant affect (Diener, Larsen, & Emmons, 1984). One aim of the present dissertation is to explore the presence of personality-congruent biases in retrospective frequency estimates of emotions.

Furthermore, previous studies of the relation between personality and trait affect often did not differentiate between emotions and moods, and often did not separate the frequency of emotions from their typical intensity or duration. However, it has been demonstrated that individual differences in the frequency and the typical intensity of emotional experiences are separable constructs (cf. Diener, Larsen, Levine, & Emmons, 1985; Schimmack & Diener, in press). Therefore, the present dissertation also explores the structure of individual differences in the frequency of emotional experiences.

2.2.2 Relevance for Clinical Psychology

Abnormally frequent or infrequent experiences of emotions are symptoms of several psychological disorders (cf. Andreasen & Black, 1991). For example in the diagnostic system DSM-III-R, symptoms of the paranoid personality disorder are frequent experiences of distrust, fear, jealousy, and resentments. Symptoms of the schizoid personality disorder are very infrequent experiences of rage and joy. And a symptom of the narcissistic personality is the frequent experience of envy.

For a practitioner, it is difficult to assess these symptoms because epidemiological data about the frequency of these emotions in the general population are lacking. The results of the present dissertation can serve as a first guideline about the prevalence of emotions like envy or joy, although the results are limited to a student population. One aim of the dissertation is to suggest strategies that allow an economical but accurate estimation of the prevalence of emotions in the general population.

A second problem for the diagnostician is that it is unknown whether a patient’s reported frequencies of an emotion are accurate. It could be that psychological disorders bias the self-report of past emotional experiences. For example, depressed patients might overestimate the frequency of their unpleasant emotional experiences (Fitzgerald, Slade, & Lawrence, 1988). However, the evidence concerning long-term memory deficits due to psychological disorders is mixed (see Brewin et al., 1993, for a review). An investigation of the cognitive processes underlying frequency judgments of emotions could help to determine when a patient’s self-report is likely to be accurate and when it might be biased. It might also help to develop measurement instruments that are least susceptible to distortions.

2.2.3 Relevance for Social and Cross-Cultural Psychology

People often experience emotions in social situations (cf. Scherer, Wallbott, & Summerfield, 1986). Because the structure of a society influences the type of social situations that its members encounter, it is likely that social factors influence the frequency of emotions. For example, Briggs (1970) claimed that the Inuit never experience anger (but see Briggs, 1987). In contrast, the high homicide rate in the USA suggests that anger-related emotions such as anger, rage, or hate are experienced quite frequently in the United States, especially in the South (Cohen, 1996).

Evidently, understanding the cultural factors that influence the frequency of emotions – and the actions motivated by them – is relevant for political decisions. One important cultural dimension that is likely to influence the frequency of emotional experiences is the individualism-collectivism dimension. Because individualistic cultures provide looser social networks than collectivistic cultures (cf. Hofstede, 1980; Triandis, 1994), members of individualistic cultures might experience more frequently loneliness, but less frequently shame. Markus and Kitayama (1994) report a study in which the frequency of joy was more highly correlated with the frequency of pride in the USA (an individualistic culture) than in Japan (a collectivistic culture), indicating that achievement situations are a stronger source of happiness in individualistic cultures.

Studying the frequency of emotions is also important because the frequency of pleasant versus unpleasant emotions is one component of Subjective Well-Being (cf. Diener, 1984; Diener, Sandvik, & Pavot, 1991). Recently Diener, Diener, and Diener, (1995; Diener & Diener, 1995) explored what differentiates “happy” from “unhappy” nations. They found happy nations to be more affluent, individualistic, and democratic. These findings have potential implications for political questions such as whether China can promote the happiness of its people by economic growth, but without changing its political system. The interpretation of cross-cultural studies is, however, based on a number of assumptions. Among others, one basic assumption is that people can accurately estimate the frequency of their emotional experiences.

2.2.4 Relevance for Industrial and Organizational Psychology

Traditionally, job satisfaction is measured as an evaluative judgment or an attitude (Pekrun & Frese, 1992). Temme and Tränkle (1996) pointed out that this approach neglects the emotional aspects of work. An individual might be satisfied with his or her job, but experience only rarely emotions such as joy or pride. Similarly, global evaluations of one’s life are correlated, but distinct from measures of the frequency of pleasant versus unpleasant emotions (cf. Pavot & Diener, 1993). Therefore, the traditional assessment of job satisfaction should be complemented with the assessment of the frequencies of pleasant and unpleasant emotions at work. It is therefore an important question whether people can make accurate estimates of the frequencies of their emotional experiences in different contexts. Otherwise the frequency judgments would not only reflect the number of joy experiences at work, but also joy experiences in other contexts. Actually, several frequency judgment models, which are reviewed later, predict that people lack the ability to discriminate frequencies in different contexts. If this were true, assessing frequencies of emotions at work would be difficult.

2.3 Previous Research on Frequency Judgments of Emotions

2.3.1 Accuracy

Accuracy of frequency judgments can be globally defined as the agreement between frequency judgments and actual frequencies. Complementary, lack of accuracy is indicated by deviations of frequency judgments from actual frequencies. Because these deviations can be computed in various ways, several types of accuracy can be distinguished (cf. Naveh-Benjamin & Jonides, 1986; Thomas & Diener, 1990).

Absolute and relative accuracy compare the absolute level of actual and estimated frequencies. Absolute accuracy is defined as the ability of individuals to estimate the absolute frequency of an emotion accurately, irrespectively of the direction of the estimation error; that is, whether the actual frequencies are over- or underestimated. A common index of absolute accuracy is the standard deviation of the frequency judgments from the actual frequencies (Naveh-Benjamin & Jonides, 1986). In contrast, relative accuracy takes the direction of the error into account. A common measure of relative accuracy is the difference between the actual and estimated absolute frequency (estimated minus actual). In contrast to absolute accuracy, relative accuracy indicates whether the actual frequencies are over- or underestimated.
In the literature on frequency judgment of emotions, absolute and relative accuracy have been neglected. So far, the only study of relative accuracy compared the actual number of pleasant days; that is, days on which a person experienced more pleasure than displeasure; with the estimated number of pleasant days (Thomas & Diener, 1990). The results indicated that participants underestimate the number of pleasant days.

Because the number of unpleasant days is by definition perfectly inversely related to the number of pleasant days, the study also demonstrated that participants overestimated the number of unpleasant days. Because most participants experienced more pleasant than unpleasant days, this finding is consistent with the finding that frequency estimates regress toward the mean due to information loss (Howell, 1973; Fiedler, 1991). In contrast to the number of pleasant versus unpleasant days, the present dissertation explores for the first time the absolute and relative accuracy of frequency estimates of single emotions such as joy, anger, or sadness.

Two other types of accuracy are not concerned with the absolute frequency of a single entity (e.g., an emotion), but rather test how accurately frequency estimates discriminate between actual frequencies of different entities; where the entities can be the stimuli or the participants. These types of accuracy are subsequently called discriminative accuracy. A common index of discriminative accuracy is the Pearson correlation coefficient between actual and estimated frequencies computed across different entities.

Experimental studies investigate the ability of frequency estimates to discriminate the actual frequencies of the stimuli, which are experimentally manipulated. In the present context, the stimuli are different emotions such as joy, fear, or gratitude. Therefore, this type of accuracy is called discriminative accuracy across emotions, which has not been tested in previous studies of frequencies judgments of emotions. Exploring the discriminative accuracy across emotions has, however, several advantages. First, it can be tested in field and experimental studies of frequency judgments of emotions, which is not true for the discriminative accuracy across participants introduced below.

Second, tests of different models of frequency judgments are often based on the ability of a potential mediator variable (e.g., the number of recalled examplars) to discriminate between actual frequencies of different emotions (cf. Fitzgerald et al., 1988). This is, however, only meaningful if the frequency estimates possess discriminative accuracy across emotions. As a consequence, the accuracy of frequency judgments across emotions is investigated in the present dissertation.

In contrast to experimental studies, personality psychologists are mainly concerned with the ability of frequency judgments of emotions to discriminate actual frequencies of emotions experienced by different participants (Diener, Smith, & Fujita, 1995; Feldman Barrett, in press; Parkinson, Briner, Reynolds, & Totterdell, 1995; Thomas & Diener, 1990). This type of accuracy is subsequently called discriminative accuracy across participants. For example, Diener et al. (1995) studied individual differences in the experience of six types of emotions (see below). Each type was measured by four items. At the end of 52 consecutive days, the participants rated the time that they experienced each emotion on the particular day. Before and after the diary period, the participants also made time judgments for the previous month. The correlations between the averaged pre- and post-diary judgments and the averaged daily judgments were r = .69 for threat emotions (e.g., fear), r = .61 for bad-other emotions (e.g., anger), r = .52 for bad-self emotions (e.g., shame), r = .64 for separation emotions (e.g., sadness), r = .65 for good-other emotions (e.g., love), and r = .68 for pleasure emotions (e.g., joy). This finding shows quite high discriminative accuracy across participants.

In addition, the correlations between actual and estimated times of emotions were higher within the same type of emotions (pleasure–pleasure) than across different types of emotions (pleasure-good-other), indicating that the frequency judgments were emotion specific. This pattern of results rules out a simple response set explanation. Feldman Barrett (in press) reported correlations ranging from r = .59 to .76 between averages of repeated momentary mood ratings over a 90 day period and retrospective estimates of these averages after the diary period. Again, in this study frequency was not measured in a pure fashion, because the average of repeated intensity ratings comprises frequency and intensity information (Schimmack & Diener, in press). Furthermore, previous studies might overestimate the true discriminative accuracy across participants, because the frequency judgments were made (partly) after the diary study. Therefore, it is possible that participants based their post-diary judgments on memories of their daily judgments, rather than their daily experiences, a source of information that is not available under natural circumstances.

An influence of daily ratings on the subsequent post-diary judgments is especially likely because the post-diary judgments were made on the same response scale as the daily ratings. This hypothesis is further strengthened by Thomas and Diener’s (1990) finding that the correlation between pre-diary estimates of the number of happy days and the number of happy days experienced during the diary period was lower than the correlation obtained for the post-diary estimates. To test this hypothesis, the present dissertation followed Thomas and Diener’s (1990) approach to compare pre- and post-diary frequency judgments of emotions. Furthermore, different response scales were used for the assessment of the daily frequencies and the retrospective frequency judgments.

In sum, previous studies suggest that the discriminative accuracy across participants is fairly high (Diener, Smith et al., 1995; Feldman Barrett, in press; Parkinson et al., 1995; Thomas & Diener, 1990). However, in none of these studies the frequency of emotions was measured in a pure fashion; Diener et al. (1995) studied frequency and duration, Feldman Barrett (in press; Parkinson et al., 1995) investigated frequency and intensity, and Thomas and Diener’s (1990) studied the number of pleasant versus unpleasant days. Thomas and Diener’s study suggests that people underestimate the frequency of pleasant emotions and overestimate the frequency of unpleasant emotions. However, this finding might be limited to their measure of pleasant versus unpleasant days. Different results might be obtained for absolute frequency estimates of single pleasant and unpleasant emotions such as anger, joy, or sadness. Furthermore, several types of accuracy -namely absolute and relative accuracy as well as discriminative accuracy across emotions – have been neglected in previous studies. One major aim of the present dissertation is to explore the different types of accuracy for pure frequency judgments of emotions.

2.3.2 Underlying Cognitive Processes

Only two studies explored the cognitive processes underlying frequency judgments of emotions (Fitzgerald, et al., 1988; MacLeod, Andersen, & Davies, 1994; see 2.4 for research on frequency judgments in general). Both studies compared frequency judgments of emotions with the latencies to retrieve a single autobiographic memory in which the target emotion occurred. The results consistently showed an inverse relation between these two variables: Retrieval latencies decreased with increasing frequency of emotions. Furthermore, retrieval latencies were faster for the more frequent pleasant emotions than for the less frequent unpleasant emotions (MacLeod et al., 1994). These findings have been interpreted as support for the assumption of the ease-of-retrieval model (see next paragraph for more detail) that people base frequency judgments on the ease (speed) with which they can retrieve exemplars from memory. However, this design has two shortcomings. First, the evidence is only correlational. Second, the relation of both measures to the actual frequencies of emotions is unknown. Only if both measures show the same correlation with the actual frequencies of emotions, the frequency judgments can be based on ease-of-retrieval. If, however, the frequency judgments would be more highly correlated with the actual frequencies than a measure of ease-of-retrieval, ease-of-retrieval could not explain the accuracy of the frequency judgments. Therefore, more rigorous tests are needed to uncover the cognitive processes underlying frequency judgments of emotions.

2.4 The Experimental Literature on Frequency Judgments in General

2.4.1 Theoretical Models

Experimental research on frequency judgments has often relied on stimuli (e.g. word lists with concepts of natural objects such as fruits, furniture, birds, etc.) which, on first sight, bear little resemblance to experiences of emotions in everyday life. Therefore, one might be skeptical whether this research helps to understand frequency judgments of emotions. This skepticism is not justified for two reasons.

First, frequency judgments of emotion also employ emotion concepts. Although emotion concepts differ from concepts of natural objects – for example, concepts of natural objects are hierarchically organized and mutually exclusive at the same level of the hierarchy, whereas emotion concepts are not (Reisenzein, 1995) – frequency judgments might not be affected by these differences.

Secondly, even if frequency judgments of emotions employ other cognitive processes than frequency judgments of natural objects, the theories and experimental methods developed in the general frequency judgment literature are at least heuristically fruitful for the investigation of frequency judgments of emotions.

Various frequency judgment models have been proposed in the psychological literature, which are not mutually exclusive (Brown, 1995; Hintzman, 1988; Howell, 1973; Tversky & Kahneman, 1973). Each model might be correct in specific contexts and for specific domains. For example, Manis, Shedler, Jonides, and Nelson (1993) argued that direct-encoding models might account for frequency judgments of repeated occurrences of the same stimulus, whereas retrieval-based models might account for frequency judgments of categories (see also Brown, 1995).

Other studies show that the expectation about the frequency of the event to be estimated is also important. People are likely to use a counting strategy for rare events (seeing the dentist in the last year), but estimation strategies for more frequent events (restaurant visits in the last year) (Blair & Burton, 1987).

Figure 2 provides a taxonomy of the different frequency judgment models proposed in the literature. Two major characteristics differentiate between the frequency judgment models. The first distinction is between direct encoding models that assume frequency information to be encoded directly at the time of encoding (Hasher & Zacks, 1979; Jonides & Jones, 1992; Underwood, 1969) and indirect encoding models, which assume frequency information to be stored indirectly in memory in the form of multiple memory traces (see Figure 2).

Direct encoding models are, for example, the counter model, which is based on the idea that concepts are linked to a frequency counter that registers every activation of the concept (Underwood, 1969), or the concept strength model, which assumes that concepts are strengthened by each activation so that frequency judgments can be based on a readout of a concept’s strength (cf. Howell, 1973).

The second major distinction is between retrieval-based versus retrieval-free models. Retrieval-based models assume that frequency judgments are based on information that is obtained by the retrieval of relevant exemplars to the level of consciousness. A straight forward strategy would be to count all available instances in memory (Brown, 1995; Meudall, 1971). However, research suggests that people do not use the counting strategy for unregular and frequent events (Menon, 1994). Because emotional experiences are irregular, and quite frequent over longer time periods, it is unlikely that people rely on a counting strategy when they judge the frequency of emotions.

The other possibility is that people use simple heuristics to make frequency judgments. Tversky and Kahneman (1973) suggested several possibilities which heuristics people might use to make frequency judgments. Somewhat confusingly, all of the proposed heuristics became to be known as availability heuristics, although they assume clearly distinct cognitive processes. Most commonly, the availability heuristic has been interpreted as the retrieval of a limited number of exemplars followed by an estimation based on the number of retrieved exemplars (cf. Watkins & LeCompte, 1991). “The subject could, therefore, use the number of instances retrieved in a short period to estimate the number of instances that could be retrieved in a much longer period of time” (Tversky & Kahneman, 1973, p. 210). To distinguish this heuristic from other heuristics, it has been named recall-estimate theory (Watkins & LeCompte, 1991).

Empirical support for the recall-estimate model stems from the finding that the number of recalled exemplars is correlated with frequency judgments and both variables are influenced in the same way by experimental manipulations at the time of encoding (Manis, et al., Tversky, & Kahneman, 1973). For example, in a now classical study, Tversky and Kahneman demonstrated that people recall more female names than male names from a list with an equal number of female and male names, when the female names referred to famous people. In addition, they also made higher frequency estimates for female than for male names.

Today, the second availability heuristic proposed by Tversky and Kahneman is known as the ease-of-retrieval model (Schwarz, Bless, Strack, Klumpp, Rittenauer-Schatka, & Simons, 1991). According to this model, an individual “attempts to recall some instances and judges overall frequency by availability, i.e., by the ease with which instances come [italics added] to mind (Tversky & Kahneman, 1973, p. 220). The ease-of-retrieval model has been empirically tested and supported in some studies (Gabrielcik & Fazio, 1984; Schwarz et al., 1991). For example, Schwarz et al. (1991) asked participants to recall either six instances when they were assertive, which was easy, or twelve instances, which was difficult. Subsequently, the participants in the easy recall condition judged themselves to be more assertive than those in the difficult recall condition.

In contrast to the retrieval-based models, retrieval-free models assume that frequency judgments do not involve retrieval of exemplars to the level of consciousness. Interestingly, Tversky and Kahneman (1973) also suggested a retrieval-free model of frequency judgments. “To assess availability it is not necessary to perform the actual operations of retrieval. It suffices to assess the ease with which these operations could [italics added] be performed, much as the difficulty of a puzzle or mathematical problem can be assessed without considering specific solutions” (p. 208).

This proposition bears a close resemblance to the finding in the metamemory literature that people often have a feeling-of-knowing the answer to a question, even when they cannot recall the answer. Nevertheless, the strength of this feeling predicts people’s performance in a later recognition test (see Nelson, 1988). Metcalfe (1993) proposed that this seemingly paradox ability is based on the familiarity of the question. Similarly, Hintzman (1988) proposed that people can judge the frequency of events without actual retrieval of related memories by means of a direct familiarity signal from memory. Hintzman’s familiarity model assumes that a question such as “How frequently did you experience joy in the last week?” activates automatically and in parallel memories of joy experiences in the last week. The activation is based on a feature-matching process: The more features of a typical joy experience a memory possesses the stronger the activation of this memory; and the stronger the familiarity signal. Furthermore, some features encode the time of the experience, so that joy experiences in the last week are activated more strongly than joy experiences at other times. The automatic activation process produces an echo. The intensity of this echo reflects the amount of information that was activated in memory. This echo intensity is experienced as a feeling of familiarity. The major distinction between the familiarity model on the one hand, and the retrieval-based models on the other hand, is that the familiarity model does not require the retrieval of emotional memories to a conscious level. Therefore, it is possible that someone says: “I cannot recall a specific situation in which I felt joy last week, but I think I felt joy about 20 times.”

One important limitation of the familiarity model is that it does not explain how participants make absolute frequency estimates. The familiarity model only predicts that the familiarity signal will be stronger for frequent stimuli and weaker for rare stimuli, but the model does not explain how a feeling of familiarity is converted into an absolute numerical estimate (cf. Brown, 1995; Brown & Siegler, 1993). This problem, however, exists also for the ease-of-retrieval and the recall-estimate model.

2.4.2 Empirical Paradigms

In the experimental literature, several experimental paradigms have been developed to differentiate between the various frequency judgment models. Subsequently, I review those paradigms that were employed in the present studies to explore the cognitive processes underlying frequency estimates of emotions. More specifically, I first review paradigms that differentiate direct from indirect encoding models, and then paradigms that differentiate between indirect encoding models.

One paradigm is modeled after a study by Hintzman and Block (1971). The authors asked participants to learn two lists of words in which the frequency of words was independently varied. Subsequently, the participants estimated the frequency of words separately for the first and the second list. The authors found that the participants were able to make accurate frequency judgments for each list. This finding is difficult to explain by direct encoding models, which assumes that frequency information is constantly updated at the time of encoding. Therefore, only the total frequency is stored in memory and it is impossible to differentiate frequencies in different contexts. Study 2 of the present dissertation used a similar paradigm. Participants first made daily frequency estimates of emotions for two weeks. Subsequently, they made separate frequency estimates for the first and second week of the diary study.

A second paradigm that has been used to test direct versus indirect encoding of frequency information relies on a manipulation of the salience of category membership (e.g., Bruce, Hockley, & Craik, 1991; Greene, 1989). In Greene’s study, participants were asked to study a list of words. In this list, words of different categories occurred with varying frequencies (e.g., fruits: orange, apple, banana, grapes; trees: oak, pine). In one study, he manipulated the salience of category membership in that exemplars of the same category appeared either in one block, or were spread across the list. The direct encoding model assumes that category members automatically activate category labels and that the frequency of the category is counted (Alba, Chromiak, Hasher, & Attig, 1980). Hence, making category membership salient should not have an effect on frequency judgments. However, Greene (1989) found that frequency was judged to be higher when category membership was salient; that is, in the blocked condition. This is once again difficult to explain by the direct encoding model. Manipulations of the salience of emotion concepts were employed in the studies 1, 3 and 4 of the present dissertation, to test whether salience has an influence on frequency judgments of emotions. In Study 1, participants made frequency judgments of emotions before and after a diary study for salient emotions; that is, those that were on the rating form during the diary study, and non-salient emotions; that is, those that were not on the form. First of all, it was expected that frequency judgments of all emotions increase due to the participation in a diary study. In addition, it was expected that frequency judgments of salient emotions increase more strongly than those of non-salient emotions. In studies 3 and 4, participants first rated for a number of emotions whether they would experience these emotions in various hypothetical scenarios. Subsequently, they were asked to estimate how frequently they would have experienced emotions in the set of scenarios. This question was asked for salient emotions; that is, those emotions that had been included in the previous rating task, and non-salient emotions; that is, those that had not been included in the scenario rating task. Note that frequency judgments of non-salient emotions are meaningful, because the fact that these emotion concepts were not included in the scenario rating task does not imply that these emotions could not have been experienced in the hypothetical scenarios. It was expected that the frequency judgments of salient emotions would be higher than those of non-salient emotions (Bruce et al., 1991; Greene, 1989; Hintzman, 1988).

The previously described paradigms can test direct and indirect encoding models against each other, but they do not allow to distinguish between retrieval-based and retrieval-free models, because all indirect models predict that different frequency judgments can be provided for different contexts (Hintzman & Block, 1971), or that salience at the time of encoding enhances frequency judgments (Bruce et al., 1991). Even the finding that frequency judgments and recall measures show discriminative accuracy across stimuli does not provide conclusive evidence that the frequency judgments were actually based upon a recall-estimate strategy (Bruce et al., 1991; Hastie & Park, 1986; Watkins & LeCompte, 1991). The correlation could be simply due to the fact that frequency judgments and number of recalled exemplars are bound to be related by the number of exemplars stored in memory. Nevertheless, frequency judgments might not be based on the retrieval of exemplars. Therefore, a closer examination of the frequency judgment process is needed. The major difference between the two models is the assumption of the retrieval-based models that exemplars are retrieved to the level of consciousness. Therefore, the time needed for a frequency judgment should be longer than the latency to retrieve at least a single exemplar. Similarly, Reder (1987) argued in the metamemory literature that feeling-of-knowing judgments should take more time than the retrieval of answers, if they are retrieval-based; however, consistent with retrieval-free models (Metcalfe, 1993), the feeling-of-knowing judgments were faster than retrieval times of answers.

Furthermore, if frequency judgments are based on the retrieval of exemplars, the judgment times of frequency judgments should be systematically related to the judged frequency: According to the recall-estimate theory, higher frequency judgment should need more time because more exemplars were retrieved, and the retrieval of more exemplars takes more time (Brown, 1995; Meudall, 1971). The opposite prediction is made by the ease-of-retrieval model. Higher frequency judgments are based on easier retrieval of examplars, which implies that the exemplars come to mind faster so that the judgment can be made faster. In contrast to these predictions of the two retrieval-based models, Hanson and Hirst (1988) found neither a positive nor a negative relation between judgment times and size of the frequency judgments. Brown (1995) found a positive correlation when participants used the counting or the recall-estimate strategy, but not when participants used a retrieval-free strategy. In studies 3 and 4, the size of the frequency judgments, the time needed to make these judgments, and response times in a latency-to-retrieve task were compared to test the indirect encoding models of frequency judgments in the emotion domain.

2.5 Proposing a Familiarity Model of Frequency Judgments of Emotions

A finding by Schimmack and Reisenzein (in press) casts doubt on the basic assumption of the retrieval-based models that participants rely on the recall of exemplars to judge the frequency of emotions. Fitzgerald et al. (1988) found that on average the retrieval of emotional episodes from autobiographic memory needed more than 7s. MacLeod et al. (1994) also reported average retrieval times of more than 7s for unpleasant emotion memories, although pleasant memories were retrieved within 4s. In Schimmack and Reisenzein’s (in press) study, participants made conditional probability judgments of emotions (i.e., “If you experience joy, how frequently do you experience euphoria?”) on average within 5s. These judgment times appear to be too fast, especially given that the judgments were made only for unpleasant emotions, to be based on the recall of past emotional experiences. Of course, the fast responses could be due to random responding. However, covariation judgments of emotions are not only fast, but also reflect actual covariations between emotions fairly accurately (Reisenzein & Schimmack, 1996).

Finally, Schimmack and Reisenzein analyzed asymmetries in conditional probability judgments. According to Bayes’s theorem (Wiggins, 1973), p(A) > p(B) exactly if p(A|B) > p(B|A); that is, because sadness is in general a more frequent emotion than embarrassment, the conditional probability of sadness given embarrassment should be higher than the conditional probability of embarrassment given sadness. This prediction was confirmed for most emotion pairs. Therefore, conditional probability judgments reflect not only the actual co-occurrence of emotions, but also the separate frequencies of the two emotions. It is difficult to imagine how the participants (a) retrieved a sufficient number of exemplars and (b) carried out the necessary computations on a conscious level within 5s. Therefore, Schimmack and Reisenzein concluded that both frequency and co-occurrence judgments of emotions are either already pre-stored in memory, or the judgments are based on a feeling of familiarity (Hintzman, 1988; Metcalfe, 1993). As a consequence, a test between direct encoding models and the familiarity model seems to be highly desirable. Nevertheless, previous findings by Hintzman and Block (1971) and others (Green, 1989) have challenged direct encoding models in other domains. Furthermore, the familiarity model has been successfully applied to other social judgments. For example, the familiarity model, but not the direct-encoding models, explains the phenomenon of illusory correlations (Smith, 1991; Smith & Zaraté, 1993; see also Fiedler, 1991). Therefore, Schimmack and Reisenzein recommended the familiarity model as an “inference to the best explanation,” for frequency and co-occurrence judgments of emotions.

2.6 Biases in Frequency Judgments of Emotions

2.6.1 Mood-Congruent Biases

Up till now, frequency judgments of emotions have been treated just like frequency judgments of natural objects (e.g. fruits, cities, furniture). However, emotions differ from these stimuli in that emotions have a hedonic tone: they are either pleasant or unpleasant (cf. Clore, 1994). Several information processing models predict that affective information is processed differently from non-affective information. According to the mood-congruent-memory hypothesis (Bower, 1981), the current affective state renders mood-congruent memories more accessible. In combination with the indirect encoding models of frequency judgments, this leads to the prediction that frequency estimates of emotions are biased in a mood-congruent direction. In contrast, the competing model of mood effects on social judgments; that is, the mood-as-information model (Schwarz & Clore, 1983), does not make this prediction. According to this model, people directly use their current mood to make evaluative judgments whenever they consider their current mood a valid and relevant source of information for the judgment. In an intriguing experiment, Schwarz and Clore (1983) demonstrated that participants rated the satisfaction with their lives to be higher in a good mood (e.g., on sunny days) than in a bad mood (e.g., on rainy days). This effect, however, disappeared when the influence of the weather on participants’ current mood was made salient to them. As a consequence, participants did no longer consider their current mood a valid source of information and used other information. Subsequent studies showed that current mood was not used for judgments about satisfaction
with specific life domains, presumably because participants considered their current mood as irrelevant (Schwarz, 1987). Because current mood does not appear to be a particularly relevant source of information for frequency judgments of specific emotions such as love, hate, joy, and fear, people should not use their current mood as information for these judgments. Therefore, the mood-congruent-memory model, but not the mood-as-information model, predicts an influence of current mood on frequency judgments of emotions.

To address this question empirically, individual differences in naturally occurring mood at the time of the frequency judgments were assessed in the two field studies. Naturally occurring mood, rather than a mood-induction procedure, was used for two reasons. First, an experimental mood manipulation may have distorted the results in the more important analyses of the accuracy of frequency judgments. Second, I believe that it is useful to start a scientific investigation with a demonstration of a phenomenon under natural conditions. If naturally occurring mood is unrelated to frequency judgments of emotions, an experimental investigation is at least of secondary importance to the present research question. This research strategy seems especially desirable in the light of a series of studies by Parrott and Sabini (1990), who did not find mood-congruent recall in naturalistic settings; indeed, the authors found mood-incongruent recall. In addition, mood effects in experimental studies are often quite small and inconsistent (Blaney, 1986; Brewin et al., 1993), suggesting that current mood leads only to small distortions in retrospective frequency estimates of emotions.

2.6.2 Personality-Congruent Biases

Martin’s (1985) notion of a personality-congruent memory bias suggests that a person’s personality might also influence frequency judgments of emotions. Specifically, participants might overestimate the frequency of personality-congruent emotions. In support of this hypothesis, Diener et al. (1984) found that neurotic individuals overestimated the amount of their unpleasant affect, whereas extravert individuals overestimated the amount of their pleasant affect (see also Feldman-Barrett, in press). In addition, Larsen (1992) found that neurotic individuals tended to overestimate the frequency of some physical symptoms.
Personality-congruent biases can be explained in several ways. First, personality-congruent memories are more accessible (Martin, 1985); therefore, at least the retrieval-based models would predict higher frequency judgments for personality-congruent emotions. Second, people may have generalized beliefs about their personality that can be based on various kinds of information, such as, for example, communication with others or abstractions from own experiences (Fiske & Taylor, 1984; Hastie & Park, 1986). For example, a person might think that he or she is a “jealous”, “choleric”, or “happy” person. People might rely on such generalized beliefs when they are asked to estimate the frequency of their emotional experiences (Feldman Barrett, in press; Zuroff, 1989). The use of generalized beliefs is, of course, just another judgment model of frequency judgments of emotions: Once an individual has determined the frequency of his or her emotions; for example by means of one of the other judgment strategies, he or she simply retrieves this prestored frequency information to make subsequent frequency judgments. As long as this information accurately reflects the actual frequencies of emotional experiences, this judgment strategy provides for a fast and efficient way to answer frequency judgments of emotions. However, if the formerly derived frequency judgments deviate from the actual frequencies of emotional experiences, reliance on this information leads to personality-congruent biases. With regard to frequency questions over limited time periods (e.g., “in the last month”) a third explanation is possible: People may have difficulties to distinguish between episodes that fall within and those that fall outside of the asked time period (cf., Schwarz, 1990). In this case, the frequency judgments would cover a longer time period than intended by the investigator’s question. Furthermore, personality tends to be a better predictor of emotional experiences the more they are aggregated over longer time periods (Epstein, 1983). As a consequence, personality explains additional variance in the frequency judgments of emotions that is not accounted for by the actual frequencies of emotions during the limited time period under investigation. Finally, personality-congruent memory biases could be a simple method artifact due to the fact that personality traits are measured by judgments that are very similar to frequency judgments of emotions.

In studies 1 and 2 the aim was simply to further explore whether a personality-congruent memory bias exists. If so, this could be the starting point for further analyses, differentiating between the different explanations described above. Note that a personality-congruent bias is consistent with the familiarity model, if it is due to the activation of memory traces of experiences outside of the time period under investigation. The bias should, however, account for much less variance than the actual frequencies of emotions, because the familiarity model assumes that memory traces can be activated for different contexts, including the time of experience (Hintzman & Block, 1971).

2.7 Summary of Hypotheses

The present dissertation has two main aims: (a) to test the accuracy of frequency judgments of emotions and (b) to explore the cognitive processes underlying these judgments.
With regard to the first question, the following predictions are made:

1. Participants should underestimate the absolute frequency of their emotional experiences. A related prediction is that people underestimate especially the frequencies of more frequent emotions. Both predictions are based primarily on the fact that these effects have been consistently obtained in the frequency judgment literature (cf. Thompson & Mingay, 1991; Williams & Durso, 1986). However, an explanation of this effect is lacking, because many frequency judgment models do not address the question how frequency information (e.g., a feeling of familiarity) is converted into absolute frequencies (cf. Brown, 1995).

2. Frequency judgments of emotions show discriminative accuracy across emotions because the familiarity signal reflects the number of stored memories, and therewith the number of experiences of an emotion, fairly accurately (Hintzman, 1988). High discriminative accuracy across stimuli has been reported for frequency judgments of other stimuli (Hasher & Zacks, 1984; Hintzman, 1988).

3. The discriminative accuracy across participants is expected to be moderate. This prediction is based on some earlier findings (e.g., Thomas & Diener, 1990). The correlations reported by Diener et al. (1995) and Feldman-Barrett (in press) in the range from r = .50 to .70 are predicted to overestimate the true discriminative accuracy across participants, because the estimates were made (a) on the same scale that was used during the diary period and (b) after participation in a diary study.

With regard to the cognitive processes underlying frequency judgments of emotions, the following predictions were made:

1. Participants should be able to judge accurately the frequencies of emotions in the first and in the second week of a diary study. The reason is that it is possible to activate memory traces of different time periods separately, so that the familiarity signal reflects predominantly the frequencies in a specified time period (Hintzman & Block, 1971).

2. Making some emotion concepts salient during the encoding process should increase the judged frequency of salient emotion concepts compared to non-salient ones􀁘that is, those emotion concepts that were not presented at the time of encoding, because salience leads to deeper encoding and less information loss (Greene, 1989; Hintzman, 1988).

3. The retrieval latency of an emotional episode from memory should be unable to account for the discriminative accuracy of frequency judgments across emotions. The basis for this prediction is that frequency judgments are assumed to be based on a sense of familiarity and that the familiarity signal reflects the actual frequencies more accurately than the ease-of-retrieval of exemplars (Watkins & LeCompte, 1991). This hypothesis allows for the possibility that frequency judgments of emotions and retrieval latencies of emotional episodes are negatively correlated (Fitzgerald, et al., 1988; MacLeod et al., 1994). It only predicts that this correlation is not strong enough to account for the discriminative accuracy across emotions.

4. The familiarity model does not predict a relation between the size and the speed of frequency judgments. In contrast, the retrieval-based models predict such a relation; the counting and the recall-estimate model predict a positive correlation (Brown, 1995), whereas the ease-of-retrieval model predicts a negative correlation.

5. Finally, it is predicted that the recall of a single emotional episode takes longer than the complete frequency judgment process. This prediction is again based on the assumption of the familiarity model that frequency judgments are not based on the retrieval of emotional experiences from memory. This prediction is in agreement with previous results that frequency judgments are faster than the retrieval of exemplars (Alba et al., 1980; Schimmack & Reisenzein, in press).

No explicit predictions were made concerning the influence of current mood or personality on frequency judgments of emotions because the familiarity model predicts such biases only under certain conditions. Mood-congruent effects could be due to a stronger activation of mood-congruent memory traces, leading to a stronger familiarity signal. Personality-congruent effects could be due to the activation of memory traces outside of the time frame of the question. However, it is predicted on account of the familiarity model that biases, if they exist, are relatively small compared to the amount of variance that is explained by the actual frequencies of emotions.

3 STUDY 1

Studies 1 and 2 used the pre-post design of Thomas and Diener (1990). In a pre-post design, participants first judge the frequency of emotions for a time period prior to the diary study. Then, they take part in a diary study which serves the purpose to obtain a measure of the actual frequencies of emotions. Finally, they judge the frequency of emotions during the diary period. The advantage of the pre-post design is that it allows to test salience effects; that is, whether the participation in the diary study influenced the post-diary judgments. The problem of the design is that pre-diary estimates necessarily cover a different time period than the one during which the actual frequencies of emotions are measured. Hence, changes in the true frequencies of emotions from the pre-diary period to the actual diary period can attenuate correlations between pre-diary estimates and the actual frequencies of emotions.

Study 1 served several goals: first, to test the accuracy of frequency judgments of emotions, using various measures of accuracy, and second, to test whether the salience of emotions at the time of encoding influences subsequent frequency judgments. Third, Study 1 tested the presence of mood- or personality-congruent biases in frequency judgments of emotions.
Fourth, the strengths and weaknesses of two different response formats were compared: (a) the participants made absolute estimates; that is, they estimated the absolute number of occurrences of an emotion (X times a week), and (b) they made vague quantifier ratings (Pepper, 1981; Wright, Gaskel, & O’Muircheartaigh, 1994); that is, they checked for a number of common frequency expressions (e.g., never, rarely, sometimes, often) which one most appropriately described how frequently they experienced an emotion. The use of both response formats appeared to be especially desirable because Schaeffer (1991) obtained different results for absolute estimates and vague quantifier ratings. In a survey study, respondents were first asked to rate the frequency of excitement and boredom in their lives by means of vague quantifiers. Then, they were asked to indicate which absolute frequency the chosen quantifier indicates. Black participants appeared to be more bored than white participants according to the vague quantifier ratings, but not according to the absolute estimates.

Two ancillary analyses were carried out. First, I explored the time period covered by frequency judgments of emotions. It is well known that memories increasingly decay over time (Hintzman, 1988). Therefore, frequency judgments of emotions should be influenced predominantly by more recent emotional experiences. Nevertheless, it remains to be discovered whether frequency judgments of emotions cover only experiences in the last few days or extend over much longer time periods. Second, the structure of individual differences in the frequencies of pleasant and unpleasant emotions was explored, which is an important topic in personality psychology (Bradburn, 1969; Diener, Smith et al., 1995; Green, Goldman, & Salovey, 1993; Meyer & Shack, 1989).

3.1 Method

3.1.1 Participants

One hundred and fifty students in a semester-long course on research in personality at the University of Illinois took part in this study. Four participants were excluded because of missing data. The final sample consisted of 107 female and 39 male participants. Although the topic of the validity of retrospective frequency judgments of emotions was discussed in this course, this happened only after the data collection relevant to this study had been completed.

3.1.2 Material and Procedure

3.1.2.1 Daily Estimates

At the core of the present study, participants estimated the absolute frequencies of 20 emotions (see Table 1) at the end of each day for 23 days. The first two days were used to practice the use of the questionnaire and were excluded from all analyses. Using a free response format, participants entered any number that seemed to be appropriate as an estimate of the absolute frequency with which they experienced an emotion on a particular day. Participants returned the forms the next day, except for the weekend forms which were due on Monday.

In studies 1 and 2, the averaged daily ratings are used as a standard of comparison for the long-term frequency judgments. Therefore, the averaged daily judgments are labeled actual frequencies, although the measure can be expected to give only an approximation of participants’ actual frequencies of emotional experiences. However, random- or event-sampling methods (see Schimmack & Diener, in press) would reflect the frequency of emotional experiences only if the number of daily measurement points were extremely high; which would probably overtax the motivation of the participants. Therefore, daily, or, as in Study 2, twice-daily, frequency estimates were considered the most appropriate measure of actual frequencies of emotions in everyday life.

3.1.2.2 Vague Quantifier Ratings

Before (pre-diary) and after (post-diary) the diary study, the participants made vague quantifier ratings concerning the frequencies of emotions during the last three weeks. The emotions were the 20 emotions included in the daily form and 9 additional ones. For the rating task, participants were provided with frequency expressions commonly used in everyday language, and they had to check the most appropriate one (i.e., In the last three weeks, I experienced joy [never], [very rarely], [rarely], [sometimes], [often], [very often], [extremely often]). For statistical analyses, the vague quantifier ratings were later converted to numbers from 0=never to 6=extremely often.

3.1.2.3 Absolute Estimates

The participants also made pre- and post-diary absolute estimates of the frequency of emotions experienced during the last three weeks. These judgments were made after the vague quantifier ratings. This order was chosen because participants might have relied on their absolute estimates to make the vague quantifier ratings, whereas a transfer effect in the other direction seemed to be less likely. The emotions were the same as in the vague quantifier questionnaire. Also, the item sequence was the same for vague quantifier ratings and absolute estimates. The absolute estimates were made using a free response format; that is, the participants wrote down any number that deemed to be appropriate. Although the estimates were required to cover the last three weeks, the questionnaire asked to estimate weekly frequencies (i.e., “In the last three weeks, I experience joy _ times a week”). A weekly time frame was used for the following reasons. A daily time frame ( times a day) seemed problematic. First, a daily time frame might have especially encouraged the participants to memorize their daily estimates to make the post-diary estimates. Second, a daily time frame does not allow to discriminate frequencies of very rare emotions (e.g., envy, hate), which on many days are not experienced at all; therefore the modal response of these emotions would be zero. A three-week time frame (_ times in the last three weeks) was not used because a weekly time frame has the advantage that it can be used for different time periods, ranging from one week (“In the last week, I experienced joy _ times a week.”) to people’s frequency of emotional experiences in general (“In general, I experience joy __ times a week”).

3.1.2.4 Mood Questionnaire

After completing the two frequency judgment tasks, the participants rated their current mood on the ELMI (Everyday Language Mood Inventory; Schimmack, 1996a) which is an English adaptation of the BASTI (Berliner Alltagssprachliche Stimmungsinventar; Schimmack, in press). The ELMI measures 10 specific mood dimensions, namely indifference, sentimentality, depression, grouchiness, irritation, anxiety, nervousness, euphoria, cheerfulness, and relaxation, and three global mood dimensions, pleasure-displeasure, aroused-calm, and wakeful-tired, with two items each. Ratings were made on an intensity scale ranging from 0=not at all to 6=extremely intense.

3.1.2.5 Personality Questionnaire

The personality dimensions of neuroticism and extraversion were measured by means of the NEO-PI-R (Costa & McCrae, 1992). Each trait is measured by six subscales, and each subscale comprises 8 items. Therefore, the NEO-PI-R provides for a reliable and broad assessment of these two personality dimensions.

3.2 Results

3.2.1 Absolute Accuracy

The absolute and relative accuracy of frequency judgments of emotions can only be tested for the absolute estimates, because vague quantifiers do not correspond to a fixed absolute frequency (Pepper, 1981; Wright et al., 1994). To test the absolute accuracy, each participant’s standard deviation (i.e., the square root of the squared differences between estimated and actual frequencies) from the participant’s actual frequency was computed for the 20 pre- and the 20 post-diary estimates. A comparison of the standard deviations, indicated that absolute accuracy increased from pre-diary (mean SD = 11.88) to the post-diary estimates (mean SD = 9.09), t(145) = 5.95, p < .01.

This finding suggests that the participation in the diary study increased the accuracy of the estimates. One problem in the interpretation of this finding is, however, that the post-diary estimates, but not the pre-diary estimates, cover the time period of the diary study. Therefore, the present finding might also be due to the fact that the actual emotion frequencies changed from the pre-diary period to the diary period.

The previous analysis measured absolute accuracy at the individual level. It is, however, also possible to compare the absolute accuracy of pre- and post-diary estimates at the group level. To this aim, the frequency judgments of each emotion were first averaged across participants. Then the standard deviations of the averaged pre- and post-diary estimates from the averaged actual frequencies were compared. This analysis also shows an improvement in absolute accuracy from pre-diary (mean SD = 7.54) to post-diary (mean SD = 3.60) estimates, t(19) = 4.46, p < .01.

This finding is stronger evidence for an improvement in absolute accuracy, because it is less likely that the average frequency of an emotion changed from the pre-diary to the diary period. That is, it is less likely that all participants experienced less anger or more joy in one of the two weeks. In sum, the analyses suggest that the participation in the diary study improved the absolute accuracy of the judgments.

3.2.2 Relative Accuracy

Table 1 shows the weekly frequencies of the 20 emotions, as derived from the daily frequency estimates, which were added up and divided by 3, and as estimated before and after the diary study. For all 20 emotions, the actual frequencies were higher than estimated frequencies. The other notable finding that all pre-diary estimates were lower than the post-diary estimates is discussed later. In some cases the absolute differences were quite dramatic: For example, participants experienced contentment on average 28 times a week, but estimated to do so only 7 times in their pre-diary estimates.

As a measure of relative accuracy, each participant’s actual frequencies were subtracted from his or her estimated frequencies of emotions. Underestimation was more severe for the pre-diary estimates (mean d = -7.54) than for the post-diary estimates (mean d = -.3.59), t(145) = 7.09, p < .01. Before the diary study only 6 (of 146) participants revealed overestimation, whereas after the diary study 23 participants overestimated their frequencies of emotions.

The analysis at the group level provided the same results as the previous analysis of the absolute accuracy because the averaged estimates always underestimated the averaged actual frequencies (absolute and relative accuracy differ only when both over- and underestimation occurs). In sum, the analyses provide clear support for the first hypothesis that people in general underestimate the frequency of their emotional experiences.

Figure 3 shows the means of Table 1 to test the second part of hypothesis 1, that people underestimate higher frequencies more than lower frequencies (cf. Watkins & LeCompte, 1991). Clearly, underestimation increases with higher actual frequencies. In addition, it can be seen that the frequency estimates follow a linear trend. This finding is in agreement with results reported by Watkins and LeCompte (1991). To demonstrate higher underestimation for higher actual frequencies quantitatively, the relative accuracy score was correlated with the actual frequencies. This analysis produced, as predicted, negative correlations for both pre- and post-diary estimates: rs = -.99 and -.86 (ps < .01), respectively. Underestimation was more severe for higher actual frequencies. In addition, the relative accuracy scores of pre- and post-diary estimates were highly correlated with each other, r = .88, p < .01. In sum, analyses of the relative accuracy of frequency judgments of emotions revealed that (a) people underestimate the frequency of their emotions and (b) that they do so increasingly with increasing actual frequencies of emotions. This finding is consistent with experimental studies (cf. Watkins & LeCompte, 1991; Williams & Durso, 1986), and estimates of daily activities (Mingay, Shevell, Bradburn, & Ramirez, 1994). Furthermore, underestimation was more pronounced for pre- than for post-diary estimates. Because it is unlikely that the actual frequencies of all emotions increased from the pre-diary to the diary period, this finding can be interpreted as evidence that the relative accuracy of the estimates increased due to the participation in the diary studies.

3.2.3 Discriminative Accuracy across Emotions

Discriminative accuracy across emotions can be assessed at the group level as well as at the individual level. For the analysis at the group level, one simply has to compute the correlation between the actual frequencies and the estimated frequencies across the 20 emotions included in the daily report form (Table 1). The discriminative accuracy across emotions was very high for pre- and post-diary estimates, rs = .96 and .98, respectively, both ps < .01; the pre- and post-diary estimates were also highly correlated with each other, r = .96, p < .01. The same analyses were performed for the vague quantifier ratings. The correlations with the actual frequencies as well as with the absolute estimates were very high (all rs > .90, all ps < .01).

To test discriminative accuracy across emotions at the level of each participant, the correlations between actual frequencies and the four frequency judgments (pre- and post-diary absolute estimates and vague quantifier ratings) were computed for each individual. Subsequently, the correlation coefficients4 were used as dependent variables in a 2 x 2 ANOVA with the within-subject factors response format (absolute estimates vs. vague quantifier ratings) and time of judgment (pre-diary vs. post-diary). This analysis revealed that the absolute estimates produced higher correlations (mean r = .79) than the vague quantifier ratings (mean r = .73), F(1,145) = 61.54, p < .01. Furthermore, post-diary judgments were more highly correlated (mean r = .85) with actual frequencies than pre-diary judgments (mean r = .67), F(1,145) = 415.51, p < .01. The interaction was not significant, F(1,145) = 0.02.

The higher correlations obtained for post-diary estimates suggests that participation in the diary study also increased the discriminative accuracy across emotions. However, the effect could also be due to changes in the true frequencies of emotions from the pre-diary to the diary period. The fact that vague quantifier ratings possessed less discriminative accuracy across emotions in the analysis at the individual level could be due to the limited number of response categories which restricts the possibility to discriminate between emotions with similar frequencies.

In sum, the results support hypothesis 2 that frequency judgments of emotions possess discriminative accuracy across emotions􀁘that is, people are sensitive to the different frequencies with which they experience different emotions. This finding is consistent with studies of frequency judgment in other domains which also show that people are sensitive to variation in the frequencies of different stimuli (Hasher & Zacks, 1984; Hintzman, 1988).

3.2.4 Discriminative Accuracy across Participants

To test the discriminative accuracy across participants the pre- and post-diary frequency judgments of both response formats were correlated with the actual frequencies across participants, separately for each of

the 20 emotions (Table 2). Table 2 also shows the test-retest correlations of the absolute estimates and the weak quantifier ratings. In the last column of Table 2, the internal consistency of the daily estimates across the 21 days of the diary period is reported as a measure of the stability of individual differences in the frequency of emotional experiences during the diary period (Diener & Larsen, 1984).

As can be seen in Table 2, nearly all correlations between frequency estimates and actual frequencies of emotions were significant. Nonsignificant correlations were obtained only for pre-diary absolute estimates. These results support hypothesis 3 that frequency judgments of emotions possess discriminative accuracy across participants. However, the correlations in Table 2 vary considerably, ranging from r = -.03 to .78. A 2 x 2 ANOVA with the within-subject factors response format (absolute estimates vs. vague quantifier ratings) and time of judgment (pre- vs. post-diary judgments) was used to test whether these factors influence the discriminative accuracy across participants.

This analysis revealed significant main effects for response format, F(1,19) = 8.72, p < .01, and for time of estimate, F(1,19) = 148.54, p < .01. In addition, the interaction was also significant, F(1,19) = 12.93, p < .01. Follow up analyses of the mean correlations5 indicated that the post-diary correlations were higher than the pre-diary correlations (Figure 4). In addition, the significant interaction is due to the fact that the pre-diary vague quantifier ratings produced higher correlations than the pre-diary absolute estimates, whereas both response formats produced equally high correlations when the judgments were made after the diary period.

Again, the finding that post-diary estimates possess higher discriminative accuracy across participants can be due to two, not mutually exclusive factors. First, rating the frequency of emotions on a daily basis might make emotional experiences more salient, leading to more accurate judgments. Second, individual differences in the actual frequency of emotional experiences may have changed from the pre-diary period to the actual diary period. It is important to distinguish between these two explanations, because the first explanation implies that the post-diary correlations overestimate the true discriminative accuracy of frequency judgments of emotions, whereas the second explanation implies that the pre-diary correlations underestimate the discriminative accuracy. Additional analyses were carried out to test the viability of the two accounts in greater detail.

The last column in Table 2 shows that the individual differences in the frequencies of emotions were highly stable over the three-week diary period. On the basis of this finding, a fairly high stability of individual differences in the frequencies of emotions can also be expected from the three weeks prior to the diary study to the three weeks of the diary period. If so, the higher correlations obtained for the post-diary judgments were at least partly due to the participation in the diary study. Table 2, however, also shows that emotions differ in their stability over time. For example, the frequency of affection is more stable (alpha = .93) than the frequency of feeling hurt (alpha = .71). If the correlation between pre-diary estimates and actual frequencies is attenuated by changes in the true frequencies of emotions, emotions with more variable frequencies over time (e.g., hurt) should be more affected than emotions with very stable frequencies over time (e.g., affection). To test this hypothesis, the pre-diary correlations (column 1 and 3 in Table 2) were correlated with the stability coefficient (i.e., alpha in the last column in Table 2) across the 20 emotions. Both correlations indicate that the discriminative accuracy of pre-diary estimates increased with the stability of individual differences in the frequency of an emotion (absolute estimates r = .44, p = .05; vague quantifier ratings r = .54, p < .05), although the correlation for the absolute estimates was only marginally significant. This finding suggests that the temporal stability of an emotion influenced the size of the correlations between pre-diary estimates and actual frequencies during the diary study. Therefore, these correlations tend to underestimate the discriminative accuracy of frequency judgments of emotions. In sum, the analyses show empirically that the true discriminative accuracy is higher than the correlation obtained for pre-diary estimates and lower than the correlation obtained for post-diary estimates. Therefore, a point-estimate of the true discriminative accuracy across participants is not possible, but it is on average in a rage from r = .30 to .60. This finding suggests that the discriminative accuracy across participants has been overestimated in previous studies which used only post-diary judgments (Feldman Barrett, in press). The fact that the post-diary estimates in the present study are still lower than in previous studies can be attributed to the use of singe-item measures in the present study, whereas previous studies used multiple-item measures which are bound to have a higher reliability.

In a further set of analyses the specificity of the frequency judgments was explored; that is, whether individual differences in the judged frequency of an emotion are more highly correlated with individual differences in the actual frequencies of the same emotion than with those of other emotions (cf. Diener, Smith et al., 1995). The actual frequencies of each emotion was correlated with the frequency judgments of the remaining 19 emotions and the highest correlation was recorded (see Appendix 1). Subsequently, this correlation was compared to the correlation with the frequency judgment of the same emotion (Table 2). Specificity was established, if the correlation with the judgments of the same emotion exceeded the highest correlation with judgments of another emotion. These analyses were carried out for all four frequency judgments (pre- and post-diary absolute estimates and vague quantifier ratings). The strongest evidence for specificity was obtained for the post-diary weak quantifier ratings: Estimates for all 20 emotions revealed specificity. For the other judgments, specificity existed for frequency judgments of 18 (post-diary absolute estimates), 17 (pre-diary vague quantifier ratings) and 14 emotions (pre-diary absolute estimates). Even 14 cases of specificity are much more than what would be expected by chance; expected = 1, χ2(N = 20) = 177.89, p < .01. These results show that the participants clearly used information about specific emotions. This finding eliminates a simple response set explanation of discriminative accuracy across participants (cf. Diener, Smith et al., 1995). Furthermore, the results suggest that frequency judgments are not based on generalized beliefs, unless one assumes that participants have different beliefs for each of the 20 emotions.

In sum, frequency judgments of emotions were found to (a) possess moderate discriminative accuracy across participants (b) and to show remarkable specificity for each emotion. With regard to the two response formats, the vague quantifier ratings yielded higher correlations and more specificity than the absolute estimates, despite the use of absolute estimates on the daily report form to measure actual frequencies.

3.2.5 The Influence of Daily Ratings on Frequency Judgments of Emotions

Daily ratings of emotions during the diary study might make these emotions salient. According to hypothesis 5, this should increase the absolute level of the frequency estimates of these emotions. To test this prediction, 9 emotions were included in the pre- and post-diary questionnaires that had not been on the daily rating form. Furthermore, these emotions were selected to be related to one of the emotions on the daily form (not daily form-daily form: happiness-joy, love-affection, fear-anxiety, rage-anger, dislike-contempt, regret-guilt, shame-embarrassment, depression-sadness, helplessness-hopelessness).

The previous analysis of relative accuracy already demonstrated that people underestimated actual frequencies less in the post-diary judgments than in the pre-diary judgments. This effect implies that the frequency judgments increased from pre- to post-diary ratings (see Figure 3). If, however, the daily ratings increased especially the salience of those emotions on the daily report form, the increase should be stronger for those emotions on the report form than for their counterparts that were not on the form. In other words, salient emotions should reveal a higher increase from pre- to post-diary estimates than non-salient emotions.

To test this prediction, repeated measure ANOVAs were carried out with the within-subject factors time (pre- vs. post-diary), salience (on the form vs. not on the form) and type of emotion (9 pairs of emotions). The first analysis was based on the absolute estimates and the second analysis on the vague quantifier ratings. The ANOVA revealed significant effects for all main effects and interactions (Table 3). However, not all of the effects are theoretically important. For example, the strong effect6 for the salience x emotion interaction simply shows that the frequencies of emotions were not equivalent across and within the 9 emotion-pairs.

The most important finding is the predicted time x salience interaction was significant. Furthermore, Figure 5 shows that the interaction is due to the predicted stronger increase from pre- to post-diary frequency estimates of salient emotions.

However, the significant three-way interaction indicates that this effect differed across emotion pairs. Table 4 shows the pre- and post-diary absolute estimates for all 9 emotion pairs. Inspection of the data shows that the increase over time was replicated for all emotions, but three emotion pairs did not show the expected stronger increase for the salient emotion, namely joy-happiness, contempt-dislike, and hopelessness-helplessness.

Visual inspection of the effects suggests that frequent emotions showed a stronger increase in the frequency estimates from pre- to post-diary estimates􀁘a hypothesis that is also suggested by the regression lines in Figure 3. To explore this hypothesis more thoroughly, I went back to the data in Table 2 and correlated the actual emotion frequencies with a change score, subtracting pre-diary from post-diary absolute estimates. The correlation proved to be highly significant, r(20) = .96, p < .01, indicating that more frequent emotions show a higher increase in the estimated absolute frequency from pre- to post-diary judgments. This finding is most likely due to the stronger underestimation of these emotions in the pre-diary estimates. Therefore, frequency judgments of more frequent emotions benefit in particular from making them salient.

Figure 6 shows the means of the pre- and post-diary vague quantifier ratings of salient and non-salient emotions. An unexpected finding was that the ratings of both salient and non-salient emotions decreased from pre- to post-diary judgments. This is exactly the opposite of what was expected. Furthermore, this effect occurred although the same participants had just made the absolute estimates which showed the expected increase. This finding is strong supports for the hypothesis that vague quantifiers do not correspond in a one-to-one fashion to absolute frequencies (Pepper, 1981; Schaeffer, 1991; Wright et al., 1994). However, Figure 6 also shows that the significant time x salience interaction is due to a smaller decrease for the salient than the non-salient emotions. This finding is consistent with the predicted influence of salience: Given that vague quantifier ratings decrease over repeated assessments, they do so less for emotions which were made salient.

Table 5 shows the results for each emotion pair. All except three emotions showed the unexpected decrease from pre- to post-diary judgments. Next it was explored whether frequent emotions showed a smaller decrease than less frequent emotions, which would be equivalent to the stronger increase obtained for absolute estimates. Again, this hypothesis was tested by means of the data reported in Table 2. The actual emotion frequencies were correlated with a change score, subtracting pre-diary from post-diary vague quantifier ratings. As to the absolute frequencies, a significant positive correlation was obtained, r(20) = .79, p < .01. In addition, the change scores of absolute estimates and vague quantifier ratings were significantly correlated, r(20) = .75, p < .01.

This finding suggests that absolute estimates and vague quantifier ratings also responded in the same way to the participation in the diary study. Frequent emotions show a higher increase for absolute estimates and a smaller decrease for vague quantifier ratings. In sum, the salience manipulation had the expected effect on both types of frequency judgments. However, for the vague quantifier ratings the expected salience effect was overshadowed by the unexpected and counterintuitive finding that vague quantifier ratings decreased from pre- to post-diary judgments.

A search in the psychological literature uncovered that this finding could have been predicted on the basis of earlier findings. As early as 1954, Windle conducted a meta-analysis and reported a decreasing mean in test-retest comparisons of social-adjustment measures. Although this effect has many practical implications, very little research tried to illuminate its causal mechanisms (see Knowles, Coker, Scott, Cook, & Neville, 1996). Recently, Knowles et al. (1996) suggested that the mean change is due to a meaning change of the items from test to retest. That is, participants better learn the common theme of the items in a questionnaire, which changes the meaning of single items. For example, participants might first think of all episodes of crying when they answer an item such as “I cry easily.” However, after learning that the questionnaire is about anxiety, the item is understood in this context and certain episodes of crying (e.g., crying for joy) are discounted, leading to the choice of lower response categories. Although this explanation might explain decreasing means in questionnaires which assess a single construct, it can hardly explain the findings in the present study. First, the items were not intended to measure a common construct, and it is unlikely that the participants falsely detected such a common theme. Second, meaning changes should have influenced the absolute estimates in the same way, but these estimates increased.

A different explanation could be Parducci’s range-frequency principle. Parducci (1968) demonstrated that people’s assignment of numbers between 100 and 1000 to vague quantifiers such as very small, small, large, very large was context dependent. 550 was high if most numbers fell in the range from 100 to 550, but low if most numbers fell in the range from 550 to 1000. In other words, the meaning of a response category depends on the distribution of the stimuli that have to be assigned to the response categories. Parducci demonstrated in several experiments with various types of stimuli that the assignment function is a compromise between a range and a frequency principle. The frequency principle implies that people try to accommodate an equal number of stimuli (i.e., in the present context a stimulus is the frequency of an emotion) in each category; that is, the same number of emotions should be in the “not at all”, “rarely,” or “often” category. Clearly, the frequency principle ignores the actual distribution of the stimuli; rather it forces the data into a uniform distribution. The range principle is most easily understood by its mathematical formula: Ric = (Si – Smin) / (Smax – Smin), where S is the actual scale value. As the actual scale represents frequencies, it is reasonable to assume that Smin equals zero; hence, Ric = Si/ / Smax). Although it is not clear which frequency corresponds to Smax, probably the number of all emotional experiences during a specified time period, it is clear that the range principle preserves the distribution of the stimuli, as long as the respondent has a sufficient number of response alternatives (Parducci & Wedell, 1986). Previous studies showed that in the final assignment of an item to a category, the two principles are weighted equally: Jic = wRic + (1-w)Fic with w = .50 (Parducci 1968). More recently, Parducci and Wedell (1986) demonstrated that the weight of the two principles is context dependent. For example, a higher number of response categories decreased the influence of the frequency principle.

The range-frequency principle would predict the observed mean change from pre- to post-diary vague quantifier ratings under certain conditions, namely, (a) if the distribution of the emotion frequencies is positively skewed; in this case the frequency-principle leads to the assignment of small frequencies to medium response categories, and (b) if people weight the frequency principle less during the post-diary judgments􀁘in this case the small frequencies are assigned to low categories. The major problem with this explanation is that it remains unclear why participants would shift the weights of the two principles. This problem is closely related to Haubensak’s (1994) criticism of range-frequency theory: It is descriptive but not explanatory; that is, a combination of the range and the frequency principle can predict outcomes of experiments, but this does not illuminate the underlying processes of the effect. To overcome this limitation of range-frequency theory, Haubensak (1994) developed a consistency model which might explain the finding in the present study that vague quantifier ratings decrease from pre- to post-diary judgments. According to the consistency model, respondents prefer to start the rating task with medium rating categories. If the distribution of stimuli is positively skewed, this implies that small frequencies are often assigned to medium rather than small categories. Furthermore, the first ratings influence all subsequent ratings because (a) the initial assignments of frequencies to response categories remain a standard for the complete task, and (b) the participants want to be consistent with their initial standard. Therefore, the tendency to assign small frequencies to moderate categories prevails throughout the task. This model could explain the decreasing mean of vague quantifier ratings: The second time participants have a better sense of the distribution of emotion frequencies, and they do no longer try to be consistent with the standard of the first task, which might have been forgotten anyway. To test the viability of this post-hoc explanation, additional analyses were carried out.

As noted above, a basic assumption of this explanation is that the distribution of the actual frequencies of the emotions is positively skewed. As can be seen in Figure 3 this is indeed the case, which can also be shown quantitatively (skewness = 1.04). Similarly, the pre- and post-diary absolute estimates show a similar skewness (pre-diary 1.00; post-diary 1.15). The prediction of the consistency model that the skewness of the vague quantifier ratings is reduced was also confirmed (pre-diary skewness = 0.24). The additional assumption made to explain the decreasing mean is that participants became more sensitive to the actual distribution of the stimuli so that the distribution of the vague quantifier ratings should be more similar to the actual distribution of the stimuli (this is equivalent to a decreasing influence of the frequency principle in range-frequency theory). This is also the case (post-diary skewness = 0.41). In sum, analyses of the distribution of the actual frequencies and the vague quantifier ratings are in agreement with Huabensak’s (1994) consistency model. Problems with the initial choice of rating categories lead to distorted assignments of vague quantifiers to frequencies. This problem persists within the same questionnaire because people want to be consistent. However, experience with the set of stimuli and the lack of a need (or ability) to be consistency from one measurement point to the other allows participants to improve their ratings. Because the finding was unexpected and the explanation is post-hoc, it was further explored in Study 2.

3.2.5 Exploration of Mood- and Personality-Congruent Biases

Personality- and mood-congruent biases were investigated simultaneously because extraversion and neuroticism are often correlated with current mood (Matthews, Jones, & Chamberlain, 1990; Schimmack, in press; Steyer, Schwenkmezger, Notz, & Eid, 1994). Current mood was measured with the 10 specific mood scales and the global pleasure-displeasure and aroused-unaroused dimensions of the ELMI (Schimmack, in press, 1996a).

To reduce the number of variables, a factor analysis of the 10 specific mood scales was carried out and the factor scores of the first two unrotated factors were retained. The obtained factors were very similar for the pre- and post-diary administration of the ELMI; therefore, only the post-diary factor analysis is reported in detail. The first factor was a displeasure-pleasure factor: The specific mood scales Depression, Grouchiness, and Irritation had high positive loadings on this factor, whereas the scales Good Humor and Relaxation had high negative loadings. The second factor was an arousal factor; the scales Nervousness, Anxiety, and Euphoria had high positive loadings on this factor. This interpretation of the factors was also supported by the simple correlations between the two factors and the directly measured pleasure and arousal dimensions. The first factor correlated highly negatively with the pleasure dimension (r = -.84, p < .01), and slightly with the arousal dimension (r = -.18, p < .05). The second factor correlated mainly with the arousal dimension (r = .45, p < .01), and slightly with the pleasure dimension (r = .18, p < .05).

In the following analysis, the factor scores were used, because they are based on a greater number of items than the direct measures of pleasure and arousal. To facilitate the interpretation of results, the factor scores of the first factor were inverted so that higher values indicate more pleasure. Extraversion and neuroticism were measured by the respective scales of the NEO-PI-R. To reduce the number of analyses, the frequency estimates of pleasant and those of unpleasant emotions were averaged (analyses for each emotion separately are included in Appendix 2).

In the first set of analyses, the post-diary frequency judgments were regressed simultaneously onto the actual frequencies, the mood and the personality variables to control for the intercorrelations between the predictor variables. Table 6 shows that for all analyses the daily averages were the strongest predictor of frequency estimates, indicating that the frequency judgments primarily reflect individual differences in the actual frequencies of experienced emotions. The personality and mood variables, however, were also related to the frequency judgments.

The absolute estimates showed a consistent bias for extraversion: Extraverted individuals estimated their pleasant and unpleasant emotions to occur more frequently than introverted individuals. Because extraversion is not generally assumed to be congruent with frequent experience of unpleasant emotions, this result does not indicate a personality-congruent effect. In contrast, the vague quantifier ratings showed a personality-congruent bias: Extraversion was a significant predictor of frequency estimates of pleasant emotions even after controlling for actual frequencies of emotions, and neuroticism predicted a bias in the frequency estimates of unpleasant emotions.

A mood-congruent effect was obtained in that a pleasant mood predicted lower vague quantifier ratings of unpleasant emotions, but not higher ratings of pleasant mood.
In a second analysis, the simple correlations between the personality and mood variables with the post-diary judgments were compared to the correlations with the pre-diary judgments. Conceivably, participation in the diary study could attenuates personality or mood biases, because the participants are more aware of their emotional experiences. If this is true, the simple correlations between personality and mood measures and frequency judgments should be higher for the pre- than for the post-diary judgments. However, Table 7 provides little support for this hypothesis. Only the absolute level of the correlation between current pleasure and vague quantifier ratings of pleasant and unpleasant emotions was higher for pre- than for post-diary estimates.

In sum, the results are mixed; only vague quantifier ratings showed a consistent personality-congruent bias (see also Feldman Barrett, in press). This finding could be a method artifact because the measurement of extraversion and neuroticism in the NEO-PI-R is partly based on items that include vague quantifier (e.g., “I rarely feel depressed”). Therefore, individual differences in the interpretation of vague quantifiers could explain the finding that the personality measure explained additional variance on top of the actual frequencies, which are based on absolute estimates. Finally, it should be noted that the actual frequencies were by far the strongest predictor of the frequency judgments of emotions. This shows that frequency judgments are mainly based on the actual frequencies of emotions in the past, and that biases play only a minor rule.

3.3 Additional Analyses

3.3.1 The Time Extension of Frequency Judgments of Emotions

People’s frequency judgments of emotions might only reflect the frequencies of emotions in the most recent past, or they may extend over longer time periods. To address this question empirically, the daily frequency estimates were averaged separately for the first, second, and third week of the diary period. Then, the post-diary estimates were regressed onto the three weekly averages in hierarchical regression analyses. In one set of analyses, the third week was entered first, followed by the second and first week, whereas in a second set of analyses, the predictors were entered in the reverse order. If information about more remote time periods; that is, the first week, is weighted less heavily in the frequency judgments, then entering the first week as the last predictor should explain less additional variance than entering the third week as the last predictor. Figure 7 shows the averaged incremental amount of explained variance for the separate analyses of the 20 emotions (see Appendix 3 for the results of each emotion).

A comparison of the increment in explained variance for the two orders in which the predictor variables were entered revealed that in the first step more variance was explained by week 3 than by week 1, t(19) = 2.24, p < .05. Week 2, entered always in the second step, explained more variance when it was entered after week 1 rather than after week 3, t(19) = 3.49, p < .01. There was no significant difference in the amount of explained variance for step 3, t(19) = 0.11, p = .91. This pattern of results indicates a slight recency effect in the vague quantifier ratings. However, week 1 and week 2 still explain 3% additional variance when they were entered after week 3. Therefore, vague quantifier ratings reflect the emotional experiences over the whole three weeks of the diary study (in individual analyses, an increment of 3% explained variance was significant).

The same analyses were performed for the absolute estimates. For these judgments, very different results were obtained (Figure 8): When week 3 was entered first, adding the second and first week hardly increased the amount of explained variance (2% and 1% respectively). In contrast, when week 1 was entered first, week 2 and 3 still explained a considerable amount of additional variance. The differences between the two orders of entry in amount of explained variance for all three steps were highly significant, all ts(19) > 5.00, ps < .01. This pattern of results reveals a strong recency effect for the absolute estimates.

The differences between vague quantifier ratings and absolute estimates can also be shown quantitatively. The third week entered in Step 1 explained more variance for the absolute estimates than for the vague quantifier ratings, t(19) = 2.14, p < .05. In contrast, week 1 entered in Step 3 explained more additional variance for the vague quantifier ratings than for the absolute estimates, t(19) = 5.84, p < .01. There were no differences for the second week entered in step 2, t(19) = 1.72, p = .10. In sum, both response formats show a recency effect; that is, frequency judgments are biased toward the frequencies in the more recent past. However, this effect is more pronounced for the absolute estimates than for the vague quantifier ratings.

The stronger recency effect for absolute estimates might be due to the use of absolute estimates to assess the actual frequencies during the diary study. Therefore, participants might have been influenced by recollections of their last daily absolute estimates when they made the absolute estimates, but not when they made the vague quantifier ratings. This could also explain, why the absolute estimates were much less stable than the vague quantifier ratings, from the pre- to the post-diary judgments (Table 2). Nevertheless both response formats achieve an equally good prediction of emotional frequencies in the last three weeks (see Figure 4), but they do so differently: Whereas the absolute estimates better capture the frequencies in the most recent past, the vague quantifier ratings more accurately reflect frequencies in the remoter past.

3.3.2 Interrelations between the Frequencies of Pleasant and Unpleasant Emotions

The relation between the frequencies of pleasant and unpleasant emotions is an important topic in personality psychology. Previous researchers found the frequencies of pleasant and unpleasant affects to be independent or negatively correlated (Bradburn, 1969; Diener, Smith et al., 1995; Green et al., 1993; Watson, Clark, & Tellegen, 1988). However, these studies relied mostly on retrospective frequency estimates (for an exception see Diener, Smith et al., 1995). More importantly, the studies exclusively used vague quantifiers to assess the frequency of emotions. Because the present study already revealed several differences between vague quantifier ratings and absolute estimates, it seemed worthwhile to explore whether the response format also influenced the relation between frequency estimates of pleasant and unpleasant emotions. This is indeed the case as can be seen in Table 8: The absolute response format produced positive correlations, whereas the vague quantifier ratings produced negative correlations. These conflicting correlations indicate that one response format produces misleading results due to a method artifact.

Green et al. (1993) have argued that the experience of pleasant and unpleasant moods is highly negatively correlated, but that this negative correlation is often obscured by random and systematic measurement errors. On first sight, this argument would suggest that the positive correlation obtained for the absolute estimates is an artifact, for example due to an extremity bias. Furthermore, the low negative correlations obtained for the vague quantifier ratings could be attenuated by random measurement error. However, this interpretation of the data does not recognize the important distinction between mood and emotion. Green et al. (1993) asked their respondents’ how much pleasant or unpleasant mood they experienced in the last month. Assuming that a person is most of the time in a mood state (cf. Ekman & Davidson, 1994, chapter 2); that is, feels either pleasant or unpleasant, and that pleasant and unpleasant affects are rarely experienced at the same moment in time (Diener & Iran-Nejad, 1986; Green et al., 1993; Schimmack, in press; Steyer et al., 1994), it follows that the amount of pleasant mood must be negatively correlated with the amount of unpleasant mood experienced in the last month. This logical necessity, however, does not apply to the relation between the frequencies of pleasant and unpleasant emotions (see Figure 1), because emotions are not elicited and experienced all the time. Therefore the number of times that pleasant emotions are elicited can vary independently from the number of times that unpleasant emotions are elicited. Subsequently, I want to argue that the empirical relation between frequencies of pleasant and unpleasant emotional experiences is positive; that is, that some individuals experience more pleasant and unpleasant emotions than others, and that the negative correlation obtained for the vague quantifier ratings is an artifact.

A major problem of vague quantifier ratings is that it is unknown how the participants’ use the vague quantifiers for their judgments. One possibility could be that participants use vague quantifiers to indicate ranges of absolute frequencies; that is, rarely might mean 2-5 times a week. If this were true, however, vague quantifier ratings should show a similar pattern of results to the absolute estimates. The previous findings contradict this hypothesis. Another possibility could be that the participants used vague quantifiers to describe ranges of percentages (cf. Reisenzein, 1995). For example, experiencing an emotion often might mean in 80 to 90% of all emotional experiences. This, however, would produce a method artifact in the analysis of individual differences in the frequencies of experienced emotions, because percentages eliminate such individual differences. For example, one person might have only 10 emotional experiences a week, of which 8 elicited happiness and 2 elicited sadness. Another person might have 100 emotional experiences a week of which 80 elicited happiness and 20 sadness. Both respondents might say that they experience happiness often, meaning in 80-90% of their emotional experiences and sadness rarely, meaning in 10-20% of their emotional experiences. But the second person clearly experienced both emotions more frequently than the first person. Furthermore, because pleasant and unpleasant emotions co-occur very infrequently during a single emotional episode (Reisenzein, 1995; Schimmack & Reisenzein, in press), percentages, in contrast to absolute frequencies, of pleasant and unpleasant emotion frequencies are bound to be negatively correlated across participants.

The hypothesis that participants use vague quantifiers to indicate percentages makes the prediction that vague quantifier ratings of, for example, pleasant emotions reflect not only the actual frequencies of pleasant emotions, but also the actual frequencies of unpleasant emotions, although in the opposite direction; that is, higher frequency judgments are obtained for lower actual frequencies of emotions of the opposite valence. This follows from the fact that percentages take all emotional experiences into account (i.e. rating of pleasure = actual pleasure / (actual pleasure + actual displeasure). To test this hypothesis, the vague quantifier ratings of pleasant and unpleasant emotions were regressed onto the actual frequencies of pleasant and unpleasant emotions. Table 8 shows the predicted pattern that vague quantifier ratings of pleasant (unpleasant) emotions are positively related to the actual frequencies of pleasant (unpleasant) emotions, but also negatively related to the actual frequencies of unpleasant (pleasant) emotions.

However, the correlations across types of affects are not close to -1, which is what one would expect if vague quantifiers were pure measures of percentages. It is therefore conceivable that they reflect partly absolute frequencies and partly percentages of emotional experiences. If the vague quantifier ratings also reflect individual differences in the average actual frequencies of all emotions, the sum of all weak quantifier ratings should be correlated with the sum of all absolute estimates. Indeed, the correlations are rs = .40 and .45, ps < .01, for the pre- and post-diary vague quantifier ratings. Apparently, vague quantifier ratings indicate partly absolute frequencies and partly percentages of individual’s emotional experiences.

In sum, the major implication of the present findings is that the negative correlations between frequencies of pleasant and unpleasant emotions, obtained with vague quantifier ratings, are an artifact􀁘that is, they conceal that the actual frequencies of pleasant and unpleasant emotions are positively correlated. Additional support for this claim stems from a study by Schimmack and Diener (in press), who also found a positive correlation between frequencies of pleasant and unpleasant emotions by means of a different method. In several studies, the participants indicated their emotional reactions to hypothetical or real life events, using an intensity scale ranging from 0=not at all to 6=extremely intense. Frequency of emotions was then determined as the number of non-zero ratings; that is, the number of times the emotion was experienced at all. In all studies, the frequencies of pleasant emotions were found to be positively correlated with the frequency scores of unpleasant emotions. Hence, yet another method supports a positive correlation between frequencies of pleasant and unpleasant emotions. Furthermore, Suh, Diener, and Fujita (1996) demonstrated that people with more positive life events also have more negative life events, presumably because they lead more active lives. For example, researchers who submit many papers to a journal have both more positive and negative reviews, and as a consequence more pleasant and unpleasant emotions, than researchers who submit only few papers. This can explain the positive correlation between the frequencies of pleasant and unpleasant emotions, because positive events elicit pleasant emotions and negative events elicit negative emotions.

In sum, the present results indicate that the frequencies of pleasant and unpleasant emotions are positively correlated across individuals. This finding contradicts previous findings. However, previous studies did either study moods and not emotions (Green et al., 1993), or used exclusively vague quantifiers ratings to measure the frequency of emotions. For reasons outlined above, the use of vague quantifiers is likely to produce method artifacts and lead to false conclusions about individual differences in the frequency of experienced emotions.

3.4 Discussion

The findings of Study 1 support the first three hypothesis: Frequency judgments of emotions possess discriminative accuracy across emotions as well as across participants. They underestimate the actual frequencies of emotions, and they do so increasingly with increasing frequencies of actual occurrences. Study 1 also provided evidence for hypothesis 5 that the salience of emotions during the encoding stage increases subsequent frequency judgments: After participating in a diary study, participants provided higher absolute frequency estimates and did so especially for salient emotions; that is, emotions that were included in the daily report form. Furthermore, frequent emotions showed stronger increases than infrequent emotions. This is very likely due to the stronger underestimation of these emotions when they are not salient; there is simply more room for salience to boost the estimates of frequent emotions.

The increased salience also had an effect on the accuracy of the frequency judgments. All measures of accuracy showed higher accuracy for the post-diary than for the pre-diary judgments. Although the interpretation of this finding is ambiguous, because the pre-diary judgments necessarily covered a different time period than the period when the actual frequencies were measured, the consistency of the effects suggests that participants were better able to judge the frequencies of their emotions after participating in a diary study. Influences of salience on the accuracy of frequency judgments have also been reported in experimental studies (Naveh-Benjamin & Jonides, 1986).

The findings concerning mood- or personality-congruent biases were mixed. Only the vague quantifier ratings showed a personality-congruent bias: neurotic individuals overestimated the frequencies of their unpleasant emotions, and extraverted individuals overestimated the frequency of their pleasant emotions (Diener et al., 1984; Feldman Barrett, in press).
Study 1 also revealed interesting new findings that deserve attention in future research. First, in various analyses different results were obtained for the absolute estimates and the vague quantifier ratings. Concerning absolute estimates, pre-diary estimates revealed low discriminative accuracy across participants, and the test-retest correlations between pre- and post-diary estimates were low, despite a relatively high stability of emotion frequencies. It was also found that the absolute estimates reflect mostly frequencies in the most recent past (last week), and that they were unaffected by mood or personality-congruent biases. On the other hand, vague quantifier ratings showed high temporal stability, reflected frequencies of emotions over the last three weeks, and appeared to be slightly biased in a personality and mood-congruent direction. First of all, this finding indicates that the choice of the response format matters (Schaeffer, 1991); a factor that has been neglected in the frequency judgment literature. The present study does only allow to speculate about the causal mechanisms that produced these differences. Brown (1995) made the important point that most frequency judgment models do not explain how frequency information (e.g., a feeling of familiarity) is converted into an absolute estimate. The present study shows that the conversion into a vague quantifier rating also requires explanation. The finding that the two response formats produced divergent results, even though both formats were administered at the same time to the same participants, suggests that the response formats influence predominantly the conversion of frequency information into a response.

One very interesting differences between the two response formats was that absolute estimates increased considerably from pre- to post-diary judgments. In contrast, vague quantifier ratings, which were made right before the absolute estimates, decreased from pre- to post-diary judgments. This finding demonstrates clearly that vague quantifier ratings do not correspond to absolute frequencies. The unexpected decrease of vague quantifier ratings in a test-retest design has been observed in other studies (Knowles et al., 1996). Post-hoc analysis were in agreement with Haubensak’s consistency model. Vague quantifier ratings, but not absolute estimates, distorted the positively skewed distribution of the emotion frequencies. However, the post-diary vague quantifier ratings reflected the actual distribution better than the pre-diary estimates. This finding suggests that the participants had problems to convert their frequency impressions into ratings along a limited number of vague quantifier categories.
Finally, the two response formats produced different correlations between frequencies of pleasant and unpleasant emotions: For absolute estimates the correlation was positive, whereas it was negative for the vague quantifier ratings. Regression analyses suggested that the negative correlations obtained with vague quantifier ratings were due to a method artifact: Participants used vague quantifiers partly to indicate percentages of the overall number of their emotional experiences (Reisenzein, 1995) which (a) eliminates individual differences in absolute frequencies of emotions and (b) pushes the correlation in a negative direction. Therefore, the absolute estimates are better suited to explore individual differences in the frequency of emotional experiences. According to the present study, people who experience pleasant emotions frequently are also likely to experience more unpleasant emotions. This finding challenges current structural models of personality which assume that the frequencies of pleasant and unpleasant emotions are independent (cf. Costa & McCrae, 1992; Watson et al., 1988).

4 STUDY 2

Study 2 was similar to Study 1 in that the participants again made daily frequency judgments and estimated the frequency of emotions before and after the diary study. Minor differences between Study 1 and 2 are that in Study 2 (a) the diary period extended only over two weeks, (b) the actual frequencies were based on twice-daily absolute estimates, (c) and the participants made only vague quantifier ratings, but not absolute estimates, before and after the diary period. The most important difference was that the participants in Study 2 estimated the frequencies of emotions separately for the first and second week of the diary period. In analogy to Hintzman and Block’s (1971) seminal experiment of frequency judgments for separate word list, this permitted to test whether information about the frequency of emotions is stored directly or indirectly in memory. If frequencies of emotions are stored directly in memory, participants should be unable to provide accurate frequency estimates separately for the first and second week. However, if frequency estimates are based on the activation of multiple memory traces, and if memory traces of particular time periods can be selectively activated, then participants should be able to judge accurately the frequencies of their emotions in the first and the second week.

Study 2 also provided for an opportunity to replicate several of the findings in Study 1, namely to test (a) the discriminative accuracy of frequency judgments of emotions across emotions and participants, (b) the presence of personality- and mood-congruent biases, and (c) the unexpected decrease of vague quantifier ratings over repeated measurements.

4.1 Method

4.1.1 Participants

The participants were undergraduate students at the Free University Berlin who took part in a course on emotions in everyday life. 80 participants (24 men and 56 woman) with a mean age of 25 completed all data collections.

4.1.2 Material and Procedure

4.1.2.1 Daily Estimates

At the core of Study 2 was the two week diary period. Participants rated the frequency of 34 emotions twice a day, which probably provides for a better estimation of the actual frequencies of emotions than the end-of-day judgments in Study 1. The response format for the daily estimates differed from Study 1. Whereas in Study 1 a free response format was used, the participants in Study 2 used a seven-point scale, ranging from 0 to 6, for their absolute estimates. All categories of this scale represented absolute frequencies (i.e. 0=never, 1=once, 2=twice, etc.) except the last category, which comprised all absolute frequencies greater than five. For most emotions, however, this category was used infrequently, so that the sum of the twice-daily estimates approximates the actual frequencies of emotional experiences during the diary period. Furthermore, the absolute level of the frequencies is less relevant in Study 2, because participants did not make absolute estimates before and after the diary period. Therefore, absolute and relative accuracy were not tested in Study 2.

In Study 1, participants had been asked to return questionnaires on a daily basis to ensure that the ratings were completed daily. This procedure was not feasible in Study 2 because students in Berlin do not live “on campus” and many students do not visit the university each day. Therefore, the report forms for the daily ratings were given to the participants in the form of two booklets; one for each week, so that at least the weekly completion of the report forms could be controlled. Afterwards, the twice-daily ratings were averaged across the repeated assessments to obtain a measure of the actual frequencies with which emotions were experienced.

4.1.2.2 Vague Quantifier Ratings

The vague quantifier ratings were always made for the time period of one week (How often did you experience joy in the last week?). The questionnaire included the 34 emotions that were on the daily report from, although in a different order, to discourage participants from using a stereotyped response pattern when they made the post-diary judgments. Judgments were made on the following scale: 0 =never, 1 = very rarely, 2 = rarely, 3 = sometimes, 4 = often, 5 = very often, and 6 = nearly always. Even though the daily report form and the vague quantifier ratings used a rating scale from 0 to 6, participants could not use the modal response on the daily form to make accurate vague quantifier ratings, because the same numeric category has different meanings in the two questionnaires. For example, a participant who always checks the response category “1” on the daily form as the frequency of his experiences of hate, would indicate to experience hate 14 times a week; this means experiencing hate quite often and not, as a vague quantifier rating of “1” would mean “very rarely.”

The vague quantifier ratings were made four times: two times prior to the diary study with a two week interval between the two assessments, and two times immediately after the diary study to make separate judgments of the first and the second week of the diary period.

4.1.2.3 Mood Questionnaire

Current mood was assessed with the BASTI (Schimmack, in press). The BASTI is the German counterpart of the ELMI used in Study 1 (see 3.1.2.4). The BASTI was completed after each administration of the vague quantifier ratings.

4.1.2.4 Personality Questionnaires

Extraversion and neuroticism were assessed two weeks after the diary study with the NEO-FFI (Borkenau & Ostendorf, 1991). In this questionnaire, neuroticism and extraversion are assessed with 12 items each. The NEO-FFI is a short version of the NEO-PI-R (Costa & McCrae, 1992) used in Study 1. The NEO-FFI is somewhat less reliable than the NEO-PI-R, but the reliability is still good (Borkenau & Ostendorf, 1991).

4.2 Results

Because the participants in Study 2 did not estimate absolute frequencies before or after the diary study, it was not possible to test the absolute or relative accuracy of frequency judgments in this study.

4.2.1 Discriminative Accuracy across Emotions

As in Study 1, the discriminative accuracy across emotions was assessed at the group and at the individual level. For the analysis at the group level, the actual frequencies and the vague quantifier ratings of the 34 emotions were averaged across the 80 participants. Then, the correlations of the actual frequencies with the two pre-diary and the averaged post-diary estimates were computed across the 34 emotions. All correlations were very high and statistically significant (pre-diary 1 r = .88, pre-diary 2 r = .92, averaged post-diary r = .96); a trend toward higher accuracy after repeated assessments is also apparent.

To test discriminative accuracy at the level of each participant, the same correlations were computed for each participant. Subsequently, the correlation coefficients were used as dependent variables in an ANOVA with the within-subject factor time of judgment. This analysis revealed a significant main effect, F(2,158) = 151.88, p < .01. Follow up analyses revealed that all three mean correlations differed significantly from each other (Figure 9).

The increased accuracy from the first to the second pre-diary ratings is especially important, because both ratings do not cover the period during which the actual frequencies of emotions were assessed. Therefore, the effect can be more easily attributed to an increase in people’s ability to discriminate the frequencies of emotions. Furthermore, the increased accuracy from pre- to post-diary ratings replicates the finding of Study 1; and it probably reflects an influence of the participation in the diary study on the discriminative accuracy across emotions. In sum, the results largely replicate the finding of Study 1 that frequency judgments of emotions possess discriminative accuracy across emotions which increased over repeated assessments.

4.2.2 Discriminative Accuracy across Participants

To estimate the discriminative accuracy across participants, the two post-diary estimates were averaged (separate analyses for each of the two diary weeks are reported later). As in Study 1, the actual frequencies were correlated with the pre- and the post-diary ratings.
Figure 10 shows that the frequency judgments of emotions in general possessed discriminative accuracy across participants (the separate correlations for each the 34 emotions can be found in Appendix 5).

However, follow up tests indicated that all three mean correlations differed significantly from each other. Study 2 also replicated the finding that pre-diary estimates are a better predictor of actual frequencies of emotions with higher temporal stability. However, this finding was only supported for the second pre-diary judgments (r = .49, p < .01), but not for the first pre-diary judgments (r = .10, p = .59). This pattern of results indicates that pre-diary judgments attenuate the discriminative accuracy of frequency judgments because they cover a different time period than the one, during which the actual frequencies of emotions were assessed. Nevertheless, the high accuracy of the post-diary judgments is probably due to the participation in the diary study and overestimates the discriminative accuracy across participants under natural conditions.

Next, the discriminative accuracy across participants in study 2 was quantitatively compared to the one in Study 2 for the 18 emotions that were included in both studies (all emotions of Study 1 except hurt and worry). These analyses revealed a very similar degree of accuracy in both studies (Study 1: pre-diary r = .39; post-diary r = .58; Study 2: second pre-diary r = .36; post-diary r = .58), which did not differ significantly from each other ts < 1.50, ps > .20. Furthermore, it was explored whether emotions that revealed high discriminative accuracy across participants in one study, also revealed high discriminative accuracy across participants in the other study; this was, however, not true (pre-diary r = .09, post-diary r = .22, both ps > .10). Therefore, at present it is not possible to recommend emotions that guarantee a high degree of discriminative accuracy across participants.

As in Study 1, the specificity of the frequency judgments was investigated in that the correlation between the actual frequencies of an emotion with the frequency judgment of this emotion was compared to the correlations with the frequency judgments of all other 33 emotions. Specificity was established if the correlation with the same emotion was higher than any of the other 33 correlations. For the post-diary estimates this was the case for all emotions except discontentment (see Appendix 6). However, for the two pre-diary estimates, only about half of the emotions revealed specificity (1st pre-diary estimates N = 16, 2nd pre-diary estimates N = 15 out of 34). Although this number is still significantly different from chance (expected value N = 1, both χ2s > 200, ps < .01), it indicates lower specificity for the pre-diary estimates.

A similar effect had not been observed in Study 1. A possible explanation for this difference could be that Study 1 included more similar emotions. Another explanation would be the longer delay between pre-diary ratings and the assessment of the actual frequencies, which renders it more likely that the actual frequencies changed between the time periods covered by the pre-diary judgments and the time period during which the actual frequencies were assessed.

In sum, Study 2 closely replicated the findings of Study 1 that frequency judgments of emotions possess moderate discriminative accuracy across participants. The size of the correlations between pre-diary and post-diary estimates with actual frequencies was very similar in both studies in the range from r = .30 to r = .60. Furthermore, frequency judgments of emotions in Study 2 were quite often specific for each emotion, indicating that the respondents used different information for frequency judgments of each emotion. As in Study 1, this finding demonstrates that discriminative accuracy across participants is not due to response sets.

4.2.3 Exploration of Mood- and Personality-Congruent Biases

The analysis follows closely the procedure in Study 1. First, the 10 specific mood dimensions of the BASTI were submitted to a factor analysis and the factor scores of the first two unrotated factors were retained for further analyses. Replicating Study 1, the factor scores of the first factor were highly negatively correlated with the directly measured pleasure-displeasure dimension (r = -.81, p < .01), and the factor scores of the second factor were moderately correlated with the directly measured arousal dimension (r = .52, p < .01). To facilitate the interpretation, the factor scores of the first factor were inverted so that positive values indicate pleasure. The frequency estimates of pleasant and of unpleasant emotions were averaged to reduce the number of analyses. Emotion words denoting “mixed feelings” (e.g., sympathy) were dropped at this step. Next, multiple regression analyses were carried out, in which the post-diary vague quantifier ratings were regressed onto the actual frequencies of emotions, neuroticism, extraversion, current pleasure, and current arousal. Table 10 shows that the actual frequencies of emotions were the best predictor. The only additional significant effect was that current arousal predicted frequency estimates of pleasant emotions. However, current arousal did not predict vague quantifier ratings of pleasant emotions in Study 1; although it predicted absolute estimates. Similarly, Study 2 did not replicate the personality-congruency effects obtained for vague quantifier ratings in Study 1.

Finally, the simple correlations between the pre- and post-diary vague quantifier ratings and the five predictors of the previous analyses were compared to each other. If participation in a diary study attenuates biases, the simple correlations between personality and mood measures and pre-diary frequency judgments should be higher than those for the averaged post-diary estimates. Table 11 shows that only extraversion was more highly correlation with both pre-diary than with the post-diary judgments. However, extraversion revealed no higher correlations with pre-diary estimates in Study 1. Hence, the present studies do not support the hypothesis that participation in the diary study reduces the influence of personality- or mood-congruent biases on frequency judgments of emotions.

In sum, only actual frequencies were consistently related to frequency judgments of emotions in studies 1 and 2. Personality and mood effects were not consistent across studies or response formats in Study 1. Although the present results do not indicate that personality- or mood-congruent effects do not exist, especially given the fact that other studies found at least personality-congruent biases (Diener et al., 1984; Feldman Barrett, in press), the studies clearly show that biases are small relative to the amount of accuracy in frequency judgments. This conclusion is in agreement with the previous studies, in which the bias was also small compared to the accuracy in the retrospective judgments (Diener et al., 1984; Larsen, 1992; Feldman Barrett, in press).

4.2.4 Accuracy of Separate Estimates of the First and Second Diary Week

The next analyses go beyond a simple test of the accuracy of frequency judgments of emotions, and start to investigate the cognitive processes underlying frequency judgments of emotions. The participants in Study 2 were asked to provide separate frequency estimates for the first and second week of the diary period. If frequency information is stored directly in memory and frequency judgments are simply based upon the retrieval of this prestored information, participants should not be able to make accurate frequency judgments for the two separate weeks. In contrast, if frequency information is stored indirectly in memory (e.g., in the form of multiple episodes) and frequencies are computed only at the time of the judgments, the judgments should accurately reflect differences between the frequencies of emotions in the two weeks, because contextual cues can be used to activate memory traces of specific time periods (Hintzman & Block, 1971).

First, the daily estimates were averaged separately for the first and second week. Then, a set of hierarchical regression analyses were computed to determine three components of the shared variance between the actual frequencies in the two weeks and the vague quantifier ratings of one week: (a) the variance that is uniquely explained by the actual frequencies of emotions in the first week, (b) the variance that is uniquely explained by the actual frequencies of emotions in the second week, and (c) the variance that is shared by the two predictor variables. Figure 11 illustrates how the shared variance was decomposed into these three components. The amount of shared variance is simply the difference of the total amount of explained

variance minus the two unique variances (R2total = R2unique week1 + R2unique week2 + R2shared). This decomposition of the overall amount of explained variance is possible because all variables were positively intercorrelated so that suppression effects can be ruled out.
Frequency Judgments of Emotions 93
If participants are able to make accurate frequency estimates separately for the two weeks, entering the judged week in the second step should produce a higher increase in explained variance than entering the week that was not the target of the judgment (i.e. for week 1, R2unique week1 > R2unique week2; for week 2, R2unique week2 > R2unique week1). Furthermore, if participants discriminate frequencies of emotions in the two weeks perfectly, adding the non-target week in the second step should not increase the amount of explained variance. (i.e., for week 1, R2unique week2 = 0, for week 2, R2unique week1 = 0.). Finally, if participants are better able to estimate the frequencies of emotions in the more recent second week, the unique variance in estimates of week 2 should be higher than the unique variance in estimates of week 1 estimates (i.e. [for week 2, R2unique week2 ] > [for week 1, R2unique week1]).

Figure 12 shows the amount of the three variance components averaged across the analyses of each of the 34 emotions (see Appendix 7 for the results of the individual analyses). Most important are the findings that (a) the actual frequencies in week 1 uniquely explained more variance in vague quantifier ratings of week 1 than the actual frequencies in week 2, t(33) = 5.24, p < .01, and (b) the actual frequencies in week 2 uniquely explained more variance in vague quantifier ratings of week 2 than the actual frequencies in week 1, t(33) = 3.11, p < .01. This confirms the prediction of the familiarity model that people are sensitive to frequencies of emotions in different contexts.

In addition, the amount of uniquely explained variance by the target week (e.g., actual frequencies in week 1 for ratings of week 1) did not differ between ratings of week 1 and ratings of week 2, t(33) = 1.15, p = .25. There were also no significant differences between the ratings of the two weeks in the variance uniquely explained by the actual frequencies in the non-target week (e.g., actual frequencies in week 1 for ratings of week 2), t(33) = 0.66, p = .52, or the amount of shared variance, t(33) = 0.60, p = .56. This pattern of results shows that the participants were equally able to make accurate frequency estimates for the first and the second week of the diary study, and that the accuracy of the frequency judgments for the remoter first week was as good as for the more recent second week. Finally, the amount of variance uniquely explained by the non-target week was significantly different from zero for ratings of both weeks, both Fs(1,33) > 20.00, both ps < .01. This finding indicates that participants are not perfect in discriminating the frequencies of emotions experienced in the two weeks: Ratings of one week were also influenced by actual frequencies in the other week. This influence is, however relatively small (see Figure 12).

In sum, the findings show a high ability of the participants to detect changes in the frequencies of their emotional experiences from week 1 to week 2. This finding is particularly noteworthy because it replicates an experimental finding (Hintzman & Block, 1971) under natural conditions over a much longer interval between the encoding of stimuli (i.e. the experience of emotions) and the moment when the frequency judgments were made. Furthermore, it demonstrates this effect for the first time with regard to discriminative accuracy across participants. Finally, the present design provides for a strong test of the hypothesis of context sensitivity because the actual frequencies in the two weeks were highly correlated (mean r = .71). Therefore, the changes in the emotion frequencies from one week to the other were relatively small. Nevertheless, the participants detected these changes, a finding that contradicts direct encoding models of frequency information. It also speaks against the hypothesis that frequency judgments of emotions are based on generalized beliefs or are pre-stored in memory.

4.3 Additional Analyses

4.3.1 Repeated Assessment of Vague Quantifier Ratings

In Study 1 the mean of the vague quantifier ratings decreased from pre- to post-diary judgments, whereas the absolute estimates increased. In Study 2 the participants made vague quantifier ratings two times prior to the diary study. Therefore, it could be explored whether the decreased also occurs when participants do not participate in a diary study between the two ratings. Vague quantifier ratings at each measurement point were first averaged across participants. Then, a repeated measure ANOVA was computed across emotions with the within-subject factor time. A strong effect was obtained, F(3,33) = 62.50, p < .01, partial ε2 = .65.

Follow up analyses indicated that the mean decreased from the first to the second assessment and again to the post-diary assessment. The means of the two post-diary ratings were practically identical (Figure 13). Next, the prediction of Haubensak’s consistency model was tested that decreasing means should be paralleled by a better approximation of the distribution of the actual frequencies of emotions, which should be positively skewed. The data support this prediction: The skewness of the actual frequencies was 1.50. The skewness of the first vague quantifier ratings was 0.45. It increased to 0.57 for the second pre-diary judgments, and once more to 0.91 and 0.93 for the two post-diary ratings.

In sum, the decreasing mean of vague quantifier ratings in Study 1 has been replicated. The finding has been extended by showing an additional decrease from a second to a third assessment. Furthermore, additional analyses suggest that the consistency model (Haubensak, 1994) is a promising candidate for a theory that can explain this effect. Future research should try to test the consistency model more directly. A better understanding of the effect seems to be highly desirable because the decreasing mean of vague quantifier ratings has important practical implications (Knowles et al., 1996). Most importantly, it makes the interpretation of changes in pre-post designs (e.g. in therapy evaluation studies) very difficult. A better understanding of the effect might help to develop measurement instruments that minimize this effect.

4.3.2 Interrelations between the Frequencies of Pleasant and Unpleasant Emotions

In Study 1 it was found that the relation between frequency estimates of pleasant and unpleasant emotions depended on the response format: The absolute format produced positive correlations whereas the vague quantifier ratings produced negative correlations. In Study 1 the absolute estimates were made using a free response format. This response format might be especially susceptible to an extremity response style. In contrast, the participants in Study 2 made (daily) absolute estimates on a predefined response scale which limited the range of responses. Therefore, the positive correlation should disappear, if it were simply due to the free response format. However, despite the modified procedure used in Study 2, it replicated the finding of a high positive correlation between absolute frequency estimates of pleasant and unpleasant emotions (Table 12). In contrast, the significant negative correlations obtained for the vague quantifier ratings in Study 1 were not replicated: The two pre-diary estimates revealed non-significant correlations close to zero, whereas the post-diary estimates produced low, but significant positive correlations.

As in Study 1, regression analyses were carried out to test the prediction that vague quantifier ratings of pleasant (unpleasant) emotions are influenced by the actual frequencies of unpleasant (pleasant) emotions. These analyses in general replicated the results of Study 1 (Table 13). Besides high correlations with the actual frequencies of emotions of the same valence, the vague quantifier ratings also show negative correlations to the actual frequencies of emotions of the opposite valence.

As in Study 1, actual and estimated frequencies were averaged across all emotions to test whether people who experience on average more emotions also used higher vague quantifiers. Again, positive correlations were obtained: the correlations were r = .36 for the first pre-diary judgments, r = .47 from the second pre-diary estimates, and r = .64 for the averaged post-diary estimates. Especially for the post-diary estimates the correlation is higher than in Study 1 (r = .45). This finding indicates that vague quantifier ratings partly reflect percentages and partly absolute frequencies of emotions. It seems that the participants in Study 2 used the vague quantifiers more to express absolute frequencies.

In sum, the results of Study 2 replicated a positive correlation between frequencies of pleasant and unpleasant emotions, and the finding that vague quantifier ratings mask this positive correlation because participants partly use vague quantifiers to indicate percentages and only partly to indicate absolute frequencies of emotional experiences. As noted before, this finding undermines the empirical support of prevailing theories of individual differences in emotional experiences (Costa & McCrae, 1992; Watson et al., 1988).

3.4 Discussion

Study 2 replicated several findings of Study 1: The discriminative accuracy across emotions was high, the discriminative accuracy across participants was in the same moderate range (r = .30 to .60), and the mean of vague quantifier ratings decreased over repeated assessments. On the other hand, the few personality- and mood-congruent biases obtained in Study 1 were not replicated, suggesting that theses biases are not an important factor in frequency judgments of emotions. Study 1 also replicated the finding of a positive correlation between the frequencies of pleasant and unpleasant emotions, which, this time, was even supported by the post-diary vague quantifier ratings. Most importantly, Study 2 provided for a first test between direct and indirect frequency judgment models.

Study 2 showed that participants were able to estimate accurately the frequency of emotions in two separate weeks, even after controlling for the highly correlated frequencies in the other week. This finding is incompatible with direct encoding models of frequency information, which assume that frequency information is constantly updated (cf. Hintzman & Block, 1971). According to these models, it should be impossible to estimate frequencies in a remoter time period independently from the frequencies in the recent past. Opposing this prediction, participants were sensitive to differences in the emotion frequencies in the two weeks. This result suggests that the information about the frequency of emotional experiences is indirectly stored in memory, probably in the form of multiple memory traces of emotional episodes. Study 2, however, does not allow to differentiate between the different indirect encoding models (see Figure 2). To this aim, the next two studies were carried out.

5 STUDY 3

The aim of studies 3 and 4 was to study the cognitive processes underlying frequency judgments of emotions under more controlled conditions. To do so, real emotional experiences investigated in the previous studies were replaced by emotional reactions that participants would experience in hypothetical scenarios. That is, participants indicated their likely emotional reactions to a number of hypothetical scenarios. Subsequently, the number of times participants indicated that they would feel joy, anger, or gratitude was used as the measure of the actual frequencies of emotions. The advantage of this new way to determine the actual frequency of emotions is that actual frequencies can be objectively determined. This avoids the problem of the previous studies that the measure of the actual frequencies was based on frequency judgments. The disadvantage of this paradigm is clearly that the frequency judgments concern only hypothetical situations and not real life emotional events. Nevertheless, the approach is similar to the strategy of experimental psychologists to study frequency judgments of words denoting natural objects and to assume that the results generalize to frequency judgments of objects or events in real life (cf. Hasher & Zacks, 1984).

The use of hypothetical scenarios as stimulus material allows one to manipulate to a certain degree the frequencies of “emotions,” because people’s emotional reactions to some kind of situations (e.g., death of a loved one) are fairly universal (cf. Mesquita & Frijda, 1992). However, there are also individual differences in emotional reactions to the same situations, because people appraise situations differently (cf. Lazarus, 1991; Reisenzein & Hofmann, 1993). Therefore, the manipulation of emotion frequencies in the present studies is less reliable than the manipulation of the frequency of natural objects in previous experimental studies (e.g., Greene, 1989). For example, not everybody feels angry if he or she has to wait in line, but everybody agrees that a banana is a fruit. This “weakness” of the present approach can also be a strength, when researchers want to study individual differences in the processing of emotional information under controlled conditions (cf. Schimmack & Hartmann, in press). In the present context, however, individual differences constitute error variance, in accordance with experimental studies of frequency judgments in general (cf. Naveh-Benjamin & Jonides, 1986).

The standard paradigm used in the following two studies consists of an initial scenario rating task (SRT) and a subsequent frequency judgment task (FJT). In the SRT, participants indicated for several scenarios which emotions they would experience if they were in the described situation. These ratings allow the researcher to determine the actual frequencies of emotions􀁘that is, the number of times that a respondent indicated that he or she would have experienced an emotion in a scenario. In the FJT, participants have to estimate how often they would have experienced various emotions in the situations of the SRT. These judgments can then be compared to the actual frequencies as defined above. As the following experiments were carried out on a personal computer, it was also possible to measure the judgment times of the frequency judgments.

On top of the SRT and the FJT, other tasks can be added to explore the cognitive processes underlying frequency judgments of emotions. In the following two studies, a latency-of-retrieval task (LRT) was added (Fitzgerald et al., 1988; MacLeod et al., 1994). In this task, participants were asked to recall, as fast as possible, one of the scenarios in which an emotion would have been experienced. This allows one to test the ease-of-retrieval model: If the ease-of-retrieval hypothesis is correct, higher frequency judgments should be related to shorter retrieval times (see hypothesis 6). Furthermore, the judgment times of the frequency judgments should be correlated with the retrieval latencies as well as with the size of the frequency judgments (hypothesis 7), and frequency judgment should need longer than the retrieval of scenarios from memory (hypothesis 8). In contrast, the familiarity model does not predict these effects.

5.1 Method

5.1.1 Participants

48 undergraduate students at the Free University Berlin participated in the Study for course credits.

5.1.2 Material

5.1.2.1 Scenarios

Reisenzein and Hofmann (1993) asked 20 students at the Free University Berlin to report for each of 23 emotions one personal experience of this emotion. Subsequently, they asked a different sample of 51 participants to indicate which of the 23 emotions was most likely felt by the protagonist of each scenario. The authors found that for most scenarios, the target emotion; that is, the emotion that triggered the reported scenario, was recognized by the majority of the participants. For the present study, 12 of the 23 target emotions investigated by Reisenzein and Hoffman were selected: anger, anxiety, contempt, disappointment, disgust, embarrassment, gratitude, jealousy, joy, love, pride, and sadness. For each target emotion the ten scenarios with the highest recognition rate of the target emotion were selected, yielding a total of 120 scenarios. Each scenario was about 10 to 60 words long. The following example is a description of an anger experience:

A while ago, I bought some apples at the supermarket, because they were so cheap. At home, I found out that they were already rotten inside. I thought: “And this supermarket always advertises with its fresh fruits.‘

For the present study, the selected 120 episodes were split into four sets of 30 episodes and one of these four sets was presented to each participant. To manipulate the frequency of emotions between the four scenario sets, unequal numbers of scenarios of one target emotion were assigned to each set (see Table 14). It has to be noted, however, that this procedure allows only a rough manipulation of emotion frequencies between sets of scenarios because each scenario tends to elicit several emotions besides the target emotion (cf. Reisenzein, 1995). For example, jealousy scenarios often also elicit anger, disappointment, and sadness. Therefore, another aim of Study 3 was to determine the pattern of emotions that is elicited in each scenario. This would provide for a better manipulation of emotion frequencies in Study 4 and other future studies.

5.1.2.2 Emotions

Although the scenarios elicited most strongly the 12 target emotions, it is likely that the scenarios also elicit various other emotions. To obtain ratings for a comprehensive list of emotions, 32 emotions were selected for the rating task. Besides the 12 target emotions, the 11 additional emotions studied by Reisenzein and Hofmann (1993) were also included, namely compassion, discontentment, envy, guilt, hope, helplessness, loneliness, regret, relieve, shame, and surprise. Furthermore, contentment, depression, euphoria, hate, and hopelessness were included because they had been studied by Reisenzein (1995, Study 3; see also Schimmack & Reisenzein, in press) in related research. Rage was included upon request from participants in a small pilot study. Finally, the global descriptions “a pleasant feeling,” and “an unpleasant feeling” were added, to investigate frequency judgments of broad categories compared to those of specific emotions. For the scenario rating task, the 12 target emotions were split into two sets of 6 emotions each. Similarly, the 20 remaining emotions, labeled non-target emotions, were split into two sets of 10 emotions each (see Appendix 8 for the assignment of emotions to sets of target and non-target emotions). Each participant received only one set of target emotions and one set of non-target emotions in the SRT. In sum, the SRT was divided between participants according to a three factorial design, with four sets of episodes, two sets of target emotions, and two sets of non-target emotions (Figure 14). However, analyses are not based upon the 4 x 2 x 2 design because the number of participants in each cell of the design is too small (N = 3). The purpose of this design was that an equal number of participants rated the intensity of each emotion in each of the four sets of scenarios. Because each scenario set was presented to 12 participants, and each participant rated the intensity of half of the target and half of the non-target emotions, in each set of scenarios six participants rated the intensity of the same emotion.

5.1.2.3 Rating Scale

In the SRT, participants were mainly asked to rate whether they would feel an emotion or not. However, the intensity of the emotional reactions appeared to be of interest as well, especially regarding other research questions (Schimmack & Hartmann, in press; Schimmack & Diener, in press). Therefore, the participants were also asked to indicate how intense their emotional reactions would be, provided that they experienced an emotion at all. Reisenzein (1995, Study 3) used two separate ratings to obtain independent information about the presence and intensity of an emotion. To simplify the judgment process, Schimmack (in press; Schimmack & Diener , in press) proposed to decompose a single rating on an intensity scale into information about the presence/frequency and intensity of an emotional reaction. One simply uses zero-responses as information that an emotion is not experienced; then all non-zero responses indicate the experience of an emotion. That is, it is proposed to use a dichotomization of intensity ratings into ratings equal zero and those greater than zero as information about the absence versus presence of emotions. Summed across scenarios, the number of non-zero judgments for one emotion represents the actual frequency of this emotion. In the present study, a four-point intensity scale was used (0=not at all, 1=sligthly, 2=medium to 3=high intensity).

5.1.3 Procedure

The experiment comprised several tasks that were implemented in a single computer program. First, the participants judged how intensely they would experience a selection of 16 emotions in one of the four sets of 30 scenarios. Afterwards, they estimated how frequently they would have experienced each of the 32 emotions in the set of scenarios of the SRT. That is, they made frequency judgments for the 16 emotions included in their SRT, labeled salient emotions, plus the 16 emotions that were not presented in their SRT, labeled non-salient emotions. Subsequently, in a latency-to-retrieve task, the participants recalled, as quickly as possible, one scenario in which they would have experienced the emotion that was presented as a retrieval cue. To reduce the length of the already strenuous experiment, only the 12 target emotions (Table 14) were used as retrieval cues. Because each participant had six target emotions in his or her SRT, this guarantied that each participant retrieved scenarios for six salient and six non-salient emotions.

5.1.3.1 Scenario Rating Task

In the SRT, the participants were asked to imagine being in the described situation and to indicate how they would feel in the situation. For each emotion the participants were asked to consider first whether they would feel this emotion or not. Only if they would feel the emotion, were they to consider the intensity of the emotion. After having read the instructions, participants pressed the return key to start the scenario rating task. The scenarios were displayed in the upper half of the screen and could be studied by the participants as long as they wanted. When the participants had sufficiently studied the scenario, they pressed the return key to start the rating task. After pressing the return key, the rating scale was displayed below the scenario description, which remained on the screen. The rating scale was split into two parts, with the zero-category on the left and all remaining categories on the right side, to increase the salience of the difference between zero and non-zero responses. Between the scenario description and the rating scale, the sentence “In this situation I would have felt …” followed by each of the 16 emotion words was displayed. The participants indicated their likely emotional reaction by pressing the appropriate number on the keypad. If they made an error, they could repeat the last entry using a special correction key. After all 16 emotions had been rated, the next scenario was displayed. New random sequences were generated by the computer for the 30 scenarios for each participant and for the 16 emotions for each scenario. The computer also measured the judgment times from the display of each emotion words to the intensity rating.

5.1.3.2 Frequency Judgment Task

After the SRT was completed, the participants were surprised with additional instructions that they would now be asked several questions concerning the scenarios presented in the SRT. Their first task, after the SRT, would be to estimate the absolute frequency with which various emotions had occurred in the previous episodes. For example, if they had rated anger to be present in five of the scenarios (i.e. if they had made five non-zero-ratings in the SRT), then five would be the correct answer. Participants were not informed that they read 30 scenarios; therefore it was possible that participants made frequency judgments greater than 30. The participants were also informed that they should estimate the frequency of salient and non-salient emotions and that judgments of non-salient emotions (i.e. those not included in the previous SRT) are meaningful because they could have been elicited in the scenarios even though the participants did not have to rate their intensity. For example, although a participant might not have been asked to rate the presence and intensity of disgust in the SRT, the assignment of scenarios to the four sets of scenarios made it very likely that each participant would have indicated to experience disgust at least once (see Table 14). The participants were also informed that the frequency judgments had to be made within 10s, and that the next item would be presented automatically if they exceeded this time limit. A pilot study had shown that participants exceeded this time limit very rarely.

After reading the instructions, participants pressed the return key to start the computer-paced frequency judgment task. The 32 emotions were displayed in a different random sequence for each participant. With the display of each emotion word, a timer appeared also on the screen that counted the elapsed seconds. After 7s, the computer sounded a warning tone. After the participants had entered the first number, the computer recorded the time since the emotion word was presented. After entering the complete number, participants pressed the return key to continue with the task.

5.1.3.3 Latency-To-Retrieve Task

In the latency-to-retrieve task (LRT), the 12 target emotions appeared on the screen in a new random order for each participant participants. For each emotion, the participants were asked to recall, as quickly as possible, a scenario from the SRT. When they recalled one, they pressed the return key. Subsequently, they entered a keyword to describe the recalled scenario (e.g., apple for the example in 5.1.2.1). If a participant did not recall a scenario within 10s the next emotion word was automatically presented on the screen. The computer recorded the retrieval latency from the presentation of an emotion word to the pressing of the return key.

5.2 Results

5.2.1 Preliminary Analyses

First, the actual frequencies of emotions in each of the four sets of scenarios were determined. To do so, the number of times a participant made a non-zero rating in one of his or her 30 scenarios was counted. Due to the experimental design, six participants rated the same emotion for the same set of scenarios. The frequencies of these six participants were averaged to determine the actual frequency of an emotion in each set of scenarios (see Appendix 8).
Correlations of the frequencies of emotions between the four sets of scenarios revealed that the attempted experimental manipulation of the frequencies was not very successful because the frequencies were highly intercorrelated. (Table 15). That is, emotions that were frequent in one set of scenarios also tended to be frequent in other set of scenarios.

5.2.2 Relative Accuracy

First, the number of times participants failed to make a frequency judgment within 10s was determined. This happened only 5 out of 1536 times. In these cases, the missing frequency judgment was set to zero and the missing judgment time was replaced by the maximum judgment time (10s).

As in Study 1, the relative accuracy of the absolute estimates was tested. Figure 15 shows the estimates of the 32 emotions plotted against the actual frequencies in each of the for scenario sets. As in Study 1 the regression slop of the estimates indicates that the actual frequencies were underestimated in all four sets of scenarios. The relative accuracy scores (actual – estimated frequencies) for all four sets of scenarios are negative (Set 1 d = -6.29, Set 2 d = -2.41, Set 3 d = -4.20, Set 4 d = -3.96) and significantly different from zero (all Fs > 20.00, ps < .01), which shows the trend toward underestimation quantitatively.

As in Study 1, the actual frequencies of emotions were correlated with the relative accuracy score of each emotion. If frequent emotions are underestimated more strongly than infrequent ones, a negative correlation between the actual frequencies and the relative accuracy scores is expected. This prediction was confirmed in all four sets of scenarios (rs =-.88, -.61, -.72, -.80, all ps < .01). In sum, Study 3 replicated the finding in Study 1 that absolute estimates underestimate the actual frequencies of emotions and that they do so increasingly with increasing frequency of occurrence.

Because each participants did not rate the intensity of all 32 emotions in the SRT, actual frequencies of all emotions were not available at the individual level. Therefore, only analyses at the group level were possible. For these analyses, the frequency judgments of those 12 participants who rated the same set of scenarios were averaged. The interrater agreement between the 12 participants in each condition was determined, using Shrout and Fleiss’s (1979) intra-class coefficient (ICC[2,k]). The interrater agreement for the four sets ranged from ICC[2,12] = .60 to .74. Furthermore, the estimated frequencies of emotions were correlated between the four sets of scenarios (Table 16), which is expected because the actual frequencies of emotions in the four sets of scenarios were also correlated (Table 15).

Table 17 shows the correlations between the actual and estimated emotion frequencies in the four sets of scenarios. First of all, the correlations are high, indicating general agreement between actual frequencies of emotions and frequency judgments. However, a stronger test
of discriminative accuracy across emotions would require that the frequency judgments of participants who rated a particular set of scenarios are more highly correlated with the actual frequencies of this rather than a different set of scenarios. Table 17 shows that this was the case for all four sets of scenarios (Note that this implies a comparison only along the rows in Table 17, but not necessarily also along the columns).

In sum, Study 3 replicates the finding of Study 1 that frequency judgments discriminate between actual frequencies of emotions. It was more difficult to demonstrate sensitivity to the particular frequencies in a specific set of scenarios. This was very likely due to the high correlations of the actual frequencies between sets of scenarios (Table 15) and the moderate interrater agreement of the frequency judgments (Table 16). A stronger test of sensitivity to experimentally manipulated frequencies of emotions seems desirable. This test was carried out in Study 4 which allowed a better manipulation of emotion frequencies on the basis of the SRT data obtained in this study.

5.2.4 Influence of the Salience of an Emotion on Frequency Estimates

As in Study 1, the effect of the salience of emotion concepts at the time of encoding on the subsequent frequency estimates was examined. In the present study, some emotions were salient, because they were included in the SRT, whereas others were not salient, because they were presented the first time during the FJT. In the following analyses, the differences between scenario sets are ignored because the sets were varied orthogonal to the sets of emotion words. Therefore, differences in frequency judgments of salient and non-salient emotions cannot be attributed to the presentation of different scenarios. For each emotion, the frequency estimates of those 24 participants for whom the emotion was salient was compared to the frequency estimates of those 24 participants for whom the emotion was not salient (see Figure 14). In 31 of the 32 comparisons the frequency estimate was higher when the emotion was salient. In an analysis across emotions, the mean frequency estimate in the salient condition (M = 7.21, SD = 1.92) was significantly higher than the mean frequency estimate in the non-salient condition (M = 4.61, SD = 2.28), t(31) = 10.23, p < .01. This finding replicates the salience effect obtained in Study 1.

This effect can also be seen in Figure 16. This figure also shows that, in contrast to Study 1, the slop of the regression line was not steeper in the salient condition. This implies that the salience condition produced higher estimates in general, but not for the more frequent emotions in particular. Indeed, the actual frequencies were not significantly correlated with a difference score between frequency estimates in the salient and non-salient condition, r = -.24, p = .19. The divergent findings could be due to (a) the fact that Study 1 comprised more participants, (b) the use of a within-subject design in Study 1 and a between-subject design in Study 3, or (c) the different manipulations of salience. This inconsistency should, however, not obscure the main finding of this analysis that frequency judgments were generally higher in the salient than in the non-salient condition. This finding was predicted in Hypothesis 5 and is inconsistent with direct encoding models of frequency information.

5.2.5 Testing the Ease-of-Retrieval Model

It has been proposed that people rely on the retrieval of exemplars to estimate frequencies (Tversky & Kahneman, 1973). One version of the retrieval-based models, the ease-of-retrieval model, was explicitly tested in this study in several ways. First, the relation between frequency judgments and retrieval times in the LRT were compared (cf. MacLeod et al., 1994). Second, the relation between the size and the speed of the frequency judgments was explored. And finally, the speed of frequency judgments was compared with retrieval latencies in the LRT. In the following analyses, the data were averaged across all 48 participants to increase the reliability of the variables. This was justifiable because of the high correlations of the actual frequencies between scenario sets (Table 15). The frequency estimates showed a high internal consistency, ICC[2,48] = .92, whereas the internal consistency of the averaged judgment times was only moderate, ICC[2,48] = .42. Contrary to the prediction of the ease-of-retrieval model, the judgment times were not significantly correlated with the size of the frequency judgments (r = .22, p = .22). The correlation was even slightly in the opposite direction than the one predicted by the ease-of-retrieval model. The direction of the correlation is more consistent with the recall-estimate model (see Brown, 1995); however, as it is not significant, it does not support this alternative retrieval-based model either.

For the next analysis, the retrieval latencies of the 12 target emotions were averaged across participants. The interrater agreement for these latencies was ICC[2,48] = .66. The latencies were then correlated with the frequency estimates across the 12 target emotions. The correlation with the retrieval latencies failed to be significant (r = .43, p = .17) and was again in the wrong direction. The positive correlation does not support the recall-estimate model either, because in the latency-to-retrieve task participants were asked to retrieve only one exemplar. Therefore, the longer retrieval times do not indicate counting of several exemplars (see Brown, 1995). Third, the average retrieval latencies were compared with the average times needed for the frequency judgment; for this analysis only the judgment times of the 12 target emotions were used, because only target emotions were used in the LRT. If frequency estimates are based on information about the speed of retrieval, frequency estimates should take at least as long as the retrieval of a single scenario. However, the mean retrieval latency (M = 3.56s, SD = 0.57) was longer than the mean time needed to make a frequency judgment (M = 3.16, SD = 0.18), t(11) = 2.91, p < .05. This finding is incompatible with any retrieval-based model; the ease-of-retrieval or the recall-estimate model. In sum, the analyses support hypotheses 6 to 8: retrieval latencies and judgment times were independent of frequency judgments and retrieval of a single exemplar required more time than the complete frequency judgment process.

5.3 Discussion

The results of Study 3 replicated several earlier findings obtained in field studies of real emotional experiences. In studies 1 and 3 the actual frequencies of emotions were underestimated, especially the ones of frequent emotions. Nevertheless, the frequency judgments in both studies revealed discriminative accuracy across emotions (Study 3 did not allow to test discriminative accuracy across participants). Finally, both studies showed that the frequencies of salient emotions were estimated to be higher than the frequencies of non-salient emotions. The only difference was that in Study 3 frequent emotions did not benefit more from the salience manipulation than non-salient emotions, which was the case in Study 1.

Besides replicating the results of the field study in a more controlled setting, Study 3 also provided several new findings. First, frequency judgments were not related to the times needed to make these judgments. Furthermore, retrieval of a single scenario needed more time than the complete frequency judgment process. This finding contradicts retrieval-based frequency judgment models; both the recall-estimate theory (Brown, 1995; Meudall, 1971; Watkins & LeCompte, 1991), or the ease-of-retrieval model (Schwarz et al., 1991). Contrary to the predicts of the ease-of-retrieval model was also the finding that latencies in the LRT were unrelated to the frequencies judgments. In sum, the direct encoding models cannot explain the salience effect, whereas the retrieval-based models cannot account for the speed of the frequency judgments. This leaves the familiarity model as the only model that is compatible with the present data.

Three possible objections can be raised against the present findings. First, the judgment and retrieval times might not have been accurately measured, which is suggested by the low consistency of the measures across participants. However, given that the retrieval times were measured by a computer to the nearest millisecond, it remains unclear how the participants themselves are able to distinguish differences in retrieval times more accurately than the computer, as such an ability is needed to use retrieval latencies as information about frequencies of emotions. Only if one notices that one retrieved a joy-scenario faster than an envy scenario, one can judge the frequency of joy to be higher than the frequency of envy. A second objection could be that the ease of retrieval is conceptually different from the speed of retrieval. That is, people do not base their judgment on the speed of retrieval but on a feeling of ease which is separate from and unrelated to the speed of retrieval. Although such a modified ease-of-retrieval model is logically possible, it would have to specify (a) how the feeling of ease is generated, and (b) why it is unrelated to the latency of retrieval. Straightforward answers to these questions are not in sight. Finally, one might object to the present findings that the manipulation of emotion frequencies was only partly successful and that the high correlation across scenario sets was due to the fact that emotions which are frequent in everyday life were also frequent in all four scenario sets. As a consequence, the participants may have relied on generalized beliefs about the frequencies of different emotions, when making their frequency estimates. However, the analyses of different sets of scenarios suggested that participants were also sensitive to differences in the frequencies of emotions between sets. Nevertheless, a stronger demonstration of sensitivity to experimentally manipulated frequencies would be needed to rule out this hypothesis. This was attempted in Study 4, which also served the purpose to replicate the findings of Study 3.

6 STUDY 4

A main aim of Study 4 was to replicate the findings of Study 3. In addition, Study 4 had the purpose to study how individual differences in a repressive way of coping influence the encoding and retrieval of emotion memories; the results bearing on this issue are reported elsewhere (Schimmack & Hartmann, in press; see 7.3.1 for a brief summary). As a consequence, all participants rated the same set of scenarios with regard to the same set of emotions. Furthermore, the selected scenarios elicited mainly unpleasant emotions. This had the advantage that the frequencies of emotions differed from the frequencies of emotions in everyday life and from the frequencies of emotions in Study 3. Hence, Study 4 provides for a stronger test of participants’ sensitivity to experimentally manipulated frequencies of emotions.

6.1 Method

6.1.1 Participants

61 undergraduate psychology students (14 male, 47 female) at the Free University Berlin participated in the study for course credit.

6.1.2 Material and Procedure

The SRT included 25 negative and 5 positive scenarios. 16 emotion words (13 unpleasant and 3 pleasant) were selected for the rating task. The rating scale was changed to a 7-point scale so that individual differences in the intensity of emotional reactions could be detected more easily, which was important for different research questions (cf. Schimmack & Diener, in press; Schimmack & Hartmann, in press). The response categories were labeled “not”, “very slightly”, “slightly”, “medium”, “strongly”, “very strongly”, and “extremely strongly” and were scored from 0 to 6. As in Study 3, participants were instructed that only zero-rating imply the complete absence of an emotion, whereas all remaining response categories imply its presence, although with different degrees of intensity. The frequency judgment task was identical to the one in Study 3 and the same 32 emotions were used. The LRT was identical to that used in Study 3. However, a different set of 10 emotions was used as retrieval cues, including five salient and five non-salient emotions.

6.2 Results

6.2.1 Absolute Accuracy

In all following analyses the actual frequencies are based on the SRT ratings in Study 3. This had the advantage that Study 3 provided actual frequencies for all 32 emotions included in the frequency judgment task. Furthermore, the actual frequencies of salient and non-salient emotions are both based on ratings of a different group of participants. Absolute accuracy was determined separately for the salient and the non-salient emotions. A significant differences was obtained in that frequency judgments of salient emotions (mean SD = 8.09) were more accurate than those of non-salient emotions (mean SD = 9.41) t(60) = 4.99, p < .01. This finding replicates Study 1.

6.2.2 Relative Accuracy

Figure 17 shows that participants in Study 4 again underestimated the actual frequencies of emotions. Across all emotions, the relative accuracy was d = -5.99, which is significantly different from zero, F(1,31) = 54.97, p < .01. Figure 17 also shows that the frequencies of frequent emotions were underestimated more strongly than those of infrequent emotions. This is also evident in the correlation between actual frequencies and the relative accuracy score, r = -.90, p < .01. In sum, Study 4 replicates the previous finding that people underestimate the frequency of emotions and that they do so especially for frequent emotions.

6.2.3 Discriminative Accuracy across Emotions

One aim of Study 4 was to demonstrate that participants are sensitive to experimentally manipulated frequencies of emotions. Therefore, it is important to demonstrate that the selection of scenarios in study 4 yielded frequencies of emotions that are independent of emotion frequencies in real life. For the 29 overlapping emotions between Study 2 and 4, the correlation between the actual frequencies of emotions was r = -.21, p = .29. As a consequence, discriminative accuracy across emotion in the present study cannot be attributed to generalized beliefs about the frequencies of emotions.

For analysis at the group level, the frequency judgments of all participants were averaged. The correlation between actual frequencies and frequency estimates was r = .66, p < .01. This correlation is rather low, compared to the values in the previous studies. One explanation could be that the present analysis included salient and non-salient emotions. Figure 17 already shows that the frequency estimates of non-salient emotions were lower than those of the salient emotions. Therefore, salience can attenuate the present correlation. As a consequence, separate correlations were computed across the 16 salient and the 16 non-salient emotions. The correlation for the salient emotions was indeed higher, r = .85, p < .01, but the correlation across the 16 non-salient emotions was not, r = .63, p < .01. The difference between the two correlations also suggests that salience increased the discriminative accuracy across emotions.

The analysis at the individual level was carried out separately for the 16 salient and the 16 non-salient emotions. For the salient emotions the discriminative accuracy across emotions (mean r = .45) was significantly higher than the one for the non-salient emotions (mean r = .36), F(1,60) = 5.53, p < .05. Because the actual frequencies are based on ratings of a different sample, this finding suggests that salience also increased the discriminative accuracy across emotions. 6.2.4 Influence of the Salience of an Emotion on Frequency Estimates The following analysis attempts to replicate the finding of Study 1 and 3 that salience at the time of encoding increases frequency judgment. Figure 17 already suggests that this was also true in Study 4. In the present study, all participants rated the same emotions in the SRT. Therefore, the analysis had to be carried out across emotions. To control for any differences in the actual frequencies between salient and non-salient emotions, the actual frequencies of the emotions were used as a covariate. The analysis of variance revealed a highly significant effect of salience, F(1,29) = 23.61, p < .01. A comparison of the predicted means shows that participants judged the frequency of non-salient emotions to be lower (M = 5.18) than the frequency of salient emotions (M = 8.28). It was also tested, whether salience boosted especially the frequency judgments of frequent emotions. Figure 17 already suggests that this was not true, because the slops of the regression line for salient and non-salient emotions were similar. To test this hypothesis quantitatively, a median split of the actual emotion frequencies was carried out. Then, frequency judgments were used as a dependent variable in an ANOVA with the factors actual frequency (high vs. low frequency) and salience. If frequent emotions benefit more from the salience manipulation, the interaction should be significant, but the ANOVA does not confirm this prediction, F(1,28) < 1, p > .50. This finding is consistent with Study 3, where the same salience manipulation also failed to affect especially the frequent emotions, but it is inconsistent with Study 1, where participation in a diary study increased especially the frequencies of frequent emotions.

One explanation for this pattern of results is that the different salience manipulations influence different stages of the frequency judgment process (Brown, 1995). It might be that the participation in the diary study increased participants awareness of the absolute number of emotional experiences. Therefore, they converted the same familiarity signal into higher absolute frequencies than they did before the diary study. In contrast, rendering particular emotions salient does not influence the range of the absolute frequencies; it only boosts the familiarity signal of the salient emotions, which then receive higher absolute estimates for the same range of absolute frequencies as the non-salient emotions.

6.2.5 Testing the Ease-of-Retrieval Model

The interrater agreement for the frequency estimates was excellent, ICC[2,61] = .96. Because of the larger sample size, the judgment times were also more consistent across participants than in Study 3, ICC[2,61] = .61. Nevertheless, replicating the finding of Study 3, the size of the frequency judgments was unrelated to the speed of these judgments (r = -.19, p = .30).

The second test of the ease-of-retrieval hypothesis used the latencies in the LRT. As in Study 3, the retrieval latencies were averaged across participants. This time, the interrater agreement was excellent (ICC[2,61] = .95). The average latencies were then correlated with the frequency estimates across the 10 emotions included in the latency-to-retrieve task. The correlation with the frequency estimates was significant and consistent with predictions of the ease-of-retrieval model (r = -.76, p < .05). However, the next result indicates that this support is more apparent than real. As in Study 3, the average retrieval latencies were compared to the average speed of the frequency judgments. Again, the mean retrieval latency was significantly longer (M = 4.94, SD = 1.40) than the time needed to make a frequency judgment (M = 3.23, SD = 0.10), t(9) = 3.96, p < .01. Hence, it is not possible that the frequency estimates are based on retrieval processes. The finding that the speed of frequency judgment is unrelated to frequency judgments and that these judgments are made faster than the retrieval of scenarios contradicts the ease-of-retrieval hypothesis. It is instructive that the contradictory evidence was obtained concurrently with a high negative correlation between frequency judgments and latencies in the LRT. This finding demonstrates nicely that the significant negative correlation between frequency judgments and latencies in a LRT that were obtained in previous studies (MacLeod et al., 1994; Fitzgerald et al., 1988) do not indicate that the frequency judgment were based on the ease-of-retrieval.

It is also instructive to look at the speed of frequency judgments and retrieval times of the very rare emotion regret. On average, participants made a frequency judgment within 3.22s. In contrast, the average latency in the LRT was 8.13s. In addition, these long latencies are partly due to the responses of 42 participants, who were unable to retrieve a regret scenario within 10s (so that their retrieval latencies were set to 10s). These 42 participants were able to make a frequency judgment within the allotted time of 10s even though they could not recall a single scenario in this time limit. As a consequence, ease-of-retrieval cannot explain the frequency judgments of these participants, because the ease of-retrieval hypothesis assumes that at least one exemplar was retrieved. One could try to rescue the ease-of-retrieval model and argue for a two-stage processes. For example, a fast recognition process could inform the participant whether an emotion occurred at all. If so, an exemplar is actually retrieved from memory and the frequency is estimated following the ease-of-retrieval model. If the recognition signal suggests that no exemplar can be retrieved, the frequency judgment is zero. Such a two-stage model is not very parsimonious because Hintzman and Curran (1994) showed that recognition judgments are based on the same familiarity signal that is assumed to underlie frequency judgments. Therefore, the initial recognition process already supplies the frequency information that the additional ease-of-retrieval heuristic is supposed to provide. In sum, the analysis replicated two findings of Study 3. The judgment times of frequency judgments are not related to the size of the judgment and frequency judgments are made faster than the time needed to retrieve a single scenario. These findings are damaging for retrieval-based frequency judgment models. The finding that latencies in the retrieval task were significantly related to the frequency judgments does not rescue the ease-of-retrieval model. Rather, it demonstrates that the same finding in other studies does not provide evidence for a causal role of ease-of-retrieval in the frequency judgment process.

6.3 Discussion

Study 4 replicated many of the earlier findings. As in Study 1 and 3, participants underestimated absolute frequencies of emotions, especially those of frequent emotions. Furthermore, increasing the salience of emotions at the time of encoding, increased frequency judgments. As in Study 3, salience did not increase especially the frequency estimates of frequent emotions. Most importantly, Study 4 replicated the findings of Study 3, that (a) the times needed to make frequency judgment were unrelated to the magnitude of the judgments and (b) that frequency judgments were made faster than the retrieval of a single scenario. Therefore, frequency judgments cannot be based on information about the retrieval of scenarios, although Study 4 found a significant negative correlation between frequency judgments and latencies in the LRT.

7 GENERAL DISCUSSION

The two main topics of the dissertation􀁘namely (a) the accuracy of frequency judgments of emotions, and (b) the cognitive processes underlying these judgments􀁘are discussed separately.

7.1 The Accuracy of Frequency Judgments of Emotions

Four types of accuracy were differentiated: (a) absolute and (b) relative accuracy as well as discriminative accuracy (c) across emotions and (d) across participants. The results bearing on each of the types of accuracy are discussed next.

7.1.1 Absolute Accuracy

Absolute accuracy was only explored in studies 1 and 4, because the participants in Study 2 did not make absolute estimates, and Study 3 lacked an appropriate standard of comparison. However, studies 1 and 4 both showed that frequency judgments of emotions are not very accurate in an absolute sense. This finding is inconsistent with a direct-encoding of frequencies of emotions. On the other hand estimation errors can be expected when participants use heuristics to make the absolute estimates. The finding in studies 1 and 3 that absolute accuracy was higher for salient compared to non-salient emotions indicates that salience can increase the accuracy of frequency judgments of emotions. Increasing accuracy due to salience has also been observed in other studies of frequency judgments (Naveh-Benjamin & Jonides, 1986). Brown and Singer (1993) pointed out that absolute accuracy is sensitive to two types of estimation errors: (a) errors in the estimation of the distribution of the actual frequencies and (b) errors in the estimation of the level of the absolute frequencies. Therefore, the salience effect on absolute accuracy can be due to an influence on either (or both) of these error sources. These possibilities are explored in the next analyses.

7.1.2 Relative Accuracy

Relative accuracy refers to the question how good the absolute level of frequency judgments reflects the absolute level of the actual frequencies, or in other words, whether people over- or underestimate the actual frequency of their emotions. Studies 1, 3, and 4; Study 2 did not allow to address this question, all showed the predicted effect that frequency estimates underestimated the actual frequencies of emotion. Two objects might be raised against this finding. In the two field studies, actual frequencies were based on the sum of repeated frequency estimates, whereas the estimates are based on a single estimate for the whole time period. Fiedler and Armbruster (1994) demonstrated that splitting a single frequency judgment of one category into two frequency judgments of two sub-categories produced different frequency estimates: The sum of the two estimates was higher than the frequency judgment of the whole category. Therefore, one might argue that the repeated daily estimates might overestimate the actual frequencies. This explanation of the effect in Study 1 encounters several difficulties. First, underestimation was also found in studies 3 and 4, where actual frequencies were not based on split frequency estimates. Second, Fielder and Armbruster’s results did not show overestimation for the split-category judgments; rather they showed that split-judgments prevented categories from being underestimated. Therefore, the sum of the split-estimates was more accurate than the single judgments of a whole category.

Hence, the category-split effect supports, rather than contradicts, the current interpretation that the frequency estimates for the whole diary period underestimate the actual frequencies of emotions. Finally, underestimation is also prevalent in experimental studies of frequency judgments (cf. Watkins & LeCompte, 1991; Williams & Durso, 1986) in which the actual frequencies were objectively determined.

A second objection could be that the participants changed the meaning of the emotion words from the daily estimates to the frequency estimates over three weeks (Schwarz, Strack, Müller, & Chassein, 1988). For short time periods even very mild experiences of the emotion were counted, whereas for longer time periods the participants considered only severe experiences of the emotion. Again, this objection can not explain underestimation in studies 3 and 4, where participants were explicitly told that frequency judgments should reflect all scenarios in which the emotion was rated to be present, irrespectively of the intensity. Nevertheless, the actual frequencies were underestimated in the frequency judgment task. In sum, the present results provide strong support for the hypothesis that people underestimate the frequency of their emotions.

A related expectation was that underestimation should increase with the actual frequency of an emotion, which is also a common finding in the frequency judgment literature (Mingay et al., 1994; Watkins & LeCompte, 1991). Again, all studies that allowed a test of this prediction confirmed it. This finding has important practical implications when frequency judgments of emotions are used to measure subjective well-being. Often researchers compute a difference score between the frequencies of pleasant and unpleasant emotions. This hedonic-balance score is then used as a measure of subjective well-being. The problem with this index is that it
underestimates the well-being of those people who experience more pleasant than unpleasant emotions, because the more frequent pleasant emotions are underestimated more than the less frequent unpleasant emotions. Similarly, the index underestimates the unhappiness of those people who experience unpleasant emotions more frequently than pleasant emotions.

Another noteworthy finding in studies 1, 3 and 4 was that making certain emotions salient increased the estimated frequencies of these emotions. This finding also has implications in many applied settings. For example, several psychotherapies are likely to increase the salience of emotional experiences. This could produce an expected increase in the reported frequencies of pleasant emotions, and an unexpected increase in the reported frequencies of unpleasant emotions, without any changes in the actual frequencies of emotional experiences. Because salience could affect the comparison of pre- and post-treatment measures, evaluation studies of treatment effects should include control groups in which emotional experiences are made salient.

Study 1 also indicated that salience increased especially the frequency estimates of the more frequent emotions. This was evident in a steeper slop of the regression line for post-diary estimates. In contrast, in studies 3 and 4 the regression slop of salient emotions was elevated but not steeper, indicating no special influence of salience on frequent emotions. The discrepant findings can be due to the fact that the different salience manipulations influenced different stages of the frequency judgment process (Brown, 1995). To judge frequencies of emotions, participants first have to construct a range of plausible frequencies. Then, they can assign frequencies to emotions by mapping the strength of the familiarity signal onto the frequency scale. The pre-post design in Study 1 very likely influenced participants’ beliefs about the plausible range of emotion frequencies. Making frequency judgments each day, they noticed that they experienced more emotions than they believed to experience before the diary study. Hence, they increased the upper limit of the frequency scale for the post-diary judgments. Similarly, Brown demonstrated that manipulations of participants beliefs about the range of plausible frequencies changed the slop of the regression line.

This effect is different from the salience effect observed in studies 3 and 4 which showed that frequency judgments were higher for emotions that were made salient compared to emotions that were not made salient. It is plausible that this salience manipulation influenced the first stage of the frequency judgment process. Salient emotions appeared to be more familiar and therefore were rated to be more frequent than non-salient, unfamiliar emotions. Nevertheless, the familiarity feelings of salient and non-salient emotions were mapped onto the same range of absolute frequencies during the second stage of the judgment process, which leads to the observed differences in the level, but not in the slope, of the regression lines in studies 3 and 4. The distinction between two stages in the frequency judgment process (Brown, 1995) has also practical implications. The familiarity model predicts that people’s feeling of familiarity has high discriminative accuracy across emotions. However, it does not allow straightforward predicts of relative accuracy and discriminative accuracy across participants, because these two types of accuracy also depend on the second stage in which the feeling of familiarity is converted into an absolute estimate. A better understanding of this conversion process might help to improve the measurement of emotion frequencies. For example, one could try to assist respondents in their selection of an appropriate range of frequencies. For example, Blair and Williamson (1994) discussed the merits and pitfalls of providing participants with population norms of frequencies (e.g. On average people go to church once every three month) to increase the relative accuracy of frequency estimates. This procedure could improve the accuracy of frequency estimates, if participants have information about their relative standing on the relevant dimension (I go much less frequently to church than the average person). With regard to internal states such as emotions, it is unlikely that people have accurate knowledge how they compare to others in the frequency of emotional experiences. To conclude, the conversion process of frequency information (i.e. feeling of familiarity) into an absolute estimate is an important topic for future research, not only from a theoretical (Brown, 1995) but also from a practical point of view (Blair & Williamson, 1994).

7.1.3 Discriminative Accuracy Across Emotions

The present set of studies also explored the discriminative accuracy across emotions􀁘that is, the question how well frequency judgments discriminate the actual frequencies of different emotions. This question is relevant for some issues in research on emotions. First, hierarchical models of the structure of emotions (Oatley & Johnson-Laird, 1987; Shaver, Schwartz, Kirson & O’Connor, 1987) predict that emotions on higher levels of the hierarchy are experienced more frequently than emotions lower in the hierarchy. For example, sadness should be experienced more frequently than disappointment because disappointment is assumed to be a subtype of sadness. To test these predictions of structural models of emotions, one needs accurate information that discriminates between frequencies of different emotions. It also seems to be an interesting topic for future research on emotions, why certain emotions are experienced more frequently than others. For example, why is anxiety experienced in general more frequently than hate? Any emotion theory that explains how emotions are elicited should eventually explain differences between emotions in their frequencies of occurrence.
Discriminative accuracy across emotions was good in all four studies, and excellent when the data were first aggregated across participants. This finding is consistent with frequency judgments in other domains. Indeed, the claim that frequency judgments are very accurate, which has led some theorists to propose direct encoding models of frequency information (Hasher & Zacks, 1984), is predominantly based on findings of high discriminative accuracy across stimuli. However, even this type of accuracy was influenced by salience (studies 1 and 4), which contradicts the direct-encoding models. Furthermore, Study 1 also showed that the type of the response format influenced this type of accuracy. At the individual level, absolute estimates discriminated more accurately between the frequencies of emotions, presumably because vague quantifier ratings forced participants to assign the same frequency-category to several emotions, although they were able to discriminate between the frequencies of these emotions.

7.1.4 Discriminative Accuracy across Participants

For many applied settings, the last type of accuracy is most important, namely discriminative accuracy across participants􀁘that is, the question how well retrospective frequency judgments reflect individual differences in the actual frequencies of emotions? The first two studies provide highly similar answers to this question, despite the fact that (a) the participants were from different nations, and (b) that the daily ratings and the frequency judgments were obtained with slightly different methods. In both studies the discriminative accuracy was between r = .30 and .60. An exact estimate is difficult because frequency judgments after daily frequency ratings overestimate this type of accuracy, whereas judgments before the daily frequency ratings underestimate this type of accuracy.

The present set of studies also provided some valuable results that rule out artifact explanations. First, the frequency estimates of a single emotion were often correlated most highly with the actual frequency of this emotion and not those of other emotions. Furthermore, frequency judgments made for two separate weeks were more highly correlated with the actual frequencies in the target week and not the alternative week. This pattern of results rules out a simple response set explanation of discriminative accuracy across participants. Furthermore, it contradicts the contention that frequency judgments are simply based on some generalized beliefs. In particular, generalized beliefs cannot explain the context sensitivity of the frequency judgments. Furthermore, studies 1 and 2 provided little support for the hypotheses that frequency judgments are systematically biased by the self-concept or current mood of the participants: Neither emotion-related personality traits, nor the current mood at the time of the frequency judgments appeared to have a consistent effect on the frequency judgments. In a similar vein, Schimmack and Hartmann (in press; see also Cutler, Larsen, & Bunce, 1996 ) investigated whether people with a repressive coping style; that is, people who are assumed to repress unpleasant feelings, underestimate the frequencies of their unpleasant emotions. Although so called repressors indicated to experience unpleasant emotions less frequently when they were confronted with emotional scenarios (in a scenario rating task, see 5.1.3.1), their frequency estimates in the subsequent frequency judgment task were not biased. For unpleasant emotions, repressors’ lower frequency judgments correctly reflected the lower number of endorsements of unpleasant emotions in the scenario rating task. In sum, the search for personality dimensions that predict a systematic bias in frequency judgments of emotions has been unsuccessful. Nevertheless, it is possible that frequency judgments of emotions are influenced by systematic biases that remained undetected in the previous studies.

7.2 The Cognitive Processes underlying Frequency Judgments of Emotions

How do people judge the frequency of emotions? In the present dissertation four models were compared with each other: (a) the direct encoding models, (b) the recall-estimate model, (c) the ease-of-retrieval model, and the (d) familiarity model. Several results of the studies were incompatible with the direct encoding model. Maybe the most important result was that participants in Study 2 were able to estimate accurately the frequency of emotions in the first and second week of the diary period. This means that the frequency were estimated at the time of retrieval. Additional evidence against the direct encoding models was that the salience of emotions at the time of encoding increased frequency judgments (Study 1, 3 and 4). According to the direct encoding models, emotional experiences should automatically activate emotion concepts and modify the frequency counter (Alba et al., 1980), independently of the salience of the concept. The same type of evidence has been used to challenge the direct encoding models in other domains (Greene, 1989). The present studies show that the direct encoding model cannot explain frequency judgments of emotions either.

The present dissertation also challenges the ease-of-retrieval model. If frequency judgments were actually based on information about the ease-of-retrieval, higher frequency judgments should be made faster. Contrary to this prediction, studies 3 and 4 did not find a relation between the speed and the size of frequency judgments. Furthermore, in Study 4 many participants were able to judge the frequency of a very rare emotion (regret), although they were unable to retrieve a single scenario in which this emotion occurred. In addition, latencies in a separate retrieval task were related to frequency judgments only in study 4, but not in Study 3. Nevertheless the frequency judgments in both studies possessed discriminative accuracy across emotions. Probably the most damaging finding was that in studies 3 and 4, the frequency judgments were made faster than the time needed to retrieve a single scenario. Therefore, retrieval of exemplars to a conscious level is simply to slow to explain the fast (and accurate) frequency judgments. The same line of reasoning has been used to dismiss retrieval-based models in related research on metamemory (Metcalfe, 1993; Reder, 1987). The last finding contradicts not only the ease-of-retrieval model, but also other retrieval-based models, such as the recall-estimate model.

The only model that is compatible with all the present findings is the familiarity model. With regard to emotions this model assumes that frequency questions activate multiple memory traces of previous emotional experiences simultaneously. As a consequence, the memory sends a direct signal, reflecting how many traces have been activated in memory. This signal is experienced as a sense of familiarity. Like the other indirect encoding models, the model predicts that participants can differentiate frequencies in different contexts, such as week 1 and 2 in the diary study, if the context variable is sufficiently encoded in memory (see Barsalou & Ross, 1986). Furthermore, it predicts that the salience of emotions at the time of encoding increases frequency judgments because salience strengthens memory traces, which results in a greater echo intensity (Hintzman, 1988). The familiarity model does not predict any relation between the size of a frequency judgment and its time and such a relation was not obtained. Therefore the lack of such relations does at least not contradict the model. To conclude, the familiarity model seems to be the best candidate for a theory of frequency judgments of emotions. This conclusion should not be generalized to frequency judgments in other domains. Retrieval-based estimation strategies can be used and apparently are used under certain conditions (Brown, 1995; Menon, 1994).

The superior performance of the familiarity model might appear to some readers due to the selection of paradigms, which tested and disconfirmed mainly predictions that are made by the competing models, but tested only few predictions that follow from the familiarity model. Although a disconfirmation of the familiarity model does not rescue the other models, such tests are an important topic of future research. One prediction that follows from the familiarity model is, for example, that the feeling of familiarity should be influenced by the presence of similar exemplars in memory (Hintzman, Curran, & Oppy, 1992; Jones & Heit, 1993). For example, frequency estimates of eating carp should be inflated by memories of eating trout. With regard to emotions, this would imply that the presence of memories in which a person experienced disappointment but not anger should nevertheless increase the echo intensity of anger because disappointment episodes share some features with anger episodes.

A second prediction based on the familiarity model is that frequency judgments of very similar events should be higher than those of the same number of dissimilar events (Hintzman and Stern, 1978), because similar memories produce a stronger feeling of familiarity (but see Brown, 1995 for an alternative explanation). Therefore, the experience of similar anger episodes (e.g., always directed at one’s romantic partner) should lead to higher frequency judgments than the experience of anger in different contexts (e.g., directed at boss, partner, friends, and strangers).
7.3 Influence of the Response Format on Frequency Judgments of Emotions
The present dissertation bridged two research traditions: Studies on the validity of self-reports of emotional experiences and experimental studies of the cognitive processes underlying frequency judgments. The preferred response format in the former tradition are vague quantifier ratings, whereas the latter tradition preferred absolute estimates. Furthermore, a survey study which included both response formats obtained divergent results in the comparison of the two response formats (Schaeffer, 1991). This stimulated the idea to use both response formats within the same study (Study 1). Highly surprising and interesting results were obtained. The most interesting effect was that vague quantifier ratings decreased from pre- to post-diary judgments, whereas at the same time the absolute estimates increased. The decrease in vague quantifier ratings was replicated in Study 2. Similar findings have been reported in the literature (Knowles, et al. 1996), but the effect is still not understood, although it has important practical implications. For example, in therapy evaluation studies, patients often have to report the frequency of their emotions before and after treatment, commonly by means of vague quantifier ratings. The present study suggests that changes can be expected in these measures not only because of treatment effects but also due to changes in the use of the rating scale. Because the ratings tend to drop, a questionnaire accessing mainly unpleasant affect, could indicate a positive treatment effect; a decrease in the frequency of unpleasant emotional experiences, even if the treatment did not influence the actual frequencies of emotional experiences.

A second important difference between the two response formats was that the correlation between averaged judgments of pleasant emotions and averaged judgments of unpleasant emotions were positively correlated for the absolute estimates, but not for the vague quantifier ratings, which revealed sometimes negative, sometimes positive, and sometimes non-significant correlations close to zero. Whereas the practical implications are discussed in the next paragraph, the cognitive processes underlying these effects of the response format are discussed now.

First, one might ask during which stage of the frequency judgment process the effects occur (Brown, 1995). That is, do the different response formats influence the generation of frequency information (e.g., a familiarity signal), or do the response formats influence the conversion of this information into a response. It is likely that the influence of the response formats occurs during the second stage. During this stage, the participants are faced with the task to determine a reasonable range of absolute frequencies onto which the feeling of familiarity can be mapped. This part of the estimation process is likely to be difficult and error prone. The vague quantifier ratings do not require that the participants derive an absolute standard. The participants can simply map the different degrees of familiarity onto the categories of the response scale. This seems to suggest that vague quantifier ratings should be preferred. However, if participants use vague quantifier ratings simply to indicate the relative strength of their feeling of familiarity, the ratings can no longer be compared across participants. A rating of the highest category by one participant might reflect a very different absolute frequency than the same rating made by another participant. Therefore, vague quantifier ratings also do not solve the problem how frequency information can be converted into a response that is comparable across participants. Furthermore, range-frequency theory has shown that even the assignment of absolute numbers printed on a sheet of paper is influenced by context effects such as the distribution of the numbers (Parducci & Wedel, 1986). Similar effects were obtained in the present study for vague quantifier ratings of emotion frequencies, but not for the absolute estimates. This finding would favor the absolute estimates. Finally, Study 1 demonstrated that absolute estimates and vague quantifier ratings possessed the same amount of discriminative accuracy across participants when the judgments were made after the diary study; but before the diary study, the vague quantifier ratings outperformed the absolute estimates. In sum, it is not possible to recommend one of the two response formats over the other. Future research on the judgment process might help to reduce judgment errors, and might ultimately allow a rational choice of the best response format. Until then, a viable research strategy is to use both response formats, because each one is associated with different errors. As a consequence, the combined application of two short questionnaires with both response formats would produce more valid results than a long questionnaire with only one of the two response formats (Green et al., 1993).

7.4 The Structure of Individual Differences in the Frequencies of Pleasant and Unpleasant Emotional Experiences

The structure of individual differences in the frequencies of pleasant and unpleasant emotional experiences was not a central issue of the present investigations. However, important results were obtained that challenge current models of the personality structure of emotions. Currently, researchers assume that the frequencies of pleasant and unpleasant experiences of affect are independent (cf. Bradburn, 1969) or negatively correlated (Green et al., 1993). Furthermore, influential personality theories predict the frequencies of pleasant and unpleasant emotions to be independent (Costa & McCrae, 1992; Meyer & Shack, 1989; Watson & Clark, 1992), presumably because pleasant and unpleasant emotions are generated in different areas of the brain.

Studies 1 and 2 replicated previous results in that low correlations were obtained with the traditional response format, namely vague quantifier ratings. However, high positive correlations were obtained for absolute frequency estimates. Furthermore, regression analyses suggested that vague quantifier ratings produce an artifact, because the respondents use them partly to judge percentages of their emotional experiences and only partly to judge absolute frequencies of experienced emotions. As a consequence, individual differences in the overall number of emotional frequencies are obscured and the correlation between frequencies of pleasant and unpleasant emotions becomes negative. This invalidates the conclusion of previous studies that the frequencies of pleasant and unpleasant emotional experiences are independent. The present study suggests that a person who experiences pleasant emotions often also experiences unpleasant emotions often. This finding is consistent with a study by Schimmack and Diener (in press) which also demonstrated a positive correlation between frequencies of pleasant and unpleasant emotions, derived from repeated ratings of emotional events in everyday life. Furthermore, the positive correlation between pleasant and unpleasant emotions, is consistent with the positive correlation obtained for the number of pleasant and unpleasant events that people encounter in their lives (Suh et al., 1996).

It should be noted, however, that this finding is limited to the frequency of emotions. It is likely that it does not hold for moods. Considering only the frequency with which a person is in a pleasant or unpleasant mood, it is very likely that the two frequencies are negatively correlated, because (a) a person is nearly always in a pleasant or unpleasant state, and at any moment in time feelings of pleasure and displeasure rarely co-occur. As a consequence, it is a logical necessity that pleasure and displeasure are highly negatively correlated, a fact which is sometimes obscured by measurement error (Green et al., 1993).

In sum, evidence is growing that the frequencies of pleasant and unpleasant emotions are positively correlated, whereas the number of times a person is in a pleasant mood is inversely related to the number of times he or she is in an unpleasant mood. Although, the results of the present study should not be regarded as a final answer to this question, the divergent findings for the two response formats underscore the need to understand the processes underlying frequency judgments of emotions, before it is possible to use these measures to answer fundamental questions about the causes and the structure of individual differences in the frequency with which they experience emotions.

8 OUTLOOK

At the end, I would like to discuss two important questions for future research, namely (a) individual differences in the accuracy of frequency judgments of emotions, (b) and the question how frequency of emotions should be assessed from a normative point of view.

8.1 Individual Differences in the Accuracy of Frequency Judgments of Emotions

In the present studies, accuracy scores of individuals were averaged to estimate the general accuracy of frequency judgments of emotions. However, the accuracy scores varied across participants. An important topic of future research would be to explore (a) whether these differences in accuracy are systematic, (b) whether it will be possible to assess a person’s level of accuracy, and (c) whether a person’s accuracy is related to existing constructs in the emotion literature, such as affect intensity (Larsen & Diener, 1987; Schimmack & Diener, in press), emotional intelligence (Mayer, DiPaolo, & Salovey, 1990), or alexithymia (Taylor, 1984).
People who experience emotions more intensely than others might tend to overestimate the frequency of their emotions, or in the light of a consistent trend towards underestimation, at least underestimate the frequency of their emotions less, because intense emotional experiences are more memorable (Rapaport, 1942; Holmes, 1970). In contrast, people with high alexithymia scores might severely underestimate the frequency of their emotions, because they have problems to label their emotional experiences; and the present studies found that labeling experiences increased frequency judgments when the same label was part of the frequency question. Finally, “emotionally intelligent” persons can be expected to be more accurate in their self-perceptions than others, because they pay more attention to their emotional experiences However, it is also possible that traditional personality measures do not capture biases in the memory representation of emotions very well. Given the assumption that systematic individual differences in the accuracy of frequency judgments of emotions exist, this would require the development of new questionnaires that measure these differences.

8.2 Toward a Normative Assessment of the Frequency of Emotions

The present studies explored how people judge the frequency of emotions. An equally important question is how the frequency of emotions can be measured with the highest degree of accuracy by means of an economical research instrument at one moment in time. This question arises because an on-line recording over a long time period simply is not a viable option in most assessment situations, although it would be the best strategy from a normative perspective. The present studies suggest that people rely on a sense of familiarity when they judge the frequency of emotions and that they do not use the ease-of-retrieval or a recall-estimate heuristic. However, the fact that most people do not use these strategies, does not imply that they cannot be used. To the contrary, people can recall individual episodes (Fitzgerald et al., 1988) and people can judge the ease-of-retrieval (Schwarz et al., 1991). As a consequence, an important question of future research is whether these strategies would lead to better estimates of the frequency of emotions. For example, in two studies, Means, Swan, Jobe, and Esposito (1994) asked participants to record the number of smoked cigarettes for a period of five days. Afterwards, the participants were asked to estimate the number of smoked cigarettes on one of these five days. Furthermore, they were instructed to use one of four strategies, namely (a) to use any strategy that they wanted, (b) to provide a spontaneous estimate without thinking of particular instances of smoking, (c) to think of different contexts (in the office, after dinner) and then to sum these separate estimates, and (d) to try to recall as many instances as possible. In this study, the recall of exemplars appeared to be a better measure than the spontaneous estimation strategy, in which judgments were probably based on a familiarity signal. But different findings might be obtained for frequency judgments of emotions, especially when the time period is longer than one day. To address this question, one needs a measure of the actual frequencies of emotions as a validation criterion, which is not biased in favor of any of the estimation strategies under investigation. Both the diary and the scenario rating task could be used to assess actual frequencies. In the beginning, the SRT is preferable because it is more economical than a diary study. Ultimately, however, it is necessary to compare different estimation strategies against actual frequencies of emotions in real life.

9 REFERENCES

Alba, J. W., Chromiak, W., Hasher, L., & Attig, M. S. (1980). Automatic encoding of category size information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 6, 370-378.
Andreasen, N. C., & Black, D. W. (1991). Lehrbuch Psychiatrie [Textbook on Psychiatry]. Weinheim: Beltz.
Barsalou, L. W., & Ross, B. H. (1986). The roles of automatic and strategic processing in sensitivity to superordinate and property frequency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 12, 116-134.
Blair, E., & Burton, S. (1987). Cognitive processes used by survey respondents to answer behavioral frequency questions. Journal of Consumer Research, 14, 280-288.
Blair, E., & Williamson, K. (1994). On providing population data to respondents. In N. Schwarz & S. Sudman (Eds.), Autobiographical memory and the validity of retrospective reports (pp. 173-186). New York: Springer.
Blaney, P. H. (1986). Affect and memory: A review. Psychological Bulletin, 99, 229-246.
Borkenau, P., & Ostendorf, F. (1991). Ein Fragebogen zur Erfassung fünf robuster Persönlichkeitsfaktoren [A questionnaire for the assessment of five robust personality factors]. Diagnostica, 37, 29-41.
Bower, G. H. (1981). Mood and memory. American Psychologist, 36, 129-148.
Bradburn, N. M. (1969). The structure of psychological well-being. Chicago: Aldine.
Brewin, C. R., & Andrews, B., & Gotlib, I. H. (1993). Psychopathology and early experience: A reappraisal of retrospective reports. Psychological Bulletin, 113, 82-98.
Briggs, J. L. (1970). Never in anger. Cambridge, MA: Harvard University.
Briggs, J. L. (1987). In search of emotional meaning. Ethos, 15, 8-15.
Brown, N. R. (1995). Estimation strategies and the judgment of event frequency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 1539-1553.
Brown, N. R., & Siegler, R. S. (1993). Metrics and mappings: A framework for understanding real-world quantitative estimation. Psychological Review, 100, 511-534.
Bruce, D., Hockley, W. E., & Craik, F. I. M. (1991). Availability and category-frequency estimation. Memory and Cognition, 19, 301-312.
Clore, G. L. (1994). Why emotions require cognition. In P. Ekman & R. J. Davidson (Eds.), The nature of emotion (pp. 181-191). New York: Oxford University Press.
Cohen, D. (1996). Law, social policy, and violence: The impact of regional cultures. Journal of Personality and Social Psychology, 70, 961-978.
Costa, P. T., & McCrae, R. R. (1980). Influence of extraversion and neuroticism on subjective well-being: Happy and unhappy people. Journal of Personality and Social Psychology, 38, 668-678.
Costa, P. T., & McCrae, R. R. (1992). The revised NEO personality inventory (NEO-PI R) professional manual. Odessa, FL: Psychological Assessment Resources.
Cutler, S. E., Larsen, R. J., & Bunce, S. C. (1996). Repressive coping style and the experience and recall of emotion: A naturalistic study of daily affect. Journal of Personality, 64, 379-405.
Diener, E. (1984). Subjective well-being. Psychological Bulletin, 95, 542-575.
Frequency Judgments of Emotions 149
Diener, E., & Diener, M. (1995). Cross-cultural correlates of life satisfaction and self-esteem. Journal of Personality and Social Psychology, 68, 653-663.
Diener, E., Diener, M., & Diener, C. (1995). Factors predicting the subjective well-being of nations. Journal of Personality and Social Psychology, 69, 851-864.
Diener, E., & Iran-Nejad, A. (1986). The relationship in experience between different types of affect. Journal of Personality and Social Psychology, 50, 1031-1038.
Diener, E., & Larsen, R. J. (1984). Temporal stability and cross-situational consistency of affective, behavioral, and cognitive responses. Journal of Personality and Social Psychology, 47, 871-883.
Diener, E., Larsen, R. J., & Emmons, R. A. (1984). Bias in mood recall in happy and unhappy persons. Paper delivered at the 92nd Annual Meeting of the American Psychological Association, Toronto, August 1984.
Diener, E., Larsen, R. J., Levine, S., & Emmons, R. A. (1985). Intensity and frequency: The underlying dimensions of positive and negative affect. Journal of Personality and Social Psychology, 48, 1253-1265.
Diener, E., Sandvik, E., & Pavot, W. (1991). Happiness is the frequency, not the intensity, of positive versus negative affect. In F. Strack, M. Argyle, & N. Schwarz (Eds.), Subjective well-being (pp. 119-139). Oxford: Pergamon Press.
Diener, E., Smith, H., & Fujita, F. (1995). The personality structure of affect. Journal of Personality and Social Psychology, 69, 130-141.
Ekman, P., & Davidson, R. J. (Eds.). (1994). The nature of emotion. New York: Oxford University Press.
Emmons, R. A., & Diener, E. (1986a). A goal-affect analysis of everyday situational choices. Journal of Research in Personality, 20, 309-326.
Emmons, R. A., & Diener, E. (1986b). Influence of impulsivity and sociability on subjective well-being. Journal of Personality and Social Psychology, 50, 1211-1215.
Epstein, S. (1983). A research paradigm for the study of personality and emotions. In M. M. Page (Ed.), Personality – Current theory and research: 1982 Nebraska symposium on motivation (pp. 91-154). Lincoln: University of Nebraska Press.
Fehr, B., & Russell, J. A. (1984). Concept of emotion viewed from a prototype perspective. Journal of Experimental Psychology: General, 113, 464-486.
Feldman Barrett, L. (in press). The relationships among momentary emotional experiences, personality descriptions, and retrospective ratings of emotion. Personality and Social Psychology Bulletin.
Fiedler, K. (1991). The tricky nature of skewed frequency tables: An information loss account of distinctiveness-based illusory correlations. Journal of Personality and Social Psychology, 60, 24-36.
Fiedler, K., & Armbruster, T. (1994). Two halfs may be more than one whole: Category-split effects on frequency illusions. Journal of Personality and Social Psychology, 66, 633-645.
Fiske, S. T., & Taylor S. E. (1984). Social cognition. Reading, MA: Addison-Wesley.
Fitzgerald, J. M., Slade, S., & Lawrence, R. (1988). Memory availability and judged frequency of affect. Cognitive Therapy and Research, 12, 379-390.
Frijda, N. H., Ortony, A., Sonnemans, J., & Clore, G. L. (1992). The complexity of intensity: Issues concerning the structure of emotion intensity. In M. S. Clark (Ed.), Review of Personality and Social Psychology: Emotion (Vol. 13, pp. 60-89). Newbury Park, CA: Sage.
Frequency Judgments of Emotions 150
Gabrielcik, A., & Fazio, R. H. (1984). Priming and frequency estimation: A strict test of the availability heuristic. Personality and Social Psychology Bulletin, 10, 85-89.
Green, D. P., & Goldman, S. L., & Salovey, P. (1993). Measurement error masks bipolarity in affect ratings. Journal of Personality and Social Psychology, 64, 1029-1041.
Greene, R. L. (1989). On the relationship between categorical frequency estimation and cued recall. Memory and Cognition, 17, 235-239.
Hanson, C., & Hirst, W. (1988). Frequency encoding of token and type information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 289-297.
Hasher, L., & Zacks, R. T. (1979). Automatic and effortful processes in memory. Journal of Experimental Psychology: General, 108, 356-388.
Hasher, L., & Zacks, R. T. (1984). Automatic processing of fundamental information. American Psychologist, 39, 1372-1388.
Hastie, R. & Park, B. (1986). The relationship between memory and judgment depends on whether the judgment task is memory-based or on-line. Psychological Review, 93, 258-268.
Haubensak, G. (1994). Wie entsteht der Häufigkeitseffekt in absoluten Urteilen? [On the origin of the frequency effect in absolute judgments]. Zeitschrift für experimentelle und angewandte Psychologie, 16, 378-397.
Hintzman, D. L. (1988). Judgments of frequency and recognition memory in a multiple trace memory model. Psychological Review, 95, 528-551.
Hintzman, D. L., & Block, R. A. (1971). Repetition and memory: Evidence for a multiple-trace hypothesis. Journal of Experimental Psychology, 88, 297-306.
Hintzman, D. L., & Curran, T. (1994). Retrieval dynamics of recognition and frequency judgments: Evidence for separate processes of familiarity and recall. Journal of Memory and Language, 33, 1-18.
Hintzman, D. L., Curran, T., & Oppy, B. (1992). Effects of similarity and repetition on memory: Registration without learning? Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 667-680.
Hintzman, D. L., & Stern, L. D. (1978). Contextual variability and memory for frequency. Journal of Experimental Psychology: Human Learning and Memory, 4, 539-549.
Hofstede, G. (1980). Culture’s consequences. Beverly Hills, CA: Sage.
Holmes, D. S. (1970). Differential change in affective intensity and the forgetting of unpleasant personal experiences. Journal of Personality and Social Psychology, 15, 234-239.
Howell, W. C. (1973). Representation of frequency in memory. Psychological Bulletin, 80, 44-53.
Isen, A. M. (1985). Asymmetry of happiness and sadness effects on memory in normal college students: Comments on Hasher, Rose, Zacks, Sanft, and Doren. Journal of Experimental Psychology: General, 114, 388-391.
Izard, C. E., Libero, D. Z., Putnam, P., & Haynes, O. M. (1993). Stability of emotion experiences and their relations to traits of personality. Journal of Personality and Social Psychology, 64, 847-860.
James, W. (1884). What is emotion? Mind, 9, 188-205.
Frequency Judgments of Emotions 151
Jones, C. M., & Heit, E. (1993). An evaluation of the total similarity principle: Effects of similarity on frequency judgments. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 799-812.
Jonides, J., & Jones, C. M. (1992). Direct coding for frequency of occurrence. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 368-378.
Keppel, G. (1991). Design and analysis: A researcher’s handbook (3rd edition). Englewood Cliffs, NJ: Prentice-Hall.
Knowles, E. S., Coker, M. C., Scott, R. A., Cook, D. A., & Neville, J. W. (1996). Measurement-induced improvement in anxiety: Mean shifts with repeated assessment. Journal of Personality and Social Psychology, 71, 352-363.
Larsen, R. J. (1992). Neuroticism and selective encoding and recall of symptoms: Evidence from a combined concurrent-retrospective study. Journal of Personality and Social Psychology, 62, 480-488.
Larsen, R. J., & Diener, E. (1987). Affect intensity as an individual difference characteristic: A review. Journal of Research in Personality, 21, 1-39.
Larsen, R. J., & Diener, E. (1992). Promises and problems with the circumplex model of emotion. In M. S. Clark (Ed.), Review of personality and social psychology: Emotion (Vol. 13, pp. 25-59). Newbury Park, CA: Sage.
Lazarus, R. S. (1991). Emotion and adaptation. New York: Oxford University Press.
Lewinsohn, P. M., & Rosenbaum, M. (1987). Recall of parental behavior by acute depressives, remitted depressives and nondepressives. Journal of Personality and Social Psychology, 52, 611-619.
MacLeod, A. K., & Andersen, A., & Davies, A. (1994). Self-ratings of positive and negative affect and retrieval of positive and negative affective memories. Cognition and Emotion, 8, 483-488.
Manis, M., Shedler, J., Jonides, J, & Nelson, T. E. (1993). Availability heuristic in judgments of set size and frequency of occurrence. Journal of Personality and Social Psychology, 65, 448-457.
Markus, H. R., & Kitayama, S. (1994). The cultural construction of self and emotion: Implications for Social Behavior. In S. Kitayama & H. R. Markus (Eds.), Emotion and Culture (pp. 89-130). Washington, DC: APA.
Martin, M. (1985). Neuroticism as predisposition toward depression: A cognitive mechanism. Personality and Individual Differences, 6, 353-365.
Matthews, G., Jones, D. M., & Chamberlain, A. G. (1990). Refining the measurement of mood: The UWIST Mood Adjective Checklist. British Journal of Psychology, 81, 17-42.
Mayer, J. D., & DiPaolo, & Salovey, P. (1990). Perceiving affective content in ambiguous visual stimuli: A component of emotional intelligence. Journal of Personality, 54, 772-781.
Means, B., Swan, G. E., Jobe, J. B., & Esposito, J. L. (1994). The effects of estimation strategies on the accuracy of respondents’ reports of cigarette smoking. In N. Schwarz & S. Sudman (Eds.), Autobiographical memory and the validity of retrospective reports (pp. 107-120). New York: Springer.
Menon, G. (1994). Judgments of behavioral frequencies: Memory search and retrieval strategies. In N. Schwarz & S. Sudman (Eds.), Autobiographic memory and the validity of retrospective reports (pp. 161-172). New York: Springer.
Mesquita, B., & Frijda, N. H. (1992). Cultural variations in emotions: A review. Psychological Bulletin, 112, 176-204.
Frequency Judgments of Emotions 152
Metcalfe, J. (1993). Novelty monitoring, metacognition and control in a composite holographic associative recall model: Implications for Korsakoff amnesia. Psychological Review, 100, 3-22.
Meudall, P. R. (1971). Retrieval and representations in long-term memory. Psychonomic Science, 23, 295-296.
Meyer, G. J., & Shack, J. R. (1989). Structural convergence of mood and personality: Evidence for old and new directions. Journal of Personality and Social Psychology, 57, 691-706.
Mingay, D. J., Shevell, K., Bradburn, N. M., & Ramirez, C. (1994). Self and proxy reports of everyday events. In N. Schwarz & S. Sudman (Eds.), Autobiographical memory and the validity of retrospective reports (pp. 235-250). New York: Springer.
Naveh-Benjamin, M., & Jonides, J. (1986). On the automaticity of frequency encoding: Effects of competing task load, encoding strategy, and intention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 12, 378-386.
Nelson, T. O. (1988). Predictive accuracy of the feeling of knowing across different criterion tasks and across different subject populations and individuals. In M. M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.), Practical aspects of memory: Current research and issues (Vol. 1, pp. 190-196). New York: Wiley.
Oatley, K., & Johnson-Laird, P. N. (1987). Towards a cognitive theory of emotions. Cognition and Emotion, 1, 29-50.
Parducci, A. (1968). The relativism of absolute judgments. Scientific American, 219, 84-90.
Parducci, A., & Wedell, D. H. (1986). The category effect with rating scales: Number of categories, number of stimuli, and method of presentation. Journal of Experimental Psychology: Human Perception and Performance, 12, 496-516.
Parkinson, B., Briner, R. B., Reynolds, S., & Totterdell, P. (1995). Time frames of mood: Relations between momentary and generalized ratings of affect. Personality and Social Psychology Bulletin, 21, 331-339.
Parrott, W. G., & Sabini, J. (1990). Mood and memory under natural conditions: Evidence for mood incongruent recall. Journal of Personality and Social Psychology, 59, 321-326.
Pavot, W., & Diener, E. (1993). Review of the Satisfaction With Life Scale. Psychological Assessment, 5, 164-172.
Pavot, W., Diener, E., & Fujita, F. (1990). Extraversion and happiness. Personality and Individual Differences, 11, 1299-1306.
Pekrun, R., & Frese, M. (1992). Emotions in work and achievement. In C. L. Cooper & I. T. Robertson (Eds.), International Review of Industrial and Organizational Psychology (Vol. 7, pp. 153-200). New York: Wiley.
Pepper, S. (1981). Problems in the quantification of frequency expressions. In D. W. Fiske (Ed.), Problems with language imprecision (pp. 25-41). San Francisco: Jossey-Bass.
Rapaport, D. (1942). Emotions and memory. New York: International Universities Press.
Reder, L. M. (1987). Selection strategies in question answering. Cognitive Psychology, 19, 90-138.
Reisenzein, R. (1995). On Oatley and Johnson-Laird’s theory of emotion and hierarchical structures in the affective lexicon. Cognition and Emotion, 9, 383-416.
Frequency Judgments of Emotions 153
Reisenzein, R., & Hofmann, T. (1993). Discriminating emotions from appraisal-relevant situational information: Baseline data for structural models of cognitive appraisals. Cognition and Emotion, 7, 271-293.
Reisenzein, R. & Schimmack, U. (1996). Similarity and covariation of affects: Findings and implications. Manuscript submitted for publication.
Reisenzein, R., & Schönpflug, W. (1992). Stumpf’s cognitive-evaluative theory of emotion. American Psychologist, 47, 34-45.
Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology: General, 104, 192-223.
Schaeffer, N. C. (1991). Hardly ever or constantly? Group comparisons using vague quantifiers. Public Opinion Quarterly, 55, 395-423.
Scherer, K. R., Wallbott, H. G., & Summerfield, A. B. (Eds.). (1986). Experiencing emotion: A cross-cultural study. Cambridge: Cambridge University Press.
Schimmack, U. (1996). Resolving some controversies about the mood circumplex. Paper presented at the Annual Meetings of the Midwestern Psychological Association, Chicago, May, 1996.
Schimmack, U. (1996). The relation between extraversion/neuroticism and positive/negative affect: A meta-analysis. Manuscript in preparation.
Schimmack, U. (in press). Das Berliner-Alltagssprachliche-Stimmungsinventar (BASTI): Ein Vorschlag zur kontentvaliden Erfassung von Stimmungen [The Berlin Everyday Language Mood Inventory: Toward the content valid assessment of moods]. Diagnostica.
Schimmack, U., & Diener, E. (in press). Affect Intensity: Separating intensity and frequency in repeatedly measured affect. Journal of Personality and Social Psychology.
Schimmack, U., & Reisenzein, R. (1994). On the demarcation of the mood domain. Paper presented at the symposium, “Mood – Consensus and controversy” at the 102nd Annual Convention of the American Psychological Association, Los Angeles, CA.
Schimmack, U., & Reisenzein, R. (in press). Cognitive processes involved in similarity judgments of emotion concepts. Journal of Personality and Social Psychology.
Schimmack, U., & Siemer, M. (1995). e = m x c! Über Emotionen, Stimmungen und Kognitionen [On emotion, mood and cognition]. Positionsreferat gehalten auf der 37. TeaP in Bochum, 1995.
Schwarz, N. (1987). Stimmung als Information [Mood as information]. Heidelberg: Springer.
Schwarz, N. (1990). Assessing frequency reports of mundane behaviors: Contributions of cognitive psychology to questionnaire construction. In C. Hendrick & M. S. Clark (Eds.), Research methods in personality and social psychology (pp. 98-119). Beverly Hills, CA: Sage.
Schwarz, N., Bless, H., Strack, F., Klumpp, G., Rittenauer-Schatka, H., & Simons, A. (1991). Ease of retrieval as information: Another look at the availability heuristic. Journal of Personality and Social Psychology, 61, 195-202.
Schwarz, N., & Clore, G. L. (1983). Mood, misattribution, and judgments of well-being: Informative and directive functions of affective states. Journal of Personality and Social Psychology, 45, 513-523.
Frequency Judgments of Emotions 154
Schwarz, N., Strack, F., Müller, G., & Chassein, B. (1988). The range of response alternatives may determine the meaning of the question: Further evidence on information functions of response alternatives. Social Cognition, 6, 107-117.
Shaver, P., Schwartz, J., Kirson, D., & O’Conner, C. (1987). Emotion knowledge: Further exploration of a prototype approach. Journal of Personality and Social Psychology, 52, 1061-1086.
Shrout, P. E., & Fleiss, J. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420-428.
Smith, E. R. (1991). Illusory correlation in a simulated exemplar-based memory. Journal of Experimental Social Psychology, 27, 107-123.
Smith, E. R., & Zarate, M. A. (1992). Exemplar-based model of social judgment. Psychological Review, 99, 3-21.
Steyer, R., Schwenkmezger, P., Notz, P., & Eid, M. (1994). Testtheoretische Analysen des Mehrdimensionalen Befindlichkeitsfragebogens [Test theoretical analyses of the multidimensional state questionnaire]. Diagnostica, 40, 320-328.
Suh, E., Diener, E., & Fujita, F. (1996). Events and subjective well-being: Only recent events matter. Journal of Personality and Social Psychology, 70, 1091-1102.
Taylor, G. J. (1984). Alexithymia: Concept, measurement, and implications for treatment. American Journal of Psychiatry, 141, 725-732.
Temme, G., & Tränkle, U. (1996). Arbeitsemotionen: Ein vernachlässigter Aspekt in der Arbeitszufriedenheitsforschung [Emotions at the workplace: A neglected aspect in research on job satisfaction]. Arbeit, 5, 275-297.
Thomas, D. L., & Diener, E. (1990). Memory accuracy in the recall of emotions. Journal of Personality and Social Psychology, 59, 291-297.
Thompson, C. P., & Mingay, D. (1991). Estimating the frequency of everyday events. Applied Cognitive Psychology, 5, 497-510.
Triandis, H. C. (1994). Culture and social behavior. New York: McGraw-Hill.
Tversky, A., & Kahneman, D. (1973). Availability: A heuristic for judging frequency and probability. Cognitive Psychology, 5, 207-232.
Underwood, B. J. (1969). The attributes of memory. Psychological Review, 76, 559-573.
Watkins, M. J., & LeCompte, D. C. (1991). Inadequacy of recall as a basis for frequency knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 1161-1176.
Watson, D., & Clark, L. A. (1992). On traits and temperaments: General and specific factors of emotional experience and their relation to the five-factor model. Journal of Personality, 60, 441-476.
Watson, D., Clark, L. A., & Tellegen, A. (1988). Development and validation of brief measures of Positive and Negative Affect: The PANAS Scales. Journal of Personality and Social Psychology, 54, 1063-1070.
Wiggins, J. S. (1973). Personality and prediction: Principles of personality assessment. Reading, MA: Addision-Wesley.
Williams, K. W., & Durso, F. T. (1986). Judging category frequency: Automaticity or availability? Journal of Experimental Psychology: Learning, Memory, and Cognition, 12, 387-396.
Windle, C. (1955). Test-retest effect on personality questionnaires. Educational and Psychological Measurement, 15, 246-253.
Frequency Judgments of Emotions 155
Wright, D. B., Gaskell, G. D., & O’Muircheartaigh, C. A. (1994). How much is ‘Quite a bit’? Mapping between numerical values and vague quantifiers. Applied Cognitive Psychology, 8, 479-498.
Zuroff, D. C. (1989). Judgments of frequency of social stimuli: How schematic is person memory? Journal of Personality and Social Psychology, 56, 890-898.