Aber bitte ohne Sanna


Social psychologists have failed to clean up their act and their literature. Here I show unusually high effect sizes in non-retracted articles by Sanna, who retracted several articles. I point out that non-retraction does not equal credibility and I show that co-authors like Norbert Schwarz lack any motivation to correct the published record. The inability of social psychologists to acknowledge and correct their mistakes renders social psychology a para-science that lacks credibility. Even meta-analyses cannot be trusted because they do not correct properly for the use of questionable research practices.


When I grew up, a popular German Schlager was the song “Aber bitte mit Sahne.” The song is about Germans love of deserts with whipped cream. So, when I saw articles by Sanna, I had to think about whipped cream, which is delicious. Unfortunately, articles by Sanna are the exact opposite. In the early 2010s, it became apparent that Sanna had fabricated data. However, unlike the thorough investigation of a similar case in the Netherlands, the extent of Sanna’s fraud remains unclear (Retraction Watch, 2012). The latest count of Sanna’s retracted articles was 8 (Retraction Watch, 2013).

WebOfScience shows 5 retraction notices for 67 articles, which means 62 articles have not been retracted. The question is whether these article can be trusted to provide valid scientific information? The answer to this question matters because Sanna’s articles are still being cited at a rate of over 100 citations per year.

Meta-Analysis of Ease of Retrieval

The data are also being used in meta-analyses (Weingarten & Hutchinson, 2018). Fraudulent data are particularly problematic for meta-analysis because fraud can produce large effect size estimates that may inflate effect size estimates. Here I report the results of my own investigation that focusses on the ease-of-retrieval paradigm that was developed by Norbert Schwarz and colleagues (Schwarz et al., 1991).

The meta-analysis included 7 studies from 6 articles. Two studies produced independent effect size estimates for 2 conditions for a total of 9 effect sizes.

Sanna, L. J., Schwarz, N., & Small, E. M. (2002). Accessibility experiences and the hindsight bias: I knew it all along versus it could never have happened. Memory & Cognition, 30(8), 1288–1296. https://doi.org/10.3758/BF03213410 [Study 1a, 1b]

Sanna, L. J., Schwarz, N., & Stocker, S. L. (2002). When debiasing backfires: Accessible content and accessibility experiences in debiasing hindsight. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28(3), 497–502. https://doi.org/10.1037/0278-7393.28.3.497
[Study 1 & 2]

Sanna, L. J., & Schwarz, N. (2003). Debiasing the hindsight bias: The role of accessibility experiences and (mis)attributions. Journal of Experimental Social Psychology, 39(3), 287–295. https://doi.org/10.1016/S0022-1031(02)00528-0 [Study 1]

Sanna, L. J., Chang, E. C., & Carter, S. E. (2004). All Our Troubles Seem So Far Away: Temporal Pattern to Accessible Alternatives and Retrospective Team Appraisals. Personality and Social Psychology Bulletin, 30(10), 1359–1371. https://doi.org/10.1177/0146167204263784
[Study 3a]

Sanna, L. J., Parks, C. D., Chang, E. C., & Carter, S. E. (2005). The Hourglass Is Half Full or Half Empty: Temporal Framing and the Group Planning Fallacy. Group Dynamics: Theory, Research, and Practice, 9(3), 173–188. https://doi.org/10.1037/1089-2699.9.3.173 [Study 3a, 3b]

Carter, S. E., & Sanna, L. J. (2008). It’s not just what you say but when you say it: Self-presentation and temporal construal. Journal of Experimental Social Psychology, 44(5), 1339–1345. https://doi.org/10.1016/j.jesp.2008.03.017 [Study 2]

When I examined Sanna’s results, I found that all 9 of these 9 effect sizes were extremely large with effect size estimates being larger than one standard deviation. A logistic regression analysis that predicted authorship (With Sanna vs. Without Sanna) showed that the large effect sizes in Sanna’s articles were unlikely to be due to sampling error alone, b = 4.6, se = 1.1, t(184) = 4.1, p = .00004 (1 / 24,642).

These results show that Sanna’s effect sizes are not typical for the ease-of-retrieval literature. As one of his retracted articles used the ease-of retrieval paradigm, it is possible that these articles are equally untrustworthy. As many other studies have investigated ease-of-retrieval effects, it seems prudent to exclude articles by Sanna from future meta-analysis.

These articles should also not be cited as evidence for specific claims about ease-of-retrieval effects for the specific conditions that were used in these studies. As the meta-analysis shows, there have been no credible replications of these studies and it remains unknown how much ease of retrieval may play a role under the specified conditions in Sanna’s articles.


The blog post is also a warning for young scientists and students of social psychology that they cannot trust researchers who became famous with the help of questionable research practices that produced too many significant results. As the reference list shows, several articles by Sanna were co-authored by Norbert Schwarz, the inventor of the ease-of-retrieval paradigm. It is most likely that he was unaware of Sanna’s fraudulent practices. However, he seemed to lack any concerns that the results might be too good to be true. After all, he encountered replicaiton failures in his own lab.

of course, we had studies that remained unpublished. Early on we experimented with different manipulations. The main lesson was: if you make the task too blatantly difficult, people correctly conclude the task is too difficult and draw no inference about themselves. We also had a couple of studies with unexpected gender differences” (Schwarz, email communication, 5/18,21).

So, why was he not suspicious when Sanna only produced successful results? I was wondering whether Schwarz had some doubts about these studies with the help of hindsight bias. After all, a decade or more later, we know that he committed fraud for some articles on this topic, we know about replication failures in larger samples (Yeager et al., 2019), and we know that the true effect sizes are much smaller than Sanna’s reported effect sizes (Weingarten & Hutchinson, 2018).

Hi Norbert, 
   thank you for your response. I am doing my own meta-analysis of the literature as I have some issues with the published one by Evan. More about that later. For now, I have a question about some articles that I came across, specifically Sanna, Schwarz, and Small (2002). The results in this study are very strong (d ~ 1).  Do you think a replication study powered for 95% power with d = .4 (based on meta-analysis) would produce a significant result? Or do you have concerns about this particular paradigm and do not predict a replication failure?
Best, Uli (email

His response shows that he is unwilling or unable to even consider the possibility that Sanna used fraud to produce the results in this article that he co-authored.

Uli, that paper has 2 experiments, one with a few vs many manipulation and one with a facial manipulation.  I have no reason to assume that the patterns won’t replicate. They are consistent with numerous earlier few vs many studies and other facial manipulation studies (introduced by Stepper & Strack,  JPSP, 1993). The effect sizes always depend on idiosyncracies of topic, population, and context, which influence accessible content and accessibility experience. The theory does not make point predictions and the belief that effect sizes should be identical across decades and populations is silly — we’re dealing with judgments based on accessible content, not with immutable objects.  

This response is symptomatic of social psychologists response to decades of research that has produced questionable results that often fail to replicate (see Schimmack, 2020, for a review). Even when there is clear evidence of questionable practices, journals are reluctant to retract articles that make false claims based on invalid data (Kitayama, 2020). And social psychologist Daryl Bem wants rather be remembered as loony para-psychologists than as real scientists (Bem, 2021).

The problem with these social psychologists is not that they made mistakes in the way they conducted their studies. The problem is their inability to acknowledge and correct their mistakes. While they are clinging to their CVs and H-Indices to protect their self-esteem, they are further eroding trust in psychology as a science and force junior scientists who want to improve things out of academia (Hilgard, 2021). After all, the key feature of science that distinguishes it from ideologies is the ability to correct itself. A science that shows no signs of self-correction is a para-science and not a real science. Thus, social psychology is currently para-science (i.e., “Parascience is a broad category of academic disciplines, that are outside the scope of scientific study, Wikipedia).

The only hope for social psychology is that young researchers are unwilling to play by the old rules and start a credibility revolution. However, the incentives still favor conformists who suck up to the old guard. Thus, it is unclear if social psychology will ever become a real science. A first sign of improvement would be to retract articles that make false claims based on results that were produced with questionable research practices. Instead, social psychologists continue to write review articles that ignore the replication crisis (Schwarz & Strack, 2016) as if repression can bend reality.

Nobody should believe them.

6 thoughts on “Aber bitte ohne Sanna

  1. Uli Schimmack’s post comments on two topics: the effect size of ease of retrieval effects and Lawrence Sanna. I address both in turn.

    The size of ease of retrieval effects

    Ease of retrieval effects are part of a larger research program that investigates the interplay of declarative information (what comes to mind) and experiential information, here in the form of the metacognitive experiences that accompany retrieval or thought generation (how easy or difficult it is to bring to mind). How easily information comes to mind is influenced by a wide range of variables, including the person’s knowledge about the topic, how much time has passed since the person last thought about the topic, whether contextual cues bring related information to mind, etc. Hence, effect sizes will vary by topic, population, context, and so on. The theoretical rationale of ease of retrieval effects predicts the pattern, but not the effect size in a specific case. My email to Uli Schimmack contained an attachment (linked below) that addresses these issues. He preferred not to engage with conceptual issues and instead proceeded as if there were plausible reasons to assume that the size of accessibility effects is independent of variables that influence accessibility. Why he would find that assumption plausible remains a mystery.

    The substantive comments I shared with Uli Schimmackare, and which he chose not to address, are here:

    I’m also surprised by Schimmack’s interpretation of the part of my email that he did share. I responded to his question whether my lab had any unpublished studies (in a research program dating back to the mid 1980s). He cited:

    “of course, we had studies that remained unpublished. Early on we experimented with different manipulations. The main lesson was: if you make the task too blatantly difficult, people correctly conclude the task is too difficult and draw no inference about themselves. We also had a couple of studies with unexpected gender differences” (Schwarz, email communication, 5/18,21).

    The above FAQ link (http://tiny.cc/easeFAQ) provides a substantive discussion of these issues. Consider for example the issue of gender differences. We observed early on that women remembered more about their relationships than men did. Hence, reporting five examples of a partner’s behavior was easier for women than for men. That alerted us to the need for population and topic specific pilot work to ensure that tasks are experienced similarly. In this case, differential relationship knowledge would have resulted in differential metacognitive experiences of men and women, inviting the misleading conclusion that men and women differ in how they use metacognitive experiences when forming a judgment. This is an example of learning about one’s procedures and manipulations rather than rushing to publish things without considering if they bear on the theoretically relevant construct — that construct is the experience of ease of retrieval, not the amount of knowledge stored in memory. For Schimmack, this kind of conceptual and empirical carefulness is apparently a sign of bad practice and becomes a case of an unreported “nonreplication”.

    Throughout, Schimmack’s logic confuses the technical manipulation (the number of examples people are asked to generate) with the theoretically specified variable (the subjective experience of ease or difficulty). The goal of this line of research is to understand an underlying psychological process: What do people infer from the metacognitive experiences that accompany their own recall and thought process? We address this issue with different manipulations (which have converged on parallel results). Treating any one of these manipulations as if the manipulation itself were the psychological issue under study is a conceptual mistake.

    In the end, Uli Schimmack’s comments highlight that one cannot evaluate psychological research without thinking about the psychological processes that the research investigates. A mere focus on effect sizes –based on the assumption that they should be invariant across topics, populations, time, and contexts– does not advance our understanding of the interplay of feeling and thinking or any other process. Conceptual negligence of this type does not advance insight. One can only hope that the “young researchers” Schimmack tries to reach in his post will complement their stats expertise with comparable expertise in the substantive phenomena they address.

    Lawrence Sanna

    As noted in the email Schimmack cites, I have no reason to assume that the patterns of the Sanna et al hindsight experiments would not replicate. The patterns are consistent with a large body of other studies across many topics: After the fact, people think that they would (would not) have predicted an event when reasons for its occurrence are easy (difficult) to generate. For reasons discussed above, the theory does not predict effect sizes and the classic hindsight tasks used differ from most other ease of recall studies by pertaining to hypothetical events about which people have little knowledge.

    This said, I have not been present when the experiments were run, as is the case for most collaborative work. Lawrence Sanna spent a sabbatical at Michigan during the academic year 1999/2000, when he was investigating perceptions of time and change. I was at Michigan at the time and we discussed the hindsight bias and planned several experiments, which he ran after his return to the University of North Carolina. A decade later, he was hired by Michigan as part of a spousal hire in another department, but resigned abruptly during his first semester. His resignation is assumed to have been in response to concerns about his work on the influence of physical elevation on thought, but this has not been confirmed.

    1. Norbert, let me be more direct (German).
      1. I don’t care how many studies you did where you tried out manipulations. I care about selective publishing of results that support the ease-of-retrieval hypothesis and not publishing results when studies do not find the desired results (e.g., Bares, 2007, unpublished dissertation, supervised by you). We all know that this is how the game was (and still is) played. Are you saying you didn’t play by those rules and you do not have a file drawer of studies that you would have published if they had turned out p < .05, but they didn't - even after trying several ways? Your z-curve says otherwise. https://replicationindex.com/2019/01/11/replicability-audit-of-norbert-schwarz/

      2. You do not comment at all about the strange finding that Sanna was able to produce effect sizes that are much larger than other researchers consistently across 9 studies. If you are such an expert on theory, can you provide a theoretical explanation for this? If so, you should be willing to bet on a successful replication even with the same sample size. Want to make it interesting? $1000?

      1. Uli, for the few who may ever read this, I copy the abstract and link of the Bares (2007) dissertation. The alleged “nonreplication” is that the effect did not emerge with KINDERGARDENERS! You tell me what you conclude from that for adult judgment. Well, you won’t because you don’t engage with substantive questions. But you could at least be transparent in you claims, even if it hurts the intended innuendo. — Norbert

        The development and influence of subjective experiences on children’s thinking: Implications for cognitive -behavioral therapy.

        Bares, Cristina B.

        Abstract: Cognitive-behavioral therapies are based on the assumption that how one thinks about life experiences has significance for one’s well-being and that changing faulty thoughts can be done by assigning new interpretations to these life experiences. Although the focus of cognitive-behavioral therapy is on changing the content of faulty thinking, an emerging literature proposes that the experience of bringing that content to mind is also important for judgments of well-being. When the task of restructuring thoughts about life events is experienced as difficult, the new interpretation given to the life event might be contrary to what was hoped for in therapy. Currently, a number of studies suggest that cognitive-behavioral therapies can be helpful in treating some childhood mental health problems, but, there is little research exploring how children’s experiences during thinking influence the interpretations they make during cognitive-behavioral activities. This dissertation is aimed at exploring whether children experience difficulty in recalling many instances from memory and whether the subjective experience of this difficulty influences children’s judgments on topics similar to topics raised in cognitive-behavioral therapy. In a between-subjects design, one hundred ninety-eight Kindergarten and second-grade children recalled either few or many instances of social (times I was shy, times I was friendly, times others were nice to me, times others were mean to me) and non-social (books) behaviors and subsequently made judgments of difficulty, liking, frequency, and interest. Children were asked to recall a few (easy task) or many (difficult task) examples of each behavior, with the specific numbers of examples differing by age. Thus, Kindergarten children were asked to recall 1 (easy) or 5 (difficult) examples while second-grade children were asked to recall 3 (easy) or 8 (difficult) examples. Confirming the hypothesis that recalling many examples would be a difficult task, Kindergarten boys reported greater difficulty in recalling 5 examples of times that others had been mean to them than a few examples and greater difficulty in recalling many examples of times they were friendly than a few examples. As posited by Schwarz’s fluency model, their subsequent judgments were influenced by this experienced difficulty rather than the content brought to mind in the examples. Kindergarten boys who recalled many times they were friendly reported that they liked being friendly less than boys in the few condition. These findings did not replicate in second-grade; second-grade children did not consistently report greater difficulty in recalling many (8) examples over a few (3), or in using the experience of difficulty in their subsequent judgments. It is possible that the examples or cut points were not developmentally appropriate. Further work remains to be done on exploring the developmental aspects of when children experience difficulty in recalling examples and when they are influenced by subjective experiences. This study suggests that by Kindergarten children have the capacity to use the subjective experience of thinking when constructing judgments. There are some relevant implications for cognitive-behavioral therapy. Practitioners need to know that the cognitive process of restructuring faulty interpretations of one’s life experiences can have significant consequences for therapeutic progress. Even young children can use the subjective experience of ease and difficulty in their judgments and this may influence how events are restructured. Specifically, subjective experience of recalling many instances from memory may influence children’s judgments in ways that may lead children to infer the opposite of what is intended in therapy. [less]



    My last comment lost text while I was copying the abstract. It should have read:

    “The alleged “nonreplication” is that the effect did not consistently emerge with 2ND GRADERS and KINDERGARDENERS!”

    1. And why is it not worthy to publish this finding as evidence that the (alleged) effect is limited to adults? Selective publishing implies that (a) you know the reason why a study failed to replicate and (b) it is not worthy to share this information with the scientific community. That is not how science works.

      In fact, the meta-analysis shows that age is not a moderator. The effect is as elusive with children as it is with adults.

      1. Uli, a single study papers with an inconsistent age effect requires a follow-up. Had it been published you’d complain about that as well. Enough said.

Leave a Reply