Blogging about statistical power, replicability, and the credibility of statistical results in psychology journals since 2014. Home of z-curve, a method to examine the credibility of published statistical results.
Show your support for open, independent, and trustworthy examination of psychological science by getting a free subscription. Register here.
“For generalization, psychologists must finally rely, as has been done in all the older sciences, on replication” (Cohen, 1994).
DEFINITION OF REPLICABILITY: In empirical studies with sampling error, replicability refers to the probability of a study with a significant result to produce a significant result again in an exact replication study of the first study using the same sample size and significance criterion (Schimmack, 2017).
See Reference List at the end for peer-reviewed publications.
Mission Statement
The purpose of the R-Index blog is to increase the replicability of published results in psychological science and to alert consumers of psychological research about problems in published articles.
To evaluate the credibility or “incredibility” of published research, my colleagues and I developed several statistical tools such as the Incredibility Test (Schimmack, 2012); the Test of Insufficient Variance (Schimmack, 2014), and z-curve (Version 1.0; Brunner & Schimmack, 2020; Version 2.0, Bartos & Schimmack, 2021).
I have used these tools to demonstrate that several claims in psychological articles are incredible (a.k.a., untrustworthy), starting with Bem’s (2011) outlandish claims of time-reversed causal pre-cognition (Schimmack, 2012). This article triggered a crisis of confidence in the credibility of psychology as a science.
Over the past decade it has become clear that many other seemingly robust findings are also highly questionable. For example, I showed that many claims in Nobel Laureate Daniel Kahneman’s book “Thinking: Fast and Slow” are based on shaky foundations (Schimmack, 2020). An entire book on unconscious priming effects, by John Bargh, also ignores replication failures and lacks credible evidence (Schimmack, 2017). The hypothesis that willpower is fueled by blood glucose and easily depleted is also not supported by empirical evidence (Schimmack, 2016). In general, many claims in social psychology are questionable and require new evidence to be considered scientific (Schimmack, 2020).
Each year I post new information about the replicability of research in 120 Psychology Journals (Schimmack, 2021). I also started providing information about the replicability of individual researchers and provide guidelines how to evaluate their published findings (Schimmack, 2021).
Replication is essential for an empirical science, but it is not sufficient. Psychology also has a validation crisis (Schimmack, 2021). That is, measures are often used before it has been demonstrate how well they measure something. For example, psychologists have claimed that they can measure individuals’ unconscious evaluations, but there is no evidence that unconscious evaluations even exist (Schimmack, 2021a, 2021b).
If you are interested in my story how I ended up becoming a meta-critic of psychological science, you can read it here (my journey).
References
Brunner, J., & Schimmack, U. (2020). Estimating population mean power under conditions of heterogeneity and selection for significance. Meta-Psychology, 4, MP.2018.874, 1-22 https://doi.org/10.15626/MP.2018.874
Schimmack, U. (2012). The ironic effect of significant results on the credibility of multiple-study articles. Psychological Methods, 17, 551–566 http://dx.doi.org/10.1037/a0029487
Schimmack, U. (2020). A meta-psychological perspective on the decade of replication failures in social psychology. Canadian Psychology/Psychologie canadienne, 61(4), 364–376. https://doi.org/10.1037/cap0000246
What is open science? Isn’t open science a tautology like “new innovation.” If there is open science, what is closed science? The need for open science arises from the fact that many academic practices are unscientific. They benefit academics without advancing or even hurting science. For example, conducting experiments and not reporting the results when they do not show a favorable outcome is a common academic practice that many people would recognize as undermining science. In psychology, this academic practice is widespread and explains why psychology journals have success rates over 90% (Sterling et al., 1995). Aside from just not publishing unfavorable results, academics also use a number of questionable statistical practices to turn failures into successes (John et al., 2012). All of these practices are well known and accepted among academics who understand the pressure to publish, while the general public focuses on the outcome and not the personal consequences of individual researchers.
Open science is basically the idea of an utopia where academic work produces scientific progress and creates incentive structures that reward honest attempts to advance science rather than meeting invalid indicators like publication and citation counts that can be gamed and can waste millions of dollars without any real progress.
In psychology, Brian Nosek spearheaded the Open Science movement and founded the Open Science Foundation (OSF). He also wrote several influential articles to promote Open Science practices in psychology (e.g., Nosek & Bar-Anan, 2012; Nosek, Spies, & Motyl, 2012).
These articles laid out a comprehensive vision to reform unscientific and counterproductive practices and incentive structures in psychology. Key elements focussed on (a) aligning incentives so truth-seeking wins over career advancement, (b) restructuring the unit of research itself from small teams to distributed collaborations, and (III) promoting a culture of transparency, openness to criticism, and willingness to find out you were wrong.
The Open Science movement has changed psychology in ways that nobody in 2010 could have imagined. Helped by empirical evidence that many results in Brian Nosek’s field of social psychology could not be replicated (a replication rate of 25% in the Open Science Reproducibility Project, 2015), journals now often demand assurances that results are reported honestly and reward practices that limit researchers’ abilities to change hypotheses or results when the original results are disappointing.
However, in other ways, progress has been limited. The main problem is that open admission of mistakes is still rare and researchers fear that any admission of mistakes harms their reputation. Thus, the incentive structure continues to reward promoting false claims. This problem is exacerbated by psychological mechanisms that have been documented in psychological research for decades and are highly robust. Motivated biases make it easier for people to see mistakes in others’ work than in their own work. The Bible calls this “seeing a splinter in others’ eyes, but missing the beam in one’s own eye.” The Nobel Laureate Feynman warned fellow scientists, “The first principle is that you must not fool yourself — and you are the easiest person to fool.”
Motivated Blindness
Ironically Brian Nosek’s work on the IAT provides an example of motivated blindness. All his knowledge and intelligence that helped him to spot the problem in colleague’s work with small samples that does not replicate, does not help him to see the problems in his own work on implicit biases. Originally invented by Anthony Greenwald, Brian Nosek helped to promote the Implicit Association Test (IAT) as a measure of associations that are sometimes called implicit, automatic, or unconscious. The IAT is a reaction time task, but modern technology made it possible to administer it on a website, hosted by Project Implicit and backed by Harvard University.
The IAT was never validated to the psychometric standards required for individual assessment. In practice, it functions like a distorting mirror — reflecting back what people largely already know about their attitudes, buried under substantial measurement error. If it were presented that way, no one would object, and no one would need a warning. But Project Implicit does not present it that way. Instead, visitors are warned that the test may reveal something undesirable about themselves. That warning only makes sense if the results are trustworthy. A distorting mirror does not come with a warning — it comes with a laugh. By framing the IAT as capable of revealing uncomfortable truths, Project Implicit treats an unvalidated research tool as a diagnostic instrument.
The problem is that even in 2024, Brian Nosek is still unable to openly admit that the IAT does not measure implicit biases (reference) and that his own studies, which convinced him the IAT is valid, were flawed. For example, in one study he claimed that a weak correlation of r = .2 between racial bias on the IAT and self-reported racial attitudes demonstrated convergent validity (reference). This is false. A correlation of r = .5 between self-reported height and self-reported weight does not validate either measure — it simply shows that two different constructs share a common method. Convergent validity requires measuring the same construct with different methods, not different constructs with the same method. When the IAT is compared to other implicit measures, the correlations are equally weak and, more importantly, no higher than the correlations with self-report measures (Schimmack, 2021). The IAT therefore provides no evidence that it reveals something about individuals that they do not already know. If somebody is biased against a particular group, they know it. The IAT does not uncover hidden biases — it merely repackages what people can already report about themselves.
While Brian Nosek is no longer actively involved in IAT research, he is still associated with Project Implicit and has made no attempt to correct the misinformation about the IAT given to visitors of the website that even administers mental health IATs without proven validity. Moreover, his students continue to publish misleading articles that make false claims about the IAT. These articles are published in journals that claim to promote open science, but do not allow for open criticism of statistical errors in their publications.
The article “On the Relationship Between Indirect Measures of Black Versus White Racial Attitudes and Discriminatory Outcomes: An Adversarial Collaboration Using a Sample of White Americans” by Axt et al. (2026) seems to meet the latest standards of open science. The research team is diverse with different opinions about the IAT. Hypotheses are preregistered with a clear criterion to claim validity. Brain Nosek was not a collaborator, but strongly endorsed this article on social media as a posterchild of open science practices.
Yet, the paper had a major limitations. It totally ignored the criticism of earlier structural equation modeling studies that failed to take shared method variance into account (Schimmack, 2021) and it made the same mistake again. By including two IATs, the published model treated all shared variance between the two IATs as valid variance, ignoring the well known evidence that IAT scores are also influenced by factors unrelated to the associations being measured. The authors could have avoided this mistake because they inspected Modification Indeces that show problem with a theoretically specified model They used these modification indices to adjust the measurement model for self-ratings, but not for the two IATs.
This mistake itself is not the main problem. Even a large team of scientists can make mistakes, especially if they are not trained in psychometrics and are working with measurement models. The real problem is that the editor of the journal that published the article is unwilling to correct it (Schimmack, 2026). This decision does not meet Open Science standards of open admission of mistakes or even engagement with criticism. Open science requires open discussion and responding to scientific criticism. I emailed Dr. Axt on December 2nd about my concerns and reanalysis of his data, but did not receive a response. This reaction highlights how far we still have to come before we can reach Brian Nosek’s utopia of open criticism and open admission of mistakes. Marketing the IAT as a “window into the unconscious” (Banaji & Greenwald’s, 2013, words, not mine) was a mistake, but Greenwald, Banaji, and Nosek have yet to admit so openly. Instead, Project Implicit continues to give people invalid feedback and Harvard does not care. This is not Open Science. This is naked self-interest to preserve a reputation that was earned with the false promise of addressing racial bias in the United States of America.
Why Do I care?
After cognitive performance tests, the IAT is arguably the most influential psychological test. Implicit bias was a major topic during the 2016 presidential campaign. Hillary Clinton made implicit bias a campaign issue, claiming that many Americans still harbor implicit racial biases. Asked for comment, Greenwald relied on IAT results for the two candidates to “go out on a limb to predict that Clinton’s vote margin on November 8 will exceed the prediction of the final pre-election polls.” The opposite happened. Trump became president and created a new culture that made open expression of racial bias “great again.”
Greenwald’s trust in the IAT was not justified. The IAT had already failed to predict racial bias in the 2008 election that Barack Obama won despite widespread racial prejudice. The IAT did not predict this outcome, but self-reports showed that some people openly admitted to biases that predicted their voting intentions over and above party affiliation (Greenwald et al., 2009).
Hillary Clinton’s endorsement of implicit bias may have cost her votes. The notion of implicit bias is that white people no longer endorse racist ideology, are motivated to avoid racial biases, but are still unconsciously influenced by them. That narrative has not aged well. A decade later, a presidential candidate can stand on a debate stage and say “they’re eating the cats and dogs” to applause, and win. The problem America faces is not hidden bias operating below the threshold of awareness. It is open prejudice, stated plainly, rewarded electorally, and entirely accessible to self-report.
The implicit bias framework misjudged the landscape. It assumed that the social norm against racism was strong and stable, and that the remaining work was to address what operated beneath it. Instead, the norm itself collapsed. Many white Americans are fully aware of their racial biases, are not motivated to change them, and are willing to vote for a candidate who hesitated to distance himself from the KKK. These voters were probably more offended by the suggestion that they are motivated to be unbiased than by the accusation that they have racial biases. Implicit bias training — which cost organizations millions — failed to address the real problem because it was designed for a world in which people wanted to be fair but couldn’t help themselves. That is not the world we live in.
Conclusion
Open science promises to align academic structures, incentives, and practices with the scientific aim of discovering the truth. To do so, science needs to check itself, notice mistakes, and correct them. However, the incentive structure continues to work against this goal. It is telling that Brian Nosek, the most visible proponent of open science in psychology, is unable to follow his own open science principles and admit that his work on the IAT did not produce a valid measure of implicit biases.
One might think that Nosek is in an enviable position to admit past mistakes given his achievements in making psychology more open. He is the Executive Director of the Center for Open Science and has a legacy that does not depend on the IAT. Other psychologists, like John Bargh, built their careers on a single line of research. When social priming failed to replicate, there was little else to fall back on. Walking away from the IAT should be easier by comparison. The fact that Nosek is unable to acknowledge the problems of the IAT shows even more the power of motivated blindness. It also highlights the most important change that is needed to make psychology a science. We need to normalize failure and see it as the inevitable outcome of exploration. Every failure that is openly acknowledged is a learning opportunity that makes success more likely the next time. Daniel Kahneman is a rare example of a psychologist who admitted mistakes in public and gained in recognition as a result. Maybe we should give Brian Nosek a Nobel Prize for his open science work so that he can admit his mistakes about the IAT.
References
Axt, J. R., Connor, P., Hoogeveen, S., Clark, C. J., Vianello, M., Lahey, J. N., Hahn, A., To, J., Petty, R. E., Costello, T. H., Mitchell, G., Tetlock, P. E., & Uhlmann, E. L. (2026). On the relationship between indirect measures of Black versus White racial attitudes and discriminatory outcomes: An adversarial collaboration using a sample of White Americans. Journal of Personality and Social Psychology. Advance online publication. https://dx.doi.org/10.1037/pspa0000480
Greenwald, A. G., Smith, C. T., Sriram, N., Bar-Anan, Y., & Nosek, B. A. (2009). Implicit race attitudes predicted vote in the 2008 U.S. presidential election. Analyses of Social Issues and Public Policy, 9(1), 241–253.
Nosek, B. A., & Bar-Anan, Y. (2012). Scientific Utopia I: Opening Scientific Communication Psychological Inquiry, 23(3), 217–243. DOI: 10.1080/1047840X.2012.692215
Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific Utopia II: Restructuring Incentives and Practices to Promote Truth Over Publishability. Perspectives on Psychological Science, 7(6), 615–631. DOI: 10.1177/1745691612459058
Nosek, B. A. (2024, November 8). Highs and lows on the road out of the replication crisis [Interview]. Clearer Thinking with Spencer Greenberg, Episode 235.
Schimmack, U. (2021). The Implicit Association Test: A method in search of a construct. Perspectives on Psychological Science, 16(2), 396–414. https://doi.org/10.1177/1745691619863798
Schimmack, U. (2021). Invalid claims about the validity of Implicit Association Tests by prisoners of the implicit social-cognition paradigm. Perspectives on Psychological Science, 16(2), 435–442. DOI: 10.1177/1745691621991860
Statistical power is widely known as a tool for planning sample sizes before studies are conducted. Less well known is the use of statistical power after publication to evaluate the credibility of published results in sets of studies, such as meta-analyses.
The basic idea is simple. Statistical power is the probability of obtaining a statistically significant result, typically p < .05. When the null hypothesis is true, this probability equals the alpha criterion, usually 5%. When the null hypothesis is false, power depends on the true population effect size, the sample size, and the significance criterion.
Several publication-bias tests estimate the average power of completed studies and compare it to the actual number of significant results in those studies (Ioannidis & Trikalinos, 2007; Schimmack, 2012). If the observed frequency of significant results is greater than the expected frequency based on average power, this suggests that non-significant results are missing from the published record. This reduces the credibility of the published results. The published literature is less robust than it appears, effect sizes are inflated, and the false-positive risk is higher than the observed success rate suggests.
The negative effects of publication bias on the credibility of published findings are well known and not controversial. Although publication bias is common, there is broad agreement that it is a problem. Publication-bias tests make it possible to detect this problem empirically.
The main challenge for power-based bias tests is estimating the true average power of completed studies. Developing and comparing different estimation methods is an active area of research, but these methods rely on the same basic principle: a set of studies cannot honestly produce substantially more significant results than its average power predicts. With reported success rates above 90% in many psychology journals, power-based tests typically show clear evidence of selection bias.
A few critical articles and blog posts have raised concerns about estimating average true power. However, these criticisms often do not engage with the actual goal of methods that use average power to detect publication bias. For example, McShane et al. acknowledge that average power says “something about replicability if we were able to replicate in the purely hypothetical repeated sampling sense and if we defined success in terms of statistical significance.” McShane objects that this is not useful because new replication studies can differ from original studies. But the purpose of computing average power in meta-analyses of completed studies is not to plan future studies or predict their exact outcomes. The purpose is to examine the credibility of the completed studies.
If a published literature reports 90% significant results but the completed studies had only 20% average power, the published record does not provide credible evidence for the claim, even if it contains hundreds of significant results. Critics of average-statistical power calculations often ignore this important information. Average power is not merely a planning tool. It is also a diagnostic tool for evaluating whether a published body of evidence is too successful to be credible.
An Average Power Primer: Clarifying Misconceptions about Average Power and Replicability
Cohen (1988) introduced power analysis for the planning of studies to reduce false negative (type-II error) rates in psychological science. After the replication crisis, the importance of a priori power analyses has gained increasing attention. However, the estimation of actual power of studies remains neglected. This article clarifies important differences between power analyses with hypothetical effect sizes to plan studies and power analyses of actual studies that have been completed. Knowing the actual power of completed studies is important because it can be used to assess publication bias. Sets of studies that have high success rates, but low power do not provide credible evidence for a hypothesis.
Chapter 8 examines whether major life events produce lasting changes in subjective well-being. It begins with adaptation theory, especially the “hedonic treadmill” idea, which claims that people quickly return to their baseline level of happiness after good or bad events. The chapter argues that this view is too pessimistic. People do adapt to some changes, but not all. Life circumstances can have lasting effects, especially when they affect important goals, daily experiences, income, status, relationships, or health.
The chapter distinguishes two mechanisms that can make gains fade over time. First, aspirations can rise. As people get better housing, higher income, or newer products, their standards also increase, so satisfaction may not rise much. Second, emotional reactions are often strongest when circumstances change. A new house or improved condition may feel exciting at first, but the emotional boost fades as the new situation becomes normal. These mechanisms differ across life domains. They may be strong for income or housing, but weaker for close relationships, where ongoing engagement continues to matter.
The chapter then reviews evidence on unemployment. Unemployment is one of the clearest examples of a life event with a strong and persistent negative effect on well-being. It reduces income, status, structure, purpose, and social contact. Panel studies show that people do not simply adapt to long-term unemployment. Their well-being remains lower while they are unemployed and improves when they find new work. Much of the effect appears to operate through income and financial satisfaction, but unemployment also affects status and purpose.
Housing shows a different pattern. Moving to a better home increases housing satisfaction, and this improvement can last. However, global life satisfaction often changes little. This does not mean housing is unimportant. Rather, housing may fade into the background of daily life and may be underweighted when people make global life evaluations. Domain-specific measures show that housing conditions matter, especially when they affect daily life through noise, crowding, poor physical conditions, safety, or comfort. The chapter uses housing to show why domain satisfaction is essential for understanding well-being.
Disability provides a more complex case. Early claims that people adapt almost completely to disability were based on weak evidence. Better panel studies show that acquired disability often produces lasting declines in life satisfaction, especially when it involves broader health deterioration. However, people born with disabilities often report higher well-being than those who acquire disabilities later. This supports the ideal-based framework: people born with a disability form their goals and identity around that condition, whereas people who acquire a disability must revise previously formed ideals. Adaptation depends less on time alone than on whether people can build new goals compatible with their changed circumstances.
The chapter gives special attention to relationships. Cross-sectional studies show that partnered people are generally happier than singles, but earlier research underestimated the effect because it focused on marriage rather than partnership. Weddings may produce only temporary increases in well-being, but having a stable partner appears to have a lasting positive effect for most people. Cohabitation and committed partnership matter more than legal marital status. Most people want a partner, and those without one tend to report lower well-being. Happy lifelong singles exist, but they appear to be the exception rather than the rule.
Partnership improves well-being partly through material advantages, because couples often share income and expenses. However, income explains only a small part of the partnership effect. Family satisfaction and relationship quality explain more. Partnership provides emotional support, shared life management, intimacy, and companionship. Sexual satisfaction contributes somewhat, but relationship satisfaction is much more important. Thus, the benefits of partnership are not reducible to money or sex.
The chapter also discusses spousal similarity in well-being. Spouses are more similar in well-being than would be expected from genetics alone, and their well-being tends to change in the same direction over time. This suggests that shared environments, such as household income, housing, relationship quality, and common life events, influence both partners. Some similarity may reflect assortative mating or stable shared conditions, but the evidence points strongly to environmental influences within couples.
The conclusion is that adaptation is real but not automatic. Some changes, such as improvements in housing, may produce lasting domain-specific satisfaction without strongly affecting global life satisfaction. Other events, such as unemployment, divorce, and disability, can reduce well-being until circumstances or goals change. Pursuing happiness through life changes is not futile, but people need to consider how changes will affect everyday life, goal progress, and long-term priorities. Novelty can be exciting, but lasting well-being depends more on stable fit between actual life, personal ideals, and daily experience.
Chapter 5 examines subjective well-being around the world. It argues that the most informative comparisons are not small differences among the happiest countries, such as Finland, Denmark, or other Scandinavian nations, but the large differences between countries near the top and bottom of the global distribution. These cross-national differences allow researchers to test whether subjective well-being is shaped by material living conditions, social institutions, culture, and historical change.
The chapter begins with the history of cross-national comparisons. Cantril’s ladder was first used in the 1960s to compare life evaluations across nations, and the Gallup World Poll has used the same basic measure since 2008 in more than 140 countries. Comparing countries measured in both periods suggests that average life evaluations have increased over time. This challenges strong claims that happiness is purely relative or that modern life has made people less happy than in the past. At the same time, changes vary across countries, showing that national well-being is not fixed and can shift with social, economic, and political conditions.
World maps of subjective well-being show clear geographic patterns. Scandinavia, Western Europe, Australia, and other wealthy countries tend to score high, whereas many African countries score low. These patterns make some simple explanations unlikely. Climate cannot explain high Scandinavian well-being because other high-ranking countries have very different climates. Romantic ideas that Eastern societies are generally happier than Western societies are also not supported by the data.
The strongest predictor of national differences in subjective well-being is purchasing power. Median income adjusted for purchasing power predicts average life evaluations very strongly, especially when income is analyzed on a logarithmic scale. The relationship is strongest at low income levels, where money helps meet basic needs such as food, shelter, health, and safety. However, the relationship does not disappear in affluent countries. Additional income still predicts higher life evaluation, although with diminishing returns. This directly challenges simple claims that money does not buy happiness.
At the same time, income does not explain everything. Some regions are happier or less happy than their income levels predict. South America and Scandinavia score higher than expected, whereas Arab countries, East Asia, and Eastern Europe score lower. These deviations suggest that culture, institutions, social relationships, response styles, and political conditions may also matter, although their effects are harder to isolate than the effect of income.
The chapter discusses East Asia as one example. East Asian countries often report lower life satisfaction than expected from their purchasing power. Some of this may reflect response styles, because East Asian respondents are more likely to choose moderate response options and less likely to use extreme ratings. Cultural norms about modesty, realism, and self-enhancement may also influence self-reported well-being. However, it remains unclear whether these patterns reflect reporting differences, real differences in experienced well-being, or both.
Latin America shows the opposite pattern: subjective well-being is often higher than income would predict. Some of this may also reflect response style, especially the tendency to use the top category on life-satisfaction scales. But measurement artifacts do not fully explain the pattern. Social support appears to be the most plausible substantive explanation. Latin American cultures may place especially strong emphasis on close relationships, family support, and social integration. Unpaid family work and remittances may also make material living conditions better than GDP alone suggests.
Scandinavian countries consistently rank near the top, but the difference between Scandinavia and other affluent Anglo countries is small. The chapter argues against overinterpreting a Scandinavian “secret.” Much of the small advantage appears related to higher financial satisfaction, possibly because of lower inequality, stronger welfare systems, and lower material aspirations. The key point is that Scandinavia scores high largely because it combines high purchasing power with strong social and institutional supports.
Arab countries report lower subjective well-being than expected from income. Financial dissatisfaction explains part of the gap, and lower perceived freedom explains a smaller part, but a substantial difference remains. Religion does not explain the lower scores; if anything, religiosity has a small positive association with well-being. The chapter also notes that life circumstances may have different implications in different cultures. For example, marriage appears more strongly related to well-being in Anglo countries than in Arab countries.
The chapter then turns to migration as stronger evidence for the importance of living conditions. Immigrants’ well-being tends to move closer to the average well-being of the country they move to than to that of their country of origin. Immigrants from poorer countries often show large gains after moving to countries such as Canada. This supports the conclusion that national differences in well-being are not just cultural or personality differences; living conditions matter. At the same time, some cultural patterns remain, because immigrants from Latin America and East Asia show some of the same relative patterns observed in their regions of origin.
Migration studies also show that integration matters. Immigrants who identify with Canada, either while maintaining their original identity or through assimilation, report higher well-being than those who remain separated from Canadian identity or feel marginalized. This suggests that migration improves well-being most when people gain access to better living conditions and also develop a sense of belonging in the new society.
The final sections argue that subjective well-being is not the only criterion for evaluating societies. Life expectancy also matters. A country where people are moderately happy for many decades may be preferable to one where people are very happy for a short life. The concept of happy life-years combines average well-being with life expectancy. Wealthier nations often do better on both dimensions because economic resources support health care, safety, and longer lives.
The chapter ends with sustainability. Modern high well-being often depends on resource-intensive lifestyles that may harm future generations. Subjective well-being research cannot solve this moral and political problem, but it can identify societies that achieve high well-being, long lives, and more sustainable living. Scandinavian countries currently do well on these dimensions, and Costa Rica offers a warmer example of relatively high well-being with lower resource use. The broader conclusion is that money matters greatly for well-being, especially through basic needs, but the best societies must also consider longevity, social conditions, and sustainability.