All posts by Dr. R

About Dr. R

Since Cohen (1962) published his famous article on statistical power in psychological journals, statistical power has not increased. The R-Index makes it possible f to distinguish studies with high power (good science) and studies with low power (bad science). Protect yourself from bad science and check the R-Index before you believe statistical results.

Children and Covid-19: Evidence from New York City

Despite tremendous efforts by scientists to aid in the fight against Covid-19, many important questions lack a clear scientific answer. It is known that children can be infected by the virus and in some rare cases the virus can produce severe symptoms or even death. However, it is unclear whether children have a lower risk to get infected and whether they are less likely to infect others. On May 7, Nature wrote ” scientists are still trying to understand what the deal is with kids and COVID-19″.

A major problem is that most countries responded to the Covid-19 pandemic by closing schools and minimizing children’s contact. Thus, lower number of children among people testing positive for Covid-19 might simply reflect less exposure to the virus. A notable exception is Sweden, where schools were not closed. Unfortunately, Sweden did not test children, teachers, or parents to examine whether children transmitted the virus (Science, May 22).

A German study of viral load in children and adults suggested that children are no different from adults, but this study has been criticized on methodological grounds (Science Media Centre). A key problem with this claim is that it is impossible to proof the lack of a difference. It is only possible to quantify the amount of a difference and notice that the observed difference is not statistically different from zero. The key problem with this study is that the sample size is small and there is ample statistical uncertainty. Thus, no firm conclusions can be drawn from this study alone.

My colleagues (Shigehiro Oishi, Youngjae Cha, and Bansi Javiya) have been analyzing the open data about Covid-19 deaths and cases in New York City and used the US census data to predict variation in positive cases and deaths across New York City. While we are still working on this project, we would like to share some interesting results about children and Covid-19 that emerged in our analyses. Before we do so, we want to make clear that ZIP code that are not the best data to examine this question, our results are preliminary, and even if our results hold up they do not provide conclusive evidence, and the results cannot and should not be used to make policy recommendations about opening schools or not. The main purpose of this blog post is to share information with scientists who are interested in this question and to add a tiny piece of information to the big puzzle.

Occupants and Covid-19

While the role of children in the transmission of Covid-19 is still unclear, the evidence for transmission at home is much stronger. It makes logical sense that the infection rate is greater if more people share the same living space. In addition, crowded living conditions have been linked to higher rates of Covid-19. For this reason, we looked at several predictors from the US census that reflect crowding. The best predictor was the percentage of residence with more than 1 occupant per room.

We used several measures of Covid-19 prevalence. All of them showed a positive correlation with occupants, but the correlations were stronger for the positive rate (positives / tests) than for the positives per capita (positives / population * 100,000). One possible explanation for this is that testing varies as a function of other factors. It made little difference whether we used the raw numbers or residuals that controlled for differences between Boroughs, so we did use the raw scores. The correlation with the positive rate was r = .54 and the correlation with deaths was r = .20.

This simple correlation does not prove causality. There are a host of other variables that may explain this relationship. We tried to address this issue by including other variables in a causal model.

Children and Covid-19

The number of children living in an area defined by a NYC ZIP code was also a predictor of positive cases, r = .53, and deaths, r = .39. It is important to realize that this is an analysis of ZIP codes and not of individuals. It is well-known that children are at a much lower risk to die from Covid-19. Thus, there is actually a negative link for individuals. Thus, the positive correlation must reflect some other causal mechanism. One possibility (out of many) would be that children can infect older people in the same household that are at a high risk of dying when they get infected. This ‘theory’ implies transmission of the virus from young children to old people who live together.

When we fitted this model to the data, it showed indeed a causal path from children to residents to positive cases to deaths. This path implied that for every 10 percentage point increase in the proportion of children, an additional 2 people per 100,000 would die. The average is 17 death per 100,000 inhabitants. So, an increase by 2 people is a 12% increase.

The model also shows that there is still an unexplained positive relationship between children and risk of infection. This path would contribute another 4 deaths per 100,000.

These results show that ZIP codes with more children have more deaths, and that this relationship is partially explained by children adding to the number of people in a residence. However, once more these results have to be interpreted with caution because important predictor variables are missing from the model.


One potential confounding factor is income. Several analyses of the NYC data have shown that Covid-19 is more prevalent and deadly in ZIP codes with lower income. Thus, we added income to the model.

ZIP code income predicted that ZIP codes had fewer children, fewer occupants, a lower positive rate, and fewer deaths. Although the direct effect on deaths was not statistically reliable, income had a clear indirect effect on deaths by lowering the risk of getting infected. These results show that the effect of children in the previous model was inflated by ignoring the confounding effect of income. In this model, the effect of children on death was 2.1 deaths for every 10 percentage point increase in the proportion of children and 0.7 was explained by the effect on residents and 1.4 was still a direct effect.


Numerous articles found ethnic disparities in Covid-19 deaths. Thus, we added ethnicity as a predictor variable to the model. We used the percentage of White residents as the comparison group and the percentages of Asian, Black, and Hispanic residents as predictors. It is difficult to visualize the complex relationships of this model. Thus, we merely report the key finding about children and Covid-19 here.

Including ethnicity as a predictor further reduced the ‘effect’ of children on Covid-19-deaths to 1.8 deaths (0.3 indirect via occupants and 1.5 direct on PR).


Just like there can be confounding factors that inflate relationships, some confounding variables can suppress a relationship. We found that was the case for your measure of the percentage of residents over 65. As expected, ZIP codes with a higher proportion of older residents had more Covid-19 deaths. We also found a negative relationship between the proportion of older residents and occupants, r = -.35. Thus, the fact that high-occupancy ZIP codes tend to be younger reduced the effect of occupants on deaths. In this model, children increased deaths by 2.6 deaths per 100,000. This relationship is highly statistically significant and very unlikely to be just a random fluke. However, it is still possible that other variables that are missing from the model explain this relationship. The multiple pathways are weaker, and it is difficult to say how much they contribute to the relationship.


The key finding of our analyses of the NYC Covid-19 data is that ZIP codes with more children have more Covid-19 as reflected in a higher positive rate (positives / tests) and deaths. This relationship remains after controlling for income, ethnicity, and the proportion of senior residents. The final model suggests that some of the effect is explained by crowded living conditions. These results suggest that children could be transmitting the virus as much as other occupants. However, many other explanations are possible.

The most important limitation of our work is that it relies on ZIP codes, while the actual causal process is person to person transmission. We think that it would be valuable to follow up on this work with studies that examine the social networks of NYC residents who contracted the virus and who did not. Anti-body tests would be particularly useful to examine the spreading of Covid-19 within households. Relevant data may already exist from contact-tracing of infected individuals. We believe that tracing infections and deaths in NYC provides useful information about children’s risk to contract and spread the virus.

Please feel free to contact us with related information or questions.

Why Are Red States “Immune” to Covid-19?

Joey loves crowds. He is boisterous, speaks with a loud and booming voice, and is always ready to high-five everybody. No, I am not describing a super-spreader of Covid-19. It is a textbook description or caricature of an extrovert, or as personality psychologists say extravert.

Personality psychologists have studied extraversion and introversion for nearly one-hundred years, although most of the research emerged in the past 40 years. We know that extraversion is a heritable trait that runs in families. We know that it remains fairly stable throughout adulthood, and we know that it influences behavior. There also has been research on regional variation in extraversion across the world and across US states (Elleman, Condon, Russin, & Revelle, 2018). I used their data to create the map of extraversion for US states. The map shows the highest level of extraversion in Illinois and the lowest level of extraversion in Wyoming followed by Idaho and Utah. While Illinois has fairly high rates of Covid-19, especially in Chicagoland, Wyoming and Idaho have relatively low levels of positive cases. They are also solid “red” states who voted for Trump in the 2016 election with 67% and 59%. It is therefore possible that extraversion partially explains why Covid-19 is more prevalent in “blue” (liberal) states. Residents in blue states may be more extraverted and may have a harder time to follow social distancing rules.

Of course, extraversion would only be one of several factors that play a role. Another obvious factor is that urban areas are more strongly affected by Covid-19 than rural areas, and rural voters are more likely to vote for Trump. There are many other possible differences among the US states that might play a role, but preliminary analysis suggests that they do not predict Covid-19 to a substantial degree. So, to keep things short, I will focus on the two factors that I found to be significant predictors of the spread of Covid-19: urbanization and extraversion.

To examine whether this relationship is stable over time, I used confirmed positive cases reported on the Covid-Tracking website and created indicators for three, three-week periods: March 23 to April 12, April 13 to May 3, and May 4 to May 24. Predictor variables were (a) the percentage of votes for Trump in the 2016 election, (b) extraversion scores from the supplement to Elleman et al.’s article (Table 8), and (c) urbanization scores (wikipedia).

The data were analyzed using structural equation modeling to examine the relationship among the six variables. [I also examined more complex models that included deaths. The effects of the predictor variables on death were mostly mediated by confirmed positives, with the exception of a unique, negative relationship between Trump support and deaths at time 1 only.] Model fit was excellent, CFI = 1.00, RMSEA = .000. This does not mean that the model reveals the truth, but it does show that the model is consistent with the data. Thus, the model tells one possible story about the negative relationship between Trump support and Covid-19 deaths across the states.

The numbers show that urbanity is much stronger negative predictor of Trump support than extraversion. The effect of extraversion is small, and not statistically significant by conventional standards, but there are only 49 states (I excluded the island state Hawaii), making it hard to get statistical significance. The effect of urbanization and extraversion are more equal and explain a notable amount of variation in Covid-19 cases across states. The numbers also show that the effect is not getting weaker over time. It may actually become stronger. This means that both urbanization and extraversion are stable predictors of the Covid-19 pandemic in the USA. Even in the past three weeks, after several states with Republican governors eased restrictions, there is no evidence that cases are increasing notably in red states.

It is not difficult to find newspaper articles that talk about a second wave and spikes in new cases in Texas or other red states. These stories are based on the idea that red states are ignoring the danger of Covid-19, but so far this idea lacks empirical support. For every story of a pastor who died of Covid-19 after defying stay-at-home orders, there are thousands of churches who are holding services online, and hundreds of people flouting social-distancing norms in Central Park, NY. Don’t get me wrong. Trump’s disregard of science and ramblings about light therapy are a disgrace, but this does not mean that 40% of the US population follows the covidiot in the White House and drinks bleach. At least forty-percent of US voters are likely to vote for him again. Don’t ask me how anybody can vote for him again. That is not the point of this blog post. The blog post is about the empirical fact that so far Covid-19 has predominantly hit states that did not vote for Trump. I suggest that this seemingly paradox finding is not paradox at all. Joey, the extraverted bachelor who lives in an apartment in New York City, who voted for Hillary is much more likely to get infected than Joyce who lives with her family on a farm in Wyoming. Painting all Trump-voters as covidiots would be a similar mistake as Hillary Clinton calling Trump-supporters a “basket of deplorables.” If all Trump-supporters were covid-idiots, we should have seen a positive relationship between Trump-support and Covid-19 cases, especially after controlling for confounding variables like urbanization and extraversion. The fact that this positive relationship stubbornly refuses to emerge may suggest that Republican governors and residents in red states are not as stupid as their leader.

The Covid-Statistic Wars are Heating Up

After a general consensus or willingness to accept social distancing measures imposed by politicians (often referred to as lock-downs), societies are polarizing. Some citizens want to open stores, bars, and restaurants (and get a hair cut). Others want to keep social distancing measures in place. Some people on both sides are not interested in scientific arguments for or against their position. Others like to find scientific evidence that seemingly supports their viewpoint. This abuse of science is becoming more common in a polarized world. As a scientist, I am concerned about the weaponizing of science because it undermines the ability of science to inform decisions and to correct false beliefs. Psychological research has shown how easily we assimilate information that matches our beliefs and treat disconfirming evidence like a virus. These motivated biases in human reasoning are very powerful and even scientists themselves are not immune to these biases.

Some economists appear to be afflicted by a bias to focus on the economic consequences of lock-downs and to downplay the effects of the virus itself on human lives and the economy. The idea is that lock-downs were not helpful to save lives at immense costs to the economy. I am not denying the severe consequences of unemployment (I actually co-authored an article on unemployment and well-being), but I am shocked by claims in a tweet that social distancing laws are ineffective that have been retweeted 3,500 times or blog posts that make similar claims accompanied by scatterplots that give the claims the appearance of scientific credibility.

There is nothing wrong with these graphs. I have examined the relationship between policies and Covid-19 deaths across US states and across countries, and I have also not found a significant correlation. The question is what this finding means. Does it imply that lock-down measures were unnecessary and have produced huge economic costs without any benefits? As some responses on twitter indicated, interpreting correlational data is not easy because many confounding factors influence the correlation between two variables.

Social distancing is unnecessary if nobody is infected

Let’s go back in time and impose social distancing policies across the world in May 2019 randomly in some countries and not in others. We observe that nobody is dying of Covid-19 in countries with and without ‘lock-down’. In addition, countries with lock-down suffer high rates of unemployment. Clearly, locking countries down without a deadly virus spreading is not a good idea. Even in 2020 some countries were able to contain relatively small outbreaks and are now mostly Covid-free. This is more or less true of countries like Taiwan, Australia, and New Zealand. However, these countries impose severe restrictions on travel to ensure that no new infections are brought into the country. When I tried to book a flight from Toronto to Sydney, I was not able to do so. So, the entire country is pretty much in lock-down to ensure that people in Australia cannot be infected by visitors from countries that have the virus. Would economists argue that these country-wide lock-downs are unnecessary and only hurt the tourist industry?

This image has an empty alt attribute; its file name is image-16.png

The fact that Covid-19 spread unevenly across countries also creates a problem for the correlation between social-distancing policies and Covid-19 deaths across countries. The more countries are actively trying to stem the spread of the virus, the more severe social-distancing measures will be, while countries without the virus are able to relax social distancing measures. Not surprisingly, some of the most sever restrictions were imposed at the peak of the epidemics in Italy and Spain. This produces a positive correlation between severity of lock-downs and spread of Covid-19, which could be falsely interpreted as evidence that lock-downs even increase the spread of Covid-19. A simple correlation between lock-down measures and Covid-19 deaths across countries is simply unable to tell us something about the effects of lock-down measures on deaths within countries.

Social Distancing Effects are Invisible if there is no Variation in Social Distancing Across Countries

To examine the effectiveness of social-distancing measures, we need to consider timing. First, social distancing measures may be introduced in response to a pandemic. Later on, we might see that countries or US states that imposed more severe restrictions were able to slow down the spread of the virus more. However, now we encounter a new problem. Most countries and states responded to the declaration of Covid-19 as a pandemic by the WHO on March 11 with very similar policies (school closures). This makes it difficult to see the effects of social distancing measures because we have little variation in the predictor variable. We simply do not have a large group of countries with a Covid-19 epidemic that did nothing. This means, we lack a proper control group to see whether spread in these countries would be bigger than in countries with severe lock-downs. Even countries like the UK closed schools and bars in mid March.

Sweden is often used as the example of a country that did not impose severe restrictions on citizens and kept schools open. It is difficult to evaluate the outcome of this political decision. Proportional to the population, Sweden ranks number 6 in the world in terms of Covid-19 deaths, but what is a proper comparison standard? Italy and Spain had more severe restrictions and more deaths, but their epidemics started earlier than in Sweden. Other Nordic countries like Norway, Denmark, and Finland have much lower fatalities than Sweden. This suggests that social distancing is effective in reducing the spread, but we do not have enough data for rigorous statistical analysis.

Social Distancing Policies Explain Trajectories of Covid-19 spread in hot-spots.

One advantage of epidemics it is possible to foresee the future because exponential growth produces a very notable trajectory over time that is hard to miss in statistical analyses. If every individual infects two or three other people, the number of cases will grow exponentially until a fairly large number of the population is infected. This is not what happened in Covid-19 hot spots. Let’s examine New York as an example. In mid March, the number of detected cases and deaths increased exponentially, with numbers doubling every three days.

The number of new cases peaked in the beginning of April and has been decreasing until now. One possible explanation for this pattern is that social-distancing policies that were mandated in mid-March were effective in slowing down the spread of the virus. Anybody who claims that lock-downs are ineffective needs to provide an alternative explanation for the trajectory of Covid-19 cases and deaths over time.

Once more it is difficult to show empirically what would have happened without “lock-downs”. The reasons is that even in countries that did not impose strict rules people changed their behaviors. Once more we can use Sweden as a country without ‘lock-down’ laws. As in New York, we see that rapid exponential growth was slowed down. This did not happen while people were living their lives as they did in January 2020. It happened because many Swedes changed their behaviors.

The main conclusion is that the time period from March to May makes it very difficult to examine scientifically what measures were effective in preventing the spread of the virus and what measures were unnecessary. How much does wearing masks help? How many lives are saved by school closures? The best answer to these important questions is that we do not have clear answers to these questions because there was insufficient variation in the response to the pandemic across nations or across US states. Most of the variation in Covid-19 deaths is explained by the connectedness of countries or states to the world.

Easing Restrictions and Covid-19 Cases

The coming months provide a much better opportunity to examine the influence of social distancing policies on the pandemic. Unlike New Zealand and a few other countries, most countries do have community transmission of Covid-19. The United States provide a naturalistic experiment because (a) the country has a large population and therewith many new cases each day and (b) social distancing policies are made at the level of the 50 states.

Currently, there are still 20,000 new confirmed (!) positive cases in the United States. There are also still over 1,000 deaths per day.

There is also some variation across states in the speed and extend to which states ease restrictions on public life (NYT.05.20). Importantly, there is no state where residents are just going back to live as it was in January of 2020. Even states like Georgia that have been criticized for opening early are by no means back to business as usual.

So, the question remains whether there is sufficient variance in opening measures to see potential effects in case-numbers across states.

Another problem is that it is tricky to measure changes in case-numbers or deaths when states have different starting levels. For example, in the past week New York still recorded 41 deaths per 1 Million inhabitants, while Nebraska recorded only 13 deaths per 1 Million inhabitants. However, in terms of percentages, cumulative deaths in New York increased by only 3%, whereas the increase in Nebraska was 23%. While a strong ‘first wave’ accounts for the high absolute number in New York, it also accounts for the low percentage value. A better outcome measure may be whether weekly numbers are increasing or decreasing.

Figure 1 shows the increase in Covid-19 deaths in the past 7-days (May 14 – May 20) compared to the 7 days after some states officially eased restrictions (May 2 – May 8).

It is clearly visible that states that are still seeing high numbers of deaths are not easing restrictions (CT, NJ, MA, RI, PA, NY, DE, IL, MD, LA). It is more interesting to compare states that did not see a big first wave that vary in their social distancing policies. For this analysis, I limited the analysis to the remaining states.

States below the regression line are showing faster decreases than other states, whereas states above the regression line show slower decreases or increased. When the opening policies on May 1 (NYT) are used as predictors of deaths in the recent week with deaths two weeks before as covariate, a positive relationship emerges, but it is not statistically significant. It is a statistical fallacy to infer from this finding that policies have no influence on the pandemic.

More important is the effect size, which is likely to be somewhere between -2 and + 4 deaths per million. This may seems a small difference, but we have to keep in mind that there is little variation in the predictor variable. Remember, even in Georgia where restaurants are open, the number of diners is only 15% of the normal number. The hypothetical question is how much bigger the number of Covid-19 cases would be if restaurants were filled at capacity and all other activities were back to normal. It is unlikely that citizens of open states are willing to participate in this experiment. Thus, data alone simply cannot answer this question.


Empirical science rely on data and data analysis. However, data are only necessary and not sufficient to turn a graph into science. Science also requires proper interpretation of the results and honest discussion of their limitations. It is true that New York has more Covid-19 deaths than South Dakota. It is also true that some states like South Dakota never imposed severe restrictions. This does not imply that stay-at-home orders in New York caused more Covid-19 deaths. Similarly, the lack of a correlation between Covid-19 policies and Covid-19 cases or deaths across US states does not imply that these policies have no effect. Another explanation is that there are no states that had many deaths and did not impose stay-at-home orders. For this reason, experts have relied on models of epidemics to simulate scenarios what would have happened if New York City had not closed schools, bars, and night clubs. These simulations suggest that the death toll would have been even greater. The same simulations also suggest that many more lives could have been saved if New York City had been closed down just one week earlier (NPR). Models may sound less scientific than hard data, but data are useless and can be misleading when the necessary information is missing. The social-distancing measures that were imposed world-wide did reduce the death toll, but it is not clear which measures reduced it by how much. The coming months may provide some answers to this questions. S. Korea quickly closed bars after one super spreader infected 40 people in one night (businessinsider). What will happen in Oklahoma where bars and nightclubs are reopening? Personally, I think the political conflict about lock-downs is unproductive. The energy may be better spend on learning from countries that have been successful in controlling Covid-19 and who are able to ease restrictions.

Reply to Vianello, Bar-Anan, Kurdi, Ratliff, and Cunningham

I published a critique of the Implicit Association Test. Using structural equation modeling of multi-method studies, I find low convergent validity among implicit measures of attitudes (prejudice, self-esteem, political orientation) and low discriminant validity between explicit and implicit measures. The latter finding is reflected in high correlations between factors that reflect the shared variance among explicit measures and the shared variance among implicit measures. Using factor loadings to quantify validity, I found that the controversial race IAT has at most 20% valid variance in capturing racial attitudes. Most if not all of this variance is shared with explicit measures. Thus, there is no evidence that IAT scores reflect a distinct form of implicit prejudice that may operate outside of conscious awareness.

This article elicited a commentary by Vianello and Bar-Anan (ref.) and by Kurdi, Ratliff, and Cunningham (pdf). Here is a draft of my response to their commentaries. As you will see, there is little common ground; even the term “validity” is not clearly defined making any discussion about the validity of the IAT meaningless. To make progress as a science (or to become a science), psychologists need to have a common understanding of psychological measurement and methods that can be used to evaluate the validity of measures quantitatively.


Just like pre-publication peer-reviews, the two post-publication commentaries have remarkably little overlap. While Vianello and Bar-Anan (VBA) question my statistical analyses, Kurdi, Ratcliff, and Cunningham accept my statistical results, but argue that these results do not challenge the validity of the IAT.

VBA’s critique is clearer and therefore easier to refute by means of objective model comparisons. The key difference between VBA’s model and my model is the modelling of method variance. VBA’s model assume that all implicit measures of different constructs are influenced by a single method factor. In contrast, my model assumes that implicit measures of prejudice (e.g., the standard race IAT and the Brief Implicit Association Test with the same racial stimuli) share additional method variance. As these hypotheses are nested models, it is possible to test these competing models directly against each other. The results show that a model with content-specific method variance fits the data better (Schimmack, 2020a). The standard inference from a model comparison test is that the model with the worse fit is not an adequate model of the data, but VBA ignored the poorer fit of their model and present a revised model that does not model method variance properly and therefore produces misleading results. Thus, VBA’s commentary is just another demonstration of the power of motivated reasoning that undermines the idealistic notion of a self-correcting science.

KRC ask whether my results imply that the IAT cannot be a valid measure of automatic cognition?  To provide a meaningful answer to this question, it is important to define the terms valid, measure, automatic, and cognition.  The main problem with KRC’s comment is that these terms remain undefined. Without precise definitions, it is impossible to make scientific progress. This is even true for the concept of validity that has no clear meaning in psychological measurement (Schimmack, 2020c). KRC ignore that I clearly define validity as the correlation between IAT scores and a latent variable that represents the actual variation in constructs such as attitudes towards race, political parties, and the self.  My main finding was that IAT scores have only modest validity (i.e., low correlations with the latent variable or low factor loadings) as measures of racial preferences, no validity as a measure of self-esteem, and no proven validity as measures of some implicit constructs that are distinct from attitudes that are reflected in self-report measures. Instead, KRC consistently mischaracterize my findings when they write that “the reanalyses reported by Schimmack find high correlations between relatively indirect (automatic) measures of mental content, as indexed by the IAT, and relatively direct (controlled) measures of mental content.” This statement is simply false and confuses correlations of measures with correlations of latent variables.  The high correlations between latent factors that represent shared variance among explicit measures and implicit measures provide evidence of low discriminant validity, not evidence of high validity. Moreover, the modest loadings of the race IAT on the implicit race factor show low validity of the IAT as a measure of racial attitudes.

After mischaracterizing my results, KRC go on to claim that my results do “not cast any doubt on the ability of IATs to index attitudes or to do so in an automatic fashion” (p. 5).  However, the low convergent validity among implicit measures remains a problem for any claims that the IAT and other implicit measures measure a common construct with good validity. KRC simply ignore this key finding even though factor loadings provide objective and quantitative information about the construct validity of IAT scores.

The IAT is not the only research instrument with questionable construct validity.  However, the IAT is unique because it became a popular measure of individual differences without critical evaluation of its psychometric properties. This is particularly problematic when people are given feedback with IATs on the Project Implicit website, especially for IATs that have demonstrably no validity like the self-esteem IAT.  The developers of the IAT and KRC defend this practice by arguing that taking an IAT can be educational.  “At this stage in its development, it is preferable to use the IAT mainly as an educational tool to develop awareness of implicit preferences and stereotypes” However, it is not clear how a test with invalid results can be educational. How educational would it be to provide individuals with randomly generated feedback about their intelligence?  If this sounds unethical, it is not clear why it is acceptable to provide individuals with misleading feedback about their racial attitudes or self-esteem. As a community, psychologists should take a closer look at the practice of  providing online feedback with tests that have low validity because this practice may undermine trust in psychological science.

KRC’s commentary also fails to address important questions about the sources of stability and change in IAT scores over time. KRC suggest that “the jury is still out on whether variation in responding on the IAT mostly reflects individual differences or mostly reflects the effects of the situation” (p. 4). The reason why two decades of research have failed to answer this important question is that social cognition researchers focus on brief laboratory experiments that have little ecological validity and that are unable to demonstrate stability of individual differences over time. However, two longitudinal studies suggest that IAT scores measure stable attitudes rather than context-dependent automatic cognitions. Wil Cunningham, one of the commentators, provided first evidence that variance in IAT scores reflects mostly random measurement error and stable trait variance, with no evidence of situation-specific state variance (Cunningham et al., 2001). Interestingly, KRC ignore the implications of this study. This year, an impressive study examined this question with repeated measures of a six-year period (Onyeador et al., 2020; Schimmack, 2020). The results confirmed that even over this long time-period, variance in IAT scores mostly reflects measurement error and a stable trait without notable variance due to changes in situations.

Another important topic that I could only mention briefly in my original article is incremental predictive validity. KRC mention Kurdi et al.’s (2019) meta-analysis as evidence that the IAT and self-report measures tap different constructs. They fail to mention that the conclusions of this meta-analysis are undermined by the lack of credible, high-powered studies that can demonstrate incremental predictive validity. To quote Kurdi et al.’s abstract “most studies were vastly underpowered” (p. 569).  The authors conducted tests of publication bias, but did not find evidence for it.  The reason could be that they used tests that have low power to detect publication bias. Some studies included in the meta-analysis are likely to have reported inflated effect sizes due to selection for significance, especially costly fMRI studies with tiny sample sizes. For example, Phelps et al. (2000) report a correlation of r(12) = .58 between scores on the race IAT and differences in amygdala activation in response to Black and White faces.  Even if we assume that 20% of the variance in the IAT is valid, the validation corrected correlation would be r = 1.30. In other words, this correlation is implausible given the low validity of race IAT scores.  The correlation is also much stronger than the predictive validity of the IAT in Kurdi et al.’s meta-analysis. The most plausible explanation for this result is that researchers’ degrees of freedom in fMRI studies inflated this correlation (Vul et al., 2009). Consistent with this argument, effect sizes in studies with larger sample sizes are much smaller and evidence of incremental predictive validity can be elusive, as in Greenwald et al.’s study of the 2018 election.  At present, there is no pre-registered, high-powered study that provides clear evidence of incremental predictive validity. Thus, IAT proponents have failed to respond to Blanton et al.’s (2009) critique of the IAT. Responses to my renewed criticism suggest that IAT researchers are unable or unwilling to respond to valid scientific criticism of the IAT with active coping. Instead, they prefer to engage in emotion-focused, repressive coping that makes IAT researchers feel better without addressing substantive measurement problems.

In conclusion, my critique of the IAT literature and the response by IAT researchers shows a wider problem in psychology that I have called the validation crisis (Schimmack 2020c). Although measurement is at the core of any empirical science, many psychologists lack formal training in psychological measurement. As a result, they create and use measures of unknown validity. This is particularly true for social psychologists because social psychologists in the 1970s and 1980s actively rejected the idea that characteristics within individuals are important for the understanding of human behavior (“the power of the situation”). However, when the cognitive revolution started, the focus shifted from observable situations and behaviors to mental states and processes. To study these phenomena that are not directly observable requires valid measures, just like telescopes need to be validated to observe planets in distant galaxies. The problem is that social cognition researchers developed methods like the IAT to make claims about cognitive processes that are not observable to outside observers or by means of introspection without taking the time to validate these measures. To make progress, the next generation of social psychologists needs to distinguish clearly between constructs and measures and between random and systematic measurement error. As all measures are contaminated by both sources of measurement error, constructs need to be measured with multiple, independent methods that show convergent validity (Campbell & Fiske, 1959; Cronbach & Meehl, 1955).  Psychology also needs to move from empty qualitative statements like “the IAT can be valid” to empirically-based statements about the amount of validity of a specific IAT in specific populations in clearly defined situations. This requires a new program of research with larger samples, ecologically valid situations, and meaningful criterion variables.


Blanton, H., Jaccard, J., Klick, J., Mellers, B., Mitchell, G., & Tetlock, P. E. (2009). Strong claims and weak evidence: Reassessing the predictive validity of the IAT. Journal of Applied Psychology, 94, 567–582. doi:10.1037/a0014665

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105. doi:10.1037/h0046016

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302. doi:10.1037/h0040957

Cunningham, W. A., Preacher, K. J., & Banaji, M. R. (2001). Implicit attitude measures: Consistency, stability, and convergent validity. Psychological Science, 12(2), 163–170.

Onyeador, I. N., Wittlin, N. M., Burke, S. E., Dovidio, J. F., Perry, S. P., Hardeman, R. R., … van Ryn, M. (2020). The Value of Interracial Contact for Reducing Anti-Black Bias Among Non-Black Physicians: A Cognitive Habits and Growth Evaluation (CHANGE) Study Report. Psychological Science, 31(1), 18–30.

Schimmack, U. (2020a). Open Communication about the invalidity of the race IAT.

Schimmack, U. (2020b). Racial bias as a trait. (retrieved 4/21/20)

Schimmack, U. (2020c). The validation crisis. Meta-Psychology (blog)

Vul, E., Harris, C., Winkielman, P., & Pashler, H. (2009). Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition. Perspectives on Psychological Science, 4(3), 274–290.

Covid-19 behaves like tourists

Many people are wondering about variation in the Covid-19 pandemic across countries. Why (the North of) Italy and not Portugal. How was South Korea able to contain the virus, but other countries did not even though they had less time. The New York Times published a long article that examined this question, but nobody really knows.

Some of the speculations focus on biological factors that may protect individuals or may make them more vulnerable. However, so far these factors explain a small portion of the variation in death rates. The biggest predictor is the number of people who are infected by the virus. Australia and New Zealand have few deaths because Covid-19 did not spread widely among their populations.

One possible explanation could be the response of countries to the pandemic. Countries like the UK and Sweden may have more deaths because they did not lock down their countries. The problem with these speculations is that many factors are likely to contribute to the variation and it is difficult to spot these factors without statistical analyses.

The NYT article mentions that hundreds of studies are underway to look for predictors of variation across nations, but no results are being mentioned. Maybe researchers are cautious.

“Doctors who study infectious diseases around the world say they do not have enough data yet to get a full epidemiological picture, and that gaps in information in many countries make it dangerous to draw conclusions”

Drawing conclusions is different from exploring data. There is nothing dangerous about exploring patterns in data. Clearly many people are curious and statistical analysis can provide more valuable information than armchair speculations about climate or culture.

As a cross-cultural psychologists, I am familiar with many variables that distinguish nations from each other. The most prominent dimension is individualism. Western cultures tend to be more individualistic than Asian cultures. This might suggest that culture plays a role because Asian cultures have had fewer Covid-19 deaths. However, individualism as measured by Hofstede’s dimension is a weak predictor and did not survive statistical controls. Other dimensions that were less plausible also did not predict variation in Covid-19 deaths.

However, one variable that was a predictor was the number of tourists that travel to a country (tourism data).

Tourism reflects how connected a country is with the rest of the world. Australia and New Zealand are not only islands, they are also geographically isolated which explains why relatively few people visit these otherwise attractive locations. Covid-19 also has speared much of Eastern Europe and many Eastern European countries rank low on the tourism index.

Additional analysis show that tourism is becoming a weaker predictor over time. The reason is the recent rise of cases and deaths in Latin America. Latin America was relatively unaffected in April, but lately Ecuador and Brazil have seen alarming increases in cases.

The graph also shows that tourism does not explain all of the differences between countries. For example, the UK has way more cases than predicted by the regression line. This may reflect the slow response to the Covid-19 crises in the UK. Sweden is also above the regression line, possibly due to the policy to keep schools and businesses. Switzerland is a direct neighbor of the North of Italy, where the epidemic in Europe started. Canada is above the regression line, but was on the regression line on April 15. The reason is that Canada acted quickly in the beginning, but is now seeing a late increase in death in care homes.

In conclusion, these results suggest that timing is a big factor in the current differences across countries. Countries with high death tolls were simply unlucky to be at the center of the pandemic or well connected to it. As the pandemic progresses, this factor will become less important. Some countries, like Austria and (the South of) Germany that were hit early have been able to contain the spread of Covid-19. In other countries, numbers are increasing, but no country is seeing increases as dramatic as in Italy (or New York) where Covid-19 spread before social distancing measures were in place. New factors may predict what will happen in the times of the “new normal” when countries are trying to come out of lock-downs.

I don’t think that publishing these results is dangerous. The results are what they are. It is just important to realize that they do not prove that tourism is the real causal factor. It is also possible that tourism is correlated with some other variables that reflect the real cause. To demonstrate this, we need to find measures of these causal factors and demonstrate that they predict variation in death tolls of nations better than tourism and statistically remove the relationship of tourism with Covid-19 deaths. So, this blog post should be seen as a piece of a puzzle rather than the ultimate answer to a riddle.

Politics vs. Science: What Drives Opening Decisions in the United States?

The New York Times published a map of the United States that shows which states are opening up today on May 1.

I coded these political decisions on a 1 = shut down or restricted to 3 = partial reopening scale and examined numerous predictor variables that might drive the decision to ease restrictions.

Some predictor variables reflect scientific recommendations such as the rate of testing or the number of deaths or urbanization. Others reflect political and economic factors such as the percentage of Trump supporters in the 2016 election.

The two significant predictors were the number of deaths adjusted for population (on a log-scale) and support for Trump in the 2016 election. The amount of testing that is being carried out in different states was not a predictor.

Another model showed that states that have not been affected by Covid-19 are more likely to open. These are states where the population is more religious, White, and rural.

It was not possible to decide which of these variables are driving the effect because predictor variables are too highly correlated. This simply shows the big divide between “red”, rural, religious states and “blue,” agnostic, and urban states.

A bigger problem than differences between states are probably differences within states between urban centers and rural areas, where a single state-wide policy is unlikely to fit the needs of urban and rural populations. A big concern remains that decisions about opening are not related to testing, suggesting that some states who are opening do not have sufficient testing to detect new cases that may start a new epidemic.

Covid-19 in Quebec versus Ontario: Beware of Statistical Models

I have been tracking the Covid-19 statistics of Canadian provinces for several weeks (from March 16 to be precise). Initially, Ontario and Quebec were doing relatively well and had similar statistics. However, over time the case numbers increased, deaths, especially in care homes were increasing and the numbers were diverging. The situation in Quebec was getting worse and recently the number of deaths relative to the population was higher than in the United States. Like many others, I was surprised and concerned, when the Premier of Quebec announced plans to open businesses and schools sooner than later.

I was even more surprised when I read an article on the CTV website that reported new research that claims the situation in Quebec and Ontario is similar after taking differences in testing into account.

The researchers base this claim on a statistical model that aims to correct for testing bias and that is able to estimate the true number of infections on the basis of positive test results. To do so without a representative sample of tests seems rather dubious to most scientists. So, it would be helpful if the researchers could provide some evidence that validates their estimates. A simple validation criterion is the number of deaths. Regions that have more Covid-19 infections should also have more deaths, everything else being equal. Of course, differences in age structures or infections of care homes can create additional differences in deaths (i.e., the caes-fatality rates can differ), but there are no big differences between Quebec and Ontario in this regard as far as I know. So, is it plausible to assume that Quebec and Ontario have the same number of infections? I don’t think so.

To adjust for the difference in population size, all Covid-19 statistics are adjusted. The table shows that Ontario has 1,234 confirmed positive cases per 1 Million inhabitants while Quebec has 3,373 confirmed positive cases per 1 Million residents. This is not a trivial difference. There is also no evidence that the higher number in Quebec is due to more testing. While Ontario has increased testing lately, testing remains a problem in Quebec. Currently, Ontario has tested more (21,865 per Million tests) than Quebec (19,471 per Million tests). This also means that the positive rate (percentage of positive tests; positives/tests*100) is much higher in Quebec than in Ontario. Most important, there are 741 deaths for 10 Million residents in Ontario and 2157 deaths in Montreal. That means there are 2.91 times more deaths in Quebec than in Ontario. This matches the differences in cases where Quebec has 2.73 times more cases than Ontario. It follows that Ontario and Quebec also have similar case-fatality rates of 6.00% and 6.39%. That is, out of 100 people who test positive, about 6 die of Covid-19.

In conclusion there is absolutely no evidence for the claim that the Covid-19 pandemic has affected Ontario and Quebec to the same extent and that differences in testing produce misleading statistics. Rather, case numbers and deaths consistently show that Quebec is affected three time worse than Ontario. As the false claim is based on the Montreal authors’ statistical model, we can only conclude that their model makes unrealistic assumptions. It should not be used to make claims about the severity of Covid-19 in Ontario, Quebec, or anywhere else.

Counting Covid-19 deaths is not as easy as it sounds.

Every scientific discipline is unique and requires special training. However, empirical sciences also have many things in common. One major problem in any empirical science is measurement. To make objective, scientific claims about the world, scientists need reliable and valid measures of the world. These days, we are used to measuring time, distance, and temperature with high precision for the range of values that we encounter in our everyday world. However, even these basic attributes are difficult to measure when we consider the speed of light or the length of a chromosome. Similarly, social scientists have concepts that are easy to understand. Did you have sex yesterday? Is a person dead or alive? However, when we want to measure how often US Americans or Canadians have sex or how many people died in the past month, measurement is a lot more difficult.

Death is a big deal and considerable effort and resources are invested in recording the time and the probable cause of death. This information is collected by statisticians and made publicly available. When we see these numbers, we forget about measurement error. If a newspaper article or a website shows us that there are 63,801 deaths, we think that there are 63,801 deaths. However, it is nearly certain that this number if false and that the true number is greater or smaller than 63,801 deaths. The real number is unlikely to be 10 times larger or 10 times smaller, but we should not assume that deaths can be counted like coins in our piggy bank.

While counting deaths is difficult, it is even more difficult to specify the cause of death. This makes it even more difficult to say how many US Americans have died so far of Covid-19. The task of quantifying Covid-19 deaths is more like estimating or guessing than counting. At best, we can try to reduce the uncertainty in our estimates.

One solution to the problem of estimation is to use different estimation strategies. If these different strategies produce convergent results, it is likely that the estimates reflect the truth. However, if two estimation strategies diverge, at least one of them has to be false – more false than the other. Ideally, scientists might work together and try to come up with a consensus based on different estimation methods. However, often this does not happen because scientists get rewarded for presenting their method as superior to other methods. In psychology and in some other sciences, researchers that try to make sense of conflicting claims by original scientists are called meta-analysts or meta-scientists. In this blog post, I take a meta-analytic perspective on different measures of Covid-19 deaths.

I focus on US states because the US has provided

1. Confirmed Covid-19 Deaths

The most widely used estimate of Covid-19 deaths is based on testing of deceased people. Assuming a fairly low amount of measurement error in Covid-19 tests, this method seems to be close to perfect. However, there are problems. First, not every coroner has the resources to test for Covid-19. Second, some patients had pre-existing conditions and it is not clear whether Covid-19 was the actual cause of death. Given these problems, statistics are often reported with a disclaimer that the numbers may be inaccurate.

Aside from uncertainty, it is also likely that the numbers underestimate the actual number of Covid-19 deaths. The reason is that it is more likely that a death caused by Covid-19 is attributed to some other disease than to attribute a death caused by Covid-19 to another disease. This is certainly true if the cause of death is determined by a test for Covid-19. However, sometimes Covid-19 is considered a probable cause without a diagnostic test. The problem is that it is unclear how big the discrepancy between confirmed and actual Covid-19 deaths is. If there are 100 confirmed Covid-19 deaths, is the real number 101, 120, 200, or 1000?

2. Excess All-Cause Mortality

An alternative way to estimate Covid-19 deaths relies on the count of all deaths. The number of people that die every year in a specific week or month of the year is fairly stable from year to year. The main difference between previous years and the year 2020 is that a new virus started killing people. Assuming everything else remains the same (a big assumption), people who died of Covid-19 would be additional deaths, except for a small number of people who got infected and would have died anyways. This makes it possible to estimate the number of Covid-19 deaths without a diagnostic test. if normally 500 people die in the week from April 1 to April 7, and 550 people died in the same week in 2020, this method suggests that 50 people died of Covid-19. The problem with this method is that it makes several assumptions and the estimate will be inaccurate if these assumptions are wrong. For example, the number of deaths varies as a function of the severity of the flu season and if the flu was very mild, there could be only 450 deaths without Covid-19 in 2020 and the true number of Covid-19 deaths is 100. Or the flu was more severe and fully accounted for all 550 deaths and there were zero Covid-19 deaths.

Another problem is that deaths are not reported in real time to the Center for Disease Control and Prevention. The numbers for the critical weeks of March and April are still being revised and updated. This means that current numbers are likely to underestimate deaths in 2020 and therewith Covid-19 deaths. Statisticians have a way to deal with this problem and can use information about the typical delay in reporting to correct for the delay in reporting. The problem is that this means more assumptions and more room for error.

Despite these problems, several media-reports have used estimates based on excess-all-cause mortality to make strong claims about the “true” number of Covid-19 deaths.

One problem with these claims is that official Covid-19 death numbers have been revised to incorporate presumed cases without a Covid-19 test. This has led to some dramatic increases in the numbers for severely affected states like New York (National Post).

The question is whether there are still tens of thousands of Covid-19 deaths that are missing even after this revision of the numbers.

Data about all-cause mortality and excess-all -cause mortality are provided by the CDC (

The dark blue bars show the recorded deaths. These numbers do not show a notable increase in deaths in March and April. The reasons is that there is a lag in reporting. This means that the final all-cause-mortality numbers for March and April 2020 are still unknown. We know that the numbers are going to go up, but we do not know how much. The light blue bars are predictions about the final numbers based on information about the typical reporting lags. The red starts show weeks with predicted excess deaths. These predictions show excess deaths for the time period from March 29 to April 18. For the most recent week from April 19 to April 25, the reported numbers are so low that even the predicted values are well below the typical level. It is unlikely that this is a real finding given the recording of thousands of Covid-19 deaths in this time period (

The CDC datafile records excess deaths for the critical weeks. The sum of these values is 19,086 excess deaths that may be attributed to Covid-19.

In contrast, the WordOMeter statistics shows 39,331 Covid-19 deaths in the United States on April 18. This is more than double the number that the excess-deaths method suggests. Even the more conservative estimate by the Covid-Tracking team lists 34,557 deaths, which is still considerably more than the estimate based on excess-all-cause mortality. Thus, there is no evidence that these estimates severely underestimate the number of Covid-19 deaths in the United States.


Even though death can be determined objectively, the actual number of Covid-19 deaths is unknown. Scientists can only provide reasonable estimates and different estimation methods produce different estimates. Counting only confirmed cases that were tested for Covid-19 underestimates the actual number because some people died of Covid-19 and were not tested. Including presumed cases addresses this problem, but it creates uncertainty. There may be a bias to attributed deaths to Covid-19 or a bias in the opposite direction. Examining excessive all-cause mortality avoids the problem of diagnoses but creates many new problems. One problem is that the final numbers of deaths in April is still unknown because it takes time to collect this information. Given all of these problems, there is a wide range of estimates. It could be as few as 19,000 and it could be more than 40,000. However, there is no scientific evidence to claim that tenth-of-thousands more than the official 40,000 deaths were caused by Covid-19. Given the information provided here, it would be very surprising if new data revise the number of deaths by April 18 to over 60,000 deaths.

Personally, I don’t believe that the actual number is very important. What is more important is that the number of deaths would have been dramatically higher without actions to stop the spread of the virus. Whether we compare 20,000, 40,000, or 60,000 deaths to 500,000 deaths seems irrelevant. No country, including Sweden, decided to let the virus just spread because the consequences were horrible. The actual number is also important for the timing and planning of the future. How and when to ease restrictions depends on the the current number of new cases rather than the cumulative number of deaths. For example, Italy has a large number of deaths, but was able to control the spread of the virus and can now start thinking about easing the lock-down.

Unfortunately, the number of Covid-19 deaths seems to be most important for politics. A higher death toll means that political leaders are responsible for a bigger disaster. At the beginning of the Covdi-19 crisis, Trump supporters pointed out that 12,000 US Americans died of H1N1 under the Obama administration.

Greg Musselwhite (R) FL20 on Twitter: "… "

Now that the death toll from Covid-19 is much higher than the death toll from Covid-19, the liberal media may be inflating the number of Covid-19 deaths to make Trump look as bad as possible. The fact that the true number is unknown and can only be estimated provides ample room for political bias in the presentation of the results.

Scientists should not be part of these partisan number-games. The best way to avoid this is to provide information in terms of ranges that express uncertainty about the true number. Ranges also make it easier for scientists to agree with each other and to speak with a common voice that reflects the objective data. Based on the information that I presented here, I would say that between 20,000 and 60,000 US Americans died of Covid-19 by April 18. As more data become available, uncertainty in this estimate will shrink.

The more important question remains how much US Americans and the rest of the world are able to control the spread of the virus in the future to prevent as many deaths as possible in the months to come and to find ways to live with the virus until a vaccine is available.

Will Warm Weather Make COVID-19 Toast? Don't Bet on It, Experts ...

Red states have fewer Covid-19 deaths because they are less urban, more religious, and less Black

Covid-19 does not care about politics, but the response to Covid-19 in the United States has divided the country just like many other issues are now perceived differently by liberal and conservative Americans. At least initially, Trump and some other conservatives did not take the Covid-19 pandemic very seriously. It is possible that this response influenced conservatives’ response to the pandemic (Beauchamp, 2020).

However, at the state-level, red states that voted for Trump, have lower fatalities than blue states that voted for Hillary Clinton in the 2016 election.

This relationship even holds after taking population size into account.

The negative correlation or r = -.53 is statistically speaking a strong correlation. It is also very unlikely to be a chance finding, p < .001.

A correlation does not mean that the relationship is causal. It is possible that some other variables in the background predict voting for Trump in the 2016 election and Covid-19 deaths. If other variables are causing this relationship they should (a) be stronger predictors of Covid-19 deaths and (b) weaken the relationship between voting for Trump and Covid-19 deaths. They should also have a plausible causal mechanism.

I was able to identify three variables that fulfill these criteria. First, states that voted for Trump are less urban and a virus spreads more easily in urban areas. Second, states that voted for Trump tend to be more religious. While media focus on a few churches that disobeyed stay-at-home orders, Christians in general are more likely to follow orders and probably also find it easier to spend time at home. Race has emerged as another predictor of Covid-19 deaths. There are numerous factors that may contribute to this relationship. As Black voters are less likely to vote for Trump, this is another potential variable that contributes to the negative correlation between Trump support and Covid-19 deaths.

Regression analyses confirm that the three variables predict unique variance in Covid-19 deaths between states and weaken the relationship between Trump support and Covid-19 deaths. The relationship of -.11 in Figure 3 is not statistically significant, although it is still negative.

It is noteworthy that the relationship between Trump support and deaths in Figure 3 could have become positive. This would be expected if Trump supporters would not engage in the same amount of social distancing and other measures that prevent the spreading of the virus. The fact that there is no significant positive path in Figure 3 suggests that I failed to identify additional variables that produce a negative relationship between Trump support and Covid-19 deaths or that support for Trump in 2016 is not a strong predictor of responses to the Covid-19 pandemic.


Covid-19 deaths (April 19)




Black Population

National Differences in Severity of Covid-19 Pandemic on April 19

Since mid-March many countries are in some form of lock-down or have instituted other measures to slow the spread of Covid-19. However, for many countries, mid-March was too late to prevent widespread community transmission of the virus. Just as countries differed in their response to Covid-19, we are going to see national differences in the way countries are trying to come out of lock-downs and to restart their economies. Rankings of nations in terms of case numbers or deaths inevitably raise questions about the rank ordering of nations. Just as much there are debates about the accuracy of the data. For example, relative to its population, Belgium has the highest death toll, but some Belgians argue that other countries are not counting deaths properly. The comparison of case numbers is severely distorted by different amounts of testing. To move beyond armchair epidemiology, we need to take a more careful look at the numbers.

The number of deaths relative to the population is a reasonable indicator of severity. The main problem with this indicator is that some countries may not counting all deaths. Another indicator is the number of people that test positive. This is not an ideal indicator for several reasons. First, some positive cases may only have mild symptoms. Second, countries that test little will severely underestimate the number of real cases. However, we would expect that countries that have a bigger problem have more deaths and also more positive tests. To adjust for national differences in testing, we can compute the positive rate; that is the percentage of tests with a positive result. For example, if one country conducts 1000 tests and has 20 positives, the positive rate is only 5%. In contrast, a country that has 100 tests and 10 positives, has very positives, but a higher positive rate of 10%.

Neither indicator is perfect and both indicators are likely to have some measurement error. Ideally, these errors are relatively independent of each other and cancel each other out. In that case, countries with a high death rate and a high positive rate are likely to be more severely affected than countries with a low death rate and a low positive rate.

To examine how well the two indicators are related, I used the death rates reported on Worldometers and computed the positive rate based on reported tests and positives on wikipedia. Deaths numbers were for April 19. The testing data were obtained on April 21 for the last available day of testing. Testing is not always reported daily, but positive rates do not change dramatically from day to day. Only countries with information about testing could be used.

Deaths were log-transformed to take exponential growth into account. The Pearson correlation between the two indicators was r = .68. The rank correlation was similar, r = .64, indicating that both indicators track severity of the Covid-19 pandemic.

Figure 1 also shows the location of individual countries. This makes it possible to examine potential factors that may explain discrepancies between the two indicators. For example, Belgium does indeed have more death than the regression of deaths on the positive rate predicts. However, the differences is relatively small and the positive rate also identifies Belgium as a country that has been severely affected by Covid-19. A notable outlier is Ecuador, and it is possible that death counts severely underestimate the Covid-19 spread in Ecuador. The same is probably true for Iran.

The most interesting question is whether the reliable variation across nations is simply random or whether there are systematic factors that explain why some countries were affected more than others. This blog post merely shows that deaths rates and positive rates are related and can be used to examine this question.