Category Archives: Uncategorized

Reply to Vianello, Bar-Anan, Kurdi, Ratliff, and Cunningham

I published a critique of the Implicit Association Test. Using structural equation modeling of multi-method studies, I find low convergent validity among implicit measures of attitudes (prejudice, self-esteem, political orientation) and low discriminant validity between explicit and implicit measures. The latter finding is reflected in high correlations between factors that reflect the shared variance among explicit measures and the shared variance among implicit measures. Using factor loadings to quantify validity, I found that the controversial race IAT has at most 20% valid variance in capturing racial attitudes. Most if not all of this variance is shared with explicit measures. Thus, there is no evidence that IAT scores reflect a distinct form of implicit prejudice that may operate outside of conscious awareness.

This article elicited a commentary by Vianello and Bar-Anan (ref.) and by Kurdi, Ratliff, and Cunningham (pdf). Here is a draft of my response to their commentaries. As you will see, there is little common ground; even the term “validity” is not clearly defined making any discussion about the validity of the IAT meaningless. To make progress as a science (or to become a science), psychologists need to have a common understanding of psychological measurement and methods that can be used to evaluate the validity of measures quantitatively.


Just like pre-publication peer-reviews, the two post-publication commentaries have remarkably little overlap. While Vianello and Bar-Anan (VBA) question my statistical analyses, Kurdi, Ratcliff, and Cunningham accept my statistical results, but argue that these results do not challenge the validity of the IAT.

VBA’s critique is clearer and therefore easier to refute by means of objective model comparisons. The key difference between VBA’s model and my model is the modelling of method variance. VBA’s model assume that all implicit measures of different constructs are influenced by a single method factor. In contrast, my model assumes that implicit measures of prejudice (e.g., the standard race IAT and the Brief Implicit Association Test with the same racial stimuli) share additional method variance. As these hypotheses are nested models, it is possible to test these competing models directly against each other. The results show that a model with content-specific method variance fits the data better (Schimmack, 2020a). The standard inference from a model comparison test is that the model with the worse fit is not an adequate model of the data, but VBA ignored the poorer fit of their model and present a revised model that does not model method variance properly and therefore produces misleading results. Thus, VBA’s commentary is just another demonstration of the power of motivated reasoning that undermines the idealistic notion of a self-correcting science.

KRC ask whether my results imply that the IAT cannot be a valid measure of automatic cognition?  To provide a meaningful answer to this question, it is important to define the terms valid, measure, automatic, and cognition.  The main problem with KRC’s comment is that these terms remain undefined. Without precise definitions, it is impossible to make scientific progress. This is even true for the concept of validity that has no clear meaning in psychological measurement (Schimmack, 2020c). KRC ignore that I clearly define validity as the correlation between IAT scores and a latent variable that represents the actual variation in constructs such as attitudes towards race, political parties, and the self.  My main finding was that IAT scores have only modest validity (i.e., low correlations with the latent variable or low factor loadings) as measures of racial preferences, no validity as a measure of self-esteem, and no proven validity as measures of some implicit constructs that are distinct from attitudes that are reflected in self-report measures. Instead, KRC consistently mischaracterize my findings when they write that “the reanalyses reported by Schimmack find high correlations between relatively indirect (automatic) measures of mental content, as indexed by the IAT, and relatively direct (controlled) measures of mental content.” This statement is simply false and confuses correlations of measures with correlations of latent variables.  The high correlations between latent factors that represent shared variance among explicit measures and implicit measures provide evidence of low discriminant validity, not evidence of high validity. Moreover, the modest loadings of the race IAT on the implicit race factor show low validity of the IAT as a measure of racial attitudes.

After mischaracterizing my results, KRC go on to claim that my results do “not cast any doubt on the ability of IATs to index attitudes or to do so in an automatic fashion” (p. 5).  However, the low convergent validity among implicit measures remains a problem for any claims that the IAT and other implicit measures measure a common construct with good validity. KRC simply ignore this key finding even though factor loadings provide objective and quantitative information about the construct validity of IAT scores.

The IAT is not the only research instrument with questionable construct validity.  However, the IAT is unique because it became a popular measure of individual differences without critical evaluation of its psychometric properties. This is particularly problematic when people are given feedback with IATs on the Project Implicit website, especially for IATs that have demonstrably no validity like the self-esteem IAT.  The developers of the IAT and KRC defend this practice by arguing that taking an IAT can be educational.  “At this stage in its development, it is preferable to use the IAT mainly as an educational tool to develop awareness of implicit preferences and stereotypes” However, it is not clear how a test with invalid results can be educational. How educational would it be to provide individuals with randomly generated feedback about their intelligence?  If this sounds unethical, it is not clear why it is acceptable to provide individuals with misleading feedback about their racial attitudes or self-esteem. As a community, psychologists should take a closer look at the practice of  providing online feedback with tests that have low validity because this practice may undermine trust in psychological science.

KRC’s commentary also fails to address important questions about the sources of stability and change in IAT scores over time. KRC suggest that “the jury is still out on whether variation in responding on the IAT mostly reflects individual differences or mostly reflects the effects of the situation” (p. 4). The reason why two decades of research have failed to answer this important question is that social cognition researchers focus on brief laboratory experiments that have little ecological validity and that are unable to demonstrate stability of individual differences over time. However, two longitudinal studies suggest that IAT scores measure stable attitudes rather than context-dependent automatic cognitions. Wil Cunningham, one of the commentators, provided first evidence that variance in IAT scores reflects mostly random measurement error and stable trait variance, with no evidence of situation-specific state variance (Cunningham et al., 2001). Interestingly, KRC ignore the implications of this study. This year, an impressive study examined this question with repeated measures of a six-year period (Onyeador et al., 2020; Schimmack, 2020). The results confirmed that even over this long time-period, variance in IAT scores mostly reflects measurement error and a stable trait without notable variance due to changes in situations.

Another important topic that I could only mention briefly in my original article is incremental predictive validity. KRC mention Kurdi et al.’s (2019) meta-analysis as evidence that the IAT and self-report measures tap different constructs. They fail to mention that the conclusions of this meta-analysis are undermined by the lack of credible, high-powered studies that can demonstrate incremental predictive validity. To quote Kurdi et al.’s abstract “most studies were vastly underpowered” (p. 569).  The authors conducted tests of publication bias, but did not find evidence for it.  The reason could be that they used tests that have low power to detect publication bias. Some studies included in the meta-analysis are likely to have reported inflated effect sizes due to selection for significance, especially costly fMRI studies with tiny sample sizes. For example, Phelps et al. (2000) report a correlation of r(12) = .58 between scores on the race IAT and differences in amygdala activation in response to Black and White faces.  Even if we assume that 20% of the variance in the IAT is valid, the validation corrected correlation would be r = 1.30. In other words, this correlation is implausible given the low validity of race IAT scores.  The correlation is also much stronger than the predictive validity of the IAT in Kurdi et al.’s meta-analysis. The most plausible explanation for this result is that researchers’ degrees of freedom in fMRI studies inflated this correlation (Vul et al., 2009). Consistent with this argument, effect sizes in studies with larger sample sizes are much smaller and evidence of incremental predictive validity can be elusive, as in Greenwald et al.’s study of the 2018 election.  At present, there is no pre-registered, high-powered study that provides clear evidence of incremental predictive validity. Thus, IAT proponents have failed to respond to Blanton et al.’s (2009) critique of the IAT. Responses to my renewed criticism suggest that IAT researchers are unable or unwilling to respond to valid scientific criticism of the IAT with active coping. Instead, they prefer to engage in emotion-focused, repressive coping that makes IAT researchers feel better without addressing substantive measurement problems.

In conclusion, my critique of the IAT literature and the response by IAT researchers shows a wider problem in psychology that I have called the validation crisis (Schimmack 2020c). Although measurement is at the core of any empirical science, many psychologists lack formal training in psychological measurement. As a result, they create and use measures of unknown validity. This is particularly true for social psychologists because social psychologists in the 1970s and 1980s actively rejected the idea that characteristics within individuals are important for the understanding of human behavior (“the power of the situation”). However, when the cognitive revolution started, the focus shifted from observable situations and behaviors to mental states and processes. To study these phenomena that are not directly observable requires valid measures, just like telescopes need to be validated to observe planets in distant galaxies. The problem is that social cognition researchers developed methods like the IAT to make claims about cognitive processes that are not observable to outside observers or by means of introspection without taking the time to validate these measures. To make progress, the next generation of social psychologists needs to distinguish clearly between constructs and measures and between random and systematic measurement error. As all measures are contaminated by both sources of measurement error, constructs need to be measured with multiple, independent methods that show convergent validity (Campbell & Fiske, 1959; Cronbach & Meehl, 1955).  Psychology also needs to move from empty qualitative statements like “the IAT can be valid” to empirically-based statements about the amount of validity of a specific IAT in specific populations in clearly defined situations. This requires a new program of research with larger samples, ecologically valid situations, and meaningful criterion variables.


Blanton, H., Jaccard, J., Klick, J., Mellers, B., Mitchell, G., & Tetlock, P. E. (2009). Strong claims and weak evidence: Reassessing the predictive validity of the IAT. Journal of Applied Psychology, 94, 567–582. doi:10.1037/a0014665

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105. doi:10.1037/h0046016

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302. doi:10.1037/h0040957

Cunningham, W. A., Preacher, K. J., & Banaji, M. R. (2001). Implicit attitude measures: Consistency, stability, and convergent validity. Psychological Science, 12(2), 163–170.

Onyeador, I. N., Wittlin, N. M., Burke, S. E., Dovidio, J. F., Perry, S. P., Hardeman, R. R., … van Ryn, M. (2020). The Value of Interracial Contact for Reducing Anti-Black Bias Among Non-Black Physicians: A Cognitive Habits and Growth Evaluation (CHANGE) Study Report. Psychological Science, 31(1), 18–30.

Schimmack, U. (2020a). Open Communication about the invalidity of the race IAT.

Schimmack, U. (2020b). Racial bias as a trait. (retrieved 4/21/20)

Schimmack, U. (2020c). The validation crisis. Meta-Psychology (blog)

Vul, E., Harris, C., Winkielman, P., & Pashler, H. (2009). Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition. Perspectives on Psychological Science, 4(3), 274–290.

Covid-19 behaves like tourists

Many people are wondering about variation in the Covid-19 pandemic across countries. Why (the North of) Italy and not Portugal. How was South Korea able to contain the virus, but other countries did not even though they had less time. The New York Times published a long article that examined this question, but nobody really knows.

Some of the speculations focus on biological factors that may protect individuals or may make them more vulnerable. However, so far these factors explain a small portion of the variation in death rates. The biggest predictor is the number of people who are infected by the virus. Australia and New Zealand have few deaths because Covid-19 did not spread widely among their populations.

One possible explanation could be the response of countries to the pandemic. Countries like the UK and Sweden may have more deaths because they did not lock down their countries. The problem with these speculations is that many factors are likely to contribute to the variation and it is difficult to spot these factors without statistical analyses.

The NYT article mentions that hundreds of studies are underway to look for predictors of variation across nations, but no results are being mentioned. Maybe researchers are cautious.

“Doctors who study infectious diseases around the world say they do not have enough data yet to get a full epidemiological picture, and that gaps in information in many countries make it dangerous to draw conclusions”

Drawing conclusions is different from exploring data. There is nothing dangerous about exploring patterns in data. Clearly many people are curious and statistical analysis can provide more valuable information than armchair speculations about climate or culture.

As a cross-cultural psychologists, I am familiar with many variables that distinguish nations from each other. The most prominent dimension is individualism. Western cultures tend to be more individualistic than Asian cultures. This might suggest that culture plays a role because Asian cultures have had fewer Covid-19 deaths. However, individualism as measured by Hofstede’s dimension is a weak predictor and did not survive statistical controls. Other dimensions that were less plausible also did not predict variation in Covid-19 deaths.

However, one variable that was a predictor was the number of tourists that travel to a country (tourism data).

Tourism reflects how connected a country is with the rest of the world. Australia and New Zealand are not only islands, they are also geographically isolated which explains why relatively few people visit these otherwise attractive locations. Covid-19 also has speared much of Eastern Europe and many Eastern European countries rank low on the tourism index.

Additional analysis show that tourism is becoming a weaker predictor over time. The reason is the recent rise of cases and deaths in Latin America. Latin America was relatively unaffected in April, but lately Ecuador and Brazil have seen alarming increases in cases.

The graph also shows that tourism does not explain all of the differences between countries. For example, the UK has way more cases than predicted by the regression line. This may reflect the slow response to the Covid-19 crises in the UK. Sweden is also above the regression line, possibly due to the policy to keep schools and businesses. Switzerland is a direct neighbor of the North of Italy, where the epidemic in Europe started. Canada is above the regression line, but was on the regression line on April 15. The reason is that Canada acted quickly in the beginning, but is now seeing a late increase in death in care homes.

In conclusion, these results suggest that timing is a big factor in the current differences across countries. Countries with high death tolls were simply unlucky to be at the center of the pandemic or well connected to it. As the pandemic progresses, this factor will become less important. Some countries, like Austria and (the South of) Germany that were hit early have been able to contain the spread of Covid-19. In other countries, numbers are increasing, but no country is seeing increases as dramatic as in Italy (or New York) where Covid-19 spread before social distancing measures were in place. New factors may predict what will happen in the times of the “new normal” when countries are trying to come out of lock-downs.

I don’t think that publishing these results is dangerous. The results are what they are. It is just important to realize that they do not prove that tourism is the real causal factor. It is also possible that tourism is correlated with some other variables that reflect the real cause. To demonstrate this, we need to find measures of these causal factors and demonstrate that they predict variation in death tolls of nations better than tourism and statistically remove the relationship of tourism with Covid-19 deaths. So, this blog post should be seen as a piece of a puzzle rather than the ultimate answer to a riddle.

Counting Covid-19 deaths is not as easy as it sounds.

Every scientific discipline is unique and requires special training. However, empirical sciences also have many things in common. One major problem in any empirical science is measurement. To make objective, scientific claims about the world, scientists need reliable and valid measures of the world. These days, we are used to measuring time, distance, and temperature with high precision for the range of values that we encounter in our everyday world. However, even these basic attributes are difficult to measure when we consider the speed of light or the length of a chromosome. Similarly, social scientists have concepts that are easy to understand. Did you have sex yesterday? Is a person dead or alive? However, when we want to measure how often US Americans or Canadians have sex or how many people died in the past month, measurement is a lot more difficult.

Death is a big deal and considerable effort and resources are invested in recording the time and the probable cause of death. This information is collected by statisticians and made publicly available. When we see these numbers, we forget about measurement error. If a newspaper article or a website shows us that there are 63,801 deaths, we think that there are 63,801 deaths. However, it is nearly certain that this number if false and that the true number is greater or smaller than 63,801 deaths. The real number is unlikely to be 10 times larger or 10 times smaller, but we should not assume that deaths can be counted like coins in our piggy bank.

While counting deaths is difficult, it is even more difficult to specify the cause of death. This makes it even more difficult to say how many US Americans have died so far of Covid-19. The task of quantifying Covid-19 deaths is more like estimating or guessing than counting. At best, we can try to reduce the uncertainty in our estimates.

One solution to the problem of estimation is to use different estimation strategies. If these different strategies produce convergent results, it is likely that the estimates reflect the truth. However, if two estimation strategies diverge, at least one of them has to be false – more false than the other. Ideally, scientists might work together and try to come up with a consensus based on different estimation methods. However, often this does not happen because scientists get rewarded for presenting their method as superior to other methods. In psychology and in some other sciences, researchers that try to make sense of conflicting claims by original scientists are called meta-analysts or meta-scientists. In this blog post, I take a meta-analytic perspective on different measures of Covid-19 deaths.

I focus on US states because the US has provided

1. Confirmed Covid-19 Deaths

The most widely used estimate of Covid-19 deaths is based on testing of deceased people. Assuming a fairly low amount of measurement error in Covid-19 tests, this method seems to be close to perfect. However, there are problems. First, not every coroner has the resources to test for Covid-19. Second, some patients had pre-existing conditions and it is not clear whether Covid-19 was the actual cause of death. Given these problems, statistics are often reported with a disclaimer that the numbers may be inaccurate.

Aside from uncertainty, it is also likely that the numbers underestimate the actual number of Covid-19 deaths. The reason is that it is more likely that a death caused by Covid-19 is attributed to some other disease than to attribute a death caused by Covid-19 to another disease. This is certainly true if the cause of death is determined by a test for Covid-19. However, sometimes Covid-19 is considered a probable cause without a diagnostic test. The problem is that it is unclear how big the discrepancy between confirmed and actual Covid-19 deaths is. If there are 100 confirmed Covid-19 deaths, is the real number 101, 120, 200, or 1000?

2. Excess All-Cause Mortality

An alternative way to estimate Covid-19 deaths relies on the count of all deaths. The number of people that die every year in a specific week or month of the year is fairly stable from year to year. The main difference between previous years and the year 2020 is that a new virus started killing people. Assuming everything else remains the same (a big assumption), people who died of Covid-19 would be additional deaths, except for a small number of people who got infected and would have died anyways. This makes it possible to estimate the number of Covid-19 deaths without a diagnostic test. if normally 500 people die in the week from April 1 to April 7, and 550 people died in the same week in 2020, this method suggests that 50 people died of Covid-19. The problem with this method is that it makes several assumptions and the estimate will be inaccurate if these assumptions are wrong. For example, the number of deaths varies as a function of the severity of the flu season and if the flu was very mild, there could be only 450 deaths without Covid-19 in 2020 and the true number of Covid-19 deaths is 100. Or the flu was more severe and fully accounted for all 550 deaths and there were zero Covid-19 deaths.

Another problem is that deaths are not reported in real time to the Center for Disease Control and Prevention. The numbers for the critical weeks of March and April are still being revised and updated. This means that current numbers are likely to underestimate deaths in 2020 and therewith Covid-19 deaths. Statisticians have a way to deal with this problem and can use information about the typical delay in reporting to correct for the delay in reporting. The problem is that this means more assumptions and more room for error.

Despite these problems, several media-reports have used estimates based on excess-all-cause mortality to make strong claims about the “true” number of Covid-19 deaths.

One problem with these claims is that official Covid-19 death numbers have been revised to incorporate presumed cases without a Covid-19 test. This has led to some dramatic increases in the numbers for severely affected states like New York (National Post).

The question is whether there are still tens of thousands of Covid-19 deaths that are missing even after this revision of the numbers.

Data about all-cause mortality and excess-all -cause mortality are provided by the CDC (

The dark blue bars show the recorded deaths. These numbers do not show a notable increase in deaths in March and April. The reasons is that there is a lag in reporting. This means that the final all-cause-mortality numbers for March and April 2020 are still unknown. We know that the numbers are going to go up, but we do not know how much. The light blue bars are predictions about the final numbers based on information about the typical reporting lags. The red starts show weeks with predicted excess deaths. These predictions show excess deaths for the time period from March 29 to April 18. For the most recent week from April 19 to April 25, the reported numbers are so low that even the predicted values are well below the typical level. It is unlikely that this is a real finding given the recording of thousands of Covid-19 deaths in this time period (

The CDC datafile records excess deaths for the critical weeks. The sum of these values is 19,086 excess deaths that may be attributed to Covid-19.

In contrast, the WordOMeter statistics shows 39,331 Covid-19 deaths in the United States on April 18. This is more than double the number that the excess-deaths method suggests. Even the more conservative estimate by the Covid-Tracking team lists 34,557 deaths, which is still considerably more than the estimate based on excess-all-cause mortality. Thus, there is no evidence that these estimates severely underestimate the number of Covid-19 deaths in the United States.


Even though death can be determined objectively, the actual number of Covid-19 deaths is unknown. Scientists can only provide reasonable estimates and different estimation methods produce different estimates. Counting only confirmed cases that were tested for Covid-19 underestimates the actual number because some people died of Covid-19 and were not tested. Including presumed cases addresses this problem, but it creates uncertainty. There may be a bias to attributed deaths to Covid-19 or a bias in the opposite direction. Examining excessive all-cause mortality avoids the problem of diagnoses but creates many new problems. One problem is that the final numbers of deaths in April is still unknown because it takes time to collect this information. Given all of these problems, there is a wide range of estimates. It could be as few as 19,000 and it could be more than 40,000. However, there is no scientific evidence to claim that tenth-of-thousands more than the official 40,000 deaths were caused by Covid-19. Given the information provided here, it would be very surprising if new data revise the number of deaths by April 18 to over 60,000 deaths.

Personally, I don’t believe that the actual number is very important. What is more important is that the number of deaths would have been dramatically higher without actions to stop the spread of the virus. Whether we compare 20,000, 40,000, or 60,000 deaths to 500,000 deaths seems irrelevant. No country, including Sweden, decided to let the virus just spread because the consequences were horrible. The actual number is also important for the timing and planning of the future. How and when to ease restrictions depends on the the current number of new cases rather than the cumulative number of deaths. For example, Italy has a large number of deaths, but was able to control the spread of the virus and can now start thinking about easing the lock-down.

Unfortunately, the number of Covid-19 deaths seems to be most important for politics. A higher death toll means that political leaders are responsible for a bigger disaster. At the beginning of the Covdi-19 crisis, Trump supporters pointed out that 12,000 US Americans died of H1N1 under the Obama administration.

Greg Musselwhite (R) FL20 on Twitter: "… "

Now that the death toll from Covid-19 is much higher than the death toll from Covid-19, the liberal media may be inflating the number of Covid-19 deaths to make Trump look as bad as possible. The fact that the true number is unknown and can only be estimated provides ample room for political bias in the presentation of the results.

Scientists should not be part of these partisan number-games. The best way to avoid this is to provide information in terms of ranges that express uncertainty about the true number. Ranges also make it easier for scientists to agree with each other and to speak with a common voice that reflects the objective data. Based on the information that I presented here, I would say that between 20,000 and 60,000 US Americans died of Covid-19 by April 18. As more data become available, uncertainty in this estimate will shrink.

The more important question remains how much US Americans and the rest of the world are able to control the spread of the virus in the future to prevent as many deaths as possible in the months to come and to find ways to live with the virus until a vaccine is available.

Will Warm Weather Make COVID-19 Toast? Don't Bet on It, Experts ...

Red states have fewer Covid-19 deaths because they are less urban, more religious, and less Black

Covid-19 does not care about politics, but the response to Covid-19 in the United States has divided the country just like many other issues are now perceived differently by liberal and conservative Americans. At least initially, Trump and some other conservatives did not take the Covid-19 pandemic very seriously. It is possible that this response influenced conservatives’ response to the pandemic (Beauchamp, 2020).

However, at the state-level, red states that voted for Trump, have lower fatalities than blue states that voted for Hillary Clinton in the 2016 election.

This relationship even holds after taking population size into account.

The negative correlation or r = -.53 is statistically speaking a strong correlation. It is also very unlikely to be a chance finding, p < .001.

A correlation does not mean that the relationship is causal. It is possible that some other variables in the background predict voting for Trump in the 2016 election and Covid-19 deaths. If other variables are causing this relationship they should (a) be stronger predictors of Covid-19 deaths and (b) weaken the relationship between voting for Trump and Covid-19 deaths. They should also have a plausible causal mechanism.

I was able to identify three variables that fulfill these criteria. First, states that voted for Trump are less urban and a virus spreads more easily in urban areas. Second, states that voted for Trump tend to be more religious. While media focus on a few churches that disobeyed stay-at-home orders, Christians in general are more likely to follow orders and probably also find it easier to spend time at home. Race has emerged as another predictor of Covid-19 deaths. There are numerous factors that may contribute to this relationship. As Black voters are less likely to vote for Trump, this is another potential variable that contributes to the negative correlation between Trump support and Covid-19 deaths.

Regression analyses confirm that the three variables predict unique variance in Covid-19 deaths between states and weaken the relationship between Trump support and Covid-19 deaths. The relationship of -.11 in Figure 3 is not statistically significant, although it is still negative.

It is noteworthy that the relationship between Trump support and deaths in Figure 3 could have become positive. This would be expected if Trump supporters would not engage in the same amount of social distancing and other measures that prevent the spreading of the virus. The fact that there is no significant positive path in Figure 3 suggests that I failed to identify additional variables that produce a negative relationship between Trump support and Covid-19 deaths or that support for Trump in 2016 is not a strong predictor of responses to the Covid-19 pandemic.


Covid-19 deaths (April 19)




Black Population

National Differences in Severity of Covid-19 Pandemic on April 19

Since mid-March many countries are in some form of lock-down or have instituted other measures to slow the spread of Covid-19. However, for many countries, mid-March was too late to prevent widespread community transmission of the virus. Just as countries differed in their response to Covid-19, we are going to see national differences in the way countries are trying to come out of lock-downs and to restart their economies. Rankings of nations in terms of case numbers or deaths inevitably raise questions about the rank ordering of nations. Just as much there are debates about the accuracy of the data. For example, relative to its population, Belgium has the highest death toll, but some Belgians argue that other countries are not counting deaths properly. The comparison of case numbers is severely distorted by different amounts of testing. To move beyond armchair epidemiology, we need to take a more careful look at the numbers.

The number of deaths relative to the population is a reasonable indicator of severity. The main problem with this indicator is that some countries may not counting all deaths. Another indicator is the number of people that test positive. This is not an ideal indicator for several reasons. First, some positive cases may only have mild symptoms. Second, countries that test little will severely underestimate the number of real cases. However, we would expect that countries that have a bigger problem have more deaths and also more positive tests. To adjust for national differences in testing, we can compute the positive rate; that is the percentage of tests with a positive result. For example, if one country conducts 1000 tests and has 20 positives, the positive rate is only 5%. In contrast, a country that has 100 tests and 10 positives, has very positives, but a higher positive rate of 10%.

Neither indicator is perfect and both indicators are likely to have some measurement error. Ideally, these errors are relatively independent of each other and cancel each other out. In that case, countries with a high death rate and a high positive rate are likely to be more severely affected than countries with a low death rate and a low positive rate.

To examine how well the two indicators are related, I used the death rates reported on Worldometers and computed the positive rate based on reported tests and positives on wikipedia. Deaths numbers were for April 19. The testing data were obtained on April 21 for the last available day of testing. Testing is not always reported daily, but positive rates do not change dramatically from day to day. Only countries with information about testing could be used.

Deaths were log-transformed to take exponential growth into account. The Pearson correlation between the two indicators was r = .68. The rank correlation was similar, r = .64, indicating that both indicators track severity of the Covid-19 pandemic.

Figure 1 also shows the location of individual countries. This makes it possible to examine potential factors that may explain discrepancies between the two indicators. For example, Belgium does indeed have more death than the regression of deaths on the positive rate predicts. However, the differences is relatively small and the positive rate also identifies Belgium as a country that has been severely affected by Covid-19. A notable outlier is Ecuador, and it is possible that death counts severely underestimate the Covid-19 spread in Ecuador. The same is probably true for Iran.

The most interesting question is whether the reliable variation across nations is simply random or whether there are systematic factors that explain why some countries were affected more than others. This blog post merely shows that deaths rates and positive rates are related and can be used to examine this question.

Covid-19 in New York

I live in the middle of a large North American city; Toronto. I also lived through the SARS crisis in 2003 that killed several Torontonians before a major outbreak was prevented. So, I was concerned when Covid-19 cases were rising in Toronto and when numbers were exploding in New York City. Would we be next?

As it turns out, Toronto and many other big cities all over the world did not have a major disaster like New York. This made me wonder why the coronavirus pandemic affected New York City more than other urban centers.

Using newspaper reports, I constructed a time-line of events that help to understand what happened. The key factor that explains New York City’s fate is that health officials did not have tests to detect the spreading of the virus that occurred in the early weeks of March and possibly even in February. However, the lack of testing does not explain why other cities that were testing did not find more cases at the same time. Thus, the disaster in NY is the result of a combination of two factors. The virus arrived in NY earlier than in other cities and it was able to spread undetected because there were insufficient tests. While it is unclear why Covid-19 hit NYC earlier than other cities, the lack of tests can be explained by two problems. The US center for disease control was slow in developing and producing Covid-19 tests and the Food and Drug Administration did not allow private companies to carry out tests when the virus was already spreading in NYC. If tests had been available two weeks earlier, social distancing measures could have been introduced earlier to reduce the death toll considerably.

March 1

Covid-19 arrived early in the United States. The first outbreak with several deaths occurred in Washington State. The virus spread undetected for several weeks. Seattle is far away from NY city, but it takes only a few hours to fly from one city to the other. Flights also arrived daily from many international locations with confirmed Covid-19 cases. It was only a matter of time, till the first Covid-19 case would arrive in New York.

The NYT reported the first official case on March 1 (NYT.03.01). The article shows that officials believed to be in control of the situation, although it is likely that the virus was already spreading undetected in the city.

New York City’s Health Commissioner, Oxiris Barbot also assured New Yorkers that they had nothing to worry about.

Major de. Blasio also reassured NY residents that the situation was under control.

Similar reassurances were printed in other news outlets.

New York Post

It is not clear whether officials were simply trying to be optimistic or to prevent a panic, but they had no way of knowing how bad the situation was because they had conducted only nine tests for the virus at that point.

Statisticians are trained to make a clear distinction between the absence of evidence and the evidence of absence, but it is a human fallacy to mistake the two. Apparently, officials in NY were also treating the absence of confirmed cases as evidence that Covid-19 was not in NYC, although it is obvious that no cases can be confirmed without testing. The need for positive evidence that was unattainable delayed school closures and other measures that could have slowed the spread of the virus in early March.

March 4

On March 4th the New York Times reports that Seattle introduced several measures to deal with the Covid-19 outbreak . The same NY records 9 positive cases (NYT.3.4).

Probably because tests were limited, health officials advised people not to get tested for Covid-19 even if they had symptoms.

The illusion was that officials still had control over the virus and could trace infections, while the virus was spreading undetected among NY residents.

March 7

More testing produced more positive results. The NYT reports that 89 positive cases, including an Uber driver in Queens (NYT.3.7). Governor Cuomo declares a state of emergency to prepare a response to the spread of the virus. At the same time, he aims to calm New Yorkers.

Another article also still sounds optimistic about the fate of NYC. “New York City, of course, has not faced the cataclysmic impact of the virus that has been visited upon areas of China or Iran or Italy, and government officials are scrambling to ensure it remains that way” (NYT.3.7.b).

Another article mentions that school closures are a last resort (NYT.3.7.c). The article also points to systematic problems that may explain why NYC was hit harder by Covid-19 than other big cities in the United States or in other countries.

In hindsight it becomes apparent that the attempt to protect vulnerable students endangered them and many more vulnerable New Yorkers even more.

False information may have further led to the fatal decision to keep things running. “The federal Centers for Disease Control and Prevention have advised that, so far, children have been less likely than adults to become infected.” Instead, they may have been asymptomatic carriers of the virus and passed it on to older family members. It is not clear whether the misinformation was on the linked website or a misunderstanding by the reporter.

And today, after everybody has learned in the most painful way possible about exponential growth, the comment by a math teach seems incredibly naive.

March 8

Cuomo publicly criticizes the Center for Disease Control for preventing private labs to start testing (NYT.3.8). The lack of testing is of course a main reason why the virus could spread unnoticed in NYC. However, even with the limited amount of testing, it was apparent that community transmission was happening.

On March 8, the testing situation seemed to be at least temporarily resolved because NY received a shipment of 23,000 tests.

Although aware of a crisis, officials seemed to be unable to foresee the exponential increase in cases, expecting only several hundred cases over a time span of several weeks.

It is not clear whether officials did not have access to models or whether models did not predict a dramatic increase in cases. It is clear, however, that they underestimated the severity of the situation on March 7 and did not enact drastic social distancing measures to slow the spread of the virus.

March 10

On March 10, the biggest news was that prisoners would produce hand sanitizer. The major of NYC, Mr. de Blasio, announced that there were still no plans to close schools.

March 11

On March 11, the World Health Organization (WHO) declares Covid-19 a pandemic (NYT.3.11).

That day, the New York Times reported 328 confirmed cases for NY state and 96 for NYC. These numbers may still seem reassuringly low, but are meaningless because testing was still very limited.

In response to the evolving situation, Governor Cuomo announced a ban of gatherings of more than 500 people. In hindsight, we may say that this was too little too late to prevent spreading of the virus in NYC.

Equally useless was Trump’s travel ban for flights from Europe. This might have worked several weeks earlier, but on March 11 the virus was already in North America. Banning flights out of NY to other parts of the US might have been a more rational response to the actual pandemic.

March 13

March 13 is the first day on the covid-tracking website that lists more than 1,000 tests for NY state. Out of 2,779 tests 421 results were positive. This means 15% produced a positive result. In comparison, South Korea or other countries that were able to get ahead of the virus like Australia or New Zealand have positive rates of 2%. Even today several US states in lock-down have lower positive rates. If NY needed positive evidence that the virus is out of control, here it was.

But March 13 was a Friday and not much happened in response to this information.

March 15

The Covid-Tracker recorded 4543 tests and 729 positive results. While testing increased by 63%, positive results increased by 73%. The positive rate rose to 16%. The CovidTracking also projects lists 3 deaths for this day, while a New York Times article reports 5 deaths (NYT.315c). As it takes several weeks from infection to deaths, these deaths indicate that the virus was spreading in NY at the beginning of the month.

New York’s major announced the closure of bars and restaurants (NYT.3.15).

The announcement was made at the end of the March 14/15 weekend. It is not clear how many people got infected because bars and restaurants were not closed earlier on Friday to prevent spreading of the virus over the weekend.

The NYT article also implies that this decision was not made on the basis of an assessment of the local situation in NY, but rather in response to general guidelines by the Center for Disease Control for the United States in general.

The article further suggests that de Blasio and Cuomo reacted to social pressure rather than in response to a situation that was already out of control.

The same day, it was announced that schools will be closed. Again, this decision was made in response to pressure from parents and teachers and not a proactive decision by the major (NYT.3.15b).

March 17

The New York Times conducted a very informative interview with Governor Cuomo (NYT.3.17).

I find this answer a bit difficult to interpret the answer. One-thousand cases is a lot, especially because there were only about 300 cases on March 11, implying that cases tripled in one week. However, this increase is partly due to more testing. According to the Covid-Tracking numbers, testing also increased from a few hundred to over 5,000 tests. Thus, the number of cases alone is a meaningless (symbolic) number, not an assessment of the situation in New York that explains a sudden shift in policies, closing of schools, and closing of restaurants. In the background are models and something must have changed in the modeling predictions.

It is not clear what the models’ predictions were on March 1 or on March 8 or whether the model was unable to make predictions because there were no testing data. What is clear is that the model predicted the disaster in NY when it was too late. As deaths follow infections with a lag, most of the 5,000 deaths by April 7 can be attributed to infections that occurred before March 17.

Despite the severity of the model predictions, Cuomo did not issue a stay at home order. The New York Times reports about disagreement between New York’s major and governor Cuomo, with Cuomo explicitly opposing more drastic measures (NYT.3.17.b).

March 18

The most important news was that positive results skyrocketed by more than 1,000 cases in one day, which implies that cases nearly doubled in one day (NYT.3.18).

However, the article implies that the governor did not think that this was an alarming finding.

However, even if there were 5,000 tests, finding 1,000 positives is not a good outcome. It implies a positive rate of 20%. If we assume that only people with flu-like symptoms were tested, it means that 1 out of 5 patients with flu-like symptoms were Covid-19 positive. Such high levels imply that there are many more people with and without symptoms who are infected by the virus. The alarm bells should be ringing. Instead Cuomo seems to make light of the result as if it is just a statistical blip.

Governor Cuomo introduced further measures, but did not issue a full stay-at-home order.

A New York Times article reports that parks and playgrounds remain open (NYT.3.18.b).

March 20

On March 20, governor Cuomo did what he said he would never do; he told New Yorker’s to stay at home and closed non-essential businesses (NYT.3. 20).

Cuomo explained that the new measures were put in place because the numbers were still going up. However, nobody could have expected that the numbers would stop increasing within a couple of days or even a week of the earlier measures. The numbers would be going up for a long time. The only question was how quickly they would be going up. And by issuing a stay at home order sooner than later, they would have increased slower.

March 23

With 20,875 cases and a positive rate of 36%, the number of cases in NYC is exploding. While nobody seemed to believe that Covid-19 was in NYC a couple of weeks ago, the positive evidence now made it seem logical and inevitable that Covid-19 was spreading in early March when no actions were taken (NYT.3.23).

One can only ask, why New York did not enact measures earlier, even without testing, if it was clear that an outbreak in New York was inevitable and that the virus would spread more quickly than in Seattle.

March 24

At this point, governor Cuomo had become a household name all over the United States (NYT.3.24).

He is able to get things done. He obtained the rights to do testing in state. New York became the leader in testing. As a result, he had a clear picture of the situation than anybody else in the United States. And yet, despite this information, he still underestimated the severity of the crisis.

Cuomo’s strength is apparent in the handling of the disaster, but questions can be raised about his ability to foresee the scale of the disaster and to avoid or at least mitigate it early on.

March 26

On March 26, the situation in NY was dire. The number of deaths had nearly doubled from 3,800 to 6,800 in one day. In contrast, numbers in other states were low and some states had not recorded a single Covid-19 death. Was New York simply the canary in the coalmine (after Washington) or was New York a unique hot-spot in North America? Experts were not sure (NYT.3.26).

One expert suggests that the virus arrived in NY as early as January and was able to multiply undetected.

If the virus arrived this early, there would have been Covid-19 related deaths in February, but no such deaths were recorded. Maybe the lack of testing made it impossible to identify Covid-19 as the cause of death.

Urban density is a salient feature of NY, but NY is not the only urban, dense city in the world. Yet, other urban cities especially in Asia did not become hot-spots like New York.

Maybe Asian urban cities were spared because Asian countries were very good at keeping Covid-19 out of their countries or in South Korea’s case control the outbreak before it reached Seoul. However, other big cities in North America like Chicago or Toronto have also not seen the same level of cases or deaths as New York.

At least on March 26, there were no clear answers.

However, the question is clear. Why did New York City have more cases than other big cities in the early weeks of March or even February? Undetected spread is not the answer because other big cities in the US also had no tests. For example, the entire state of Illinois had conducted less than 500 tests by March 13. Thus, there was ample time and opportunity for the virus to spread in Chicago, but it didn’t. Was New York just unlucky?

April 15

On April 15, Cuomo announces an executive order to wear a face mask in public, when it is impossible to maintain social distancing (NYT.3.15). This order follows a major reversal in recommendations that were given by the World Health Organization. Why people were not advised to cover their mouth in crowded places like subways is another important question. Maybe the wearing of face masks can explain how densely populated cities in Asia were able to avoid the spread of the virus. It is possible that an executive order to wear masks in early March could have saved a lot of lives, but at this point there is insufficient information about the transmission of the virus in public places to be sure. The fact that other cities with heavy use of public transport like Chicago or Berlin did not suffer the same fate as New York suggests that public transportation alone is not a key factor in the spread of the virus.

April 16

There is light at the end of the tunnel (NYT.4.16). The worst is over and the question is how and when NY will start coming back to life. The situation is now much more predictable and controllable, especially because testing makes it possible to track and trace cases. The lack of tests is clearly a big factor that contributed to the disaster. How much the death toll could have been reduced by earlier, more decisive actions in the beginning of March may be an academic question that will never get a satisfactory answer. The coming months will provide new data that may help to explain why New York lost so many more lives than other North American cities. Hopefully, the whole world will learn from mistakes that were made to avoid repeating them when the next virus comes.

Fake Facts in Covid-19 Science: Kentucky vs. Tennessee

An idealistic picture of science views science as an orderly process where disinterested, objective experts look at crystal clear evidence to find the truth. In reality, science is a lot messier. Scientists are human and their values and preexisting theories influence which information they seek out or neglect. Evidence is often inconclusive. As a result, the truth is often elusive. This is especially true in new sciences that lack well-established foundations. In comparison to all other science, Covid-19 science is the newest science that is less than a year old. On top of this, the global, interdisciplinary effort to make sense of Covid-19 data is happening at lightning speed and many participants in this collective effort are not trained epidemiologists. There are simply not enough epidemiologists and many questions that arise are not epidemiological questions. For example, racial disparities in Covid-19 fatalities are extremely important, but it is unlikely that race is a parameter in epidemiological models of a pandemic. The main problem of a new, fast, interdisciplinary science is that a lot of errors will be made. Errors are also part of the scientific process, but a crisis in psychology has demonstrated that it can be difficult to correct errors. Covdi-19 cannot afford costly errors. One reason for high death rates in the United States is that the initial tests did not work well and the virus could spread undetected in NY. Once testing was available it was too late and infections grew exponentially.

One important question for all countries is how effective social distancing is and how important it is to shut schools, close non-essential businesses, and to enforce quarantine and social distancing with laws. In the United States this question is filtered through a political lens with implications for the upcoming 2020 election. Ironically, liberals favor strict laws, whereas conservatives favor less restrictive policies. These preexisting tendencies make it difficult to look at the scientific evidence objectively. I present a graph that compares Kentucky and Tennessee as an example.

Kentucky versus Tennessee

As explained in a newspaper article, this graph was made by Stephanie Jolly from Kentucky and shared with friends on social media (Vergano, 2020). The problem is not Jolly’s graph, but the uncritical sharing of the graph as if it presented hard evidence that imposing stricter social-distancing measures earlier has a strong influence on the spread of Covid-19.

First, we can fact-check the graph using the numbers on the Covid-19 tracking website ( The website shows 3,633 cases for Tennessee and 917 cases for Kentucky. Thus, there is no mistake with the data, but this does not mean that the Covid-19 pandemic is more severe in Tennessee than in Kentucky. This conclusion requires ruling out alternative explanations for the difference.

The first question is whether Tennessee simply has more inhabitants than Kentucky because states with a larger population will have more cases even if Covid-19 is equally common in both states. Tennessee has 6.8 million inhabitants, while Kentucky has 4.5 million inhabitants. Adjusted for population, Tennessee still has nearly two times as many cases as Kentucky (778 vs. 459 per 1 Million inhabitants).

The next problem is that the number of cases depends on the amount of testing. More testing is considered proactive and important in fighting Covid-19, but states that test more will look worse in a comparison of confirmed cases because they will find more cases. In contrast, a state that doesn’t test at all, would look golden because they report zero Covid-19 cases. As it turns out, Tennessee was actually testing more than Kentucky. On April 5, Tennessee recorded 45,300 tests versus 16,663 tests in Kentucky. The graph below shows the numbers adjusted for population size.

The graph shows that the difference in positives (dark blue vs. dark green) is matched by differences in the amount of testing. To get a better picture of the prevalence of Covid-19 in the two states, it is better to compute the rate of positive test results over the number of tests that were conducted.

Now the two states look more similar. However, there still seems to be a difference after April 3, just in time for different policies to show an effect. This still does not mean that there are real differences between the two states because positive rates are also influenced by the criteria that are used for testing.

One way around the problem with testing is to use deaths as an alternative indicator. If Tennessee has a bigger Covid-19 problem with more cases, we would also expect to see more deaths in Tennessee than in Kentucky. Once more, it is important to take differences in population size into account. When this is done, Tennessee and Kentucky are similar. This is more consistent with the hypothesis that Covid-19 is equally prevalent in the two states and that differences in positive cases are due to different amount of testing.

How is it possible that two states with different policies have similar numbers of Covid-19 cases and deaths? Does this mean social distancing is not effective? Not necessarily. It is also possible that citizens in the two states behaved similar despite differences in official policies. Evidence supports the latter explanation (Vergano, 2020).

Smartphone data show that citizens in Kentucky and Tennessee responded similarly to the Covid-19 pandemic.


Although Tennessee reported more positive tests of Covid-19 there is no evidence that the virus is more prevalent in Tennessee than in Kentucky. The differences are mostly due to differences in testing. Thus, there is no evidence that differences in government rules influenced Covid-19 spread. As a result, the comparison of these two states cannot be used as a fact to support claims that states that issued stay-at-home orders later or not at all have a higher prevalence of Covid-19 (cf. B. L. Jewell & N. P. Jewell, 2020).

There is no need to rely on the hypothetical calculations that we have described. The recent divergence of epidemics in Kentucky and Tennessee shows that even a few days’ difference in action can have a big effect. Kentucky’s social distancing measure was issued March 26; Tennessee waited until the last minute of March 31. As Kentucky moved to full statewide measures in reducing infection growth, Tennessee was usually less than a week behind. But as of Friday, the result was stark: Kentucky had 1,693 confirmed cases (379 per million population); Tennessee had 4,862 (712 per million)”

This does not mean that earlier social-distancing measures could have prevented more deaths. As Dr. Anthony Fauci, the director of the National Institute of Allergy and Infectious Diseases, observed, it follows logically that more lives are saved by reducing the spread of a deadly virus.

“Obviously, you could logically say that if you had a process that was ongoing and you started mitigation earlier, you could have saved lives”

This is especially true for places like NY state, where the virus had spread undetected until the WHO declared a pandemic on March 12. Kentucky and Tennessee were relatively lucky that measures taken after March 12 were able to reduce the spread of the virus and to prevent a major disaster. A scientific study of the effects of different policies in different US states on the spreading of Covid-19 is difficult and may never produce conclusive results because there are so many confounding factors.

This is arguably the main contribution that psychological science can make to the stud of Covid-19. Science is a lot harder than collecting some data, making a graph, and telling a story about it. A big part of science is to check facts and to correct claims that turn out to be wrong. A key characteristic of science is that it is self-correcting, but often the process of self-correction is painfully slow. With Covid-19, we don’t have time. So, we need to speed up the self-correction process just as much as we need to speed up the process of hypothesis generation and testing. The best evidence that social-distancing is working – if anybody is doubting this – is that the spread of Covid-19 has slowed down all over the world, including the United States and disaster areas like New York, New Jersey, Michigan, and Louisiana. The reason Tennessee didn’t see more cases is that individuals enacted social distancing earlier than it was mandated by the state government. More data on individuals’ responses to Covid-19 will help to see how important governments’ response to Covid-19. Unfortunately, this will take time and we may have to make decisions without conclusive scientific facts, but at least, we can avoid making decisions on the basis of false facts.

How Valid are Comparisons of Cases and Deaths Across US States and Canadian Provinces?

We are bombarded with statistical information about Covid-19. There is also a lot of uncertainty about the reliability of the data. Case counts are biased by differences in testing. Deaths rates are biased because some coroners because tests are prioritized for the living.

As psychologists, we are used to measurement error; lot’s of it. We deal with this problem by examining whether different measures that are supposed to measure the same thing (construct) are correlated with each other. The more two measures of the spread of the virus are correlated with each other, the more we trust these measures to reflect differences in the spread of the virus across states or provinces. We call this convergent validity.

To examine the validity (a.k.a, trustworthiness) of different Covid-19 statistics, I examined how well these measures correlate across 45 US states and 4 Canadian provinces. Some smaller states/provinces with less reliable data were excluded.

1. One statistic is the simple count of people who tested positive. Of course, this number is influenced by the population of a state or province. To adjust for this, I computed the number of positive cases for every 1 million inhabitants.

2. Another statistic is testing. Testing is important for two reasons. First, more testing is considered a weapon in the fight against Covid 19. Thus, a higher number is desirable. Second, testing can bias the comparison of cases. The more tests are conducted, the more positive cases will be detected. Thus, differences between states and provinces in the number of tests conducted can be a source of measurement error in comparisons of case counts.

3. The positive rate is the percentage of positive tests; that is, positives divided by tests. This measure takes differences in testing into account and can show that a state has a low count of cases simply because they don’t test a lot. Thus, I would expect this to be a more valid measure of the severity of the pandemic in a state.

4. Fatalities. The number of deaths is arguably the most important measure of the pandemic. If nobody would die from COVID-19, we would not have to close schools and shutter non-essential businesses. As for case counts, it is important to divide the absolute number of deaths by the population to make meaningful comparisons of states and provinces. I used the number of deaths for every 10 million inhabitants as a measure.

5. The case-fatality-rate is the proportion of people who test positive and die. Case-fatality rates are important to estimate how deadly COVID-19 is. However, they are strongly influenced by the criteria that are used for testing and the amount of testing that is done. Case-fatality rates can also differ because states differ in terms of medical equipment (ventilators), age of population, or differences in preexisting health conditions.

Table 1 shows the Pearson correlations below the diagonal and rank-order correlations above the diagonal.

The results show that both cases and deaths are highly correlated with each other, indicating that both measures are valid indicators of the severity of the COVID-19 pandemic across states and provinces.

Not surprisingly, the number of cases is also correlated with the number of tests that are conducted. This suggests that correcting for tests might improve validity. Surprisingly, simple case counts are better predictors of deaths than the positive rate (cases / tests). One possible explanation for this is that the quality of testing data is not very good and introduces a bias in the testing numbers and the positive rates. The data on the Covid tracking site have indeed shown some dramatic changes when reporting of negative results improved (

Another surprising finding is that case-fatality rates are virtually unrelated to the amount of testing. This is surprising because wider testing should identify more individuals with mild symptoms who are more likely to survive. This has been used as an explanation for low case-fatality rates in Germany, which tested more than other European countries with higher case fatality rates. The results here suggest that differences in testing may play a smaller role than is commonly assumed. Once more, unreliable testing data may contribute to this finding. Another factor is that testing is related to deaths. In states with a big outbreak like NY, testing has increased a lot more than in states with low fatality rates.

In conclusion, the main finding is that simple case counts are surprisingly, at least to me, valid indicators of the severity of COVID-19 across US states and Canadian provinces, despite big differences in the amount of testing that is carried out. Thus, the number of positive cases and the number of deaths are both valid and important indicators that can be used to compare states and provinces after adjusting for population size.

Covid-19 in Canada: What do the Numbers Mean

The Covid-19 pandemic is unique because the virus emerged in a globally connected world. This enabled the virus to spread quickly across the globe. At the same time, the ability to fight the virus has never been better. Chinese scientists quickly developed a test to identify infected individuals. This made it possible to isolate infected individuals and prevent the spread of the virus. It also provided an unprecedented amount of data that are widely shared on websites and in the news about the number of COVID-19 positive cases in different countries, the Canadian provinces. For example, the government of Canada keeps Canadian citizens informed on a website that tracks cases and COVID-19 fatalities.

As a psychologist, I wonder what Canadians are learning from these numbers. When I teach psychology, I spend a lot of time explaining what numbers actually mean. My concerns is that Canadians are bombarded with COVID-19 numbers with little information what these numbers actually mean. Every Canadian with some knowledge about Canada will notice that the numbers are bigger for provinces with a larger population, but few Canadians may know the exact population of provinces and are willing to compute the fatality rates of provinces that take population size into account. So, it remains unclear whether the situation is better or worse in Manitoba or Alberta. A simple solution to this problem would be to provide information about the number of cases for every 100,000 people.

Taking population size into account is a step in the right direction, but there is another problem that makes comparisons of provinces difficult. The number of cases that are detected also depends on the number of tests that are done. Alberta rightfully points out that they are world-leaders in the use of testing to fight the spread of COVID-19. While massive testing is positive, it has the negative effect that Alberta is also likely to find more cases than Ontario, where capacity to test the large population of Ontario is more limited. A solution to this problem is to compute the positive rate; that is the number of positive tests over the number of tests conducted. This also makes it unnecessary to take population size into account. Provinces with a larger population are likely to conduct more tests, but what matters is how many tests are being done, not how many people live in a province. A province with a large population could have a low number of cases, if there is very little testing.

Fortunately, Canadian labs report both the number of positive cases and the number of negative cases. This makes it possible to compute the positive rate as a meaningful statistic to examine the spread of the COVID-19 virus across Canadian provinces. The positive rate can also be used to compare Canadian provinces to the states in the United States. Overall, the United States have had a different response to COVID-19 than Canada. In the beginning, there was a lot less testing than in Canada. Therefore, a simple comparison of the number of positive cases can be very misleading. For example, on March 16, New York reported 49 positive cases for every 1 million inhabitants. Ontario reported 23 positive cases. Taken at face value this suggests that New York already had 2.5 times as many cases. However, NY had only carried out 282 tests per 1 million inhabitants, while Ontario had already carried out 1,044 tests per 1 million inhabitants. This means Ontario had a positive rate of 2%, when NY already had a positive rate of 21%. This means that things were 8 times worse in NY than Ontario, well before cases and fatalities in NY exploded. As of March 28, the numbers in Ontario haven’t changed much. There are now 99 positives for every 1 million inhabitants, but there are also 3,120 tests per inhabitant; a positive rate of 3%. In comparison, NY now has 2,689 positives for 8,016 tests, a positive rate of 34%. This means, things are now 10.5 times worse in NY than in Ontario. Thus, the positive rate reflects that the situation in Ontario is much better than in NY. To summarize, the positive rate controls for differences between provinces and states in the size of a population and the rate of testing that is done. This makes it possible to compare Canadian provinces to each other and with US states. The Table below ranks provinces and states according to the positive rate. The good news for Canadians is that Canada currently has low positive rates, suggesting that early testing, school closures, and social distancing measures have helped to keep COVID-19 in Canada under control. Of course, these numbers are about the present and the future is unknown. However, it is noteworthy that NY already had a positive rate of 17% on March 16, when the situation seemed under control, while COVID-19 was spreading undetected in the community. The good news is that positive rates in Canada are well below this number.

No photo description available.

It is worthwhile to discuss Quebec. On March 22, Quebec reported 221 cases with 10,005 tests (positive rate = 2%). On March 23 Quebec reported 628 cases. This was an increase of 187% from the day before with 221 cases. On March 24 numbers increased again dramatically from 628 cases to 1,013 cases. This suggested a big surge in positive cases. However, the next day the number of tests doubled from 13,215 to 27,973, while the number of positive cases increased by only 300 cases. This suggests that some accounting problems produced an artificial surge in cases. Once the increase in tests was taken into account the positive rate on March 25 was 5% and it has remained at this level until March 28. Thus, there is no rampant increase in COVID-19 cases in Quebec, and the spike was a statistical artifact.

The positive rate can also be used to compare Canada to other countries. The Table also includes fatality rates (death / population). The results show that Canada is not only doing better than the United States, but also better than many European countries and the UK, while Australia, New Zealand, and Asian countries are doing better.

It is too early to examine more carefully the reasons for national differences in COVID- 19 cases and fatalities, but the data are encouraging that Canada’s response to the global pandemic has been reasonable and effective. It is also important to avoid politicizing the issue. The liberal federal government and conservative provincial governments have worked together to respond to the crisis. We only have to look South of the border to see how politics can make a global pandemic worse.

Like all statistics, the positive rate is not a perfect measure of the actual spread of COVID-19. One important limitation of the numbers here is that they are based on all cases so far. This means, things might be changing for the better (Norway) or getting worse (UK), and these trends are not reflected in the numbers. Another problem is that positive rates depend on the criteria that are used for testing individuals. However, a low positive rate means that testing is widespread and not just limited to cases in hospitals. Thus, a low positive rate suggests that the virus is not spreading quickly. In contrast, a high positive rate could mean that testing is extremely limited or that there are a lot of cases in the community. However, limited testing is a problem in itself because there is no information about spread in the community. Thus, high numbers are a problem even if a 20% number does not mean that 20% of the population is COVID-19 positive. An alternative measure is the fatality rate. However, once fatality rates are high, the battle against COVID-19 is already lost. The advantage of the positive rate is that it provides information before individuals are dying in large numbers.

Hopefully, school and business closures that were implemented two weeks ago will show some improvement in the numbers in the coming week. Once more, case numbers are the wrong numbers to show this. Case numbers will go up, especially if testing capacities increase. Thus, what we really want to see is a decrease in the number of positive tests. Personally, my analyses of the data give me some hope that we can manage this crisis.

Comforting Case Counts Can be Misleading

Disclaimer: I tried my best to use the best available data, but I may have made some mistakes. The general points about tracking the spread of COVID-19 remain valid even if there are mistakes in the actual numbers, but the actual statistics may be misleading. Of course, even the best available data can also be distorted by errors. This is a best effort, but the results remain guestimates of the current situation.

COVID-19 is a pandemic like and unlike previous pandemics. It is spreading faster than any previous pandemics because the world is more interconnected than it every was. The virus spread from China to neighboring countries like South Korea, to Iran, and to Italy early. However, a virus doesn’t know or respect boarders. It is spreading around the world. COVID-19 is also unprecedented because modern technology made it possible to develop tests that can identify infected individuals within days. Testing makes it possible to track the spread of COVID-19 before people are dying in large numbers. Websites and newspapers report the number of confirmed COVID-19 cases as quickly as these numbers become available. Most of the time, the key statistic is the absolute number of COVID-19 cases.

For example, a popular COVID-19 tracking site posts a ranking of countries on the basis of case counts (

There are three problems with the use of case counts as a statistic that reflects the severity of the problem. First, countries with a larger population are bound to have a higher number of cases. Switzerland has a population of 8.5 million people, while China has a population of 1.4 billion. Using Switzerland’s number of cases with China’s population, implies 1,722,164 cases (1.7 million). Thus, the COVID-19 pandemic is a much bigger problem in Switzerland than in China. A solution to this problem is to provide information about cases relative to the size of the population. This is done too rarely. The German news channel ARD provided a map of cases for every 100,000 inhabitants of one of the states. The map shows that small city-states like Berlin and Hamburg have a relatively high rate of infections, compared to bigger states with larger populations in neighboring states. Thus, it is much more informative to show information relative to population size than to compare absolute case numbers.

No photo description available.

Another factor that influences case counts is time. High case counts are observed in places where the pandemic is out of control and widespread community transmission is taking place. At this point, statistics only document the pandemic, but they cannot inform policies or actions because it is too late. At this stage in the pandemic, case counts are also relatively unimportant because deaths are a more tragic and objective indicator of the severity of the crisis. For example, on March 16, the state of New York reported only 669 cases with a population of nearly 20 million people, which translates into 3 cases for every 100,000 inhabitants. Nine days later, this has increased exponentially to 91,270 cases, which translates into 456 cases for every 100,000. More important, the number of deaths increased from less than 10 to 200 deaths. Thus, within 10 days a region can go from a low case count to crisis.

A better statistic is to track the increase in the number of cases over time. A case count of 10 cases per 100,000 inhabitants may not sound alarming, but ignores that this number can explode if the number of cases is increasing rapidly. The financial times provides informative graphs that take growth rates into account.

Image result for financial times covid-19 graph

The graph shows that Italy and China had similar growth rates in the beginning, but then the growth rate in China slowed down more than the growth rate in Italy. It also shows, that Singapore was able to slow down the growth rate early on and that South Korea was able to slow the growth rate even after they had a substantial number of cases and the same growth rate as China and Italy (about 33% increase a day). It also shows that many countries are on the same trajectory as Italy and China.

Although the FT graphic is an improvement over simple count data, it has some problems and can be easily misinterpreted. First, it still relies on absolute numbers that are distorted by population size. The graph suggests that problems are worse in the United States than in the Netherlands. However, fatality rates tell a different story. The Netherlands have seen already 276 deaths with a population of 17 million, whereas the USA has seen 784 deaths with a population of 34 million, nearly 20 times as many as the Netherlands. If the Netherlands had the same population as the United States, the fatality rate would be 5,520, which is 7 times higher than the rate in the United States. Thus, the situation in the Netherlands is considerably worse than in the United States.

A third factor that distorts comparisons is that regions differ considerably in the amount of testing that is done. The more tests are being conducted, the more positive cases will be found. Thus, low case counts may provide a false sense of security and comfort, if they are caused by low rates of testing. Testing can also distort comparisons over time. Absolute numbers might jump simply because the number of tests has increased dramatically. For example, in NY the number of positive cases jumped from 669 cases on March 16 to 2,601 cases on March 17, an increase by 289%. Another way of saying it is that case numbers nearly quadrupled in one day. This jump does not suggest that cases quadrupled in one day, but rather a dramatic increase in testing. Another problem is that data may not be updated regularly. For example, Germany announced a surprising flattening of the curve last weekend, only to be surprised again when cases increased more than usual on Monday. The reason was that some states did not report updated numbers on Sunday.

A solution to this problem is to count the proportion of positive tests relative to the number of tests that were conducted. Countries with bigger populations will conduct more tests, but that will not affect the proportion of positive tests. Similarly, the number of tests can vary from day to day, but this would not influence the proportion of positive tests that are recorded. For most researchers, this seems to be obvious. We may report, that a study had 132 female and 68 male participants, but we really only care about the fact that the gender composition was 66% female and 34% male. However, when it comes to COVID-19 data, this statistic is rarely reported. To make matters worse, in some countries it is impossible to compute it because only positive tests are recorded and reported. Nobody knows how many tests are conducted in Germany (Dear Robert Koch Institute, improve the recording and reporting of COVID-19 cases!).

Most of the states in the USA and the Canadian provinces report the numbers of tests that are being conducted. This makes it possible to examine the spread of the COVID-19 pandemic in the USA and Canada. For the states in the USA, I used data from the covidtracking website. I was not able to compute statistics for all states because some states do not report negative results. Without reporting of negative results, it is unclear how widespread COVID-19 in these states is. The best estimates may be based on results from neighboring states that report credible testing rates.

With every new statistic, we have to examine whether it produces useful, valid information. One indication that the statistic provides some useful information is the ranking of New York as the state with the biggest problem. Not surprisingly, neighboring New Jersey also ranks highly. It is harder to validate the statistic at the lower end because it is not clear whether places with low case numbers have low cases of COVID-19 or testing rates are low. However, Alberta has the highest testing rate of all states and provinces, and only 1% of cases test positive. This suggest that COVID-19 is not widespread in Alberta.

The placement of Quebec is interesting because Quebec has been testing at a fairly high rate with relatively low case numbers, but it recorded big increases over the past two days. This spike made Quebec move to rank #7 and there is concern about widespread community transmission.

Thus, some results suggest that the positive rate is a useful indicator of the spread of COVID-19. However, there are some problems with this statistic as well. Testing is not random and in places where tests are limited, testing will focus on testing of cases in hospitals with a high positive rate. In places with more capacity, the general public has more opportunity to get tested and the positive rate will be lower. Interestingly, I found no correlation between the amount of testing (tests conducted as of March 24 / population) and the positive rates, Pearson r = .03, and only a small negative rank correlation, rank r = -.27. Thus, a considerable amount of the variation in the positive rates may reflect the actual spread of COVID-19 in different states. This would suggest that COVID-19 has spread already in the East of the United States, while prevalence in the Great Plains is still low. This would suggest that the Southern states may be more affected than other statistics suggest because testing rates in these states are low. There is still a chance that some Western states can prevent a pandemic by implementing the right measures before it is too late. In Canada, Ontario and Alberta seem to be doing well so far, and have implemented measures to prevent community transmission, but the spike in Quebec shows that case numbers and positive rates can increase quickly.

In sum, statistics can provide valuable information, but they can also be misleading if numbers are not presented in a way that is informative. Absolute case counts are relatively easy to compute and to report, but absolutely uninformative. To provide the general public with valuable information, everybody needs to do better. Labs and governments need to record and report the number of negative tests, and media need to report numbers that actually reflect the risk to a particular population. Widespread community testing is needed to detect new outbreaks of the pandemic. This has worked in South Korea, the only country that has been able to stop the pandemic. Japan and Singapore were able to prevent it from spreading widely in the first place. In some places, it is not too late to follow these examples.