Every scientific discipline is unique and requires special training. However, empirical sciences also have many things in common. One major problem in any empirical science is measurement. To make objective, scientific claims about the world, scientists need reliable and valid measures of the world. These days, we are used to measuring time, distance, and temperature with high precision for the range of values that we encounter in our everyday world. However, even these basic attributes are difficult to measure when we consider the speed of light or the length of a chromosome. Similarly, social scientists have concepts that are easy to understand. Did you have sex yesterday? Is a person dead or alive? However, when we want to measure how often US Americans or Canadians have sex or how many people died in the past month, measurement is a lot more difficult.
Death is a big deal and considerable effort and resources are invested in recording the time and the probable cause of death. This information is collected by statisticians and made publicly available. When we see these numbers, we forget about measurement error. If a newspaper article or a website shows us that there are 63,801 deaths, we think that there are 63,801 deaths. However, it is nearly certain that this number if false and that the true number is greater or smaller than 63,801 deaths. The real number is unlikely to be 10 times larger or 10 times smaller, but we should not assume that deaths can be counted like coins in our piggy bank.
While counting deaths is difficult, it is even more difficult to specify the cause of death. This makes it even more difficult to say how many US Americans have died so far of Covid-19. The task of quantifying Covid-19 deaths is more like estimating or guessing than counting. At best, we can try to reduce the uncertainty in our estimates.
One solution to the problem of estimation is to use different estimation strategies. If these different strategies produce convergent results, it is likely that the estimates reflect the truth. However, if two estimation strategies diverge, at least one of them has to be false – more false than the other. Ideally, scientists might work together and try to come up with a consensus based on different estimation methods. However, often this does not happen because scientists get rewarded for presenting their method as superior to other methods. In psychology and in some other sciences, researchers that try to make sense of conflicting claims by original scientists are called meta-analysts or meta-scientists. In this blog post, I take a meta-analytic perspective on different measures of Covid-19 deaths.
I focus on US states because the US has provided
1. Confirmed Covid-19 Deaths
The most widely used estimate of Covid-19 deaths is based on testing of deceased people. Assuming a fairly low amount of measurement error in Covid-19 tests, this method seems to be close to perfect. However, there are problems. First, not every coroner has the resources to test for Covid-19. Second, some patients had pre-existing conditions and it is not clear whether Covid-19 was the actual cause of death. Given these problems, statistics are often reported with a disclaimer that the numbers may be inaccurate.
Aside from uncertainty, it is also likely that the numbers underestimate the actual number of Covid-19 deaths. The reason is that it is more likely that a death caused by Covid-19 is attributed to some other disease than to attribute a death caused by Covid-19 to another disease. This is certainly true if the cause of death is determined by a test for Covid-19. However, sometimes Covid-19 is considered a probable cause without a diagnostic test. The problem is that it is unclear how big the discrepancy between confirmed and actual Covid-19 deaths is. If there are 100 confirmed Covid-19 deaths, is the real number 101, 120, 200, or 1000?
2. Excess All-Cause Mortality
An alternative way to estimate Covid-19 deaths relies on the count of all deaths. The number of people that die every year in a specific week or month of the year is fairly stable from year to year. The main difference between previous years and the year 2020 is that a new virus started killing people. Assuming everything else remains the same (a big assumption), people who died of Covid-19 would be additional deaths, except for a small number of people who got infected and would have died anyways. This makes it possible to estimate the number of Covid-19 deaths without a diagnostic test. if normally 500 people die in the week from April 1 to April 7, and 550 people died in the same week in 2020, this method suggests that 50 people died of Covid-19. The problem with this method is that it makes several assumptions and the estimate will be inaccurate if these assumptions are wrong. For example, the number of deaths varies as a function of the severity of the flu season and if the flu was very mild, there could be only 450 deaths without Covid-19 in 2020 and the true number of Covid-19 deaths is 100. Or the flu was more severe and fully accounted for all 550 deaths and there were zero Covid-19 deaths.
Another problem is that deaths are not reported in real time to the Center for Disease Control and Prevention. The numbers for the critical weeks of March and April are still being revised and updated. This means that current numbers are likely to underestimate deaths in 2020 and therewith Covid-19 deaths. Statisticians have a way to deal with this problem and can use information about the typical delay in reporting to correct for the delay in reporting. The problem is that this means more assumptions and more room for error.
Despite these problems, several media-reports have used estimates based on excess-all-cause mortality to make strong claims about the “true” number of Covid-19 deaths.
One problem with these claims is that official Covid-19 death numbers have been revised to incorporate presumed cases without a Covid-19 test. This has led to some dramatic increases in the numbers for severely affected states like New York (National Post).
The question is whether there are still tens of thousands of Covid-19 deaths that are missing even after this revision of the numbers.
Data about all-cause mortality and excess-all -cause mortality are provided by the CDC (https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm).
The dark blue bars show the recorded deaths. These numbers do not show a notable increase in deaths in March and April. The reasons is that there is a lag in reporting. This means that the final all-cause-mortality numbers for March and April 2020 are still unknown. We know that the numbers are going to go up, but we do not know how much. The light blue bars are predictions about the final numbers based on information about the typical reporting lags. The red starts show weeks with predicted excess deaths. These predictions show excess deaths for the time period from March 29 to April 18. For the most recent week from April 19 to April 25, the reported numbers are so low that even the predicted values are well below the typical level. It is unlikely that this is a real finding given the recording of thousands of Covid-19 deaths in this time period (https://www.worldometers.info/coronavirus/country/us/).
The CDC datafile records excess deaths for the critical weeks. The sum of these values is 19,086 excess deaths that may be attributed to Covid-19.
In contrast, the WordOMeter statistics shows 39,331 Covid-19 deaths in the United States on April 18. This is more than double the number that the excess-deaths method suggests. Even the more conservative estimate by the Covid-Tracking team lists 34,557 deaths, which is still considerably more than the estimate based on excess-all-cause mortality. Thus, there is no evidence that these estimates severely underestimate the number of Covid-19 deaths in the United States.
Even though death can be determined objectively, the actual number of Covid-19 deaths is unknown. Scientists can only provide reasonable estimates and different estimation methods produce different estimates. Counting only confirmed cases that were tested for Covid-19 underestimates the actual number because some people died of Covid-19 and were not tested. Including presumed cases addresses this problem, but it creates uncertainty. There may be a bias to attributed deaths to Covid-19 or a bias in the opposite direction. Examining excessive all-cause mortality avoids the problem of diagnoses but creates many new problems. One problem is that the final numbers of deaths in April is still unknown because it takes time to collect this information. Given all of these problems, there is a wide range of estimates. It could be as few as 19,000 and it could be more than 40,000. However, there is no scientific evidence to claim that tenth-of-thousands more than the official 40,000 deaths were caused by Covid-19. Given the information provided here, it would be very surprising if new data revise the number of deaths by April 18 to over 60,000 deaths.
Personally, I don’t believe that the actual number is very important. What is more important is that the number of deaths would have been dramatically higher without actions to stop the spread of the virus. Whether we compare 20,000, 40,000, or 60,000 deaths to 500,000 deaths seems irrelevant. No country, including Sweden, decided to let the virus just spread because the consequences were horrible. The actual number is also important for the timing and planning of the future. How and when to ease restrictions depends on the the current number of new cases rather than the cumulative number of deaths. For example, Italy has a large number of deaths, but was able to control the spread of the virus and can now start thinking about easing the lock-down.
Unfortunately, the number of Covid-19 deaths seems to be most important for politics. A higher death toll means that political leaders are responsible for a bigger disaster. At the beginning of the Covdi-19 crisis, Trump supporters pointed out that 12,000 US Americans died of H1N1 under the Obama administration.
Now that the death toll from Covid-19 is much higher than the death toll from Covid-19, the liberal media may be inflating the number of Covid-19 deaths to make Trump look as bad as possible. The fact that the true number is unknown and can only be estimated provides ample room for political bias in the presentation of the results.
Scientists should not be part of these partisan number-games. The best way to avoid this is to provide information in terms of ranges that express uncertainty about the true number. Ranges also make it easier for scientists to agree with each other and to speak with a common voice that reflects the objective data. Based on the information that I presented here, I would say that between 20,000 and 60,000 US Americans died of Covid-19 by April 18. As more data become available, uncertainty in this estimate will shrink.
The more important question remains how much US Americans and the rest of the world are able to control the spread of the virus in the future to prevent as many deaths as possible in the months to come and to find ways to live with the virus until a vaccine is available.