No Justice, No Peace: A History of Slavery Predicts Violence Today

Some human behaviors attract more attention than others. Homicides are rare, but very salient human behaviors. Governments investigate and keep records of homicides and social scientists have developed theories of homicides.

In the 1960s, social scientists suggested that inequality can lead to more violence. One simple reason is that the rewards for poor people to commit violent crimes increase with greater inequality in wealth (Becker, 1968).

Cross-national studies confirm that societies with more income inequality have higher homicide rates (Avison & Loring, 1986; Blau & Blau, 1982; Chamlin & Cochran, 2006; Corcoran & Stark, 2020; Fajnzylber, Lederman & Loayza, 2002; Krahn et al., 1986; Pratt & Godsey, 2003; Pridemore, 2008).

A recent article in Psychological Science replicated this finding (Clark, Winegard, Beardslee, Baumeister, & Shariff, 2020). However, the main focus of the article was on personality attributes as a predictor of violence. The authors main claim was that religious people are less likely to commit crimes and that among non-religious individuals those with lower intelligence would be more likely to commit homicides.

A fundamental problem with this article is that the authors relied on an article by a known White-supremacist, Richard Lynn, to measure national differences in intelligence (Lynn & Meisenberg, 2010). This article with the title “National IQs calculated and validated for 108 nations” claims that the values used by Clark et al. (2020) do reflect actual differences in intelligence. The problem is that the article contains no evidence to support this claim. In fact, the authors reveal their racist ideology when they claim that a correlation between their scores and skin color of r = -.9 validates their measure as a measure of intelligence. This is not how scientific validation works. This is how racists abuse science to justify their racist ideology.

The article also makes a common mistake to impose a preferred causal interpretation on a correlation. Lynn and Meisenberg (2010) find that their scores correlate nearly perfectly with educational attainment. They interpret this as evidence that intelligence causes educational attainment and totally ignore the plausible alternative explanation that education influences performance on logical problems. This has important implications for Clark et al.’s (2020) article because the authors buy into Lynn and Meisenberg’s racist interpretation of the correlation between performance on logic problems and educational attainment. An alternative interpretation of their finding would be that religion interacts with education. In nations with low levels of formal education, religion provides a moral code that prevents homicides. In countries with more education, other forms of ethics can take the place of religion. High levels of homicides would be observed in countries where neither religion nor education teach a moral code.

Aside from this fundamental flaw in Clark et al.’s (2020) article, closer inspection of their data shows that they overlooked confounding factors and that their critical interaction is no longer significant when these factors are included in the regression model. In fact, financial and racial inequality are much better predictors of national differences in violence than religion and the questionable measure of intelligence. Below I present the statistical results that support this conclusion that invalidate Clark et al’s (2020) racist conclusions.

Statistical Analysis

Distribution Problems

Not long ago, religion was a part of life in most countries. Only over the past century, some countries became more secular. Even today, most countries are very religious. Figure 1 shows the distribution of religiosity based on the Relig_ARDA variable in Clark et al.’s dataset. This skewed distribution can create problems when a variable is used in a regression model, especially if the variable is multiplied with another variable to test interaction effects.

It is common practice to transform variables to create a more desirable distribution for the purpose of statistical analysis. To do so, I reversed the item to measure atheism and then log-transformed the variable. To include countries that scored 100% on religiosity, I added 0.001 to all atheism scores before I carried out the log transformation. The distribution of log-atheism is less skewed.

The distribution of homicides (rates per 100,000 inhabitants) is also skewed.

Because homicide rates are right-skewed, a direct log-transformation can be applied to get a more desirable distribution. To include nations with a value of 1, I added a value of 1 before the log-transformation. The resulting distribution for log-homicides is more desirable.

The controversial IQ variable did not require a transformation.

Bivariate Relationships

The next figure shows a plot of homicides as a function of the questionable intelligence (QIM). There is a visible negative correlation. However, the plot also highlights countries in Latin America and the United States. These countries have in common that they were established by decimating the indigenous population and bringing slaves from Africa to work for the European colonialists. It is notable that nations with a history of slavery have higher homicide rates than other nations. Thus, aside from economic inequality, racial inequality may be another factor that contributes to violence even though slavery ended over 100 years ago, while racial inequality persists until today. Former slave countries also tend to score lower on the QIM measure. Thus, slavery may partially account for the correlation between QIM and homicide rates.

The next plot shows homicide rates as a function of atheism. A value of 0 would mean the country it totally atheistic, while more negative values show increasing levels of religion. There is no strong relationship between religion and homicide rates. This replicates the results in the original article by Clark et al. Remember that their key finding was a interaction between QIM and religion. However, the plot also shows a clear distinction between less religious countries. Former slave countries are low in religion and have high homicide rates, while other countries (mainly in Europe) are low in religion and have low homicide rates.

Regression Models

To examine the unique contribution of different variables to the prediction of homicide rates, I conducted several regression analyses. I started with the QIM x religion interaction to see whether the interaction is robust to transformations of the predictor variables. The results clearly show the interaction and main effects for QIM and religion (t-values > 2 are significant at p < .05).

Next I added slavery as a predictor variable.

The interaction is no longer significant. This shows that the interaction emerged because former slave countries tend to score low on QIM and religion.

I then added the GINI coefficient, the most widely used measure of income inequality, to the model. Income inequality was an additional predictor. The QIM x religion interaction remained non-significant.

I then added GDP to the model. Countries wealth is strongly related to many positive indicators. Given the skewed distribution, I used log-GDP as a predictor, which is also the most common way economists use GDP.

GDP is another significant predictor, while the QIM x religion interaction remains non-significant. Meanwhile, the strong relationship between QIM and homicide rates has decreased from b = -.71 without controls to b = -.25 with controls. However, it is still significant. As noted earlier, QIM may reflect education and Clark et al. (2020) included a measure of educational attainment in their dataset. It correlates r = .68 with QIM. I therefore substituted QIM with education.

However, education did not predict homicide rates. Thus, QIM scores capture something about nations that the education measure does not capture.

We can compare the social justice variables (slavery, GDP, GINI) with the personal-attribute (atheist, QIM) variables. A model with the social justice variables explains 62% of the variation in homicide rates across nations.

The personal-attribute model explains only 40% of the variance.

As these predictors overlap, the personal-attributes add only 3% additional variance to the variance that is explained by slavery, income inequality, and wealth.

Replicating Slavery’s Effect in the United States

The United States provide another opportunity to test the hypothesis that a legacy of slavery and racial inequality is associated with higher levels of homicides. I downloaded statistics about homicides (homicide stats). In addition, I used a measure of urbanization to predict homicides (urbanization). I also added a measure of income inequality (GINI). I classified states that fought for the confederacy as slave states (civil war facts). Results were similar for different years in which homicide rates were available from 1996 to 2018. So, I used the latest data.

In a model with all predictor variables, slavery was the only significant predictor. Income inequality showed a trend, and urbanization was not a unique predictor. When urbanization was removed from the model, the effect of income inequality was a bit stronger.

Overall, these results are consistent with the cross-national data and suggest that a history of slavery and persistent racial inequality create social conditions that lead to more violence and homicides. These results are consistent with recent concerns that systemic racism contributes to killing of civilians by civilians and police officers who historically had the role to enforce racial inequality.

Meta-Science Reflections

Clark et al.’s (2020) article is flawed in numerous ways. Ideally, the authors would have the decency to retract it. The main flaw is the use of a measure with questionable validity and to never question the validity of the measure. This flaw is not unique to this article. It is a fundamental flaw that has also led to a large literature on implicit bias based on an invalid measure. The uncritical use of measures has to stop. A science without valid measures is not a science and statistical results that are obtained with invalid measures are not scientific results.

A second flaw of the article is that psychologists are trained to conduct randomized laboratory experiments. Random assignment makes it easy to interpret statistically significant results. Unless something went really wrong or sampling error produced a false result, a statistically significant result means that the experimental manipulation influenced the dependent variable. Causality is built into the design. However, things are very different when we look at naturally occurring covariation because everything is correlated with everything. Observed relationships may not be causal and they can be produced by variables that were not measured. The only way to deal with this uncertainty is to carefully test competing theories. It is also necessary to be careful in the interpretation of results. Clark et al. (2020) failed to do so and make overly strong statements based on their correlational findings.

Many scholars have argued that religion reduces violent behavior within human social groups. Here, we tested whether intelligence moderates this relationship. We hypothesized that religion would have greater utility for regulating violent behavior among societies with relatively lower average IQs than among societies with relatively more cognitively gifted citizens. Two studies supported this hypothesis

This statement would be fine if they had conducted an experiment, but of course, it is impossible to conduct an experiment to examine this question. This also means it is no longer possible to use evidence as support for a hypothesis. Correlational evidence simply cannot verify a hypothesis. It can only falsify wrong theories. Clark et al. (2020) failed to acknowledge competing theories of homicides and to test their theory against competing theories.

The last meta-scientific observation is that all conclusions in science rests on a combination of data and assumptions. When the same data lead to different conclusions, like they did here, we get insights into researchers’ assumptions. Clark et al.’s (2020) assumptions were (a) there are notable difference in intelligence between nations, (b) these differences are measured with high validity by Lynn and Weisenberg’s (2010) questionable IQ scores, and homicides are caused by internal dispositions like being an atheist with low intelligence. Given Lynn and Weisenberg’s finding that their questionable measure correlates highly with skin tone, they also implicitly share the racist assumption that dark skinned people are more violent because they are less intelligent. The present blog post shows that an entirely different story fits the data. Homicides are caused by injustice such as unfair distributions of wealth and discrimination and prejudice based on skin color. I am not saying that my interpretation of the data is correct because I am aware that alternative explanations are possible. However, I rather have a liberal/egalitarian bias than a racist bias.

When Right-Wing News Write About Race and Police

The right-wing magazine Quilette just published an article by John McWhorter, who is an associate professor in the linguistics department at Columbia University with the title “Racist Police Violence Reconsidered” Given his training in writing, he knows how to draw his readers in with an emotional story about a White victim of lethal use of force to make the point that police sometimes kill White people, too. This is followed by the statement that “plenty of evidence indicates, however, that racism is less important to understanding police behavior than is commonly supposed”.

In a scientific article, this would be the time to mention the scientific evidence that is supposed to support this claim. But McWhorter is no scientist. He is a writer and cannot be held to the scientific standards of criminologists and other social scientists. With one sentence, a fact has been created. The idea that police is racially biased and kills disproportionately African Americans is wrong. But why does everybody believe it to be true? McWhorter has a ready explanation for this. The biased liberal, reverse-racist media cover police brutality only when the officer is White and the victim is Black. “Had Tony Timpa been black, we would all likely know his name by now. Had George Floyd been white, his name would likely be a footnote, briefly reported in Minneapolis local news and quickly forgotten.”

Well trained in propaganda, McWhorter then presented more cases of White victims in equal numbers to Black people. For every Black victim, there is a White victim in his narrative that is based on his personal selection of cases. After creating the illusion that there is a White victim for every Black victim, he is ready to repeat his claim that we have been manipulated by the liberal media. “So, the perception that the police regularly kill black people under circumstances in which white people would be merely disciplined is in fact a misperception.”

But is it a misperception? That would require actual scientific information about the influence of race on lethal use of force by police officers in the US. This evidence is reviewed after the conclusion has already been offered that the common assumption of racial bias against African Americans is a misperception.

McWhorter next explains correctly that African Americans are a minority in the United States. If police were unbiased in the lethal use of force, we would expect a lot more victims to be White than Black. He then correctly states that ” it remains true that black people are killed at a rate disproportionate to their percentage of the population.”

So, it is NOT a misperception that police kill disproportionately more African Americans. There is racial disparity in the use of force. This invalidates the claim that we all believe that racial bias exists because we have been manipulated by the liberal media.

McWhorter then makes a distinction between racial disparity and racial bias. “However, these figures are not necessarily evidence of police racism. According to the Washington Post‘s database, over 95 percent of the people fatally shot by police officers in 2019 were male, and no serious-minded person argues that this is evidence of systemic misandry. So what, then, accounts for the disproportionate representation of black men among those killed by cops?”

This is a controversial topic that has been examined in numerous articles by social scientists in scientific journals. But McWhorter does not cite these studies, presumably because he lacks the training to understand the sometimes complicated statistical methods that have been used in these articles.

Like a novelist, he creates facts with the stroke of a pen. “The socioeconomic gap between blacks and whites is doubtless an important contributing factor.” and “This disparity in poverty rates means black people are also disproportionately presented in violent rates of violent crime” Here we go again. Police is not racially biased. The real reason why they kill more Black people is because Black people are more criminal. Blame the victim. To give this biased narrative some credibility, McWhorter cites only one scientific article that supports his story. “Contrary to his expectations, Harvard economist Roland Fryer has found that while white men are actually more likely to be killed by cops,” as if an economist is more credible than criminologists or other scientists because he is at Harvard. That is not how science works. You also have to cite evidence that contradicts your claims (Young unarmed nonsuicidal male victims of fatal use of force are 13 times more likely to be Black than White).

In the end McWhorter softens his stance a bit. “This disparity cannot explain every fatal police shooting,” “This is not to say that race has nothing to do with policing issues in America.”   But these sentences are mere rhetorical devices to signal that the author is balanced and reasonable, when the truth is that the author is ignorant about the science on racial bias in policing, including use of force.

I have no reason to believe that McWhorter wrote this terrible article because he is motivated by racism, but it is not clear to me why McWhorter wrote such a biased article that is so hurtful to many African Americans who are traumatized by the callus killing of innocent African Americans. All I can say is that McWhorter lacks the competence to write about this topic because he is either too lazy or not trained to follow the actual science on this topic. In Germany we say “Schuster blieb bei Deinen Leisten” (“Every man to his trade.”) Please follow this good advice, Dr. McWhorter.

Can We Measure Racism? Yes We Can

A famous quote states that something that cannot be measured does not exist. This is of course not true, but if we want to move from anecdotal evidence to scientific evidence and theories of racism, we need valid measures of racism.

Social psychology has a long history of developing measures of racism and today there are dozens of different measures of racism. Unfortunately, psychologists are better at developing new measures than at validating existing ones. This makes research on racism vulnerable to criticism that racism measures are invalid or biased (Feldman & Huddy, 2005; Zigerell, 2015).

Take the item “Irish, Italians, Jewish and many other minorities overcame prejudice and worked their way up. Blacks should do the same without special favors” as an example. The item is one of several items that is used to measure a form of racism called symbolic racism.

Feldman and Huddy (2005) argue that items like this one have two components. A purely racist component where White people do not see Black people as equal citizens and a purely ideological component that opposes policies that favor any particular group, even if this group is disadvantaged by a history of racism. Whether the latter component is itself racist or not is not the topic of this blog post. My focus is rather on the separation of the two components. How can we separate agreement to the item that is based on racism from endorsement of the item for purely political reasons?

One solution to this problem is to see how endorsement of items with political content is related to items that have no political content. Using a statistical method called factor analysis it is then possible to separate the racial and the ideological component and to examine how much political orientation is related to the two components.

Indirect Measures of Racism

The problem with direct measures of racism is that open admission of racial bias has become less acceptable over time. This makes it harder to measure racism with items like “What about having a close relative marry a Black person? Would you be very in favor of it happening, somewhat in favor, neither in favor nor opposed to it happening, somewhat opposed, or very opposed to it happening?” Respondents may be unwilling to report their true feelings about this issue, especially when the interviewer is African American (Schaeffer, 1980; Schimmack, 2020).

Modern psychological testing with computers has made it possible to avoid these problems by measuring racism with computerized tasks that rely on participants’ behavior in response to racial stimuli. There are several tasks such as the evaluative priming task, the affective misattribution task and the popular Implicit Association Task (IAT). Unfortunately, the IAT has become known as a measure of implicit bias or implicit racism that is distinct from racism that can be measured with self-report measures. I have argued that there is no evidence that people can hide their feelings towards African Americans from themselves. It is more useful to see these tasks as alternative measures of racism that are less susceptible to fake responding. This does not mean that these tasks are perfect measures of racism because the use of computerized tasks creates new problems. Thus, there is no perfect measure of racism, but all valid measures of racism should be positively correlated with each other and the shared variance among these measures is likely to reflect variation in racism. The interesting question is whether political orientation is related to the shared variance among a variety of direct and indirect racism measures.


The data come from a study by Bar-Anan and Vianello (2018). The data have also been used in my critique of the IAT as a measure of implicit bias (Schimmack, 2020). The study assessed political orientation and racism with multiple measures. Political orientation was also measured with the standard and the brief IAT. In addition, participants reported whether they voted Republican or Democrat. Only White participants who reported voting were included in the analysis.

Racism was measured with the standard IAT, the brief IAT, the evaluative priming task, the Affective Missattribution Task, a direct rating of preference for White or Black people, and the Modern Racism Scale. Like other measures that have been criticized, the Modern Racism scale mixes racism and political ideology.

The interesting question is how much political orientation is related to the unique variance in the modern racism scale that is not shared with other racism measures and how much it is related to the shard variance with other racism measures.


The results show two clearly identified factors. The strong relationship between voting and the Republican factor (rep) shows that political orientation can be measured well with a direct question. In contrast, racism is more difficult to measure. The best measure in this study would be the direct preference rating (r_att) that is related .6 with the pro-White factor. But even this relationship implies that only about a third of the variance in the actual ratings reflects racism. The rest of the variance is measurement error. So, there is no gold-standard or perfect way to measure racism. There are only multiple imperfect ways. The results also show that the controversial Modern Racism Scale (mrs) reflects both racism (.444) and political orientation (.329). This shows that Republicans score high on Modern Racism in part because they reject social policies that favor minority groups independent of their attitudes towards Black Americans. However, the figure also shows that Republicans are more racist, as reflected in the relationship between the Republican and Racism factors (.416).

It is important that these results cannot be used to identify individuals or to claim that a particular Republican is a racist. The results do show however, that people who vote Republican are more likely to score higher on a broad range of racism measures whether they mention a political agenda or not.


Critics of racism research by social psychologists have argued that the research is biased because many social psychologists are liberal. The accusation is that social psychologists have created biased measures that conflate liberal policies with bigotry. Here I show that these critics have a valid point and that high scores on scales like the symbolic racism scale and the modern racism scale are influenced by attitudes towards egalitarian policies. However, I also showed that Republicans are more racist when racism is measured with a broad range of measures that have only racism as a common element.

Conservatives may be displeased by this finding, but recent events in 2020 have made it rather obvious that some Americans are openly racist and that these Americans are also openly supporting Trump. The real question for Republicans who oppose racism is how they can get rid of racism in their party.

Racism decreased in the US, but not for Conservative Republicans

Are Republicans more racist? I am not the only one who is asking themselves this question. In 2014, Nate Silver and Allison McCann examined this question using data from the General Social Survey (538.2014). The political landscape has changed since Trump was elected in 2016 and it is interesting to reexamine this question with the latest data.

Like Silver and McCann, I limit the analysis to White respondents because Black voters are much more likely to support the Democratic party. Thus, a trivial reason why Republicans are more likely to be racist is that they are more likely to be White. The more interesting question is whether there is an additional difference between White Republicans and White Democrats.

Political Orientation

The General Social Survey (GSS) asks two questions about political orientation. One question asks about party identification. Responses range from strong democrat to strong republican. To identify conservative Republicans, I classified respondents who responded with “independent – close to Republican”, “not very strong Republican”, and “strong republican” as Republicans.

Another question asks respondents to locate their political orientation on a liberal versus conservative dimension on a 7-point scale ranging from extremely liberal to extremely conservative. To identify conservatives, I combined the three response categories slightly conservative, conservative, and extremely conservative.

Figure 1 shows that responses to these two questions are not perfectly aligned. There are conservative Democrats and liberal Republicans. There are several ways to deal with this inconsistency. I chose to contrast conservative Republicans with the other three groups (liberal Republicans, conservative Democrats, & liberal Democrats).

The GSS also asked about voting behavior in past elections. In the 2018 survey, most conservative Republicans voted for Trump.


Racism is a loaded word with no clear meaning. While the majority of Democrats considers Donald Trump to be a racist, most Republicans disagree. Yet, we need a term to describe people who do not see or treat Black and White people as equal. Given the history of the United States, it is difficult for a White American to live up to the ideal of racial equality. However, there is variation in the degree to which White Americans treat Black and White people as equal or not. Thus, the question is not whether somebody is a racist or not, but the degree to which White’s treat people equally or unequally.

The GSS includes several questions that measure racism. Some of these questions were discontinued and others were added in later years. I found five questions that were asked in later years that measure racism.

1. Feeling Thermometer.
Feeling thermometers are a widely used measure of racism. One question is about feelings towards Black people. Another question is about feelings towards White people. The difference between these two ratings indicates how much people have more positive feelings towards White people than towards Black people. Feelings are assessed on a 9-point scale ranging from 1 = not at all close to 9 = very close. The difference score can range from -8 to + 8.

The Figure shows that most respondents indicate that they make no difference in their feelings based on race, while some participants report feeling closer to White people.

2. Interracial Marriages
The United States have a long history of racism that is reflected in laws against interracial marriages (Loving vs. Virginia). Even today, some Americans do not approve of interracial marriages. The GSS asks respondents “What about having a close relative marry a Black person? Would you be very in favor of it happening, somewhat in favor, neither in favor nor opposed to it happening, somewhat opposed, or very opposed to it happening?” The Figure shows the distribution of the responses before and after 2016. The results show a decrease in opposition to mixed marriages, but even today many US American are against somebody in their family marrying an African American.

3. Like other Immigrants
The third item asks respondents how much they agree with an item that ignores the special history of slavery, discrimination, and police brutality that African Americans are subjected to. “Irish, Italians, Jewish and many other minorities overcame prejudice and worked their way up. Blacks should do the same without special favors.” Responses are made on a 5-point scale ranging from 1=strongly disagree to 5=strongly agree. The Figure shows that even today a majority of White Americans agree with this statement.

4. Opposing Affirmative Action
One question in the GSS asks about respondents’ attitudes towards affirmative actions.

“Some people say that because of past discrimination, blacks should be given preference in hiring and promotion. Others say that such preference in hiring and promotion of blacks is wrong because it discriminates against whites. What about your opinion — are you for or against preferential hiring and promotion of blacks?” Responses are favor strongly, favor, oppose, and oppose strongly.

It is important to point out that opposition to affirmative actions may be based on other consideration than racism. Nevertheless, the item can be used to measure racism because racists are much more likely to agree with the item than respondents who strive towards racial equality.

The advantage of this item for measuring racism is that it provides opportunity for plausible deniability. It is possible to endorse the item without revealing that the response is motivated by racism. This may explain the high proportion of Americans who express strong opposition to affirmative actions.

5. Spending on Improving the conditions of Blacks
The fifth question is part of a series of questions about public spending. One question asks whether respondents think the government is spending too little or too much on “improving the conditions of Blacks.” As with the previous item, respondents may answer “too much” for several reasons. For example, some may oppose taxation and government spending on anything. Thus, responding “too much” does not directly imply that the response is racially motivated, but individuals who are more racists are more likely to answer “too much.”

A Measurement Model of Racism

Responses to each of the five items can be influenced by many factors (causes) other than racism. Thus, we cannot simply look at individual items to determine how racist somebody is. Psychologists have developed measurement models to examine whether responses to items are driven by a common cause. Finding evidence for a common cause does not conclusively show that the cause is racism, but it makes it more likely.

The Figure shows the measurement model of racism. In this model racism is an attribute (trait) that varies across individuals. Individuals with high racism are more likely to respond to the five items in ways that express a dislike of African Americans. The numbers next to the arrows show how much a specific item reflects racism. Interestingly, the feeling thermometer is a relatively weak measure of racism. One concern could be that the racism factor is dominated by issues related to money, but the opposition to a black family member shows that that racism is not limited to equity issues.

Another concern is that responses to a survey are different from actual traits. Some respondents may downplay their racism because they are aware that racism is offensive. The GSS makes it possible to test this hypothesis because respondents were sometimes interviewed by Black interviewers. The next figure shows that White respondents respond to the question in a less racist manner when they are interviewed by a black interviewer. This effect is general across all five items, but especially strong for the feeling thermometer and the spending question.

In short, measuring racism is difficult and no single question provides a clear and unambiguous measure of racism. However, there is clear evidence that people respond to the five questions in a consistent manner that is more or less racist, suggesting that the factor underlying their responses can be used as a measure of racism.

Political Orientation and Racism

The next analysis examined differences between conservative republicans and other Americans. The results show that conservative republicans are more racist (.769 standard deviations). In addition, they oppose affirmative actions and spending on Black issues. This shows that politics also influence responses to the items independent of racism, but even when we allow for such extra relationships, there is a strong relationship between identifying as a conservative republican and racism. In psychology a difference of .8 standard deviations is considered a large effect. In intelligence testing, a standard deviation is 15, and a difference of .769 would be a difference by 12 IQ points.

Political Orientation and Racism Over Time

The first figures showed that racism was lower in 2016 and 2018 than in the years before. To examine whether changes in racism differ as a function of political orientation, I added time to the model and examined different time trends for conservative republicans and others. I used presidents to measure time using the 1990s as the reference period and counted 2016 as a Trump year, although he was only elected at the end of the year to have two years of observations (2016, 2018). The results of the model showed a significant interaction between political orientation and time. While racism levels remained nearly unchanged for conservative republicans, they decreased for other Americans.

The figure shows a dramatic difference in racism during the Trump years. The difference of 1.4 standard deviations corresponds to a difference of 21 points in IQ testing. Interestingly, the difference is not due to increasing racism among conservative republicans, but rather decreasing racism among Americans.

It is tempting to suggest that this large difference in racism explains the much stronger response to George Floyd’s murder by a police officer, while similar events in previous years did not spark the widespread protests the US have seen in the last weeks. The data suggest that America as a whole is moving towards more racial equality and that many White Americans empathize with African Americans. However, conservative republicans are an exception. Many of them are opposed to racial equality.

Complicit Racism

Psychological research over the past two decades has focused a lot on implicit racism. The idea is that individuals can hide their racism from themselves. The notion of unconscious bias or racism is controversial and there is no solid scientific evidence for implicit racism. Most people know whether they would welcome a Black family member or not. A more common form of racism is a conscious choice to ignore the racism of others, whether it is a colleague, a friend, a family member or a party member. You know they are racist, but you don’t say anything about it. Most people, including conservative republicans, are not surprised by the finding that conservative republicans are more likely to be racist. We all know examples of racist conservative republicans and the Republican party carefully avoids alienating racist voters. The current Republican party suffers from complicit racism. Some party strategists even admit that complicit racism was used to win majorities. However, most Americans are increasingly appalled by the complicit racism that is inconsistent with modern American values. Just like the civil war and the civil rights movement, America today is still struggling with the “birth defect of slavery” (Condoleezza Rice).

Political Science and Political Bias

It is not hard to figure out my personal political views and I have no reason to hide them. I am not a fan of conservative republicans and their current leader. This may have influenced the way I analyzed and presented the results. Fortunately, the data are openly available and conservatives are welcome to analyze the data and present their findings. In this spirit, I consider this blog post as a thesis that is awaiting a critical examination and response. However, is it really in doubt that respondents who voted for Trump are more likely to oppose affirmative action and to object to a Black family member? Many probably think that these results merely show the obvious.

Covid-19 responses: USA versus Europe

The Economist published an article about the response to Covid-19 in the USA titled “America’s covid-19 experience is tragic but not that exceptional.” The article compares the United States to Europe, which are both doing more poorly than Asia, African, and at this moment also South America, although the situation in several Latin American countries is worsening.

The articles lists several factors that might contribute to a worse outcome in the United States:
– a profit-oriented health care system
– many people without insurance who may delay treatment
– more people with medical conditions like diabetes

Other factors may work in the US’s favor.
– a younger population
– lower population density

The main reason why it is interesting to compare the US and Europe is the US president Donald Trump who has failed to organize a federal response to the Covid-19 crisis in the United States. Many believe that his idiotic response to the pandemic may have made things worse, especially among his supporters.

However, the Economist claims that so far the US is doing better than Europe.

In typical style, The Economist then finds additional factors that may distort this comparison. Importantly, European countries like Italy were hit by Covid-19 earlier than the United States. In fact, some virological studies suggest that the virus mutated and became more infectious in Europe and that most infections in the United States came from Europe. So, it is possible that Europe was affected worse in the beginning, but responded better to the crisis than the United States, which is still seeing over 20,000 new infections and over 1,000 deaths a day in late May.

To examine the time course of the Covid-19 epidemics in Europe and the United States, I downloaded the US data from Covid-Tracking and the data for European countries from Our World In Data. To compare the response to Covid-19, I computed deaths per 10 Million inhabitants for 49 US states (excluding the island state Hawaii) and 35 European countries: Albania, Austria, Belarus, Belgium, Bulgaria, Croatia, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Latvia, Lithuania, Moldova, Montenegro, Netherlands, Norway, Poland, Portugal, Romania, Russia, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, UK

I then averaged these values. These averages are not weighted by population. Thus, small US states like Wyoming and small European countries like Estonia are weighted equally to big US states like Texas and big European countries like the UK, respectively.

The results show, as expected, that the epidemic in the US lagged the epidemic in Europe. However, around April 20 the US had caught up with Europe and since then the US has recorded more deaths than Europe. Since April 20, average deaths per 10 Million have nearly tripled from around 800 deaths to around 2400 deaths. In, Europe deaths doubled from around 800 to 1,600. The curves also show a much flatter trajectory in Europe than in the US. In Europe most countries have controlled the spread of the virus, while several US states are still showing increasing numbers.

This results contradict the claim in The Economist that the US has performed equally well to Europe. Of course, the results vary dramatically with the US and across European countries, but as a whole the US has performed more poorly than the US. The poor performance is especially noteworthy after the first wave surprised Europe and North America.

A major factor that contributed to the disaster in New York was the lack of testing in the early weeks of March, which resulted in widespread undetected community transmission of the virus. However, New York state quickly ramped up testing and deaths are now decreasing. Other US states had more time to prepare, but are seeing increasing rather than decreasing numbers. Thus, especially for the recent weeks in May it is clear that the US is not handling the crisis as well as Europe.

How much of this poor performance can be attributed to Trump is a political question that may be difficult to answer scientifically. However, even if he doesn’t personally take responsibility, one can only wonder how much better the US response would have been with a president Obama.

Children and Covid-19: Evidence from New York City

Despite tremendous efforts by scientists to aid in the fight against Covid-19, many important questions lack a clear scientific answer. It is known that children can be infected by the virus and in some rare cases the virus can produce severe symptoms or even death. However, it is unclear whether children have a lower risk to get infected and whether they are less likely to infect others. On May 7, Nature wrote ” scientists are still trying to understand what the deal is with kids and COVID-19″.

A major problem is that most countries responded to the Covid-19 pandemic by closing schools and minimizing children’s contact. Thus, lower number of children among people testing positive for Covid-19 might simply reflect less exposure to the virus. A notable exception is Sweden, where schools were not closed. Unfortunately, Sweden did not test children, teachers, or parents to examine whether children transmitted the virus (Science, May 22).

A German study of viral load in children and adults suggested that children are no different from adults, but this study has been criticized on methodological grounds (Science Media Centre). A key problem with this claim is that it is impossible to proof the lack of a difference. It is only possible to quantify the amount of a difference and notice that the observed difference is not statistically different from zero. The key problem with this study is that the sample size is small and there is ample statistical uncertainty. Thus, no firm conclusions can be drawn from this study alone.

My colleagues (Shigehiro Oishi, Youngjae Cha, and Bansi Javiya) have been analyzing the open data about Covid-19 deaths and cases in New York City and used the US census data to predict variation in positive cases and deaths across New York City. While we are still working on this project, we would like to share some interesting results about children and Covid-19 that emerged in our analyses. Before we do so, we want to make clear that ZIP code that are not the best data to examine this question, our results are preliminary, and even if our results hold up they do not provide conclusive evidence, and the results cannot and should not be used to make policy recommendations about opening schools or not. The main purpose of this blog post is to share information with scientists who are interested in this question and to add a tiny piece of information to the big puzzle.

Occupants and Covid-19

While the role of children in the transmission of Covid-19 is still unclear, the evidence for transmission at home is much stronger. It makes logical sense that the infection rate is greater if more people share the same living space. In addition, crowded living conditions have been linked to higher rates of Covid-19. For this reason, we looked at several predictors from the US census that reflect crowding. The best predictor was the percentage of residence with more than 1 occupant per room.

We used several measures of Covid-19 prevalence. All of them showed a positive correlation with occupants, but the correlations were stronger for the positive rate (positives / tests) than for the positives per capita (positives / population * 100,000). One possible explanation for this is that testing varies as a function of other factors. It made little difference whether we used the raw numbers or residuals that controlled for differences between Boroughs, so we did use the raw scores. The correlation with the positive rate was r = .54 and the correlation with deaths was r = .20.

This simple correlation does not prove causality. There are a host of other variables that may explain this relationship. We tried to address this issue by including other variables in a causal model.

Children and Covid-19

The number of children living in an area defined by a NYC ZIP code was also a predictor of positive cases, r = .53, and deaths, r = .39. It is important to realize that this is an analysis of ZIP codes and not of individuals. It is well-known that children are at a much lower risk to die from Covid-19. Thus, there is actually a negative link for individuals. Thus, the positive correlation must reflect some other causal mechanism. One possibility (out of many) would be that children can infect older people in the same household that are at a high risk of dying when they get infected. This ‘theory’ implies transmission of the virus from young children to old people who live together.

When we fitted this model to the data, it showed indeed a causal path from children to residents to positive cases to deaths. This path implied that for every 10 percentage point increase in the proportion of children, an additional 2 people per 100,000 would die. The average is 17 death per 100,000 inhabitants. So, an increase by 2 people is a 12% increase.

The model also shows that there is still an unexplained positive relationship between children and risk of infection. This path would contribute another 4 deaths per 100,000.

These results show that ZIP codes with more children have more deaths, and that this relationship is partially explained by children adding to the number of people in a residence. However, once more these results have to be interpreted with caution because important predictor variables are missing from the model.


One potential confounding factor is income. Several analyses of the NYC data have shown that Covid-19 is more prevalent and deadly in ZIP codes with lower income. Thus, we added income to the model.

ZIP code income predicted that ZIP codes had fewer children, fewer occupants, a lower positive rate, and fewer deaths. Although the direct effect on deaths was not statistically reliable, income had a clear indirect effect on deaths by lowering the risk of getting infected. These results show that the effect of children in the previous model was inflated by ignoring the confounding effect of income. In this model, the effect of children on death was 2.1 deaths for every 10 percentage point increase in the proportion of children and 0.7 was explained by the effect on residents and 1.4 was still a direct effect.


Numerous articles found ethnic disparities in Covid-19 deaths. Thus, we added ethnicity as a predictor variable to the model. We used the percentage of White residents as the comparison group and the percentages of Asian, Black, and Hispanic residents as predictors. It is difficult to visualize the complex relationships of this model. Thus, we merely report the key finding about children and Covid-19 here.

Including ethnicity as a predictor further reduced the ‘effect’ of children on Covid-19-deaths to 1.8 deaths (0.3 indirect via occupants and 1.5 direct on PR).


Just like there can be confounding factors that inflate relationships, some confounding variables can suppress a relationship. We found that was the case for your measure of the percentage of residents over 65. As expected, ZIP codes with a higher proportion of older residents had more Covid-19 deaths. We also found a negative relationship between the proportion of older residents and occupants, r = -.35. Thus, the fact that high-occupancy ZIP codes tend to be younger reduced the effect of occupants on deaths. In this model, children increased deaths by 2.6 deaths per 100,000. This relationship is highly statistically significant and very unlikely to be just a random fluke. However, it is still possible that other variables that are missing from the model explain this relationship. The multiple pathways are weaker, and it is difficult to say how much they contribute to the relationship.


The key finding of our analyses of the NYC Covid-19 data is that ZIP codes with more children have more Covid-19 as reflected in a higher positive rate (positives / tests) and deaths. This relationship remains after controlling for income, ethnicity, and the proportion of senior residents. The final model suggests that some of the effect is explained by crowded living conditions. These results suggest that children could be transmitting the virus as much as other occupants. However, many other explanations are possible.

The most important limitation of our work is that it relies on ZIP codes, while the actual causal process is person to person transmission. We think that it would be valuable to follow up on this work with studies that examine the social networks of NYC residents who contracted the virus and who did not. Anti-body tests would be particularly useful to examine the spreading of Covid-19 within households. Relevant data may already exist from contact-tracing of infected individuals. We believe that tracing infections and deaths in NYC provides useful information about children’s risk to contract and spread the virus.

Please feel free to contact us with related information or questions.

Why Are Red States “Immune” to Covid-19?

Joey loves crowds. He is boisterous, speaks with a loud and booming voice, and is always ready to high-five everybody. No, I am not describing a super-spreader of Covid-19. It is a textbook description or caricature of an extrovert, or as personality psychologists say extravert.

Personality psychologists have studied extraversion and introversion for nearly one-hundred years, although most of the research emerged in the past 40 years. We know that extraversion is a heritable trait that runs in families. We know that it remains fairly stable throughout adulthood, and we know that it influences behavior. There also has been research on regional variation in extraversion across the world and across US states (Elleman, Condon, Russin, & Revelle, 2018). I used their data to create the map of extraversion for US states. The map shows the highest level of extraversion in Illinois and the lowest level of extraversion in Wyoming followed by Idaho and Utah. While Illinois has fairly high rates of Covid-19, especially in Chicagoland, Wyoming and Idaho have relatively low levels of positive cases. They are also solid “red” states who voted for Trump in the 2016 election with 67% and 59%. It is therefore possible that extraversion partially explains why Covid-19 is more prevalent in “blue” (liberal) states. Residents in blue states may be more extraverted and may have a harder time to follow social distancing rules.

Of course, extraversion would only be one of several factors that play a role. Another obvious factor is that urban areas are more strongly affected by Covid-19 than rural areas, and rural voters are more likely to vote for Trump. There are many other possible differences among the US states that might play a role, but preliminary analysis suggests that they do not predict Covid-19 to a substantial degree. So, to keep things short, I will focus on the two factors that I found to be significant predictors of the spread of Covid-19: urbanization and extraversion.

To examine whether this relationship is stable over time, I used confirmed positive cases reported on the Covid-Tracking website and created indicators for three, three-week periods: March 23 to April 12, April 13 to May 3, and May 4 to May 24. Predictor variables were (a) the percentage of votes for Trump in the 2016 election, (b) extraversion scores from the supplement to Elleman et al.’s article (Table 8), and (c) urbanization scores (wikipedia).

The data were analyzed using structural equation modeling to examine the relationship among the six variables. [I also examined more complex models that included deaths. The effects of the predictor variables on death were mostly mediated by confirmed positives, with the exception of a unique, negative relationship between Trump support and deaths at time 1 only.] Model fit was excellent, CFI = 1.00, RMSEA = .000. This does not mean that the model reveals the truth, but it does show that the model is consistent with the data. Thus, the model tells one possible story about the negative relationship between Trump support and Covid-19 deaths across the states.

The numbers show that urbanity is much stronger negative predictor of Trump support than extraversion. The effect of extraversion is small, and not statistically significant by conventional standards, but there are only 49 states (I excluded the island state Hawaii), making it hard to get statistical significance. The effect of urbanization and extraversion are more equal and explain a notable amount of variation in Covid-19 cases across states. The numbers also show that the effect is not getting weaker over time. It may actually become stronger. This means that both urbanization and extraversion are stable predictors of the Covid-19 pandemic in the USA. Even in the past three weeks, after several states with Republican governors eased restrictions, there is no evidence that cases are increasing notably in red states.

It is not difficult to find newspaper articles that talk about a second wave and spikes in new cases in Texas or other red states. These stories are based on the idea that red states are ignoring the danger of Covid-19, but so far this idea lacks empirical support. For every story of a pastor who died of Covid-19 after defying stay-at-home orders, there are thousands of churches who are holding services online, and hundreds of people flouting social-distancing norms in Central Park, NY. Don’t get me wrong. Trump’s disregard of science and ramblings about light therapy are a disgrace, but this does not mean that 40% of the US population follows the covidiot in the White House and drinks bleach. At least forty-percent of US voters are likely to vote for him again. Don’t ask me how anybody can vote for him again. That is not the point of this blog post. The blog post is about the empirical fact that so far Covid-19 has predominantly hit states that did not vote for Trump. I suggest that this seemingly paradox finding is not paradox at all. Joey, the extraverted bachelor who lives in an apartment in New York City, who voted for Hillary is much more likely to get infected than Joyce who lives with her family on a farm in Wyoming. Painting all Trump-voters as covidiots would be a similar mistake as Hillary Clinton calling Trump-supporters a “basket of deplorables.” If all Trump-supporters were covid-idiots, we should have seen a positive relationship between Trump-support and Covid-19 cases, especially after controlling for confounding variables like urbanization and extraversion. The fact that this positive relationship stubbornly refuses to emerge may suggest that Republican governors and residents in red states are not as stupid as their leader.

The Covid-Statistic Wars are Heating Up

After a general consensus or willingness to accept social distancing measures imposed by politicians (often referred to as lock-downs), societies are polarizing. Some citizens want to open stores, bars, and restaurants (and get a hair cut). Others want to keep social distancing measures in place. Some people on both sides are not interested in scientific arguments for or against their position. Others like to find scientific evidence that seemingly supports their viewpoint. This abuse of science is becoming more common in a polarized world. As a scientist, I am concerned about the weaponizing of science because it undermines the ability of science to inform decisions and to correct false beliefs. Psychological research has shown how easily we assimilate information that matches our beliefs and treat disconfirming evidence like a virus. These motivated biases in human reasoning are very powerful and even scientists themselves are not immune to these biases.

Some economists appear to be afflicted by a bias to focus on the economic consequences of lock-downs and to downplay the effects of the virus itself on human lives and the economy. The idea is that lock-downs were not helpful to save lives at immense costs to the economy. I am not denying the severe consequences of unemployment (I actually co-authored an article on unemployment and well-being), but I am shocked by claims in a tweet that social distancing laws are ineffective that have been retweeted 3,500 times or blog posts that make similar claims accompanied by scatterplots that give the claims the appearance of scientific credibility.

There is nothing wrong with these graphs. I have examined the relationship between policies and Covid-19 deaths across US states and across countries, and I have also not found a significant correlation. The question is what this finding means. Does it imply that lock-down measures were unnecessary and have produced huge economic costs without any benefits? As some responses on twitter indicated, interpreting correlational data is not easy because many confounding factors influence the correlation between two variables.

Social distancing is unnecessary if nobody is infected

Let’s go back in time and impose social distancing policies across the world in May 2019 randomly in some countries and not in others. We observe that nobody is dying of Covid-19 in countries with and without ‘lock-down’. In addition, countries with lock-down suffer high rates of unemployment. Clearly, locking countries down without a deadly virus spreading is not a good idea. Even in 2020 some countries were able to contain relatively small outbreaks and are now mostly Covid-free. This is more or less true of countries like Taiwan, Australia, and New Zealand. However, these countries impose severe restrictions on travel to ensure that no new infections are brought into the country. When I tried to book a flight from Toronto to Sydney, I was not able to do so. So, the entire country is pretty much in lock-down to ensure that people in Australia cannot be infected by visitors from countries that have the virus. Would economists argue that these country-wide lock-downs are unnecessary and only hurt the tourist industry?

This image has an empty alt attribute; its file name is image-16.png

The fact that Covid-19 spread unevenly across countries also creates a problem for the correlation between social-distancing policies and Covid-19 deaths across countries. The more countries are actively trying to stem the spread of the virus, the more severe social-distancing measures will be, while countries without the virus are able to relax social distancing measures. Not surprisingly, some of the most sever restrictions were imposed at the peak of the epidemics in Italy and Spain. This produces a positive correlation between severity of lock-downs and spread of Covid-19, which could be falsely interpreted as evidence that lock-downs even increase the spread of Covid-19. A simple correlation between lock-down measures and Covid-19 deaths across countries is simply unable to tell us something about the effects of lock-down measures on deaths within countries.

Social Distancing Effects are Invisible if there is no Variation in Social Distancing Across Countries

To examine the effectiveness of social-distancing measures, we need to consider timing. First, social distancing measures may be introduced in response to a pandemic. Later on, we might see that countries or US states that imposed more severe restrictions were able to slow down the spread of the virus more. However, now we encounter a new problem. Most countries and states responded to the declaration of Covid-19 as a pandemic by the WHO on March 11 with very similar policies (school closures). This makes it difficult to see the effects of social distancing measures because we have little variation in the predictor variable. We simply do not have a large group of countries with a Covid-19 epidemic that did nothing. This means, we lack a proper control group to see whether spread in these countries would be bigger than in countries with severe lock-downs. Even countries like the UK closed schools and bars in mid March.

Sweden is often used as the example of a country that did not impose severe restrictions on citizens and kept schools open. It is difficult to evaluate the outcome of this political decision. Proportional to the population, Sweden ranks number 6 in the world in terms of Covid-19 deaths, but what is a proper comparison standard? Italy and Spain had more severe restrictions and more deaths, but their epidemics started earlier than in Sweden. Other Nordic countries like Norway, Denmark, and Finland have much lower fatalities than Sweden. This suggests that social distancing is effective in reducing the spread, but we do not have enough data for rigorous statistical analysis.

Social Distancing Policies Explain Trajectories of Covid-19 spread in hot-spots.

One advantage of epidemics it is possible to foresee the future because exponential growth produces a very notable trajectory over time that is hard to miss in statistical analyses. If every individual infects two or three other people, the number of cases will grow exponentially until a fairly large number of the population is infected. This is not what happened in Covid-19 hot spots. Let’s examine New York as an example. In mid March, the number of detected cases and deaths increased exponentially, with numbers doubling every three days.

The number of new cases peaked in the beginning of April and has been decreasing until now. One possible explanation for this pattern is that social-distancing policies that were mandated in mid-March were effective in slowing down the spread of the virus. Anybody who claims that lock-downs are ineffective needs to provide an alternative explanation for the trajectory of Covid-19 cases and deaths over time.

Once more it is difficult to show empirically what would have happened without “lock-downs”. The reasons is that even in countries that did not impose strict rules people changed their behaviors. Once more we can use Sweden as a country without ‘lock-down’ laws. As in New York, we see that rapid exponential growth was slowed down. This did not happen while people were living their lives as they did in January 2020. It happened because many Swedes changed their behaviors.

The main conclusion is that the time period from March to May makes it very difficult to examine scientifically what measures were effective in preventing the spread of the virus and what measures were unnecessary. How much does wearing masks help? How many lives are saved by school closures? The best answer to these important questions is that we do not have clear answers to these questions because there was insufficient variation in the response to the pandemic across nations or across US states. Most of the variation in Covid-19 deaths is explained by the connectedness of countries or states to the world.

Easing Restrictions and Covid-19 Cases

The coming months provide a much better opportunity to examine the influence of social distancing policies on the pandemic. Unlike New Zealand and a few other countries, most countries do have community transmission of Covid-19. The United States provide a naturalistic experiment because (a) the country has a large population and therewith many new cases each day and (b) social distancing policies are made at the level of the 50 states.

Currently, there are still 20,000 new confirmed (!) positive cases in the United States. There are also still over 1,000 deaths per day.

There is also some variation across states in the speed and extend to which states ease restrictions on public life (NYT.05.20). Importantly, there is no state where residents are just going back to live as it was in January of 2020. Even states like Georgia that have been criticized for opening early are by no means back to business as usual.

So, the question remains whether there is sufficient variance in opening measures to see potential effects in case-numbers across states.

Another problem is that it is tricky to measure changes in case-numbers or deaths when states have different starting levels. For example, in the past week New York still recorded 41 deaths per 1 Million inhabitants, while Nebraska recorded only 13 deaths per 1 Million inhabitants. However, in terms of percentages, cumulative deaths in New York increased by only 3%, whereas the increase in Nebraska was 23%. While a strong ‘first wave’ accounts for the high absolute number in New York, it also accounts for the low percentage value. A better outcome measure may be whether weekly numbers are increasing or decreasing.

Figure 1 shows the increase in Covid-19 deaths in the past 7-days (May 14 – May 20) compared to the 7 days after some states officially eased restrictions (May 2 – May 8).

It is clearly visible that states that are still seeing high numbers of deaths are not easing restrictions (CT, NJ, MA, RI, PA, NY, DE, IL, MD, LA). It is more interesting to compare states that did not see a big first wave that vary in their social distancing policies. For this analysis, I limited the analysis to the remaining states.

States below the regression line are showing faster decreases than other states, whereas states above the regression line show slower decreases or increased. When the opening policies on May 1 (NYT) are used as predictors of deaths in the recent week with deaths two weeks before as covariate, a positive relationship emerges, but it is not statistically significant. It is a statistical fallacy to infer from this finding that policies have no influence on the pandemic.

More important is the effect size, which is likely to be somewhere between -2 and + 4 deaths per million. This may seems a small difference, but we have to keep in mind that there is little variation in the predictor variable. Remember, even in Georgia where restaurants are open, the number of diners is only 15% of the normal number. The hypothetical question is how much bigger the number of Covid-19 cases would be if restaurants were filled at capacity and all other activities were back to normal. It is unlikely that citizens of open states are willing to participate in this experiment. Thus, data alone simply cannot answer this question.


Empirical science rely on data and data analysis. However, data are only necessary and not sufficient to turn a graph into science. Science also requires proper interpretation of the results and honest discussion of their limitations. It is true that New York has more Covid-19 deaths than South Dakota. It is also true that some states like South Dakota never imposed severe restrictions. This does not imply that stay-at-home orders in New York caused more Covid-19 deaths. Similarly, the lack of a correlation between Covid-19 policies and Covid-19 cases or deaths across US states does not imply that these policies have no effect. Another explanation is that there are no states that had many deaths and did not impose stay-at-home orders. For this reason, experts have relied on models of epidemics to simulate scenarios what would have happened if New York City had not closed schools, bars, and night clubs. These simulations suggest that the death toll would have been even greater. The same simulations also suggest that many more lives could have been saved if New York City had been closed down just one week earlier (NPR). Models may sound less scientific than hard data, but data are useless and can be misleading when the necessary information is missing. The social-distancing measures that were imposed world-wide did reduce the death toll, but it is not clear which measures reduced it by how much. The coming months may provide some answers to this questions. S. Korea quickly closed bars after one super spreader infected 40 people in one night (businessinsider). What will happen in Oklahoma where bars and nightclubs are reopening? Personally, I think the political conflict about lock-downs is unproductive. The energy may be better spend on learning from countries that have been successful in controlling Covid-19 and who are able to ease restrictions.

Reply to Vianello, Bar-Anan, Kurdi, Ratliff, and Cunningham

I published a critique of the Implicit Association Test. Using structural equation modeling of multi-method studies, I find low convergent validity among implicit measures of attitudes (prejudice, self-esteem, political orientation) and low discriminant validity between explicit and implicit measures. The latter finding is reflected in high correlations between factors that reflect the shared variance among explicit measures and the shared variance among implicit measures. Using factor loadings to quantify validity, I found that the controversial race IAT has at most 20% valid variance in capturing racial attitudes. Most if not all of this variance is shared with explicit measures. Thus, there is no evidence that IAT scores reflect a distinct form of implicit prejudice that may operate outside of conscious awareness.

This article elicited a commentary by Vianello and Bar-Anan (ref.) and by Kurdi, Ratliff, and Cunningham (pdf). Here is a draft of my response to their commentaries. As you will see, there is little common ground; even the term “validity” is not clearly defined making any discussion about the validity of the IAT meaningless. To make progress as a science (or to become a science), psychologists need to have a common understanding of psychological measurement and methods that can be used to evaluate the validity of measures quantitatively.


Just like pre-publication peer-reviews, the two post-publication commentaries have remarkably little overlap. While Vianello and Bar-Anan (VBA) question my statistical analyses, Kurdi, Ratcliff, and Cunningham accept my statistical results, but argue that these results do not challenge the validity of the IAT.

VBA’s critique is clearer and therefore easier to refute by means of objective model comparisons. The key difference between VBA’s model and my model is the modelling of method variance. VBA’s model assume that all implicit measures of different constructs are influenced by a single method factor. In contrast, my model assumes that implicit measures of prejudice (e.g., the standard race IAT and the Brief Implicit Association Test with the same racial stimuli) share additional method variance. As these hypotheses are nested models, it is possible to test these competing models directly against each other. The results show that a model with content-specific method variance fits the data better (Schimmack, 2020a). The standard inference from a model comparison test is that the model with the worse fit is not an adequate model of the data, but VBA ignored the poorer fit of their model and present a revised model that does not model method variance properly and therefore produces misleading results. Thus, VBA’s commentary is just another demonstration of the power of motivated reasoning that undermines the idealistic notion of a self-correcting science.

KRC ask whether my results imply that the IAT cannot be a valid measure of automatic cognition?  To provide a meaningful answer to this question, it is important to define the terms valid, measure, automatic, and cognition.  The main problem with KRC’s comment is that these terms remain undefined. Without precise definitions, it is impossible to make scientific progress. This is even true for the concept of validity that has no clear meaning in psychological measurement (Schimmack, 2020c). KRC ignore that I clearly define validity as the correlation between IAT scores and a latent variable that represents the actual variation in constructs such as attitudes towards race, political parties, and the self.  My main finding was that IAT scores have only modest validity (i.e., low correlations with the latent variable or low factor loadings) as measures of racial preferences, no validity as a measure of self-esteem, and no proven validity as measures of some implicit constructs that are distinct from attitudes that are reflected in self-report measures. Instead, KRC consistently mischaracterize my findings when they write that “the reanalyses reported by Schimmack find high correlations between relatively indirect (automatic) measures of mental content, as indexed by the IAT, and relatively direct (controlled) measures of mental content.” This statement is simply false and confuses correlations of measures with correlations of latent variables.  The high correlations between latent factors that represent shared variance among explicit measures and implicit measures provide evidence of low discriminant validity, not evidence of high validity. Moreover, the modest loadings of the race IAT on the implicit race factor show low validity of the IAT as a measure of racial attitudes.

After mischaracterizing my results, KRC go on to claim that my results do “not cast any doubt on the ability of IATs to index attitudes or to do so in an automatic fashion” (p. 5).  However, the low convergent validity among implicit measures remains a problem for any claims that the IAT and other implicit measures measure a common construct with good validity. KRC simply ignore this key finding even though factor loadings provide objective and quantitative information about the construct validity of IAT scores.

The IAT is not the only research instrument with questionable construct validity.  However, the IAT is unique because it became a popular measure of individual differences without critical evaluation of its psychometric properties. This is particularly problematic when people are given feedback with IATs on the Project Implicit website, especially for IATs that have demonstrably no validity like the self-esteem IAT.  The developers of the IAT and KRC defend this practice by arguing that taking an IAT can be educational.  “At this stage in its development, it is preferable to use the IAT mainly as an educational tool to develop awareness of implicit preferences and stereotypes” However, it is not clear how a test with invalid results can be educational. How educational would it be to provide individuals with randomly generated feedback about their intelligence?  If this sounds unethical, it is not clear why it is acceptable to provide individuals with misleading feedback about their racial attitudes or self-esteem. As a community, psychologists should take a closer look at the practice of  providing online feedback with tests that have low validity because this practice may undermine trust in psychological science.

KRC’s commentary also fails to address important questions about the sources of stability and change in IAT scores over time. KRC suggest that “the jury is still out on whether variation in responding on the IAT mostly reflects individual differences or mostly reflects the effects of the situation” (p. 4). The reason why two decades of research have failed to answer this important question is that social cognition researchers focus on brief laboratory experiments that have little ecological validity and that are unable to demonstrate stability of individual differences over time. However, two longitudinal studies suggest that IAT scores measure stable attitudes rather than context-dependent automatic cognitions. Wil Cunningham, one of the commentators, provided first evidence that variance in IAT scores reflects mostly random measurement error and stable trait variance, with no evidence of situation-specific state variance (Cunningham et al., 2001). Interestingly, KRC ignore the implications of this study. This year, an impressive study examined this question with repeated measures of a six-year period (Onyeador et al., 2020; Schimmack, 2020). The results confirmed that even over this long time-period, variance in IAT scores mostly reflects measurement error and a stable trait without notable variance due to changes in situations.

Another important topic that I could only mention briefly in my original article is incremental predictive validity. KRC mention Kurdi et al.’s (2019) meta-analysis as evidence that the IAT and self-report measures tap different constructs. They fail to mention that the conclusions of this meta-analysis are undermined by the lack of credible, high-powered studies that can demonstrate incremental predictive validity. To quote Kurdi et al.’s abstract “most studies were vastly underpowered” (p. 569).  The authors conducted tests of publication bias, but did not find evidence for it.  The reason could be that they used tests that have low power to detect publication bias. Some studies included in the meta-analysis are likely to have reported inflated effect sizes due to selection for significance, especially costly fMRI studies with tiny sample sizes. For example, Phelps et al. (2000) report a correlation of r(12) = .58 between scores on the race IAT and differences in amygdala activation in response to Black and White faces.  Even if we assume that 20% of the variance in the IAT is valid, the validation corrected correlation would be r = 1.30. In other words, this correlation is implausible given the low validity of race IAT scores.  The correlation is also much stronger than the predictive validity of the IAT in Kurdi et al.’s meta-analysis. The most plausible explanation for this result is that researchers’ degrees of freedom in fMRI studies inflated this correlation (Vul et al., 2009). Consistent with this argument, effect sizes in studies with larger sample sizes are much smaller and evidence of incremental predictive validity can be elusive, as in Greenwald et al.’s study of the 2018 election.  At present, there is no pre-registered, high-powered study that provides clear evidence of incremental predictive validity. Thus, IAT proponents have failed to respond to Blanton et al.’s (2009) critique of the IAT. Responses to my renewed criticism suggest that IAT researchers are unable or unwilling to respond to valid scientific criticism of the IAT with active coping. Instead, they prefer to engage in emotion-focused, repressive coping that makes IAT researchers feel better without addressing substantive measurement problems.

In conclusion, my critique of the IAT literature and the response by IAT researchers shows a wider problem in psychology that I have called the validation crisis (Schimmack 2020c). Although measurement is at the core of any empirical science, many psychologists lack formal training in psychological measurement. As a result, they create and use measures of unknown validity. This is particularly true for social psychologists because social psychologists in the 1970s and 1980s actively rejected the idea that characteristics within individuals are important for the understanding of human behavior (“the power of the situation”). However, when the cognitive revolution started, the focus shifted from observable situations and behaviors to mental states and processes. To study these phenomena that are not directly observable requires valid measures, just like telescopes need to be validated to observe planets in distant galaxies. The problem is that social cognition researchers developed methods like the IAT to make claims about cognitive processes that are not observable to outside observers or by means of introspection without taking the time to validate these measures. To make progress, the next generation of social psychologists needs to distinguish clearly between constructs and measures and between random and systematic measurement error. As all measures are contaminated by both sources of measurement error, constructs need to be measured with multiple, independent methods that show convergent validity (Campbell & Fiske, 1959; Cronbach & Meehl, 1955).  Psychology also needs to move from empty qualitative statements like “the IAT can be valid” to empirically-based statements about the amount of validity of a specific IAT in specific populations in clearly defined situations. This requires a new program of research with larger samples, ecologically valid situations, and meaningful criterion variables.


Blanton, H., Jaccard, J., Klick, J., Mellers, B., Mitchell, G., & Tetlock, P. E. (2009). Strong claims and weak evidence: Reassessing the predictive validity of the IAT. Journal of Applied Psychology, 94, 567–582. doi:10.1037/a0014665

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105. doi:10.1037/h0046016

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302. doi:10.1037/h0040957

Cunningham, W. A., Preacher, K. J., & Banaji, M. R. (2001). Implicit attitude measures: Consistency, stability, and convergent validity. Psychological Science, 12(2), 163–170.

Onyeador, I. N., Wittlin, N. M., Burke, S. E., Dovidio, J. F., Perry, S. P., Hardeman, R. R., … van Ryn, M. (2020). The Value of Interracial Contact for Reducing Anti-Black Bias Among Non-Black Physicians: A Cognitive Habits and Growth Evaluation (CHANGE) Study Report. Psychological Science, 31(1), 18–30.

Schimmack, U. (2020a). Open Communication about the invalidity of the race IAT.

Schimmack, U. (2020b). Racial bias as a trait. (retrieved 4/21/20)

Schimmack, U. (2020c). The validation crisis. Meta-Psychology (blog)

Vul, E., Harris, C., Winkielman, P., & Pashler, H. (2009). Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition. Perspectives on Psychological Science, 4(3), 274–290.

Covid-19 behaves like tourists

Many people are wondering about variation in the Covid-19 pandemic across countries. Why (the North of) Italy and not Portugal. How was South Korea able to contain the virus, but other countries did not even though they had less time. The New York Times published a long article that examined this question, but nobody really knows.

Some of the speculations focus on biological factors that may protect individuals or may make them more vulnerable. However, so far these factors explain a small portion of the variation in death rates. The biggest predictor is the number of people who are infected by the virus. Australia and New Zealand have few deaths because Covid-19 did not spread widely among their populations.

One possible explanation could be the response of countries to the pandemic. Countries like the UK and Sweden may have more deaths because they did not lock down their countries. The problem with these speculations is that many factors are likely to contribute to the variation and it is difficult to spot these factors without statistical analyses.

The NYT article mentions that hundreds of studies are underway to look for predictors of variation across nations, but no results are being mentioned. Maybe researchers are cautious.

“Doctors who study infectious diseases around the world say they do not have enough data yet to get a full epidemiological picture, and that gaps in information in many countries make it dangerous to draw conclusions”

Drawing conclusions is different from exploring data. There is nothing dangerous about exploring patterns in data. Clearly many people are curious and statistical analysis can provide more valuable information than armchair speculations about climate or culture.

As a cross-cultural psychologists, I am familiar with many variables that distinguish nations from each other. The most prominent dimension is individualism. Western cultures tend to be more individualistic than Asian cultures. This might suggest that culture plays a role because Asian cultures have had fewer Covid-19 deaths. However, individualism as measured by Hofstede’s dimension is a weak predictor and did not survive statistical controls. Other dimensions that were less plausible also did not predict variation in Covid-19 deaths.

However, one variable that was a predictor was the number of tourists that travel to a country (tourism data).

Tourism reflects how connected a country is with the rest of the world. Australia and New Zealand are not only islands, they are also geographically isolated which explains why relatively few people visit these otherwise attractive locations. Covid-19 also has speared much of Eastern Europe and many Eastern European countries rank low on the tourism index.

Additional analysis show that tourism is becoming a weaker predictor over time. The reason is the recent rise of cases and deaths in Latin America. Latin America was relatively unaffected in April, but lately Ecuador and Brazil have seen alarming increases in cases.

The graph also shows that tourism does not explain all of the differences between countries. For example, the UK has way more cases than predicted by the regression line. This may reflect the slow response to the Covid-19 crises in the UK. Sweden is also above the regression line, possibly due to the policy to keep schools and businesses. Switzerland is a direct neighbor of the North of Italy, where the epidemic in Europe started. Canada is above the regression line, but was on the regression line on April 15. The reason is that Canada acted quickly in the beginning, but is now seeing a late increase in death in care homes.

In conclusion, these results suggest that timing is a big factor in the current differences across countries. Countries with high death tolls were simply unlucky to be at the center of the pandemic or well connected to it. As the pandemic progresses, this factor will become less important. Some countries, like Austria and (the South of) Germany that were hit early have been able to contain the spread of Covid-19. In other countries, numbers are increasing, but no country is seeing increases as dramatic as in Italy (or New York) where Covid-19 spread before social distancing measures were in place. New factors may predict what will happen in the times of the “new normal” when countries are trying to come out of lock-downs.

I don’t think that publishing these results is dangerous. The results are what they are. It is just important to realize that they do not prove that tourism is the real causal factor. It is also possible that tourism is correlated with some other variables that reflect the real cause. To demonstrate this, we need to find measures of these causal factors and demonstrate that they predict variation in death tolls of nations better than tourism and statistically remove the relationship of tourism with Covid-19 deaths. So, this blog post should be seen as a piece of a puzzle rather than the ultimate answer to a riddle.