Category Archives: Uncategorized

Open Letter about Systemic Racism to the Editor of SPPS

Dear Margo Monteith,

it is very disappointing that you are not willing to retract an openly racist article that was published in your journal Social Psychological and Personality Science (SPPS) when Simine Varzire was editor of the journal and Lee Jussim was the action editor of the article in question (Cesario, Johnson, & Terrill, 2019). I have repeatedly pleaded with you to retract the article that draws conclusions on the basis of false assumptions. I am even more stunned by your decision because you rejected my commentary on this racist article with the justification that a better criticism was submitted. This criticism was just published (Ross et al., 2020). It makes the same observation that I made in my critique; that is, the conclusion that there is no racial bias in policing and the use of force rests entirely on an invalid assumption. The original authors simply assume that police officers only encounter violent criminals or that they only encounter violent criminals when they use deadly force.

Maybe you are not watching the news, but the Black Lives Matter movement started because police often use deadly force against non-violent African Americans. In some cases, this is even documented on video. Please watch the murder of Tamir Rice, George Floyd, Philando Castile, and Eric Garner and then tell their families and friends that police only kills violent criminals. That is what SPPS is telling everybody with the mantel of scientific truth, but is a blatantly false claim based on racists assumptions. So, why are you not retracting this offensive article?

Philando Castile:

Tamir Rice:

Eric Garner:

George Floyd:

So, why are you not retracting an article that makes an obviously false and offensive assumption? Do you think that a retraction would look badly on the reputation of your journal? In that case, you are mistaken. Research shows that journals that retract articles with false conclusions have higher impact factors and are more prestigious than journals that try to maintain a flawless image by avoiding retractions of bad science (Nature). So, your actions are not only offensive, but also hurt the reputation of SPPS and ultimately our science.

Your justification for not retracting the article is unconvincing.

Just how to analyze data such as this is debated, mostly in criminology journals. (One can wonder what psychology was present in Cesario et al.’s study that led to publication in SPPS, but that’s another matter.) Cesario et al. made the important point that benchmarking with population data is problematic. Their methodology was imperfect. Ross et al. made important improvements. If one is interested in this question of police bias with benchmarking, the papers bring successive advances. ”

Your response implies that you did not fully understand Ross et al.’s criticism of the offensive article. The whole approach of “benchmarking” is flawed. So, publishing an article that introduces a flawed statistical approach from criminology to psychology is dangerous. What if we would start using this approach to study other disparities? Ross et al. show that this would be extremely harmful to psychological science. It is important to retract an article that introduces this flawed statistical approach to psychologists. As an editor it is your responsibility to ensure that this does not happen.

It is particular shocking and beyond comprehension that you resist retraction at the very same time many universities and academics are keenly aware of the systemic racism in academia. This article about an issue that affects every African American was based on research funding to White academics, reviewed by White academics, approved by White academics, and now defended and not retracted by a White academic. How does your action promote diversity and inclusion? It is even more surprising that you seem to be blind to this systemic racism in the publication of this racist article given your research on prejudice and the funding you received to study these issues (CV). Can you at least acknowledge that it is very offensive to Black people to attribute their losses of lives entirely to violent crime?

Ulrich Schimmack

Systemic Racism at Michigan State University

This is how three professors at MSU talk about innocent Black people being killed by police (podcast transcript at 25minuts and 40seconds into the clip).

Their discussion of tragic deaths suggests that Black lives don’t matter to Joseph Cesario (MSU), Steve Hsu (MSU), and Corey Washington (MSU)

Here is what those rare events look like. I dear everybody to watch them and then reflect on the words of these privileged professors.

Philando Castile:

Tamir Rice:

Eric Garner:

George Floyd:

And yes, it doesn’t only happen to Black people, but contrary to the statistically flawed work by Cesario, Young Black unarmed men are more often the target of police brutality and the victims of lethal force errors (

See also:

When Right-Wing News Write About Race and Police

The right-wing magazine Quilette just published an article by John McWhorter, who is an associate professor in the linguistics department at Columbia University with the title “Racist Police Violence Reconsidered” Given his training in writing, he knows how to draw his readers in with an emotional story about a White victim of lethal use of force to make the point that police sometimes kill White people, too. This is followed by the statement that “plenty of evidence indicates, however, that racism is less important to understanding police behavior than is commonly supposed”.

In a scientific article, this would be the time to mention the scientific evidence that is supposed to support this claim. But McWhorter is no scientist. He is a writer and cannot be held to the scientific standards of criminologists and other social scientists. With one sentence, a fact has been created. The idea that police is racially biased and kills disproportionately African Americans is wrong. But why does everybody believe it to be true? McWhorter has a ready explanation for this. The biased liberal, reverse-racist media cover police brutality only when the officer is White and the victim is Black. “Had Tony Timpa been black, we would all likely know his name by now. Had George Floyd been white, his name would likely be a footnote, briefly reported in Minneapolis local news and quickly forgotten.”

Well trained in propaganda, McWhorter then presented more cases of White victims in equal numbers to Black people. For every Black victim, there is a White victim in his narrative that is based on his personal selection of cases. After creating the illusion that there is a White victim for every Black victim, he is ready to repeat his claim that we have been manipulated by the liberal media. “So, the perception that the police regularly kill black people under circumstances in which white people would be merely disciplined is in fact a misperception.”

But is it a misperception? That would require actual scientific information about the influence of race on lethal use of force by police officers in the US. This evidence is reviewed after the conclusion has already been offered that the common assumption of racial bias against African Americans is a misperception.

McWhorter next explains correctly that African Americans are a minority in the United States. If police were unbiased in the lethal use of force, we would expect a lot more victims to be White than Black. He then correctly states that ” it remains true that black people are killed at a rate disproportionate to their percentage of the population.”

So, it is NOT a misperception that police kill disproportionately more African Americans. There is racial disparity in the use of force. This invalidates the claim that we all believe that racial bias exists because we have been manipulated by the liberal media.

McWhorter then makes a distinction between racial disparity and racial bias. “However, these figures are not necessarily evidence of police racism. According to the Washington Post‘s database, over 95 percent of the people fatally shot by police officers in 2019 were male, and no serious-minded person argues that this is evidence of systemic misandry. So what, then, accounts for the disproportionate representation of black men among those killed by cops?”

This is a controversial topic that has been examined in numerous articles by social scientists in scientific journals. But McWhorter does not cite these studies, presumably because he lacks the training to understand the sometimes complicated statistical methods that have been used in these articles.

Like a novelist, he creates facts with the stroke of a pen. “The socioeconomic gap between blacks and whites is doubtless an important contributing factor.” and “This disparity in poverty rates means black people are also disproportionately presented in violent rates of violent crime” Here we go again. Police is not racially biased. The real reason why they kill more Black people is because Black people are more criminal. Blame the victim. To give this biased narrative some credibility, McWhorter cites only one scientific article that supports his story. “Contrary to his expectations, Harvard economist Roland Fryer has found that while white men are actually more likely to be killed by cops,” as if an economist is more credible than criminologists or other scientists because he is at Harvard. That is not how science works. You also have to cite evidence that contradicts your claims (Young unarmed nonsuicidal male victims of fatal use of force are 13 times more likely to be Black than White).

In the end McWhorter softens his stance a bit. “This disparity cannot explain every fatal police shooting,” “This is not to say that race has nothing to do with policing issues in America.”   But these sentences are mere rhetorical devices to signal that the author is balanced and reasonable, when the truth is that the author is ignorant about the science on racial bias in policing, including use of force.

I have no reason to believe that McWhorter wrote this terrible article because he is motivated by racism, but it is not clear to me why McWhorter wrote such a biased article that is so hurtful to many African Americans who are traumatized by the callus killing of innocent African Americans. All I can say is that McWhorter lacks the competence to write about this topic because he is either too lazy or not trained to follow the actual science on this topic. In Germany we say “Schuster blieb bei Deinen Leisten” (“Every man to his trade.”) Please follow this good advice, Dr. McWhorter.

Can We Measure Racism? Yes We Can

A famous quote states that something that cannot be measured does not exist. This is of course not true, but if we want to move from anecdotal evidence to scientific evidence and theories of racism, we need valid measures of racism.

Social psychology has a long history of developing measures of racism and today there are dozens of different measures of racism. Unfortunately, psychologists are better at developing new measures than at validating existing ones. This makes research on racism vulnerable to criticism that racism measures are invalid or biased (Feldman & Huddy, 2005; Zigerell, 2015).

Take the item “Irish, Italians, Jewish and many other minorities overcame prejudice and worked their way up. Blacks should do the same without special favors” as an example. The item is one of several items that is used to measure a form of racism called symbolic racism.

Feldman and Huddy (2005) argue that items like this one have two components. A purely racist component where White people do not see Black people as equal citizens and a purely ideological component that opposes policies that favor any particular group, even if this group is disadvantaged by a history of racism. Whether the latter component is itself racist or not is not the topic of this blog post. My focus is rather on the separation of the two components. How can we separate agreement to the item that is based on racism from endorsement of the item for purely political reasons?

One solution to this problem is to see how endorsement of items with political content is related to items that have no political content. Using a statistical method called factor analysis it is then possible to separate the racial and the ideological component and to examine how much political orientation is related to the two components.

Indirect Measures of Racism

The problem with direct measures of racism is that open admission of racial bias has become less acceptable over time. This makes it harder to measure racism with items like “What about having a close relative marry a Black person? Would you be very in favor of it happening, somewhat in favor, neither in favor nor opposed to it happening, somewhat opposed, or very opposed to it happening?” Respondents may be unwilling to report their true feelings about this issue, especially when the interviewer is African American (Schaeffer, 1980; Schimmack, 2020).

Modern psychological testing with computers has made it possible to avoid these problems by measuring racism with computerized tasks that rely on participants’ behavior in response to racial stimuli. There are several tasks such as the evaluative priming task, the affective misattribution task and the popular Implicit Association Task (IAT). Unfortunately, the IAT has become known as a measure of implicit bias or implicit racism that is distinct from racism that can be measured with self-report measures. I have argued that there is no evidence that people can hide their feelings towards African Americans from themselves. It is more useful to see these tasks as alternative measures of racism that are less susceptible to fake responding. This does not mean that these tasks are perfect measures of racism because the use of computerized tasks creates new problems. Thus, there is no perfect measure of racism, but all valid measures of racism should be positively correlated with each other and the shared variance among these measures is likely to reflect variation in racism. The interesting question is whether political orientation is related to the shared variance among a variety of direct and indirect racism measures.


The data come from a study by Bar-Anan and Vianello (2018). The data have also been used in my critique of the IAT as a measure of implicit bias (Schimmack, 2020). The study assessed political orientation and racism with multiple measures. Political orientation was also measured with the standard and the brief IAT. In addition, participants reported whether they voted Republican or Democrat. Only White participants who reported voting were included in the analysis.

Racism was measured with the standard IAT, the brief IAT, the evaluative priming task, the Affective Missattribution Task, a direct rating of preference for White or Black people, and the Modern Racism Scale. Like other measures that have been criticized, the Modern Racism scale mixes racism and political ideology.

The interesting question is how much political orientation is related to the unique variance in the modern racism scale that is not shared with other racism measures and how much it is related to the shard variance with other racism measures.


The results show two clearly identified factors. The strong relationship between voting and the Republican factor (rep) shows that political orientation can be measured well with a direct question. In contrast, racism is more difficult to measure. The best measure in this study would be the direct preference rating (r_att) that is related .6 with the pro-White factor. But even this relationship implies that only about a third of the variance in the actual ratings reflects racism. The rest of the variance is measurement error. So, there is no gold-standard or perfect way to measure racism. There are only multiple imperfect ways. The results also show that the controversial Modern Racism Scale (mrs) reflects both racism (.444) and political orientation (.329). This shows that Republicans score high on Modern Racism in part because they reject social policies that favor minority groups independent of their attitudes towards Black Americans. However, the figure also shows that Republicans are more racist, as reflected in the relationship between the Republican and Racism factors (.416).

It is important that these results cannot be used to identify individuals or to claim that a particular Republican is a racist. The results do show however, that people who vote Republican are more likely to score higher on a broad range of racism measures whether they mention a political agenda or not.


Critics of racism research by social psychologists have argued that the research is biased because many social psychologists are liberal. The accusation is that social psychologists have created biased measures that conflate liberal policies with bigotry. Here I show that these critics have a valid point and that high scores on scales like the symbolic racism scale and the modern racism scale are influenced by attitudes towards egalitarian policies. However, I also showed that Republicans are more racist when racism is measured with a broad range of measures that have only racism as a common element.

Conservatives may be displeased by this finding, but recent events in 2020 have made it rather obvious that some Americans are openly racist and that these Americans are also openly supporting Trump. The real question for Republicans who oppose racism is how they can get rid of racism in their party.

Racism decreased in the US, but not for Conservative Republicans

Are Republicans more racist? I am not the only one who is asking themselves this question. In 2014, Nate Silver and Allison McCann examined this question using data from the General Social Survey (538.2014). The political landscape has changed since Trump was elected in 2016 and it is interesting to reexamine this question with the latest data.

Like Silver and McCann, I limit the analysis to White respondents because Black voters are much more likely to support the Democratic party. Thus, a trivial reason why Republicans are more likely to be racist is that they are more likely to be White. The more interesting question is whether there is an additional difference between White Republicans and White Democrats.

Political Orientation

The General Social Survey (GSS) asks two questions about political orientation. One question asks about party identification. Responses range from strong democrat to strong republican. To identify conservative Republicans, I classified respondents who responded with “independent – close to Republican”, “not very strong Republican”, and “strong republican” as Republicans.

Another question asks respondents to locate their political orientation on a liberal versus conservative dimension on a 7-point scale ranging from extremely liberal to extremely conservative. To identify conservatives, I combined the three response categories slightly conservative, conservative, and extremely conservative.

Figure 1 shows that responses to these two questions are not perfectly aligned. There are conservative Democrats and liberal Republicans. There are several ways to deal with this inconsistency. I chose to contrast conservative Republicans with the other three groups (liberal Republicans, conservative Democrats, & liberal Democrats).

The GSS also asked about voting behavior in past elections. In the 2018 survey, most conservative Republicans voted for Trump.


Racism is a loaded word with no clear meaning. While the majority of Democrats considers Donald Trump to be a racist, most Republicans disagree. Yet, we need a term to describe people who do not see or treat Black and White people as equal. Given the history of the United States, it is difficult for a White American to live up to the ideal of racial equality. However, there is variation in the degree to which White Americans treat Black and White people as equal or not. Thus, the question is not whether somebody is a racist or not, but the degree to which White’s treat people equally or unequally.

The GSS includes several questions that measure racism. Some of these questions were discontinued and others were added in later years. I found five questions that were asked in later years that measure racism.

1. Feeling Thermometer.
Feeling thermometers are a widely used measure of racism. One question is about feelings towards Black people. Another question is about feelings towards White people. The difference between these two ratings indicates how much people have more positive feelings towards White people than towards Black people. Feelings are assessed on a 9-point scale ranging from 1 = not at all close to 9 = very close. The difference score can range from -8 to + 8.

The Figure shows that most respondents indicate that they make no difference in their feelings based on race, while some participants report feeling closer to White people.

2. Interracial Marriages
The United States have a long history of racism that is reflected in laws against interracial marriages (Loving vs. Virginia). Even today, some Americans do not approve of interracial marriages. The GSS asks respondents “What about having a close relative marry a Black person? Would you be very in favor of it happening, somewhat in favor, neither in favor nor opposed to it happening, somewhat opposed, or very opposed to it happening?” The Figure shows the distribution of the responses before and after 2016. The results show a decrease in opposition to mixed marriages, but even today many US American are against somebody in their family marrying an African American.

3. Like other Immigrants
The third item asks respondents how much they agree with an item that ignores the special history of slavery, discrimination, and police brutality that African Americans are subjected to. “Irish, Italians, Jewish and many other minorities overcame prejudice and worked their way up. Blacks should do the same without special favors.” Responses are made on a 5-point scale ranging from 1=strongly disagree to 5=strongly agree. The Figure shows that even today a majority of White Americans agree with this statement.

4. Opposing Affirmative Action
One question in the GSS asks about respondents’ attitudes towards affirmative actions.

“Some people say that because of past discrimination, blacks should be given preference in hiring and promotion. Others say that such preference in hiring and promotion of blacks is wrong because it discriminates against whites. What about your opinion — are you for or against preferential hiring and promotion of blacks?” Responses are favor strongly, favor, oppose, and oppose strongly.

It is important to point out that opposition to affirmative actions may be based on other consideration than racism. Nevertheless, the item can be used to measure racism because racists are much more likely to agree with the item than respondents who strive towards racial equality.

The advantage of this item for measuring racism is that it provides opportunity for plausible deniability. It is possible to endorse the item without revealing that the response is motivated by racism. This may explain the high proportion of Americans who express strong opposition to affirmative actions.

5. Spending on Improving the conditions of Blacks
The fifth question is part of a series of questions about public spending. One question asks whether respondents think the government is spending too little or too much on “improving the conditions of Blacks.” As with the previous item, respondents may answer “too much” for several reasons. For example, some may oppose taxation and government spending on anything. Thus, responding “too much” does not directly imply that the response is racially motivated, but individuals who are more racists are more likely to answer “too much.”

A Measurement Model of Racism

Responses to each of the five items can be influenced by many factors (causes) other than racism. Thus, we cannot simply look at individual items to determine how racist somebody is. Psychologists have developed measurement models to examine whether responses to items are driven by a common cause. Finding evidence for a common cause does not conclusively show that the cause is racism, but it makes it more likely.

The Figure shows the measurement model of racism. In this model racism is an attribute (trait) that varies across individuals. Individuals with high racism are more likely to respond to the five items in ways that express a dislike of African Americans. The numbers next to the arrows show how much a specific item reflects racism. Interestingly, the feeling thermometer is a relatively weak measure of racism. One concern could be that the racism factor is dominated by issues related to money, but the opposition to a black family member shows that that racism is not limited to equity issues.

Another concern is that responses to a survey are different from actual traits. Some respondents may downplay their racism because they are aware that racism is offensive. The GSS makes it possible to test this hypothesis because respondents were sometimes interviewed by Black interviewers. The next figure shows that White respondents respond to the question in a less racist manner when they are interviewed by a black interviewer. This effect is general across all five items, but especially strong for the feeling thermometer and the spending question.

In short, measuring racism is difficult and no single question provides a clear and unambiguous measure of racism. However, there is clear evidence that people respond to the five questions in a consistent manner that is more or less racist, suggesting that the factor underlying their responses can be used as a measure of racism.

Political Orientation and Racism

The next analysis examined differences between conservative republicans and other Americans. The results show that conservative republicans are more racist (.769 standard deviations). In addition, they oppose affirmative actions and spending on Black issues. This shows that politics also influence responses to the items independent of racism, but even when we allow for such extra relationships, there is a strong relationship between identifying as a conservative republican and racism. In psychology a difference of .8 standard deviations is considered a large effect. In intelligence testing, a standard deviation is 15, and a difference of .769 would be a difference by 12 IQ points.

Political Orientation and Racism Over Time

The first figures showed that racism was lower in 2016 and 2018 than in the years before. To examine whether changes in racism differ as a function of political orientation, I added time to the model and examined different time trends for conservative republicans and others. I used presidents to measure time using the 1990s as the reference period and counted 2016 as a Trump year, although he was only elected at the end of the year to have two years of observations (2016, 2018). The results of the model showed a significant interaction between political orientation and time. While racism levels remained nearly unchanged for conservative republicans, they decreased for other Americans.

The figure shows a dramatic difference in racism during the Trump years. The difference of 1.4 standard deviations corresponds to a difference of 21 points in IQ testing. Interestingly, the difference is not due to increasing racism among conservative republicans, but rather decreasing racism among Americans.

It is tempting to suggest that this large difference in racism explains the much stronger response to George Floyd’s murder by a police officer, while similar events in previous years did not spark the widespread protests the US have seen in the last weeks. The data suggest that America as a whole is moving towards more racial equality and that many White Americans empathize with African Americans. However, conservative republicans are an exception. Many of them are opposed to racial equality.

Complicit Racism

Psychological research over the past two decades has focused a lot on implicit racism. The idea is that individuals can hide their racism from themselves. The notion of unconscious bias or racism is controversial and there is no solid scientific evidence for implicit racism. Most people know whether they would welcome a Black family member or not. A more common form of racism is a conscious choice to ignore the racism of others, whether it is a colleague, a friend, a family member or a party member. You know they are racist, but you don’t say anything about it. Most people, including conservative republicans, are not surprised by the finding that conservative republicans are more likely to be racist. We all know examples of racist conservative republicans and the Republican party carefully avoids alienating racist voters. The current Republican party suffers from complicit racism. Some party strategists even admit that complicit racism was used to win majorities. However, most Americans are increasingly appalled by the complicit racism that is inconsistent with modern American values. Just like the civil war and the civil rights movement, America today is still struggling with the “birth defect of slavery” (Condoleezza Rice).

Political Science and Political Bias

It is not hard to figure out my personal political views and I have no reason to hide them. I am not a fan of conservative republicans and their current leader. This may have influenced the way I analyzed and presented the results. Fortunately, the data are openly available and conservatives are welcome to analyze the data and present their findings. In this spirit, I consider this blog post as a thesis that is awaiting a critical examination and response. However, is it really in doubt that respondents who voted for Trump are more likely to oppose affirmative action and to object to a Black family member? Many probably think that these results merely show the obvious.

Covid-19 responses: USA versus Europe

The Economist published an article about the response to Covid-19 in the USA titled “America’s covid-19 experience is tragic but not that exceptional.” The article compares the United States to Europe, which are both doing more poorly than Asia, African, and at this moment also South America, although the situation in several Latin American countries is worsening.

The articles lists several factors that might contribute to a worse outcome in the United States:
– a profit-oriented health care system
– many people without insurance who may delay treatment
– more people with medical conditions like diabetes

Other factors may work in the US’s favor.
– a younger population
– lower population density

The main reason why it is interesting to compare the US and Europe is the US president Donald Trump who has failed to organize a federal response to the Covid-19 crisis in the United States. Many believe that his idiotic response to the pandemic may have made things worse, especially among his supporters.

However, the Economist claims that so far the US is doing better than Europe.

In typical style, The Economist then finds additional factors that may distort this comparison. Importantly, European countries like Italy were hit by Covid-19 earlier than the United States. In fact, some virological studies suggest that the virus mutated and became more infectious in Europe and that most infections in the United States came from Europe. So, it is possible that Europe was affected worse in the beginning, but responded better to the crisis than the United States, which is still seeing over 20,000 new infections and over 1,000 deaths a day in late May.

To examine the time course of the Covid-19 epidemics in Europe and the United States, I downloaded the US data from Covid-Tracking and the data for European countries from Our World In Data. To compare the response to Covid-19, I computed deaths per 10 Million inhabitants for 49 US states (excluding the island state Hawaii) and 35 European countries: Albania, Austria, Belarus, Belgium, Bulgaria, Croatia, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Latvia, Lithuania, Moldova, Montenegro, Netherlands, Norway, Poland, Portugal, Romania, Russia, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, UK

I then averaged these values. These averages are not weighted by population. Thus, small US states like Wyoming and small European countries like Estonia are weighted equally to big US states like Texas and big European countries like the UK, respectively.

The results show, as expected, that the epidemic in the US lagged the epidemic in Europe. However, around April 20 the US had caught up with Europe and since then the US has recorded more deaths than Europe. Since April 20, average deaths per 10 Million have nearly tripled from around 800 deaths to around 2400 deaths. In, Europe deaths doubled from around 800 to 1,600. The curves also show a much flatter trajectory in Europe than in the US. In Europe most countries have controlled the spread of the virus, while several US states are still showing increasing numbers.

This results contradict the claim in The Economist that the US has performed equally well to Europe. Of course, the results vary dramatically with the US and across European countries, but as a whole the US has performed more poorly than the US. The poor performance is especially noteworthy after the first wave surprised Europe and North America.

A major factor that contributed to the disaster in New York was the lack of testing in the early weeks of March, which resulted in widespread undetected community transmission of the virus. However, New York state quickly ramped up testing and deaths are now decreasing. Other US states had more time to prepare, but are seeing increasing rather than decreasing numbers. Thus, especially for the recent weeks in May it is clear that the US is not handling the crisis as well as Europe.

How much of this poor performance can be attributed to Trump is a political question that may be difficult to answer scientifically. However, even if he doesn’t personally take responsibility, one can only wonder how much better the US response would have been with a president Obama.

Children and Covid-19: Evidence from New York City

Despite tremendous efforts by scientists to aid in the fight against Covid-19, many important questions lack a clear scientific answer. It is known that children can be infected by the virus and in some rare cases the virus can produce severe symptoms or even death. However, it is unclear whether children have a lower risk to get infected and whether they are less likely to infect others. On May 7, Nature wrote ” scientists are still trying to understand what the deal is with kids and COVID-19″.

A major problem is that most countries responded to the Covid-19 pandemic by closing schools and minimizing children’s contact. Thus, lower number of children among people testing positive for Covid-19 might simply reflect less exposure to the virus. A notable exception is Sweden, where schools were not closed. Unfortunately, Sweden did not test children, teachers, or parents to examine whether children transmitted the virus (Science, May 22).

A German study of viral load in children and adults suggested that children are no different from adults, but this study has been criticized on methodological grounds (Science Media Centre). A key problem with this claim is that it is impossible to proof the lack of a difference. It is only possible to quantify the amount of a difference and notice that the observed difference is not statistically different from zero. The key problem with this study is that the sample size is small and there is ample statistical uncertainty. Thus, no firm conclusions can be drawn from this study alone.

My colleagues (Shigehiro Oishi, Youngjae Cha, and Bansi Javiya) have been analyzing the open data about Covid-19 deaths and cases in New York City and used the US census data to predict variation in positive cases and deaths across New York City. While we are still working on this project, we would like to share some interesting results about children and Covid-19 that emerged in our analyses. Before we do so, we want to make clear that ZIP code that are not the best data to examine this question, our results are preliminary, and even if our results hold up they do not provide conclusive evidence, and the results cannot and should not be used to make policy recommendations about opening schools or not. The main purpose of this blog post is to share information with scientists who are interested in this question and to add a tiny piece of information to the big puzzle.

Occupants and Covid-19

While the role of children in the transmission of Covid-19 is still unclear, the evidence for transmission at home is much stronger. It makes logical sense that the infection rate is greater if more people share the same living space. In addition, crowded living conditions have been linked to higher rates of Covid-19. For this reason, we looked at several predictors from the US census that reflect crowding. The best predictor was the percentage of residence with more than 1 occupant per room.

We used several measures of Covid-19 prevalence. All of them showed a positive correlation with occupants, but the correlations were stronger for the positive rate (positives / tests) than for the positives per capita (positives / population * 100,000). One possible explanation for this is that testing varies as a function of other factors. It made little difference whether we used the raw numbers or residuals that controlled for differences between Boroughs, so we did use the raw scores. The correlation with the positive rate was r = .54 and the correlation with deaths was r = .20.

This simple correlation does not prove causality. There are a host of other variables that may explain this relationship. We tried to address this issue by including other variables in a causal model.

Children and Covid-19

The number of children living in an area defined by a NYC ZIP code was also a predictor of positive cases, r = .53, and deaths, r = .39. It is important to realize that this is an analysis of ZIP codes and not of individuals. It is well-known that children are at a much lower risk to die from Covid-19. Thus, there is actually a negative link for individuals. Thus, the positive correlation must reflect some other causal mechanism. One possibility (out of many) would be that children can infect older people in the same household that are at a high risk of dying when they get infected. This ‘theory’ implies transmission of the virus from young children to old people who live together.

When we fitted this model to the data, it showed indeed a causal path from children to residents to positive cases to deaths. This path implied that for every 10 percentage point increase in the proportion of children, an additional 2 people per 100,000 would die. The average is 17 death per 100,000 inhabitants. So, an increase by 2 people is a 12% increase.

The model also shows that there is still an unexplained positive relationship between children and risk of infection. This path would contribute another 4 deaths per 100,000.

These results show that ZIP codes with more children have more deaths, and that this relationship is partially explained by children adding to the number of people in a residence. However, once more these results have to be interpreted with caution because important predictor variables are missing from the model.


One potential confounding factor is income. Several analyses of the NYC data have shown that Covid-19 is more prevalent and deadly in ZIP codes with lower income. Thus, we added income to the model.

ZIP code income predicted that ZIP codes had fewer children, fewer occupants, a lower positive rate, and fewer deaths. Although the direct effect on deaths was not statistically reliable, income had a clear indirect effect on deaths by lowering the risk of getting infected. These results show that the effect of children in the previous model was inflated by ignoring the confounding effect of income. In this model, the effect of children on death was 2.1 deaths for every 10 percentage point increase in the proportion of children and 0.7 was explained by the effect on residents and 1.4 was still a direct effect.


Numerous articles found ethnic disparities in Covid-19 deaths. Thus, we added ethnicity as a predictor variable to the model. We used the percentage of White residents as the comparison group and the percentages of Asian, Black, and Hispanic residents as predictors. It is difficult to visualize the complex relationships of this model. Thus, we merely report the key finding about children and Covid-19 here.

Including ethnicity as a predictor further reduced the ‘effect’ of children on Covid-19-deaths to 1.8 deaths (0.3 indirect via occupants and 1.5 direct on PR).


Just like there can be confounding factors that inflate relationships, some confounding variables can suppress a relationship. We found that was the case for your measure of the percentage of residents over 65. As expected, ZIP codes with a higher proportion of older residents had more Covid-19 deaths. We also found a negative relationship between the proportion of older residents and occupants, r = -.35. Thus, the fact that high-occupancy ZIP codes tend to be younger reduced the effect of occupants on deaths. In this model, children increased deaths by 2.6 deaths per 100,000. This relationship is highly statistically significant and very unlikely to be just a random fluke. However, it is still possible that other variables that are missing from the model explain this relationship. The multiple pathways are weaker, and it is difficult to say how much they contribute to the relationship.


The key finding of our analyses of the NYC Covid-19 data is that ZIP codes with more children have more Covid-19 as reflected in a higher positive rate (positives / tests) and deaths. This relationship remains after controlling for income, ethnicity, and the proportion of senior residents. The final model suggests that some of the effect is explained by crowded living conditions. These results suggest that children could be transmitting the virus as much as other occupants. However, many other explanations are possible.

The most important limitation of our work is that it relies on ZIP codes, while the actual causal process is person to person transmission. We think that it would be valuable to follow up on this work with studies that examine the social networks of NYC residents who contracted the virus and who did not. Anti-body tests would be particularly useful to examine the spreading of Covid-19 within households. Relevant data may already exist from contact-tracing of infected individuals. We believe that tracing infections and deaths in NYC provides useful information about children’s risk to contract and spread the virus.

Please feel free to contact us with related information or questions.

Why Are Red States “Immune” to Covid-19?

Joey loves crowds. He is boisterous, speaks with a loud and booming voice, and is always ready to high-five everybody. No, I am not describing a super-spreader of Covid-19. It is a textbook description or caricature of an extrovert, or as personality psychologists say extravert.

Personality psychologists have studied extraversion and introversion for nearly one-hundred years, although most of the research emerged in the past 40 years. We know that extraversion is a heritable trait that runs in families. We know that it remains fairly stable throughout adulthood, and we know that it influences behavior. There also has been research on regional variation in extraversion across the world and across US states (Elleman, Condon, Russin, & Revelle, 2018). I used their data to create the map of extraversion for US states. The map shows the highest level of extraversion in Illinois and the lowest level of extraversion in Wyoming followed by Idaho and Utah. While Illinois has fairly high rates of Covid-19, especially in Chicagoland, Wyoming and Idaho have relatively low levels of positive cases. They are also solid “red” states who voted for Trump in the 2016 election with 67% and 59%. It is therefore possible that extraversion partially explains why Covid-19 is more prevalent in “blue” (liberal) states. Residents in blue states may be more extraverted and may have a harder time to follow social distancing rules.

Of course, extraversion would only be one of several factors that play a role. Another obvious factor is that urban areas are more strongly affected by Covid-19 than rural areas, and rural voters are more likely to vote for Trump. There are many other possible differences among the US states that might play a role, but preliminary analysis suggests that they do not predict Covid-19 to a substantial degree. So, to keep things short, I will focus on the two factors that I found to be significant predictors of the spread of Covid-19: urbanization and extraversion.

To examine whether this relationship is stable over time, I used confirmed positive cases reported on the Covid-Tracking website and created indicators for three, three-week periods: March 23 to April 12, April 13 to May 3, and May 4 to May 24. Predictor variables were (a) the percentage of votes for Trump in the 2016 election, (b) extraversion scores from the supplement to Elleman et al.’s article (Table 8), and (c) urbanization scores (wikipedia).

The data were analyzed using structural equation modeling to examine the relationship among the six variables. [I also examined more complex models that included deaths. The effects of the predictor variables on death were mostly mediated by confirmed positives, with the exception of a unique, negative relationship between Trump support and deaths at time 1 only.] Model fit was excellent, CFI = 1.00, RMSEA = .000. This does not mean that the model reveals the truth, but it does show that the model is consistent with the data. Thus, the model tells one possible story about the negative relationship between Trump support and Covid-19 deaths across the states.

The numbers show that urbanity is much stronger negative predictor of Trump support than extraversion. The effect of extraversion is small, and not statistically significant by conventional standards, but there are only 49 states (I excluded the island state Hawaii), making it hard to get statistical significance. The effect of urbanization and extraversion are more equal and explain a notable amount of variation in Covid-19 cases across states. The numbers also show that the effect is not getting weaker over time. It may actually become stronger. This means that both urbanization and extraversion are stable predictors of the Covid-19 pandemic in the USA. Even in the past three weeks, after several states with Republican governors eased restrictions, there is no evidence that cases are increasing notably in red states.

It is not difficult to find newspaper articles that talk about a second wave and spikes in new cases in Texas or other red states. These stories are based on the idea that red states are ignoring the danger of Covid-19, but so far this idea lacks empirical support. For every story of a pastor who died of Covid-19 after defying stay-at-home orders, there are thousands of churches who are holding services online, and hundreds of people flouting social-distancing norms in Central Park, NY. Don’t get me wrong. Trump’s disregard of science and ramblings about light therapy are a disgrace, but this does not mean that 40% of the US population follows the covidiot in the White House and drinks bleach. At least forty-percent of US voters are likely to vote for him again. Don’t ask me how anybody can vote for him again. That is not the point of this blog post. The blog post is about the empirical fact that so far Covid-19 has predominantly hit states that did not vote for Trump. I suggest that this seemingly paradox finding is not paradox at all. Joey, the extraverted bachelor who lives in an apartment in New York City, who voted for Hillary is much more likely to get infected than Joyce who lives with her family on a farm in Wyoming. Painting all Trump-voters as covidiots would be a similar mistake as Hillary Clinton calling Trump-supporters a “basket of deplorables.” If all Trump-supporters were covid-idiots, we should have seen a positive relationship between Trump-support and Covid-19 cases, especially after controlling for confounding variables like urbanization and extraversion. The fact that this positive relationship stubbornly refuses to emerge may suggest that Republican governors and residents in red states are not as stupid as their leader.

Reply to Vianello, Bar-Anan, Kurdi, Ratliff, and Cunningham

I published a critique of the Implicit Association Test. Using structural equation modeling of multi-method studies, I find low convergent validity among implicit measures of attitudes (prejudice, self-esteem, political orientation) and low discriminant validity between explicit and implicit measures. The latter finding is reflected in high correlations between factors that reflect the shared variance among explicit measures and the shared variance among implicit measures. Using factor loadings to quantify validity, I found that the controversial race IAT has at most 20% valid variance in capturing racial attitudes. Most if not all of this variance is shared with explicit measures. Thus, there is no evidence that IAT scores reflect a distinct form of implicit prejudice that may operate outside of conscious awareness.

This article elicited a commentary by Vianello and Bar-Anan (ref.) and by Kurdi, Ratliff, and Cunningham (pdf). Here is a draft of my response to their commentaries. As you will see, there is little common ground; even the term “validity” is not clearly defined making any discussion about the validity of the IAT meaningless. To make progress as a science (or to become a science), psychologists need to have a common understanding of psychological measurement and methods that can be used to evaluate the validity of measures quantitatively.


Just like pre-publication peer-reviews, the two post-publication commentaries have remarkably little overlap. While Vianello and Bar-Anan (VBA) question my statistical analyses, Kurdi, Ratcliff, and Cunningham accept my statistical results, but argue that these results do not challenge the validity of the IAT.

VBA’s critique is clearer and therefore easier to refute by means of objective model comparisons. The key difference between VBA’s model and my model is the modelling of method variance. VBA’s model assume that all implicit measures of different constructs are influenced by a single method factor. In contrast, my model assumes that implicit measures of prejudice (e.g., the standard race IAT and the Brief Implicit Association Test with the same racial stimuli) share additional method variance. As these hypotheses are nested models, it is possible to test these competing models directly against each other. The results show that a model with content-specific method variance fits the data better (Schimmack, 2020a). The standard inference from a model comparison test is that the model with the worse fit is not an adequate model of the data, but VBA ignored the poorer fit of their model and present a revised model that does not model method variance properly and therefore produces misleading results. Thus, VBA’s commentary is just another demonstration of the power of motivated reasoning that undermines the idealistic notion of a self-correcting science.

KRC ask whether my results imply that the IAT cannot be a valid measure of automatic cognition?  To provide a meaningful answer to this question, it is important to define the terms valid, measure, automatic, and cognition.  The main problem with KRC’s comment is that these terms remain undefined. Without precise definitions, it is impossible to make scientific progress. This is even true for the concept of validity that has no clear meaning in psychological measurement (Schimmack, 2020c). KRC ignore that I clearly define validity as the correlation between IAT scores and a latent variable that represents the actual variation in constructs such as attitudes towards race, political parties, and the self.  My main finding was that IAT scores have only modest validity (i.e., low correlations with the latent variable or low factor loadings) as measures of racial preferences, no validity as a measure of self-esteem, and no proven validity as measures of some implicit constructs that are distinct from attitudes that are reflected in self-report measures. Instead, KRC consistently mischaracterize my findings when they write that “the reanalyses reported by Schimmack find high correlations between relatively indirect (automatic) measures of mental content, as indexed by the IAT, and relatively direct (controlled) measures of mental content.” This statement is simply false and confuses correlations of measures with correlations of latent variables.  The high correlations between latent factors that represent shared variance among explicit measures and implicit measures provide evidence of low discriminant validity, not evidence of high validity. Moreover, the modest loadings of the race IAT on the implicit race factor show low validity of the IAT as a measure of racial attitudes.

After mischaracterizing my results, KRC go on to claim that my results do “not cast any doubt on the ability of IATs to index attitudes or to do so in an automatic fashion” (p. 5).  However, the low convergent validity among implicit measures remains a problem for any claims that the IAT and other implicit measures measure a common construct with good validity. KRC simply ignore this key finding even though factor loadings provide objective and quantitative information about the construct validity of IAT scores.

The IAT is not the only research instrument with questionable construct validity.  However, the IAT is unique because it became a popular measure of individual differences without critical evaluation of its psychometric properties. This is particularly problematic when people are given feedback with IATs on the Project Implicit website, especially for IATs that have demonstrably no validity like the self-esteem IAT.  The developers of the IAT and KRC defend this practice by arguing that taking an IAT can be educational.  “At this stage in its development, it is preferable to use the IAT mainly as an educational tool to develop awareness of implicit preferences and stereotypes” However, it is not clear how a test with invalid results can be educational. How educational would it be to provide individuals with randomly generated feedback about their intelligence?  If this sounds unethical, it is not clear why it is acceptable to provide individuals with misleading feedback about their racial attitudes or self-esteem. As a community, psychologists should take a closer look at the practice of  providing online feedback with tests that have low validity because this practice may undermine trust in psychological science.

KRC’s commentary also fails to address important questions about the sources of stability and change in IAT scores over time. KRC suggest that “the jury is still out on whether variation in responding on the IAT mostly reflects individual differences or mostly reflects the effects of the situation” (p. 4). The reason why two decades of research have failed to answer this important question is that social cognition researchers focus on brief laboratory experiments that have little ecological validity and that are unable to demonstrate stability of individual differences over time. However, two longitudinal studies suggest that IAT scores measure stable attitudes rather than context-dependent automatic cognitions. Wil Cunningham, one of the commentators, provided first evidence that variance in IAT scores reflects mostly random measurement error and stable trait variance, with no evidence of situation-specific state variance (Cunningham et al., 2001). Interestingly, KRC ignore the implications of this study. This year, an impressive study examined this question with repeated measures of a six-year period (Onyeador et al., 2020; Schimmack, 2020). The results confirmed that even over this long time-period, variance in IAT scores mostly reflects measurement error and a stable trait without notable variance due to changes in situations.

Another important topic that I could only mention briefly in my original article is incremental predictive validity. KRC mention Kurdi et al.’s (2019) meta-analysis as evidence that the IAT and self-report measures tap different constructs. They fail to mention that the conclusions of this meta-analysis are undermined by the lack of credible, high-powered studies that can demonstrate incremental predictive validity. To quote Kurdi et al.’s abstract “most studies were vastly underpowered” (p. 569).  The authors conducted tests of publication bias, but did not find evidence for it.  The reason could be that they used tests that have low power to detect publication bias. Some studies included in the meta-analysis are likely to have reported inflated effect sizes due to selection for significance, especially costly fMRI studies with tiny sample sizes. For example, Phelps et al. (2000) report a correlation of r(12) = .58 between scores on the race IAT and differences in amygdala activation in response to Black and White faces.  Even if we assume that 20% of the variance in the IAT is valid, the validation corrected correlation would be r = 1.30. In other words, this correlation is implausible given the low validity of race IAT scores.  The correlation is also much stronger than the predictive validity of the IAT in Kurdi et al.’s meta-analysis. The most plausible explanation for this result is that researchers’ degrees of freedom in fMRI studies inflated this correlation (Vul et al., 2009). Consistent with this argument, effect sizes in studies with larger sample sizes are much smaller and evidence of incremental predictive validity can be elusive, as in Greenwald et al.’s study of the 2018 election.  At present, there is no pre-registered, high-powered study that provides clear evidence of incremental predictive validity. Thus, IAT proponents have failed to respond to Blanton et al.’s (2009) critique of the IAT. Responses to my renewed criticism suggest that IAT researchers are unable or unwilling to respond to valid scientific criticism of the IAT with active coping. Instead, they prefer to engage in emotion-focused, repressive coping that makes IAT researchers feel better without addressing substantive measurement problems.

In conclusion, my critique of the IAT literature and the response by IAT researchers shows a wider problem in psychology that I have called the validation crisis (Schimmack 2020c). Although measurement is at the core of any empirical science, many psychologists lack formal training in psychological measurement. As a result, they create and use measures of unknown validity. This is particularly true for social psychologists because social psychologists in the 1970s and 1980s actively rejected the idea that characteristics within individuals are important for the understanding of human behavior (“the power of the situation”). However, when the cognitive revolution started, the focus shifted from observable situations and behaviors to mental states and processes. To study these phenomena that are not directly observable requires valid measures, just like telescopes need to be validated to observe planets in distant galaxies. The problem is that social cognition researchers developed methods like the IAT to make claims about cognitive processes that are not observable to outside observers or by means of introspection without taking the time to validate these measures. To make progress, the next generation of social psychologists needs to distinguish clearly between constructs and measures and between random and systematic measurement error. As all measures are contaminated by both sources of measurement error, constructs need to be measured with multiple, independent methods that show convergent validity (Campbell & Fiske, 1959; Cronbach & Meehl, 1955).  Psychology also needs to move from empty qualitative statements like “the IAT can be valid” to empirically-based statements about the amount of validity of a specific IAT in specific populations in clearly defined situations. This requires a new program of research with larger samples, ecologically valid situations, and meaningful criterion variables.


Blanton, H., Jaccard, J., Klick, J., Mellers, B., Mitchell, G., & Tetlock, P. E. (2009). Strong claims and weak evidence: Reassessing the predictive validity of the IAT. Journal of Applied Psychology, 94, 567–582. doi:10.1037/a0014665

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105. doi:10.1037/h0046016

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302. doi:10.1037/h0040957

Cunningham, W. A., Preacher, K. J., & Banaji, M. R. (2001). Implicit attitude measures: Consistency, stability, and convergent validity. Psychological Science, 12(2), 163–170.

Onyeador, I. N., Wittlin, N. M., Burke, S. E., Dovidio, J. F., Perry, S. P., Hardeman, R. R., … van Ryn, M. (2020). The Value of Interracial Contact for Reducing Anti-Black Bias Among Non-Black Physicians: A Cognitive Habits and Growth Evaluation (CHANGE) Study Report. Psychological Science, 31(1), 18–30.

Schimmack, U. (2020a). Open Communication about the invalidity of the race IAT.

Schimmack, U. (2020b). Racial bias as a trait. (retrieved 4/21/20)

Schimmack, U. (2020c). The validation crisis. Meta-Psychology (blog)

Vul, E., Harris, C., Winkielman, P., & Pashler, H. (2009). Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition. Perspectives on Psychological Science, 4(3), 274–290.

Covid-19 behaves like tourists

Many people are wondering about variation in the Covid-19 pandemic across countries. Why (the North of) Italy and not Portugal. How was South Korea able to contain the virus, but other countries did not even though they had less time. The New York Times published a long article that examined this question, but nobody really knows.

Some of the speculations focus on biological factors that may protect individuals or may make them more vulnerable. However, so far these factors explain a small portion of the variation in death rates. The biggest predictor is the number of people who are infected by the virus. Australia and New Zealand have few deaths because Covid-19 did not spread widely among their populations.

One possible explanation could be the response of countries to the pandemic. Countries like the UK and Sweden may have more deaths because they did not lock down their countries. The problem with these speculations is that many factors are likely to contribute to the variation and it is difficult to spot these factors without statistical analyses.

The NYT article mentions that hundreds of studies are underway to look for predictors of variation across nations, but no results are being mentioned. Maybe researchers are cautious.

“Doctors who study infectious diseases around the world say they do not have enough data yet to get a full epidemiological picture, and that gaps in information in many countries make it dangerous to draw conclusions”

Drawing conclusions is different from exploring data. There is nothing dangerous about exploring patterns in data. Clearly many people are curious and statistical analysis can provide more valuable information than armchair speculations about climate or culture.

As a cross-cultural psychologists, I am familiar with many variables that distinguish nations from each other. The most prominent dimension is individualism. Western cultures tend to be more individualistic than Asian cultures. This might suggest that culture plays a role because Asian cultures have had fewer Covid-19 deaths. However, individualism as measured by Hofstede’s dimension is a weak predictor and did not survive statistical controls. Other dimensions that were less plausible also did not predict variation in Covid-19 deaths.

However, one variable that was a predictor was the number of tourists that travel to a country (tourism data).

Tourism reflects how connected a country is with the rest of the world. Australia and New Zealand are not only islands, they are also geographically isolated which explains why relatively few people visit these otherwise attractive locations. Covid-19 also has speared much of Eastern Europe and many Eastern European countries rank low on the tourism index.

Additional analysis show that tourism is becoming a weaker predictor over time. The reason is the recent rise of cases and deaths in Latin America. Latin America was relatively unaffected in April, but lately Ecuador and Brazil have seen alarming increases in cases.

The graph also shows that tourism does not explain all of the differences between countries. For example, the UK has way more cases than predicted by the regression line. This may reflect the slow response to the Covid-19 crises in the UK. Sweden is also above the regression line, possibly due to the policy to keep schools and businesses. Switzerland is a direct neighbor of the North of Italy, where the epidemic in Europe started. Canada is above the regression line, but was on the regression line on April 15. The reason is that Canada acted quickly in the beginning, but is now seeing a late increase in death in care homes.

In conclusion, these results suggest that timing is a big factor in the current differences across countries. Countries with high death tolls were simply unlucky to be at the center of the pandemic or well connected to it. As the pandemic progresses, this factor will become less important. Some countries, like Austria and (the South of) Germany that were hit early have been able to contain the spread of Covid-19. In other countries, numbers are increasing, but no country is seeing increases as dramatic as in Italy (or New York) where Covid-19 spread before social distancing measures were in place. New factors may predict what will happen in the times of the “new normal” when countries are trying to come out of lock-downs.

I don’t think that publishing these results is dangerous. The results are what they are. It is just important to realize that they do not prove that tourism is the real causal factor. It is also possible that tourism is correlated with some other variables that reflect the real cause. To demonstrate this, we need to find measures of these causal factors and demonstrate that they predict variation in death tolls of nations better than tourism and statistically remove the relationship of tourism with Covid-19 deaths. So, this blog post should be seen as a piece of a puzzle rather than the ultimate answer to a riddle.