Category Archives: Racism

Prejudice is in the Brain: So What?

May 4, 2021Prejudice, RacismUlrich Schimmack

Introduction

Social psychology aims to study real-world problems with the tools of experimental psychology. The classic obedience studies by Milgram aimed to provide insights into the Holocaust by examining participants’ reactions to a sadistic experimenter. In this tradition, social psychologists have studied prejudice against African Americans since the beginning of experimental social psychology.

As Amadio and Cikara (2021) note, these studies aim to answer questions such as “How do humans learn to favor some groups over others?” and “Why does merely knowing a person’s ethnicity or nationality affect how we see them, the emotions we feel toward them, and the way we treat them?”

In their chapter, the authors review the social neuroscience of prejudice. Looking for prejudice in the brain makes a lot of sense. A person without a brain is not prejudiced, so we know that prejudice is somewhere in the brain. The problem for neuroscience is that the brain is complex and it is not easy to figure out how it does what it does. Another problem is that prejudice is most relevant when individuals act on their prejudices. For example, it is possible that prejudice contributes to the high prevalence of police shootings that involve Black civilians (Schimmack & Carlson, 2020). However, the measurement of brain activity often requires repeated measurements of many trials under constrained laboratory conditions.

In short, using neuroscience to understand prejudice faces some practical challenges. I was therefore skeptical that this research has produced much useful information about prejudice. When I voiced my skepticism on twitter, Amadio called me a bully. I therefore took a closer look at the chapter to see whether my skepticism was reasonable or whether I was uninformed and prejudice against social neuroscientists.

Face Processing

To act prejudice, the brain has to detect a difference between White and Black faces. Research shows that people do indeed notice these differences, especially in faces that were selected to be clear examples of the two categories. If asked to indicate whether a face is White or Black, responses can be made within a few hundred milliseconds.

The authors acknowledge that we do not need social neuroscience to know this. “behavioral studies suggest that social categorization occurs quickly (Macrae & Bodenhausen, 2000)” (p. 3), but they suggest that neuroscience produces additional information. Unfortunately, the meaning of early brain signals in the EEG are often unclear. Thus, the main conclusion that the authors can draw from these findings is that they provide “support for the early detection and categorization of race.” In other words, when the brain sees a Black person, we usually notice that the person is Black. It is not clear however that any of these early brain measures reflect the evaluation of the person, which is really the core of prejudice. Categorization is necessary, but not sufficient for prejudice. Thus, this research does not really help us to understand why noticing that somebody is Black leads to negative evaluations of this person.

Another problem with this research is the artificial nature of the task. Rather than presenting a heterogeneous set of faces that is representative of participants’ social environment, participants see 50% White and 50% Black faces. Every White person who has been suddenly in a situation where 50% or more of the people are Black may notice that they respond differently to this situation. The brain may look very different in these situations than in situations where race is not salient. In addition, the faces are strangers. Thus, these studies have no relevance for prejudice in work-settings with colleagues or in classrooms where teachers know their students. This lack of ecological validity is of course not unique to brain studies of prejudice. It applies also to behavioral experiments.

The only interesting and surprising claim in this section is that Black participants respond to White participants just like White participants respond to Black faces. “Research with
Black participants, in addition to White participants, has replicated this pattern and clarified that it is typically larger to racial outgroup faces rather than Black faces per se” (p. 5). The statement is a bit muddled because the out-group for Black participants is White.

Looking up the results from Dickter and Bartholow (2007) shows a clear participant x picture interaction effect (i.e., responses for opposite race trials are different for same race trials). While the effect for White participants is clearly significant, the effect for Black participants is not, F(1,13) = 4.46, p = .054, but was misreported as significant, p < .05. The second study did not examine Black participants or faces. It also did not include White participants. It showed that Asian participants responded stronger to the outgroup (White) than the in-group (Asian), F(1, 19) = 17.06, p = .0006. The lack of a White group of participants is puzzling. The third study had the largest sample of Black and White participants (Volpert-Esmond & Bartholow, 2019), but did not replicate Dickter and Bartholow’s findings. “the predicted Target race × Participant race interaction was not significant, b=−0.09, t(61.3)=−1.47, p=.146” I have seen shady citations and failure to cite disconfirming evidence, but it is pretty rare for authors to simply list a disconfirming study as if it produced consistent evidence. In conclusion, there is no clear evidence how minority groups respond to faces of different groups because most of the research is done by White researchers at White universities with White students.

The details are of course not important because the authors main goal is to sell social neuroscience. “Social neuroscience research has significantly advanced our understanding of the social categorization process” (p. 11). A close reading shows that this is not the case and that it is unclear what early brain signals mean and how they are modulated by the context, race of participants, and race of faces.

How is prejudice learned, represented, and activated?

Studying learning is a challenging task in an experimental context. To measure learning some form of memory task must be administered. Moreover, this assessment has to be preceded by a learning task. To make learning experiments more realistic, it is ideal to have a retention interval between the learning and the memory task. However, most studies in psychology are one-shot laboratory studies. Thus, the ecological validity of learning studies is low. Not surprisingly, the chapter contains no studies that examine neurological responses during learning or memory tasks related to prejudice.

Instead, the chapter reviews circumstantial evidence that may be related to prejudice. First, the authors review the general literature on Pavlovian aversive conditioning. However, they provide no evidence that prejudice is rooted in fear conditioning. In fact, many White Americans in White parts of the country are prejudice without threatening interactions with Black Americans. Not surprisingly, even the authors note that fear conditioning is not the most plausible root of prejudice.

“Some research has attempted to demonstrate a Pavlovian basis of prejudice using prepared fear or reversal learning paradigms (Dunsmoor et al., 2016; Olsson et al., 2005), but these results have been inconclusive regarding a prepared fear to Black faces (among White Participants) or have failed to replicate (Mallan et al., 2009; Molapour et al., 2015; Navarrete et al., 2009; Navarette et al., 2012). To our knowledge, research has not yet directly tested the hypothesis that social prejudice can be formed through Pavlovian aversive conditioning” (p. 14)

As processing of feared objects often involves the amygdala, one would expect White brains to show an amygdala response to Black faces. Contrary to this prediction, “most fMRI studies of race perception have not observed a difference in amygdala response to viewing racial outgroup compared with ingroup members (e.g., Beer et al., 2008; Gilbert et al., 2012; Golby et al., 2001; Knutson et al., 2007; Mattan et al., 2018; Phelps et al., 2000; Richeson et al., 2003; Ronquillo et al., 2005; Stanley et al., 2012; Telzer et al., 2013; Van Bavel et al., 2008, 2011).” (p. 15). The large number of studies shows how many resources were wasted on a hypotheses that is not grounded in an understanding of racism in the United States.

The chapter then reviews research on stereotypes. The main insight provided here is that “while the neural basis of stereotyping remains understudied, existing research consistently identifies the ATL (anterior temporal lobe) as supporting the representation of social stereotypes” (p. 17). However, it remains unclear what we learn about prejudice from this finding. If stereotypes were supported by some other brain area, would this change prejudice in some important way?

The authors next examine the involvement of instrumental learning in prejudice. “Although social psychologists have long hypothesized a role for instrumental learning in attitudes and social behavior (e.g., Breckler, 1984), this idea has only recently been tested using contemporary reinforcement learning paradigms and computational modeling (Behrens et al.,
2009; Hackel & Amodio, 2018).” (p. 19). Checking Hackel and Amodio (2018) shows that this review article does not mention prejudice. Other statements have nothing to do with prejudice, but rather explain why prejudice may not influence responses to all group-members. “Behavioral studies confirm that people incrementally update their attitudes about both persons (Hackel et al., 2019)” (p. 19). The authors want (us) to believe that “a model of instrumental prejudice may help to understand aspects of implicit prejudice” (p. 20), but they fail to make clear how instrumental learning is related to prejudice, let alone implicit prejudice.

The section on prejudice as habits starts with a wrong premises. “Habits: A basis for automatic prejudice? Automatic prejudices are often likened to habits; they appear to emerge from repeated negative experiences with outgroup members, unfold without intention, and resist change (Devine, 1989).” Devine’s (1989) classic subliminal priming study has not been replicated and subliminal priming in general has been questioned as producing robust findings. Moreover, the study has been questioned on methodological grounds and it has been shown that classifying an individual as Black does not automatically trigger negative responses. The main reason why prejudice is not a habit is that it requires often many repeated instances to form a habit and many White individuals have too little contact with Black individuals to form prejudice habits. The whole section is irrelevant because the authors note that “social neuroscience has yet to investigate the role of habit in prejudice” (p. 21). We can only hope that funding agencies are smart enough not to waste money on this kind of research.

This whole section ends with the following summary.

” A major contribution of social neuroscience research on prejudice has been to link different aspects of prejudice—stereotypes, affective bias, and discriminatory actions—to neurocognitive models of learning and memory. It reveals that intergroup bias, and implicit bias in particular, is not one phenomenon, but a set of different processes that may be formed, represented in the mind, expressed in behavior, and potentially changed via distinct interventions.” In short, we don’t know anything more about prejudice that we did not know without social neuroscience.

Effects of prejudice on perception

The first topic is face perception. Behavior studies show that individuals tend to be better able to discriminate between faces of their own group than faces of another group. Faces are processed in a brain area called the fusiform gyrus. A study by Golbi et al. (2001) with 10 White and 10 Black participants confirmed this finding for White Americans, t(8) = 2.10, p = .03, but not for African Americans. t(9) = 0.63. Given the small sample size the interaction is not significant in this study. The more important finding was that the fusiform gyrus showed more activation to same-race faces, t(18) = 2.58, p = .02. Inconsistent with the behavioral data, African American participants showed more activation of the fusiform face area as much as White participants. Over the past two decades, this preliminary study has been cited over 300 times. We would expect a review in 2021 to include follow-up and replication studies, but the preliminary results of this seminal study are offered as evidence as if they are conclusive. Yet, in 2021 it is clear that many results with just significant p-values, p > .005, often do not replicate. The authors seem to be blissfully unaware of the replication crisis. I was able to find a recent study that examined own-group bias for White participants only with three age groups. The study replicated findings that White participants show more activation to White faces than to Black faces, especially for adolescents and adults. The study also linked this finding to prejudice, but I will discuss these results later because it was not the focus of the review article.

In short, behavioral studies have demonstrated that White Americans have difficulties in distinguishing Black faces. This has led to false convictions based on misidentification by eye-witnesses. Expert testimony by psychologists has helped to draw awareness to this problem. Social neuroscience shows that this problem is correlated with activity in the fusiform gyrus. It is not clear, however, how knowledge about the localization of face processing in the brain provides a deeper understanding of the problem.

The authors suggest, however, that face processing may directly lead to discriminatory behavior based on an article by Krosch and Amodio (2019). In a pair of experiments, White participants were given a small or large amount of money and then had to allocate it to White or Black recipients based on some superficial impression of deservingness. In Study 1 (N = 81, 10 excluded), EEG responses to the faces showed a greater N170 response to Black faces, but only when resources were scarce, 2way interaction F(1, 69) = 4.97, p = .029. Furthermore, the results showed a significant mediation effect on resource allocation, b = .14, se = .09, p = .039. Study 2 used fMRI (N = 35, 5 excluded). This study showed the race effect on the fusiform gyrus, but only in the scarcity condition, F(1, 28) = 7.16, p = .012. Despite the smaller sample size, the mediation analysis was also significant, b = .43, se = .17, t = 2.64, p = .014. While the conceptual replication of the finding across two different studies with different brain measures makes these result look credible, the fact that all critical tests produced just significant results, p > .01 undermines the credibility of these findings (Schimmack, 2012). The most powerful test of credibility for a small set of tests is the Test of Insufficient Variance (Schimmack, 2014; Renkewitz & Keiner, 2019). The test first converts the p-values into z-scores. It then compares the observed variance to the expected variance of 1. The observed variance for these four p-values is much smaller, V = .05. A chi-square test shows that the probability of this outcome by chance is p = .013. Thus, it is unlikely that sampling error alone produced this restricted amount of variation. A more likely explanation is that the authors used questionable research practices to produce a perfect picture of significant results when the actual studies had insufficient power to produce significant results even if the main hypotheses are true. The main problem are the mediation analysis that rely on correlations in small sample sizes. It has been shown that many mediation analyses cannot be trusted because they are biased by questionable research practices

Effects of prejudice on emotion

Emotion is the most important topic for understanding prejudice. Whereas attitudes are broad dispositions to evaluate members of a specific group positively or negatively, emotions are the actual, momentary affective reactions to members of these groups. Ideally, neuroscience would be able to provide objective measures of emotions. These measures would reveal whether a White person responds with negative feelings in an interaction with a Black person. Obtaining objective, physiological indicators of emotions has been the holy grail of emotion research. First attempts to locate emotions in the body failed. Facial expressions (smiles and frowns) can provide valid information, but facial expressions can be controlled and do not always occur in response to emotional stimuli. Thus, the ability to measure brain activity seemed to open the door for objective measures of emotions. However, attempts to find signals of emotional valence in the EEG have failed. fMRI research focused on amygdala activity as a signal of fear, but latter research showed that the amygdala also responds to some positive stimuli, specifically erotic stimuli. Given this disappointing history, I was curious to see what the latest social neuroscience research on emotion has uncovered.

As it turns out, this section provides no new insights into emotional responses to members of an outgroup. The main focus is on empathy in the context of taking the perspective of an in-group or out-group member and guilt. The main reason why fear or hate are not explored is probably that there are no known neural correlates of these emotions and that research with undergraduate students in response to pictures of Black and White faces is unlikely to elicit strong emotions.

In short, the main topic where neuroscience could make a contribution lacks from knowledge of valid measures of emotions in the brain.

Effects of prejudice on decision making

Emotional responses would be less of a problem if individuals would not act on their emotions. Most adult individuals learn to regulate their emotions and to inhibit undesirable behaviors. The reason prejudice is a problem for minority groups is that some White individuals do not feel a need to regulate their negative emotions towards African Americans or that they lack the ability to do so in some situations, which is often called implicit bias. Thus, understanding how the brain is involved in actual behaviors is even more important than understanding its contribution to emotions. Although of prime importance, this section is short and contains few citations. One reference is to the resource allocation study by Krosch and Amodio that I reviewed in detail earlier. Blissfully aware of the questions raised about oxytocin research, another reference is to a study with oxytocin administration (Marsh et al., 2017). Thus, there is no research reviewed here that illuminates what the brain is doing when White individuals discriminate against African Americans. This does not stop the authors from making a big summary statement that “social neuroscience research has refined our understanding of how prejudice influences the visual processing of faces, intergroup emotion, and decision-making processes, particularly as each type of response pertains to behavior” (p. 34).

Self-regulation of Prejudice

This section starts of with a study by Amodio et al. (2004) and the claim that the results of this study have been replicated in numerous studies (Amodio et al., 2006; 2008; Amodio & Swencionis, 2018; Bartholow et al., 2006; Beer et al., 2008; Correll et al., 2006; Hughes et al., 2017). The main claim based on these studies is that self-regulation of prejudice relies on “detection of bias and initiation of control, in dACC—a process that can operate rapidly and in the absence of deliberation, and which can explain individual differences in prejudice control failures” (p. 39).

Amodio et al.’s (2004) study used the weapons – identification task. This task is an artificial task that puts participants in the position of a police officer who has to make a split second decision whether a civilian is holding a gun or some other object (cell phone). Respondents have to respond as quickly as possible whether the object is a gun or not. The race of the civilians is manipulated to examine racial biases. A robust finding is that White participants are faster to identify guns after seeing a Black face than a White face and slower to identify a tool after seeing a White face than a Black face. On some trials, participants also make mistakes. When the brain of participants notices that a mistake was made, EEG shows a distinct signal that is called the error-related negativity (ERN). The key finding in this article is that the ERN was more pronounced when participants identified a tool as a gun in trials with Black faces than in trials with White faces, t(33) = 2.94, p = .006. Correlational analysis suggested that participants with larger ERNs after mistakes with Black faces learned from their mistakes and reduced their errors, r(32) = -.50, p = .004. These results show that at least some individuals are aware when prejudice influences their behaviors and control their behaviors to avoid acting on their prejudice. It is difficult to generalize from this study to regulation of prejudice in real-life because the task is artificial and most situations provide only ambiguous feedback about the appropriateness of actions. Even the behavior here is a mere identification rather than an actual behavior such as a shoot or no-shoot decision, which might produce more careful responses and fewer errors especially in more realistic training scenarios (Andersen zzz).

Another limitation of these studies is the reliance on a few pictures to represent the large diversity of Black and White people.

All replication studies seem to have used the same faces. Therefore, it is unclear how generalizable these results and how much contextual factors (e.g., gender, age, clothing, location, etc.) might moderate the effect.

Some limitations of the generalizability were reported by Amadio and Swencionis (2018). The racial bias effect was eliminated (no longer statistically significant) when 80% of trials showed Black faces with tools rather than guns. This finding is not predicted by models that assume racial bias often has an implicit (automatic and uncontrollable) effect on behavior. Here it seems that simple knowledge about the low frequency of Black people with guns was sufficient to block the behavioral expression of prejudice. Study 4 measured EEG, but did not report ERN results.

The summary of this section concludes that “social neuroscience research on prejudice control has significantly expanded psychological theory by identifying and distinguishing multiple mechanisms of control” (p. 39). I would disagree. The main finding appears to be that the brain sometimes fails to notice that it made an error and that lack of awareness of these errors prohibits correcting this error. However, the studies are designed to produce errors in the first place to be able to measure the ERN. Without time pressure, few errors would be made and as shown by Amadio and Swencionis show that racial bias depends on a specific context. That being said, lack of awareness may cause sustained prejudice in the real world. One important role of diversity training is to make majority members aware of behaviors that hurt minority members. Awareness of the consequences should reduce the frequency of these behaviors because they are controllable as the reviewed research suggests.

The conclusion section repeats the claim that the review highlights “major theoretical advances produced by this literature to date” (p. 42). However, this claim rings hollow in comparison to the dearth of findings that inform our understanding of prejudice. The main problem for social neuroscience of prejudice is that the core component of prejudice, negative affect, has no clear neural correlates in EEG or fMRI measures of the brain, and that experimental designs suitable for neuroscience have low ecological validity. The authors suggests that this may change in the future. They provide a study with Black and White South Africans as an example. The study measured fMRI while participants viewed short video-clips of Black and White individuals in distress. The videos were taken from the South African Truth
and Reconciliation Commission. The key finding was that brain signals related to empathy showed an in-group bias. Both groups responded more to distress by members of their own group. The fact that this study is offered as an example for greater ecological validity shows the problems for social neuroscience to study prejudice in realistic settings where one individual responds to another individual and their behavior is influenced by prejudice. The authors also point to technological advances as a way to increase ecological validity. Wearable neuroimaging makes it possible to measure the brain in naturalistic settings, but it is not clear what brain signals would produce valuable information about prejudice.

My main concerns is that social neuroscience research on prejudice takes away resources from other, in my opinion more important, prejudice research that focuses on actual behaviors in the real world. I am not the only one who has observed that the focus on cognition and the brain has crowded out research of actual behaviors (Baumeister, Vohs, & Funder, 2007; Cesario, 2021). If a funding agency can spend a million dollars on a grant to study the brains of undergraduate students while they look at Black and White faces or on the shooting errors of police officers in realistic simulations, I would give money to the study of actual behavior. There is also a dearth of research on prejudice from the perspective of the victims. They know best what prejudice is and how it affects them. There needs to be more diversity in research and White researchers should collaborate with Black researchers who can draw on personal experiences and deep cultural knowledge that White researchers lack or fail to use in their research. Finally, the incentive structure needs to change. Prejudice researchers are rewarded like all other researchers for publishing in prestigious journals that are controlled by White researchers. Even journals dedicated to social issues have this systemic bias. Prejudice research more than any other field needs to ensure equity, diversity, and inclusions at all levels. Moving social neuroscience of prejudice out of White social cognition research into a diverse and interdisciplinary field might help to ensure that these studies actually inform our understanding of prejudice. Thus, a reallocation of funding is needed to ensure that funding for prejudice research benefits African Americans and other minority groups.

Statues are Falling, but Intelligence Researchers Cling to Their Racist Past

July 29, 2020Haier, https://doi.org/10.1016/j.intell.2010.04.007, https://www.journals.elsevier.com/intelligence/editorial-board, Intelligence, IQ Tests, Lynn, Meisenberg, National IQs calculated and validated for 108 nations, Racism, ValidityUlrich Schimmack

Psychology wants to be a science. Unfortunately, respect and reputations need to be earned. Just putting the name science in your department name or in the title of your journals doesn’t make you a science. A decade ago, social psychologists were shocked to find out that for years one of their colleagues had just made up data and nobody had noticed it. Then, another social psychologists proved physics wrong and claimed to have evidence of time reversed causality in a study with erotic pictures and undergraduate student. This also turned out to be a hoax. Over the past decade, psychology has tried to gain respect by doing more replication studies of classic findings (that often fail), starting to preregister studies (which medicine has implemented years ago), and in general to analyze and report their results more honestly. However, another crisis in psychology is that most measures in psychology are used without evidence that they measure what they measure. Imagine a real science where scientists first ensure that their measurement instruments work and then use them to study distant planets or microorganisms. Not so psychology. Psychologists have found a way around proper measurement called operationalism. Rather than trying to find measures for constructs, constructs are defined by the measures. What is happiness? While philosophers have tried hard to answer this questions, psychologists cannot be bothered to spend time to think about this question. Happiness is whatever your rating on a happiness self-report measure measures.

The same cheap trick has been used by intelligence researchers to make claims about human intelligence. They developed a series of tasks and performance on these tasks is used to create a score. These scores could be given a name like “score that reflects performance on a series of tasks some White men (yes, I am a White male myself) find interesting,” but then nobody would care about these scores. So, they decided to call it intelligence. If pressed to define intelligence, they usually do not have a good answer to this question, but they also don’t feel the need to give an answer because intelligence is just a term for the test. However, the choice of the term is not an accident. It is supposed to sound as if the test measures something that corresponds to the everyday term intelligence to make the test more interesting. However, it is possible that the test is not the best measure of what we normally mean by intelligence. For example, performance on intelligence tests correlates only about r = .3 with self-ratings or ratings by close friends and family members of intelligence. While there can be measurement in self-ratings, there can also be measurement error in intelligence tests. Although intelligence researchers are considered to be intelligent, they rarely consider this possibility. After all, their main objective is to use these tests and to see how they relate to other measures.

Confusing labels for tests are annoying, but hardly worth to write a long blog post about. However, some racist intelligence researchers use the label to make claims about intelligence and skin color (Lynn & Meisenberg, 2010). Moreover, the authors even use their racist preconception that dark-skinned people are less intelligence to claim that intelligence tests measure intelligence BECAUSE performance on these tests correlates with skin color.

You don’t have to be a rocket scientists to realize that this is a circular argument. Intelligence tests are valid because they confirm a racist stereotype. This is not how real science works, but this doesn’t bother intelligence researchers. The questionable article has been cited 80 times.

I only came across this nonsense because a recent article used national IQ scores to make an argument about intelligence and homicides. After concerns about the science were raised, the authors retracted their article pointing to problems in the measurement of national differences in IQ. The editor of this journal, Psychological Science, wrote an editorial with “A Call for Greater Sensitivity in the Wake of a Publication Controversy.”

Greater sensitivity also means to clean the journals of unscientific and hurtful claims that serve no scientific purpose. In this spirit, I asked the current editor of Intelligence in an email on June 15th to retract Lynn and Meisenberger’s offensive article. Today, I received the response that the journal is not going to retract the article.

Richard Haier (Emeritus, Editor in Chief) Decision Letter

This decision just shows the unwillingness among psychologists to take responsibility for a lot of bad science that is published in their journals. This is unfortunately because it shows the low motivation to change and improve psychology. It is often said that science is the most superior method to gain knowledge because science is self-correcting. However, often scientists stand in the way of correction and the process of self-correction is best measured in decades or centuries. Max Plank famously observed that scientific self-correction often requires the demise of the old guard. However, it is also important not to hire new scientists who continue to abuse the freedom and resources awarded to scientists to spread racist ideology. Meanwhile, it is best to be careful and to distrust any claims about group differences in intelligence because intelligence researchers are not willing to clean up their act.

No Justice, No Peace: A History of Slavery Predicts Violence Today

June 15, 2020Homicide, Racism, SlaveryUlrich Schimmack

Some human behaviors attract more attention than others. Homicides are rare, but very salient human behaviors. Governments investigate and keep records of homicides and social scientists have developed theories of homicides.

In the 1960s, social scientists suggested that inequality can lead to more violence. One simple reason is that the rewards for poor people to commit violent crimes increase with greater inequality in wealth (Becker, 1968).

Cross-national studies confirm that societies with more income inequality have higher homicide rates (Avison & Loring, 1986; Blau & Blau, 1982; Chamlin & Cochran, 2006; Corcoran & Stark, 2020; Fajnzylber, Lederman & Loayza, 2002; Krahn et al., 1986; Pratt & Godsey, 2003; Pridemore, 2008).

A recent article in Psychological Science replicated this finding (Clark, Winegard, Beardslee, Baumeister, & Shariff, 2020). However, the main focus of the article was on personality attributes as a predictor of violence. The authors main claim was that religious people are less likely to commit crimes and that among non-religious individuals those with lower intelligence would be more likely to commit homicides.

A fundamental problem with this article is that the authors relied on an article by a known White-supremacist, Richard Lynn, to measure national differences in intelligence (Lynn & Meisenberg, 2010). This article with the title “National IQs calculated and validated for 108 nations” claims that the values used by Clark et al. (2020) do reflect actual differences in intelligence. The problem is that the article contains no evidence to support this claim. In fact, the authors reveal their racist ideology when they claim that a correlation between their scores and skin color of r = -.9 validates their measure as a measure of intelligence. This is not how scientific validation works. This is how racists abuse science to justify their racist ideology.

The article also makes a common mistake to impose a preferred causal interpretation on a correlation. Lynn and Meisenberg (2010) find that their scores correlate nearly perfectly with educational attainment. They interpret this as evidence that intelligence causes educational attainment and totally ignore the plausible alternative explanation that education influences performance on logical problems. This has important implications for Clark et al.’s (2020) article because the authors buy into Lynn and Meisenberg’s racist interpretation of the correlation between performance on logic problems and educational attainment. An alternative interpretation of their finding would be that religion interacts with education. In nations with low levels of formal education, religion provides a moral code that prevents homicides. In countries with more education, other forms of ethics can take the place of religion. High levels of homicides would be observed in countries where neither religion nor education teach a moral code.

Aside from this fundamental flaw in Clark et al.’s (2020) article, closer inspection of their data shows that they overlooked confounding factors and that their critical interaction is no longer significant when these factors are included in the regression model. In fact, financial and racial inequality are much better predictors of national differences in violence than religion and the questionable measure of intelligence. Below I present the statistical results that support this conclusion that invalidate Clark et al’s (2020) racist conclusions.

Statistical Analysis

Distribution Problems

Not long ago, religion was a part of life in most countries. Only over the past century, some countries became more secular. Even today, most countries are very religious. Figure 1 shows the distribution of religiosity based on the Relig_ARDA variable in Clark et al.’s dataset. This skewed distribution can create problems when a variable is used in a regression model, especially if the variable is multiplied with another variable to test interaction effects.

It is common practice to transform variables to create a more desirable distribution for the purpose of statistical analysis. To do so, I reversed the item to measure atheism and then log-transformed the variable. To include countries that scored 100% on religiosity, I added 0.001 to all atheism scores before I carried out the log transformation. The distribution of log-atheism is less skewed.

The distribution of homicides (rates per 100,000 inhabitants) is also skewed.

Because homicide rates are right-skewed, a direct log-transformation can be applied to get a more desirable distribution. To include nations with a value of 1, I added a value of 1 before the log-transformation. The resulting distribution for log-homicides is more desirable.

The controversial IQ variable did not require a transformation.

Bivariate Relationships

The next figure shows a plot of homicides as a function of the questionable intelligence (QIM). There is a visible negative correlation. However, the plot also highlights countries in Latin America and the United States. These countries have in common that they were established by decimating the indigenous population and bringing slaves from Africa to work for the European colonialists. It is notable that nations with a history of slavery have higher homicide rates than other nations. Thus, aside from economic inequality, racial inequality may be another factor that contributes to violence even though slavery ended over 100 years ago, while racial inequality persists until today. Former slave countries also tend to score lower on the QIM measure. Thus, slavery may partially account for the correlation between QIM and homicide rates.

The next plot shows homicide rates as a function of atheism. A value of 0 would mean the country it totally atheistic, while more negative values show increasing levels of religion. There is no strong relationship between religion and homicide rates. This replicates the results in the original article by Clark et al. Remember that their key finding was a interaction between QIM and religion. However, the plot also shows a clear distinction between less religious countries. Former slave countries are low in religion and have high homicide rates, while other countries (mainly in Europe) are low in religion and have low homicide rates.

Regression Models

To examine the unique contribution of different variables to the prediction of homicide rates, I conducted several regression analyses. I started with the QIM x religion interaction to see whether the interaction is robust to transformations of the predictor variables. The results clearly show the interaction and main effects for QIM and religion (t-values > 2 are significant at p < .05).

Next I added slavery as a predictor variable.

The interaction is no longer significant. This shows that the interaction emerged because former slave countries tend to score low on QIM and religion.

I then added the GINI coefficient, the most widely used measure of income inequality, to the model. Income inequality was an additional predictor. The QIM x religion interaction remained non-significant.

I then added GDP to the model. Countries wealth is strongly related to many positive indicators. Given the skewed distribution, I used log-GDP as a predictor, which is also the most common way economists use GDP.

GDP is another significant predictor, while the QIM x religion interaction remains non-significant. Meanwhile, the strong relationship between QIM and homicide rates has decreased from b = -.71 without controls to b = -.25 with controls. However, it is still significant. As noted earlier, QIM may reflect education and Clark et al. (2020) included a measure of educational attainment in their dataset. It correlates r = .68 with QIM. I therefore substituted QIM with education.

However, education did not predict homicide rates. Thus, QIM scores capture something about nations that the education measure does not capture.

We can compare the social justice variables (slavery, GDP, GINI) with the personal-attribute (atheist, QIM) variables. A model with the social justice variables explains 62% of the variation in homicide rates across nations.

The personal-attribute model explains only 40% of the variance.

As these predictors overlap, the personal-attributes add only 3% additional variance to the variance that is explained by slavery, income inequality, and wealth.

Replicating Slavery’s Effect in the United States

The United States provide another opportunity to test the hypothesis that a legacy of slavery and racial inequality is associated with higher levels of homicides. I downloaded statistics about homicides (homicide stats). In addition, I used a measure of urbanization to predict homicides (urbanization). I also added a measure of income inequality (GINI). I classified states that fought for the confederacy as slave states (civil war facts). Results were similar for different years in which homicide rates were available from 1996 to 2018. So, I used the latest data.

In a model with all predictor variables, slavery was the only significant predictor. Income inequality showed a trend, and urbanization was not a unique predictor. When urbanization was removed from the model, the effect of income inequality was a bit stronger.

Overall, these results are consistent with the cross-national data and suggest that a history of slavery and persistent racial inequality create social conditions that lead to more violence and homicides. These results are consistent with recent concerns that systemic racism contributes to killing of civilians by civilians and police officers who historically had the role to enforce racial inequality.

Meta-Science Reflections

Clark et al.’s (2020) article is flawed in numerous ways. Ideally, the authors would have the decency to retract it. The main flaw is the use of a measure with questionable validity and to never question the validity of the measure. This flaw is not unique to this article. It is a fundamental flaw that has also led to a large literature on implicit bias based on an invalid measure. The uncritical use of measures has to stop. A science without valid measures is not a science and statistical results that are obtained with invalid measures are not scientific results.

A second flaw of the article is that psychologists are trained to conduct randomized laboratory experiments. Random assignment makes it easy to interpret statistically significant results. Unless something went really wrong or sampling error produced a false result, a statistically significant result means that the experimental manipulation influenced the dependent variable. Causality is built into the design. However, things are very different when we look at naturally occurring covariation because everything is correlated with everything. Observed relationships may not be causal and they can be produced by variables that were not measured. The only way to deal with this uncertainty is to carefully test competing theories. It is also necessary to be careful in the interpretation of results. Clark et al. (2020) failed to do so and make overly strong statements based on their correlational findings.

“Many scholars have argued that religion reduces violent behavior within human social groups. Here, we tested whether intelligence moderates this relationship. We hypothesized that religion would have greater utility for regulating violent behavior among societies with relatively lower average IQs than among societies with relatively more cognitively gifted citizens. Two studies supported this hypothesis”

This statement would be fine if they had conducted an experiment, but of course, it is impossible to conduct an experiment to examine this question. This also means it is no longer possible to use evidence as support for a hypothesis. Correlational evidence simply cannot verify a hypothesis. It can only falsify wrong theories. Clark et al. (2020) failed to acknowledge competing theories of homicides and to test their theory against competing theories.

The last meta-scientific observation is that all conclusions in science rests on a combination of data and assumptions. When the same data lead to different conclusions, like they did here, we get insights into researchers’ assumptions. Clark et al.’s (2020) assumptions were (a) there are notable difference in intelligence between nations, (b) these differences are measured with high validity by Lynn and Weisenberg’s (2010) questionable IQ scores, and homicides are caused by internal dispositions like being an atheist with low intelligence. Given Lynn and Weisenberg’s finding that their questionable measure correlates highly with skin tone, they also implicitly share the racist assumption that dark skinned people are more violent because they are less intelligent. The present blog post shows that an entirely different story fits the data. Homicides are caused by injustice such as unfair distributions of wealth and discrimination and prejudice based on skin color. I am not saying that my interpretation of the data is correct because I am aware that alternative explanations are possible. However, I rather have a liberal/egalitarian bias than a racist bias.

Racism decreased in the US, but not for Conservative Republicans

June 9, 2020Conservatism, General Social Survey, Historic Trend, Political Orientation, Racism, Time Trends, TrumpUlrich Schimmack

Are Republicans more racist? I am not the only one who is asking themselves this question. In 2014, Nate Silver and Allison McCann examined this question using data from the General Social Survey (538.2014). The political landscape has changed since Trump was elected in 2016 and it is interesting to reexamine this question with the latest data.

Like Silver and McCann, I limit the analysis to White respondents because Black voters are much more likely to support the Democratic party. Thus, a trivial reason why Republicans are more likely to be racist is that they are more likely to be White. The more interesting question is whether there is an additional difference between White Republicans and White Democrats.

Political Orientation

The General Social Survey (GSS) asks two questions about political orientation. One question asks about party identification. Responses range from strong democrat to strong republican. To identify conservative Republicans, I classified respondents who responded with “independent – close to Republican”, “not very strong Republican”, and “strong republican” as Republicans.

Another question asks respondents to locate their political orientation on a liberal versus conservative dimension on a 7-point scale ranging from extremely liberal to extremely conservative. To identify conservatives, I combined the three response categories slightly conservative, conservative, and extremely conservative.

Figure 1 shows that responses to these two questions are not perfectly aligned. There are conservative Democrats and liberal Republicans. There are several ways to deal with this inconsistency. I chose to contrast conservative Republicans with the other three groups (liberal Republicans, conservative Democrats, & liberal Democrats).

The GSS also asked about voting behavior in past elections. In the 2018 survey, most conservative Republicans voted for Trump.

Racism

Racism is a loaded word with no clear meaning. While the majority of Democrats considers Donald Trump to be a racist, most Republicans disagree. Yet, we need a term to describe people who do not see or treat Black and White people as equal. Given the history of the United States, it is difficult for a White American to live up to the ideal of racial equality. However, there is variation in the degree to which White Americans treat Black and White people as equal or not. Thus, the question is not whether somebody is a racist or not, but the degree to which White’s treat people equally or unequally.

The GSS includes several questions that measure racism. Some of these questions were discontinued and others were added in later years. I found five questions that were asked in later years that measure racism.

1. Feeling Thermometer.
Feeling thermometers are a widely used measure of racism. One question is about feelings towards Black people. Another question is about feelings towards White people. The difference between these two ratings indicates how much people have more positive feelings towards White people than towards Black people. Feelings are assessed on a 9-point scale ranging from 1 = not at all close to 9 = very close. The difference score can range from -8 to + 8.

The Figure shows that most respondents indicate that they make no difference in their feelings based on race, while some participants report feeling closer to White people.

2. Interracial Marriages
The United States have a long history of racism that is reflected in laws against interracial marriages (Loving vs. Virginia). Even today, some Americans do not approve of interracial marriages. The GSS asks respondents “What about having a close relative marry a Black person? Would you be very in favor of it happening, somewhat in favor, neither in favor nor opposed to it happening, somewhat opposed, or very opposed to it happening?” The Figure shows the distribution of the responses before and after 2016. The results show a decrease in opposition to mixed marriages, but even today many US American are against somebody in their family marrying an African American.

3. Like other Immigrants
The third item asks respondents how much they agree with an item that ignores the special history of slavery, discrimination, and police brutality that African Americans are subjected to. “Irish, Italians, Jewish and many other minorities overcame prejudice and worked their way up. Blacks should do the same without special favors.” Responses are made on a 5-point scale ranging from 1=strongly disagree to 5=strongly agree. The Figure shows that even today a majority of White Americans agree with this statement.

4. Opposing Affirmative Action
One question in the GSS asks about respondents’ attitudes towards affirmative actions.

“Some people say that because of past discrimination, blacks should be given preference in hiring and promotion. Others say that such preference in hiring and promotion of blacks is wrong because it discriminates against whites. What about your opinion — are you for or against preferential hiring and promotion of blacks?” Responses are favor strongly, favor, oppose, and oppose strongly.

It is important to point out that opposition to affirmative actions may be based on other consideration than racism. Nevertheless, the item can be used to measure racism because racists are much more likely to agree with the item than respondents who strive towards racial equality.

The advantage of this item for measuring racism is that it provides opportunity for plausible deniability. It is possible to endorse the item without revealing that the response is motivated by racism. This may explain the high proportion of Americans who express strong opposition to affirmative actions.

5. Spending on Improving the conditions of Blacks
The fifth question is part of a series of questions about public spending. One question asks whether respondents think the government is spending too little or too much on “improving the conditions of Blacks.” As with the previous item, respondents may answer “too much” for several reasons. For example, some may oppose taxation and government spending on anything. Thus, responding “too much” does not directly imply that the response is racially motivated, but individuals who are more racists are more likely to answer “too much.”

A Measurement Model of Racism

Responses to each of the five items can be influenced by many factors (causes) other than racism. Thus, we cannot simply look at individual items to determine how racist somebody is. Psychologists have developed measurement models to examine whether responses to items are driven by a common cause. Finding evidence for a common cause does not conclusively show that the cause is racism, but it makes it more likely.

The Figure shows the measurement model of racism. In this model racism is an attribute (trait) that varies across individuals. Individuals with high racism are more likely to respond to the five items in ways that express a dislike of African Americans. The numbers next to the arrows show how much a specific item reflects racism. Interestingly, the feeling thermometer is a relatively weak measure of racism. One concern could be that the racism factor is dominated by issues related to money, but the opposition to a black family member shows that that racism is not limited to equity issues.

Another concern is that responses to a survey are different from actual traits. Some respondents may downplay their racism because they are aware that racism is offensive. The GSS makes it possible to test this hypothesis because respondents were sometimes interviewed by Black interviewers. The next figure shows that White respondents respond to the question in a less racist manner when they are interviewed by a black interviewer. This effect is general across all five items, but especially strong for the feeling thermometer and the spending question.

In short, measuring racism is difficult and no single question provides a clear and unambiguous measure of racism. However, there is clear evidence that people respond to the five questions in a consistent manner that is more or less racist, suggesting that the factor underlying their responses can be used as a measure of racism.

Political Orientation and Racism

The next analysis examined differences between conservative republicans and other Americans. The results show that conservative republicans are more racist (.769 standard deviations). In addition, they oppose affirmative actions and spending on Black issues. This shows that politics also influence responses to the items independent of racism, but even when we allow for such extra relationships, there is a strong relationship between identifying as a conservative republican and racism. In psychology a difference of .8 standard deviations is considered a large effect. In intelligence testing, a standard deviation is 15, and a difference of .769 would be a difference by 12 IQ points.

Political Orientation and Racism Over Time

The first figures showed that racism was lower in 2016 and 2018 than in the years before. To examine whether changes in racism differ as a function of political orientation, I added time to the model and examined different time trends for conservative republicans and others. I used presidents to measure time using the 1990s as the reference period and counted 2016 as a Trump year, although he was only elected at the end of the year to have two years of observations (2016, 2018). The results of the model showed a significant interaction between political orientation and time. While racism levels remained nearly unchanged for conservative republicans, they decreased for other Americans.

The figure shows a dramatic difference in racism during the Trump years. The difference of 1.4 standard deviations corresponds to a difference of 21 points in IQ testing. Interestingly, the difference is not due to increasing racism among conservative republicans, but rather decreasing racism among Americans.

It is tempting to suggest that this large difference in racism explains the much stronger response to George Floyd’s murder by a police officer, while similar events in previous years did not spark the widespread protests the US have seen in the last weeks. The data suggest that America as a whole is moving towards more racial equality and that many White Americans empathize with African Americans. However, conservative republicans are an exception. Many of them are opposed to racial equality.

Complicit Racism

Psychological research over the past two decades has focused a lot on implicit racism. The idea is that individuals can hide their racism from themselves. The notion of unconscious bias or racism is controversial and there is no solid scientific evidence for implicit racism. Most people know whether they would welcome a Black family member or not. A more common form of racism is a conscious choice to ignore the racism of others, whether it is a colleague, a friend, a family member or a party member. You know they are racist, but you don’t say anything about it. Most people, including conservative republicans, are not surprised by the finding that conservative republicans are more likely to be racist. We all know examples of racist conservative republicans and the Republican party carefully avoids alienating racist voters. The current Republican party suffers from complicit racism. Some party strategists even admit that complicit racism was used to win majorities. However, most Americans are increasingly appalled by the complicit racism that is inconsistent with modern American values. Just like the civil war and the civil rights movement, America today is still struggling with the “birth defect of slavery” (Condoleezza Rice).

Political Science and Political Bias

It is not hard to figure out my personal political views and I have no reason to hide them. I am not a fan of conservative republicans and their current leader. This may have influenced the way I analyzed and presented the results. Fortunately, the data are openly available and conservatives are welcome to analyze the data and present their findings. In this spirit, I consider this blog post as a thesis that is awaiting a critical examination and response. However, is it really in doubt that respondents who voted for Trump are more likely to oppose affirmative action and to object to a Black family member? Many probably think that these results merely show the obvious.

Implicit Racism, Starbucks, and the Failure of Experimental Social Psychology

April 19, 2018Implicit Bias, Prejudice, RacismStarbucksUlrich Schimmack

Implicit racism is in the news again (CNN). A manager of a Starbucks in Philadelphia called 911 to ask police to remove two Black men from the coffee store because they had not purchased anything. The problem is that many White customers frequent Starbucks without purchasing things and the police is not called. The incident caused widespread protests and Starbucks announced that it would close all of its stores for “implicit bias training.”

Starbucks’ CEO Derrick Johnson explains the need for store-wide training in this quote.

“The Starbucks situation provides dangerous insight regarding the failure of our nation to take implicit bias seriously,” said the group’s president and CEO Derrick Johnson in a statement. “We refuse to believe that our unconscious bias –the racism we are often unaware of—can and does make its way into our actions and policies.”

But was it implicit bias? It does not matter. CEO Derrick Johnson could have talked about racism without changing what happened or the need for training.

“The Starbucks situation provides dangerous insight regarding the failure of our nation to take racism seriously,” said the group’s president and CEO Derrick Johnson in a statement. “We refuse to believe that we are racists and that racism can and does make its way into our actions and policies.”

We have not heard from the store manager why she called the police. This post is not about a single incidence at Starbucks because psychological science can rarely provide satisfactory answers to single events. However, the call for training of thousands of Starbucks’ employees is not a single event. It implies that social psychologists have developed scientific ways to measure “implicit bias” and developed ways to change it. This is the topic of this post.

What is implicit bias and what can be done to reduce it?

The term “implicit” has a long history in psychology, but it rose to prominence in the early 1990s when computers became more widely used in psychological research. Computers made it possible to present stimuli on screens rather than on paper and to measure reaction times rather than self-ratings. Computerized tasks were first used in cognitive psychology to demonstrate that people have associations that can influence their behaviors. For example, participants are faster to determine that “doctor” is a word if the word is presented after a related word like “hospital” or “nurse.”

The term implicit is used for effects like this because the effect occurs without participants’ intention, conscious reflection, or deliberation. They do not want to respond this way, but they do, whether they want to or not. Implicit effects can occur with or without awareness, but they are generally uncontrollable.

After a while, social psychologists started to use computerized tasks that were developed by cognitive psychologists to study social topics like prejudice. Most studies used White participants to demonstrate prejudice with implicit tasks. For example, the association task described above can be easily modified by showing traditionally White or Black names (in the beginning computers could not present pictures) or faces.

Given the widespread prevalence of stereotypes about African Americans, many of these studies demonstrated that White participants respond differently to Black or White stimuli. Nobody doubts these effects. However, there remain two unanswered questions about these effects.

What (the fuck) is Implicit Racial Bias?

First, do responses in this implicit task with racial stimuli measure a specific form of prejudice? That is, do implicit tasks measure plain old prejudice with a new measure or do they actually measure a new form of prejudice? The main problem is that psychologists are not very good at distinguishing constructs and measures. This goes back to the days when psychologists equated measures and constructs. For example, to answer the difficult question whether IQ tests measure intelligence, it was simply postulated that intelligence is what IQ tests measure. Similarly, there is no clear definition of implicit racial bias. In social psychology implicit racism is essentially whatever leads to different responses to Black and White stimuli in an implicit task.

The main problem with this definition is that different implicit tasks show low convergent validity. Somebody can take two different “implicit tests” (the popular Implicit Association Test, IAT, or the Affective Misattribution Task) and get different results. The correlations between two different tests range from 0 to .3, which means that the tests disagree more with each other than that they agree.

20 years after the first implicit tasks were used to study prejudice we still do not know whether implicit bias even exist or how it could be measured, despite the fact that these tests are made available to the public to “test their racial bias.” These tests do not meet the standards of real psychological tests and nobody should take their test scores too seriously. A brief moment of self-reflection is likely to provide better evidence about your own feelings towards different social groups. How would you feel if somebody from this group would move in next door? How would you feel if somebody from this group would marry your son or daughter? Responses to questions like this have been used for over 100 years and they still show that most people have a preference for their own group over most other groups. The main concern is that respondents may not answer these survey questions honestly. But if you do so in private for yourself and you are honest to yourself, you will know better how prejudice you are towards different groups than by taking an implicit test.

What was the Starbucks’ manager thinking or feeling when she called 911? The answer to this question would be more informative than giving her an implicit bias test.

Is it possible to Reduce Implicit Bias?

Any scientific answer to this question requires measuring implicit bias. The ideal study to examine the effectiveness of any intervention is a randomized controlled trial. In this case it is easy to do so because many White Americans who are prejudice do not want to be prejudice. They learned to be prejudice through parents, friends, school, or media. Racism has been part of American culture for a long time and even individuals who do not want to be prejudice respond differently to White and African Americans. So, there is no ethical problem in subjecting participants to an anti-racism training program. It is like asking smokers who want to quit smoking to participate in a test of a new treatment of nicotine addiction.

Unfortunately, social psychologists are not trained in running well-controlled intervention studies. They are mainly trained to do experiments that examine the immediate effects of an experimental manipulation on some measure of interest. Another problem is that published articles typically report only report successful experiments. This publication bias leads to the wrong impression that it may be easy to change implicit bias.

For example, one of the leading social psychologist on implicit bias published an article with the title “On the Malleability of Automatic Attitudes: Combating Automatic
Prejudice With Images of Admired and Disliked Individuals” (Dasgupta & Greenwald, 2001). The title makes two (implicit) claims. Implicit attitudes can change (it is malleable) and this article introduces a method that successfully reduced it (combating it). This article was published 17 years ago and it has been cited 537 times so far.

Study 1

The first experiment relied on a small sample of university students (N = 48). The study had three experimental conditions with n = 18, 15, and 15 for each condition. It is now recognized that studies with fewer than n = 20 participants per condition are questionable (Simmons et al., 2011).

The key finding in this study was that scores on the Implicit Association Test (IAT) were lower when participants were exposure to positive examples of African Americans (e.g., Denzel Washington) and negative examples of European Americans (e.g., Jeffrey Dahmer – A serial killer) than in the control condition, F(1, 31) = 5.23, p = .023.

The observed mean difference is d = .80. This is considered a large effect. For an intervention to increase IQ it would imply an increase by 80% of a standard deviation or 12 IQ points. However, in small samples, these estimates of effect size vary a lot. To get an impression of the range of variability it is useful to compute the 95%CI around the observed effect size. It ranges form d = .10 to 1.49. This means that the actual effect size could be just 10% of a standard deviation, which in the IQ analogy would imply an increase by just 1.5 points. Essentially, the results merely suggest that there is a positive effect, but they do not provide any information about the size of the effect. It could be very small or it could be very large.

Unusual for social psychology experiments, the authors brought participants back 24 hours after the manipulation to see whether the brief exposure to positive examples had a lasting effect on IAT scores. As the results were published, we already know that it did. The only question is how strong the evidence was.

The result remained just significant, F(1, 31) = 4.16, p = .04999. A p-value greater than .05 would be non-significant, meaning the study provided insufficient evidence for a lasting change. More troublesome is that the 95%CI around the observed mean difference of d = .73 ranged from d = .01 to 1.45. This means it is possible that the actual effect size is just 1% of a standard deviation or 0.15 IQ points. The small sample size simply makes it impossible to say how large the effect really is.

Study 2

Study 1 provided encouraging results in a small sample. A logical extension for Study 2 would be to replicate the results of Study 1 with a larger sample in order to get a better sense of the size of the effect. Another possible extension could be to see whether repeated presentations of positive examples over a longer time period can have lasting effects that last longer than 24 hours. However, multiple-study articles in social psychology are rarely programmatic in this way (Schimmack, 2012). Instead, they are more a colorfull mosaic of studies that were selected to support a good story like “it is possible to combat implicit bias.”

The sample size in Study 2 was reduced from 48 to 26 participants. This is a terrible decision because the results in Study 1 were barely significant and reducing sample sizes increases the risk of a false negative result (the intervention actually works, but the study fails to show it).

The purpose of Study 2 was to generalize the results of racial bias to aging bias. Instead of African and European Americans, participants were exposed to positive and negative examples of young and old people and performed an age-IAT (old vs. young).

The statistical analysis showed again a significant mean difference, F(1, 24) = 5.13, p = .033. However, the 95%CI again showed a wide range of possible effect sizes from d = .11 to 1.74. Thus, the study provides no reliable information about the size of the effect.

Moreover, it has to be noted that study two did not report whether a 24-hour follow up was conducted or not. Thus, there is no replication of the finding in Study 1 that a small intervention can have an effect that lasts 24 hours.

Publication Bias: Another Form of Implicit Bias [the bias researchers do not want to talk about in public]

Significance tests are only valid if the data are based on a representative sample of possible observations. However, it is well-known that most journals, including social psychology journals publish only successful studies (p < .05) and that researchers use questionable research practices to meet this requirement. Even two studies are sufficient to examine whether the results are representative or not.

The Test of Insufficient Variance examines whether reported p-values are too similar than we would expect based on a representative sample of data. Selection for significance reduces variability in p-values because p-values greater than .05 are missing.

This article reported a p-value of .023 in Study 1 and .033 in Study 2. These p-values were converted int z-values; 2.27 and 2.13, respectively. The variance for these two z-scores is 0.01. Given the small sample sizes, it was necessary to run simulations to estimate the expected variance for two independent p-values in studies with 24 and 31 degrees of freedom. The expected variance is 0.875. The probability of observing a variance of 0.01 or less with an expected variance of 0.875 is p = .085. This finding raises concerns about the assumption that the reported results were based on a representative sample of observations.

In conclusion, the widely cited article with the promising title that scores on implicit bias measures are malleable and that it is possible to combat implicit bias provided very preliminary results that by no means provide conclusive evidence that merely presenting a few positive examples of African Americans reduces prejudice.

A Large-Scale Replication Study

Nine years later, Joy-Gaba and Nosek (2010) examined whether the results reported by Dasgupta and Greenwald could be replicated. The title of the article “The Surprisingly Limited Malleability of Implicit Racial Evaluations” foreshadows the results.

Abstract
“Implicit preferences for Whites compared to Blacks can be reduced via exposure to admired Black and disliked White individuals (Dasgupta & Greenwald, 2001). In four studies (total N = 4,628), while attempting to clarify the mechanism, we found that implicit preferences for Whites were weaker in the “positive Blacks” exposure condition compared to a control condition (weighted average d = .08). This effect was substantially smaller than the original demonstration (Dasgupta & Greenwald, 2001; d = .82).”

On the one hand, the results can be interpreted as a successful replication because the study with 4,628 participants again rejected the null-hypothesis that the intervention has absolutely no effect. However, the mean difference in the replication study is only d = .08, which corresponds to an effect size estimate of 1.2 IQ points if the study had tried to raise IQ. Moreover, it is clear that the original study was only able to report a significant result because the observed mean difference in this study was inflated by 1000%.

Study 1

Participants in Study 1 were Canadian students (N = 1,403). The study differed in that it separated exposure to positive Black examples and negative White examples. Ideally, real-world training programs would aim to increase liking of African Americans rather than make people think about White people as serial killers. So, the use of only positive examples of African Americans makes an additional contribution by examining a positive intervention without negative examples of Whites. The study also included age to replicate Study 2.

Like US Americans, Canadian students also showed a preference for White over Blacks on the Implicit Association Test. So failures to replicate the intervention effect are not due to a lack of racism in Canada.

A focused analysis of the race condition showed no effect of exposure to positive Black examples, t(670) = .09, p = .93. The 95%CI of the mean difference in this study ranged from -.15 to .16. This means that with a maximum error probability of 5%, it is possible to rule out effect sizes greater than .16. This finding is not entirely inconsistent with the original article because the original study was inconclusive about effect sizes.

The replication study is able to provide a more precise estimate of the effect size and the results show that the effect size could be 0, but it could not be d = .2, which is typically used as a reference point for a small effect.

Study 2a

Study 2a reintroduced the original manipulation that exposed participants to positive examples of African Americans and negative examples of European Americans. This study showed a significant difference between the intervention condition and a control condition that exposed participants to flowers and insects, t(589) = 2.08, p = .038. The 95%CI for the effect size estimate ranged from d = .02 to .35.

It is difficult to interpret this result in combination with the result from Study 1. First, the results of the two studies are not significantly different from each other. It is therefore not possible to conclude that manipulations with negative examples of Whites are more effective than those that just show positive examples of Blacks. In combination, the results of Study 1 and 2a are not significant, meaning it is not clear whether the intervention has any effect at all. Nevertheless, the significant result in Study 2a suggests that presenting negative examples of Whites may influence responses on the race IAT.

Study 2b

Study 2b is an exact replication of Study 2a. It also replicated a significant mean difference between participants exposed to positive Black and negative White examples and the control condition, t(788) = 1.99, p = .047 (reported as p = .05). The 95%CI ranges from d = .002 to d = .28.

The problem is that now three studies produced significant results with exposure to positive Black and negative White examples (Original Study 1; replication Study 2a & 2b) and all three studies had just significant p-values (p = .023, p = .038, p = .047). This is unlikely without selection of data to attain significance.

Study 3

The main purpose of Study 3 was to compare an online sample, an online student sample, and a lab student sample. None of the three samples showed a significant mean difference.

Online sample: t(999) = .96, p = .34

Online student sample: t(93) = 0.51, p = .61

Lab student sample: t(75) = 0.70, p = .48

The non-significant results for the student samples are not surprising because sample sizes are too small to detect small effects. The non-significant result for the large online sample is more interesting. It confirms that the two p-values in Studies 2a and 2b were too similar. Study 3 produces greater variability in p-values that is expected and given the small effect size variability was increased by a non-significant result rather than a highly significant one.

Conclusion

In conclusion, there is no reliable evidence that merely presenting a few positive Black examples alters responses on the Implicit Association Test. There is some suggestive evidence that presenting negative White examples may reduce prejudice presumably by decreasing favorable responses to Whites, but even this effect is very weak and may not last more than a few minutes or hours.

The large replication study shows that the highly cited original article provided misleading evidence that responses on implicit bias measures can be easily and dramatically changed by presenting positive examples of African Americans. If it were this easy to reduce prejudice, racism wouldn’t be the problem that it still is.

Newest Evidence

In a major effort, Lai et al. (2016) examined several interventions that might be used to combat racism. The first problem with the article is that the literature review fails to mention Joy-Gaba and Nosek’s finding that interventions were rather ineffective or evidence that implicit racism measures show little natural variation over time (Cunningham et al., 2001). Instead they suggest that the ” dominant view has changed over the past 15 years to one of implicit malleability” [what they mean malleability of responses on implict tasks with racial stimuli]. While this may accurately reflect changes in social psychologists’ opinions, it ignores that there is no credible evidence to suggest that implicit attitude measures are malleable.

More important, the study also failed to find evidence that a brief manipulation could change performance on the IAT a day or more later, despite a large sample size to detect even small lasting effects. However, some manipulations produced immediate effects on IAT scores. The strongest effect was observed for a manipulation that required vivid imagination.

Vivid counterstereotypic scenario.

Participants in this intervention read a vivid second-person story in which they are the
protagonist. The participant imagines walking down a street late at night after drinking at a bar. Suddenly, a White man in his forties assaults the participant, throws him/her into the trunk of his car, and drives away. After some time, the White man opens the trunk and assaults the participant again. A young Black man notices the second assault and knocks out the White assailant, saving the day. After reading the story, participants are told the next task (i.e., the race IAT) was supposed to affirm the associations: White = Bad, Black = Good. Participants were instructed to keep the story in mind during the IAT.

When given this instruction, the pro-White bias in the IAT was reduced. However, one day later (Study 2) or two or three days later (Study 1) IAT performance was not significantly different from a control condition.

In conclusion, social psychologists have found out something that most people already know. Changing attitudes, including prejudice, is hard because they are stable and difficult to change, even when participants want to change them. A simple, 5-minute manipulation is not an intervention and it will not produce lasting changes in attitudes.

General Discussion

Social psychology has failed Black people who would like to be treated with the same respect as White people and White people who do not want to be racist.

Since Martin Luther King gave his dream speech, America has made progress towards a goal of racial equality without the help of social psychologists. Nevertheless, racial bias remains a problem, but social psychologists are too busy with sterile experiments that have no application to the real world (No! Starbucks’ employees should not imagine being abducted by White sociopaths to avoid calling 911 on Black patrons of their stores) and performance on an implicit bias test is only relevant if it predicted behavior and it doesn’t do that very well.

The whole notion of implicit bias is a creation by social psychologists without scientific foundations, but 911 calls that kill black people are real. Maybe Starbucks could fund some real racism research at Howard University because the mostly White professors at elite Universities seem to be unable to develop and test real interventions that can influence real behavior.

And last but not least, don’t listen to self-proclaimed White experts.

Social psychologists who have failed to validate measures and failed to conduct real intervention studies that might actually work are not experts. It doesn’t take a Ph.D. to figure out some simple things that can be taught in a one-day workshop for Starbucks’ employees. After all, the goal is just to get employees to treat all customers equally, which doesn’t even require a change in attitudes.

Here is one simple rule. If you are ready to call 911 to remove somebody from your coffee shop and the person is Black, ask yourself before you dial whether you would do the same if the person were White and looked like you or your brother or sister. If so, go ahead. If not, don’t touch that dial. Let them sit at a table like you let dozens of other people sit at their table because you make most of your money from people on the go anyways. Or buy them a coffee, or do something, but think twice or three times before you call the police.

And so what if it is just a PR campaign. It is a good one. I am sure there are a few people who would celebrate a nation-wide racism training day for police (maybe without shutting down all police stations).

Real change comes from real people who protest. Don’t wait for the academics to figure out how to combat automatic prejudice. They are more interested in citations and further research than to provide real solutions to real problems. Trust me, I know. I am (was?) a White social psychologist myself.

Replicability-Index

Improving the replicability of empirical research