The prompt for this essay is my personal experience with accusations of racism in response to my collaboration with my colleague Judith Andersen and her research team who investigated the influence of race on shooting errors in police officers’ annual certification (Andersen, Di Nota, Boychuk, Schimmack, & Collins, 2023a). Our article was heavily criticizes as racially insensitive and racially biased (Williams et al., 2023). We responded to the specific criticisms of our article (Andersen, Di Nota, Boychuk, Schimmack, & Collins, 2003b). This essay is takes a broader perspective on the study of race-related topics in psychological science. It is also entirely based on my own experiences and views and I do not speak for my colleagues.
Science
The term science is used to distinguish claims that are backed-up by scientific facts from claims that are based on other evidence or belief systems. For people who believe in science, these claims have a stronger influence on their personal belief systems than other claims. Take “flat-earth theorists” as an example. Most educated people these days believe that the Earth is round and point to modern astronomy as a science that supports this claim. However, some people seriously maintain the belief that the earth is flat (https://en.wikipedia.org/wiki/Behind_the_Curve). Debates between individuals or groups who “follow the science” or not are futile. In this regard, believing in science is like a religion. This article is addressed at readers who “believe in science.”
What does it mean to believe in science? A fundamental criterion that distinguishes science from other belief systems is falsifiability. At some point, empirical evidence has to be able to correct pre-existing beliefs. For this to happen, the evidence has to be strong. For example, there should be little doubt about the validity of the measures (e.g., thermometers are good measures of temperature) and the replicability of the results (different research teams obtain the same results). When these preconditions are fulfilled, scientific discoveries are made and knowledge is gained (e.g., better telescopes produce new discoveries in astronomy, microscopes showed the influence of bacteria on diseases, etc.). The success of Covid-19 vaccines (if you believe in science) was possible due to advances in microbiology. The modern world we live in would not exist without actions by individuals who believe in science.
Psychological Science
Psychological science emerged in the late 19th century as an attempt to use the scientific method to study human experiences and behavior. The biggest success stories in psychological science can be found in areas that make it possible to conduct tightly controlled laboratory studies. For example, asking people to read color words in the same color or a different color shows a robust effect that it is harder to name color of a color word if the color word does not match (say purple when the word purple is printed in green).

Psychological science of basic phenomena like perception and learning has produced many robust scientific findings. Many of these findings are so robust because they are universal; that is, shared by all humans. This is consistent with other evidence that humans are more alike than different from each other and that peripheral differences like height, hair texture, and pigmentation are superficial differences and not symptoms of clearly distinguishable groups of humans (different races).
Social Psychology
Social psychology emerged as a sub-discipline in psychological science in the 1950s. A major goal of social psychology was to use the methods of psychological science to study social behaviors with bigger social implications than the naming of colors. The most famous studies from the 1950 tried to explain the behavior of Germans during World War II who were involved in the Holocaust. The famous Milgram experiments, for example, showed that social pressure can have a strong influence on behavior. Asch showed that conformity pressure can make people say things that are objectively false. These studies are still powerful today because they used actual behaviors as the outcome. In Milgram’s studies participants were led to believe that they gave electro shocks to another person who screamed in pain.
From the beginning, social psychologists were also interested in prejudice (Allport, 1954), at a time when the United States were segregated and blatantly racist. White Americans’ racial attitudes were easy to study because White Americans openly admitted that they did not consider White and Black Americans to be equal. For example, in the 1950s, nearly 100% of Americans disapproved of interracial marriages, which were also illegal in some states at that time.
It was more difficult to study the influence of racism on behavior. To ensure that behavior is influenced by an individual’s race and not some other factor (psychology jargon for cause), it is necessary to keep all other causes constant and then randomly assign participants to the two conditions and show a difference in outcome. My search for studies of this type revealed only a handful of studies with small student samples that showed no evidence of prejudice (e.g., Genthner & Taylor, 1973). There are many reasons why these studies may have failed to produce evidence of prejudice. For example, participants knew that they were in a study and that their behaviors were observed, which may have influenced how they behaved. Most important is the fact that the influence of prejudice on behavior was not a salient topic in social psychology.
This changed in the late 1980s (at a time when I became a student of psychology), when social psychologists became interested in unconscious processes that were called implicit processes (Devine, 1989). The novel idea was that racial biases can influence behavior outside of conscious awareness. Thus, some individuals might claim that they have no prejudices, but their behaviors show otherwise. Twenty years later, this work led to the claim that most White people have racial biases that influence their behavior even if they do not want to (Banaji & Greenwald, 2013).
Notably, in the late 1980s, 40% of US Americans still opposed interracial marriages, showing that consciously accessible, old fashioned racism was still prevalent in the United States. However, the primary focus of social psychologists was not the study of prejudice, but the study of unconscious/implicit processes, implicit prejudice was just one of many implicit topics that were being the topic of investigation.
While the implicit revolution led to hundreds of studies that examined White people’s behaviors in responses to Black and White persons, the field also made an important methodological change. Rather than studying real behaviors to real people, most studies examined how fast participants can press a button in response to a stimulus (e.g. a name, a face, or simply the words Black/White) on a computer screen. The key problem with this research is that button presses on computer screens are not the same as button presses on dating profiles or pressing the trigger on a gun during a use of force situation.
This does not mean that these studies are useless, but it is evident that they cannot produce scientific evidence about the influence of race on behavior in the real world. In the jargon of psychological science, these studies lack external validity (i.e., the results cannot be generalized from button presses in computer tasks to real world behaviors).
Psychological Science Lacks Credibility
Psychology faces many challenges to be recognized as a science equal to physics, chemistry, or biology. One major challenge is that the behaviors of humans vary a lot more than the behaviors of electrons, atoms, or cells. As a result, many findings in social psychology are general trends that explain only a small portion of the variability in behavior (e.g., some White people are in interracial relationships). To deal with this large amount of variability (noise, randomness), psychologists rely on statistical methods that aim to detect small effects on the variability in behavior. Since the beginning of psychological science, the statistical method to find these effects is a statistical method called null-hypothesis significance testing or simply significance testing (Is p < .05?). Although this method has been criticized for decades, it continues to be taught to undergraduate students and is used to make substantive claims in research articles.
The problem with significance testing is that it is designed to confirm researchers’ hypotheses, but it cannot falsify them. Thus, the statistical tool cannot serve the key function of science to inform researchers that they ideas are wrong. As researchers are human and humans already have a bias to find evidence that supports their beliefs, significance testing is an ideal tool for scientists to delude themselves that their claims are supported by scientific evidence (p < .,05), when their beliefs are wrong.
Awareness of this problem increased after a famous social psychologist, Daryl Bem, used NHST to convince readers that humans have extrasensory perception and can foresee future events (Bem, 2011). Attesting to the power of confirmation bias, Bem still believes in ESP, but the broader community has realized that the statistical practices in social psychology are unscientific and that decades of published research lacks scientific credibility. It did not help that a replication project found that only 25% of published results in the most prestigious journals of social psychology could be replicated.
Despite growing awareness about the lack of credible scientific evidence, claims about prejudice and racism in textbooks, popular books, and media articles continue to draw on this literature because there is no better evidence (yet). The general public and undergraduate students make the false assumption that social psychologists are like astronomers who are interpreting the latest pictures from the new space telescope. Social psychologists are mainly presenting their own views as if they were based on scientific evidence, when there is no scientific evidence to support these claims. This explains why social psychologists often vehemently disagree about important issues. There is simply no shared empirical evidence that resolves these conflicts.
Thus, the disappointing and honest answer is that social psychology simply cannot provide scientific answers to real world questions about racial biases in behavior. Few studies actually examined real behavior, studies of button presses on computers have little ecological validity, and published results are often not replicable.
The Politicization of Psychological Science
In the absence of strong and unambiguous scientific evidence, scientists are no different from other humans and confirmation biases will influence scientist’s beliefs. The problem is that the general public confuses their status as university professors and researchers with expertise that is based on superior knowledge. As a result, claims by professors and researchers in journal articles or in books, talks, or newspaper interviews are treated as if they deserve more weight than other views. Moreover, other people may refer to the views of professors or their work to claim that their own view are scientific because they echo those printed in scientific articles. When these claims are not backed by strong scientific evidence, scientific articles become weaponized in political conflicts.
A scientific article on racial biases in use of force errors provides an instructive example. In 2019, social psychologist Joseph Cesario and four graduate students published an article on racial disparities in use of force errors by police (a.k.a., unnecessary killings of US civilians). The article passed peer-review at a prestigious scientific journal, the Proceedings of the National Academy of Sciences (PNAS). Like many journals these days, PNAS asks authors to provide a Public Significance Statement.

The key claim in the significance statement is that the authors found “no evidence of anti-Black or anti-Hispanic disparities across shootings.” Scientists may look at this statement and realize that it is not equivalent to the claim that “there is no racial bias in use of force errors.” First of all, the authors clearly say that they did not find evidence. This leaves the possibility that other people looking at the same data might have found evidence. Among scientists it is well known that different analyses can produce different results. Scientists also know the important distinction between the absence of evidence and evidence of the absence of an effect. The significance statement does not say that the results show that there are no racial biases, only that the authors did not find evidence for biases. However, significance statements are not written for scientists and it is easy to see how these statement could be (unintentionally or intentionally) misinterpreted as saying that science shows that there are no racial biases in police killings of innocent civilians.
And this is exactly what happened. Black-Lives-Anti-Matter Heather Mac Donald, used this research as “scientific evidence” to support the claim that the liberal left is fighting an unjustified “War on Cops” Her bio on Wikipedia shows that she received degrees in English, without any indication that she has a background in science. Yet, the Wall Street journal allowed her to summarize the evidence in an opinion article with the title “The myth of systemic police racism.” Thus, a racially biased and politically motivated non-scientist was able to elevate her opinion by pointing to the PNAS article as evidence that her opinion is the truth.
In this particular case, the journal was forced to retract the article after post-publication peer-reviewed revealed statistical errors in the paper and it became clear that the significance statement was misleading. An editorial reviewed this case-study of politicized science in great detail (Massey & Waters, 2020).


Although this editorial makes it clear that mistakes were made, it doesn’t go far enough in admitting the mistakes that were made by the journal editors. Most important, even if the authors had not made mistakes, it would be wrong to allow for any generalized conclusions in a significance statement. The clearest significance statement would be that “This is only one study of the issue with limitations and the evidence is insufficient to draw conclusions based on this study alone.” But journals are also motivated to exaggerate the importance of articles to increase their prestige.
The editorial also fails to acknowledge that the authors, reviewers, and editor were White and that it is unlikely that the article would have made misleading statements if African American researchers were involved in the research, peer-review, or the editorial decision process. To African Americans the conclusion that there is no racial bias in policing is preposterous, while it seemed plausible to the White researchers who gave the work the stamp of approval. Thus, this case study also illustrates the problems of systemic racism in psychology that African Americans are underrepresented and often not involved in research that directly affects them and their community.
My Colleague’s Research with Police Officers
My colleague, Judith Andersen, is a trained health psychologist, with a focus on stress and health. One area of research is how police officers cope with stress and traumatic experiences they encounter in their work. This research put her in a unique position to study racial biases in the use of force with actual police officers (i.e., many social psychologists studied shooting games with undergraduate students). Getting the cooperation of police department and individual officers to study such a highly politicized topic is not easy and without cooperation there are no PARTICIPANTS, no data, and no scientific evidence. A radical response to this reality would be to reject any data that require police officers’ consent. That is a principled response, but not a criticism of researchers who conduct studies and note the requirement of consent as a potential limitation and refrain from making bold statements that their data settle a political issue.
The actual study is seemingly simple. Officers have to pass a use of force test for certification to keep their service weapon on duty. To do so, officers go through a series of three realistic scenarios with their actual service weapon and do not know whether “shoot” or “don’t shoot” is the right response. Thus, they may fail the test if they fail to shoot in scenarios where shooting is the right response. The novel part of the study was to create two matched scenarios with a White or Black suspect and randomly assigned participating officers to these scenarios. Holding all other possible causes constant make it possible to see whether shooting errors are influenced by the race of a suspect.
After several journals, including PNAS, showed no interest in this work, it was eventually accepted for publication by the editor of The Canadian Journal of Behavioural Science. The journal also requires a Significance statement and we provided one.

Scientists might notice that our significance statement is essentially identical to Johnson et al.’s fateful significance statement. In plain English, we did not find evidence of racial biases in shooting errors. The problem is that significance testing often lead to the confusion of lack of evidence and evidence of no bias. To avoid this misinterpretation, we made it clear that our results cannot be interpreted as evidence that there are no biases. To do so, we emphasized that the shooting errors in the sample did show a racial bias. However, we could not rule out that this bias was unique to this sample and that the next sample might show no bias or even the opposite bias. We also point out that the bias in this sample might be smaller than the actual bias and that the actual bias might fully account for the real world disparities. In short, our significance statement is an elaborate, jargony way of saying “our results are inconclusive and have no real-world significance.”
It is remarkable that the editor published our article because 95% of articles in psychology present a statistically significant result that justifies a conclusion. This high rate of successful studies, however, is a problem because selective publishing of only significant results undermines the credibility of published results. Even cray claims like mental time travel are supported by statistically significant results. Only the publication of studies that failed to replicate these results help us to see that the original results were wrong. It follows that journals have to publish articles with inconclusive results to be credible and researchers have to be allowed to present inconclusive results to ensure that conclusive results are trustworthy. It also follows that not all scientific articles are in need of media attention and publicity. The primary aim of scientific journals is communication among scientists and to maintain a record of scientific results. Even Nadal or Federer did not win every tournament. So, scientists should be allowed to publish articles that are not winners and nobody should trust scientists who only publish articles that confirm their predictions.
It is also noteworthy that our results were inconclusive because the sample size was too small to draw stronger conclusions. However, it was the first study of its kind and it was already a lot of effort to get even these data. The primary purpose of publishing a study like this is to stimulate interest and to provide an example for future studies. Eventually, the evidence base grows and more conclusive results could be obtained. Ultimately it is up to the general public and policy makers to fund this research and to require participation of police departments in studies of racial bias. It would be foolish to criticize our study because it didn’t produce conclusive results in the first investigation. Even if the study had produced statistically significant results, replication studies would be needed before any conclusions can be drawn.
Social Activism in Science
Williams et al. (2023) wrote a critical commentary of our article with the title “Performative Shooting Exercises Do Not Predict Real-World Racial Bias in Police Officers” We were rather surprised by this criticism because our main finding was basically a non-significant, inconclusive result. Apparently, this was not the result that we were supposed to get or we should not have reported these results that contradict Williams et al.’s beliefs. Williams et al. start with the strong belief that any well-designed scientific study must find evidence for racial biases in shooting errors; otherwise there must be a methodological flaw. They are not shy to communicate this belief in their title. Our study of shooting errors during certification are called performative and they “do not predict real world racial biases in police officers.” The question is how Williams et al. (2023) know the real-world racial biases of police officers to make this claim.
The answer is that they do not know anything more than anybody else about the real racial biases of police officers (You are invited to read the commentary and see whether I missed that crucial piece of information). Their main criticism is that we made unjustified assumptions about the external validity of the certification task. “The principal flaw with Andersen et al.’s (2023) paper is unscientific assumptions around the validity of the recertification shooting test” That is, the bias that we observed in the certification task is taken at face value as information about the bias in real-world shooting situations.
The main problem with this criticism is that we never made the claim that biases in the certification task can be used to draw firm conclusions about biases in the real world. We even pointed out that we observed biases and that our results are consistent with the assumption that all of the racial disparities in real-world shootings are caused by racial biases in the immediate shooting decisions. As it turns out, Williams et al.’s critique is unscientific because it makes unscientific claims in the title and misrepresents our work. Our real sin was to be scientific and to publish inconclusive results that do not fit into the narrative of anti-police leftwing propaganda.
It is not clear why the authors were allowed to make many false and hurtful statements in their commentary, but personally I think it is better to have this example of politicization in the open to show that left-wing and right-wing political activists are trying to weaponize science to elevate their beliefs to the status of truth and knowledge.
Blatant examples of innocent African Americans killed by police officers (wikipedia) are a reason to conduct scientific studies, but these incidences cannot be used to evaluate the scientific evidence. And without good science, resources might be wasted on performative implicit bias training sessions that only benefit the trainers and do not protect the African American community.

Conclusion
The simple truth remains that psychological science has done little to answer real-world questions around race. Although social psychology has topics like prejudice and intergroup relationships as core topics, the research is often too removed from the real world to be meaningful. Unfortunately, incentives reward professors to use the inconclusive evidence selectively to confirm their own beliefs and then to present these beliefs as scientific claims. These pseudo-scientific claims are then weaponized by like-minded ideologues. This creates the illusion that we have scientific evidence, which is contradicted by the fact that opposing camps both cite science to believe they are right just like opposing armies can pray to the same God for victory.
To change this, stakeholders in science, like government funding organizations, need to change the way money is allocated. Rather than giving grants to White researchers at elite universities to do basic (a.k.a., irrelevant) research on button-presses of undergraduate students, money should be given to diverse research teams with a mandate to answer practical, real-world questions. The reward structure at universities also has to change. To collect real world data from 150 police officers is 100 times more difficult than collecting 20 brain measures from undergraduate students. Yet, a publication in a neuroscience journal is seen as more scientific and prestigious than an article in journal that addresses real-world problems that are by nature of interest to smaller communities.
Finally, it is important to recognize that a single study cannot produce conclusive answers to important and complex questions. All the major modern discoveries in the natural (real) sciences are made by teams. Funders need to provide money for teams that work together on a single important question rather than funding separate labs who work against each other. This is not new and has been said many times before, but so far there is little evidence of change. As a result, we have more information about galaxies millions of years ago than about our own behaviors and the persistent problem of racism in modern society. Don’t look to the scientists to provide a solution. Real progress has and will come from social activists and political engagement. And with every generation, more old racists will be replaced by a more open new generation. This is the way.