“Remedies include tracking one’s own questionable research practices” (Susan T. Fiske)
In 1959, Sterling observed that results sections of psychological articles provide no information. The reason is that studies nearly always reject the null-hypothesis. As a result, it is not necessary to read the results section. It is sufficient to read the predictions that are being made in the instruction because the outcome of the empirical test is a forgone conclusion.
in 1962, Cohen (1962) found that studies published in the Journal of Abnormal and Social Psychology (now separated into Journal of Abnormal Psychology and Journal of Personality and Social Psychology) have modest power to produce significant results. Three decades later, Gigerenzer and Sedlmeier (1989) replicated this finding.
Thus, for nearly 60 years social psychologists have been publishing many more significant results than they actually obtain in their laboratories, making the empirical results in their articles essentially meaningless. Every claim in social psychology, including crazy findings that nobody believes (Bem, 2011), is significant.
Over the past decades, some social psychologists have rebelled against the status quo in social psychology. To show that significant results do not provide empirical evidence, they have conducted replication studies and reported results even when they did not show a significant results. Suddenly, the success rate of nearly 100% dropped to 25%. Faced with this dismal result that reveals the extend of questionable practices in the field, some social psychologists have tried to downplay the significance of replication failures. Leader of the disinformation movement is Susan Fiske, who was invited to comment on a special issue on the replication crisis in the Journal of Experimental Social psychology (Fiske, 2016). Her article “How to publish rigorous experiments in the 21st century” is an interesting example of deceptive publishing that avoids dealing with the real issue.
First, it is always important to examine the reference list to examine bias in the literature review. For example, Fiske does not mention Bem’s embarrassing article that started the crisis, John et al.’s article on the use of questionable research practices, or Francis and Schimmack’s work on bias detection, although these articles are mentioned in several of the articles she comments on. For example, Hales (2016) writes .
” In fact, in some cases failed replications have been foreshadowed by analyses showing that the evidence reported in support of a finding can be implausibly positive. For example, multiple analyses have questioned whether findings in support of precognition (Bem, 2011) are too good to be obtained without using questionable research practices (Francis, 2012;
Schimmack, 2012). In line with these analyses, researchers who have replicated Bem’s procedures have not replicated his results (Galak, LeBoeuf, Nelson, & Simmons, 2012; Ritchie, Wiseman, & French, 2012; Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit, 2012).“
Fiske concludes that the replication crisis is an opportunity to improve research practices. She writes “Constructive advice for 21st century publication standards includes appropriate theory, internal validity, and external validity” Again, it is interesting what she is not saying. If theory, internal and external validity are advice for social psychologists in the 21st century, it implies that 20th century social psychologists did not have good theories and that studies lacked internal and external validity. After all, we do not give advice when things are going well.
Fiske (2016) discusses the replication crisis under the heading of internal validity.
Hales (2016) points out that, in the effort to report effects that are both significant and interesting, researchers may go beyond what the data allow. Over-claiming takes forms beyond the familiar Type I (false positive) and Type II (false negative) errors. A proposed Type III error describes reaching an accurate conclusion but by flawed methods (e.g., confirmation bias, hypothesizing after results are known, discarding data). A proposed Type IV error describes reaching an accurate conclusion based on faulty evidence (insufficient power, invalid measures). Remedies include tracking one’s own questionable research practices (e.g., ad hoc stopping, non-disclosure of failed replications, exploration reported as confirmation) or calculating the plausibility of one’s data (e.g., checking for experimenter bias during analysis). Pre-registration and transparency are encouraged.”
This is as close as Fiske comes to talking about the fundamental problem in social psychology, but Type-III errors are not just a hypothetical possibility; they are the norm in social psychology. Type-III errors explain how social psychologists can be successful most of the time, when their studies have a low probabilty to be successful.
Fiske’s recommendations for improvement are obscure. What does it mean for researchers to “track their own questionable practices?” Is there an acceptable quota of using these practices? What should researchers do when they find that they are using these questionable practices? How would researchers calculate the plausibilty of their data, and why is pre-registration useful? Fiske does not elaborate on this because she is not really interested in improving practices. At least, she makes it very clear what she does not want to happen: she opposes a clear code of research ethics that specifies which practices violate research integrity.
Norms about acceptable research methods change by social influence, not by regulation. As social psychology tells us, people internalize change when they trust and respect the source. A
punishing, feared source elicits at best compliance and at worst reactance, not to mention the source’s own reputational damage.”
This naive claim ignores that many human behaviors are regulated by social norms that are enforced with laws. Even scientists have social norms about fraud and Stapel was fired for fabricating data. Clearly, academic freedom has limits. If fabricating data is unethical, it is not clear why hiding disconfirming evidence should be a personal choice.
Fiske also expresses here dislike of blog posts and so-called vigilantes.
“For the most part, the proposals in this special issue are persuasive communications,
not threats. And all are peer-reviewed, not mere blog posts. And they are mostly reasoned advisory proposals, not targeted bullying. As such, they appropriately treat other researchers as
colleagues, not miscreants. This respectful discourse moves the field forward better than vigilantism.”
Maybe as a social psychologist, she should be aware that disobedience and protest have always been a part of social change, especially when powerful leaders opposed social change. Arguments that results sections in social psychology are meaningless have been made by eminent researchers in peer-reviewed publications (e.g., Cohen, 1994; Schimmack, 2012) and on blog posts (e.g., R-Index blog). The validity of the argument does not depend on the medium or peer-review, but on the internal and external validity of the evidence, and the evidence for sixty years has shown that social psychologists inflate their success rate.
There is also no evidence that social psychologists follow Fiske’s advice to track their own questionable research practices or avoid the use of these practices. This is not surprising. There is no real incentive to change behaviors and behavior does not change when the reinforcement schedule does not change. As long as p < .05 is rewarded and p > .05 is punished, psychologists will continue to publish meaningless p-values (Sterling, 1959). History has shown again and again that powerful elites do not change for the goodness of the greater good. Real change will come from public pressure (e.g., undergraduate students, funders) to demand honest reporting of results.