“Trust is good, but control is better”
Information about the replicability of published results is important because empirical results can only be used as evidence if the results can be replicated. However, the replicability of published results in social psychology is doubtful.Brunner and Schimmack (2018) developed a statistical method called z-curve to estimate how replicable a set of significant results are, if the studies were replicated exactly. In a replicability audit, I am applying z-curve to the most cited articles of psychologists to estimate the replicability of their studies.
Susan T. Fiske
Susan T. Fiske is an eminent social psychologist (H-Index in WebofScience = 66). She also is a prominent figure in meta-psychology. Her most important contribution to meta-psychology was a guest column in the APS Observer (Fiske, 2016), titled “A Call to Change Science’s Culture of Shaming.” ur field has always encouraged — required, really — peer critiques. But the new media (e.g., blogs, Twitter, Facebook) can encourage a certain amount of uncurated, unfiltered denigration. In the most extreme examples, individuals are finding their research programs, their careers, and their personal integrity under attack.”
In her article, she refers to researchers who examine the replicability of published results as “self-appointed data police,” which is relatively mild in comparison to the term “method terrorist” that she used in a leaked draft of her article.
She accuses meta-psychologists of speculating about the motives of researchers who use questionable research practices, but she never examines the motives of meta-psychologists. Why are they devoting their time and resources to meta-psychology and publish their results on social media rather than advancing their career by publishing original research in peer-reviewed journals. One possible reason is that meta-psychologists recognize deep and fundamental problems in the way social psychologists conduct research and they are trying to improve it.
Instread, Fiske denies that psychological science and claims that the Association for Psychological Science (APS) is a leader in promoting good scientific practices.
What’s more, APS has been a leader in encouraging robust methods: transparency, replication, power analysis, effect-size reporting, and data access.
She also dismisses meta-psychological criticism of social psychology as unfounded.
But some critics do engage in public shaming and blaming, often implying dishonesty on the part of the target and other innuendo based on unchecked assumptions.
In this blog post, I am applying z-curve to Susan T. Fiske’s results to examine whether she used questionable research practices to report mostly significant results that support her predictions, and to examine how replicable her published results are. The scientific method, z-curve, makes assumptions that have been validated in simulation studies (Brunner & Schimmack, 2018).
I used WebofScience to identify the most cited articles by Susan T. Fiske (datafile). I then selected empirical articles until the number of coded articles matched the number of citations, resulting in 41 empirical articles (H-Index = 41). The 41 articles reported 76 studies (average 1.9 studies per article). The total number of participants was 21,298 with a median of 54 participants per study. For each study, I identified the most focal hypothesis test (MFHT). The result of the test was converted into an exact p-value and the p-value was then converted into a z-score. The z-scores were submitted to a z-curve analysis to estimate mean power of the 65 results that were significant at p < .05 (two-tailed). Six studies did not test a hypothesis or predicted a non-significant result. The remaining 5 results were interpreted as evidence with lower standards of significance. Thus, the success rate for 70 reported hypothesis tests was 100%.
The z-curve estimate of replicability is 59% with a 95%CI ranging from 42% to 77%. The complementary interpretation of this result is that the actual type-II error rate is 41% compared to the 0% of non-significant results reported in the articles.
The histogram of z-values shows the distribution of observed z-scores (blue line) and the predicted density distribution (grey line). The predicted density distribution is also projected into the range of non-significant results. The area under the grey curve is an estimate of the file drawer of studies that need to be conducted to achieve 100% successes with 59% average power. The ratio of the area of non-significant results to the area of all significant results (including z-scores greater than 6) is called the File Drawer Ratio. Although this is just a projection, and other questionable practices may have been used, the file drawer ratio of 1.63 and the figure makes it clear that the reported results were selected to support theoretical predictions.
Z-curve is under development and offers additional information other than the replicabilty of significant results. One new feature is an estimate of the maximum number of false positive results. The maximum percentage of false positive results is estimated to be 30% (95%CI = 10% to 60%). This estimate means that a z-curve with a fixed percentage of 30% false positives fits the data nearly as well as a z-curve without restrictions on the percentage of false positives. Given the relatively small number of studies, the estimate is not very precise and the upper limit goes as high as 60%. It is unlikely that there are 60% false positives, but the point of empirical research is to reduce the risk of false positives to an acceptable level of 5%. Thus, the actual risk is unacceptably high.
A 59% replicability estimate is actually very high for a social psychologist. However, it would be wrong to apply this estimate to all studies. The estimate is an average and replicability varies as a function of the strength of evidence against the null-hypothesis (the magnitude of a z-score). This is shown with the replicabiilty estimates for segments of z-scores below the x-axis. For just significant results with z-scores from 2 to 2.5 (~ p < .05 & p > .01), replicability is only 33%. This means, these results are less likely to replicate and results of actual replication studies show very low success rates for studies with just significant results. Without selection bias, significant results have an average replicabilit greater than 50%. However, with selection for significance, this is no longer the case. For Susan T. Fiske’s data, the criterion value to achieve 50% average replicability is a z-score greater than 3 (as opposed to 1.96 without selection). 56 reported results meet this criterion. This is a high percentage of credible results for a social psychologist (see links to other replicability audits at the end of this post).
Although Susan T. Fiske’s work has not been the target of criticism by meta-psychologists, she has been a vocal critic of meta-psychologists. This audit shows that here work is more replicable than the work by other empirical social psychologists. One explanation for Fiske’s defense of social psychology could be the false consensus effect, which is a replicable social psychological phenomenon. In the absence of hard evidence, humans tend to believe that others are more similar to them than they actually are. Maybe Susan Fiske assumed that social psychologists who have been criticized for their research practices were conducting research like herself. A comparison of different audits (see below) shows that this is not the case. I wonder what Fiske thinks about the research practices of her colleague that produce replicability estimates well below 50%. I believe that a key contributor to the conflict between experimental social psychologists and meta-psychologist is the lack of credible information about the extend of the crisis. Actual replication studies and replicability reports provide much needed objective facts. The question is whether social psychologists like Susan Fiske are willing to engage in a scientific discussion about these facts or whether they continue to ignore these facts to maintain the positive illusion that social psychological results can be trusted.
It is nearly certain that I made some mistakes in the coding of Susan T. Fiske’s articles. However, it is important to distinguish consequential and inconsequential mistakes. I am confident that I did not make consequential errors that would alter the main conclusions of this audit. However, control is better than trust and everybody can audit this audit. The data are openly available and the z-curve code is also openly available. Thus, this replicability audit is fully transparent and open to revision.