Category Archives: Social Psychology

The Prevalence of Questionable Research Practices in Social Psychology

Introduction

A naive model of science assumes that scientists are objective. That is, they derive hypotheses from theories, collect data to test these theories, and then report the results. In reality, scientists are passionate about theories and often want to confirm that their own theories are right. This leads to conformation bias and the use of questionable research practices (QRPs, John et al., 2012; Schimmack, 2015). QRPs are defined as practices that increase the chances of the desired outcome (typically a statistically significant result) while at the same time inflating the risk of a false positive discovery. A simple QRP is to conduct multiple studies and to report only the results that support the theory.

The use of QRPs explains the astonishingly high rate of statistically significant results in psychology journals that is over 90% (Sterling, 1959; Sterling et al., 1995). While it is clear that this rate of significant results is too high, it is unclear how much it is inflated by QRPs. Given the lack of quantitative information about the extent of QRPs, motivated biases also produce divergent opinions about the use of QRPs by social psychologists. John et al. (2012) conducted a survey and concluded that QRPs are widespread. Fiedler and Schwarz (2016) criticized the methodology and their own survey of German psychologists suggested that QRPs are not used frequently. Neither of these studies is ideal because they relied on self-report data. Scientists who heavily use QRPs may simply not participate in surveys of QRPs or underreport the use of QRPs. It has also been suggested that many QRPs happen automatically and are not accessible to self-reports. Thus, it is necessary to study the use of QRPs with objective methods that reflect the actual behavior of scientists. One approach is to compare dissertations with published articles (Cairo et al., 2020). This method provided clear evidence for the use of QRPs, even though a published document could reveal their use. It is possible that this approach underestimates the use of QRPs because even the dissertation results could be influenced by QRPs and the supervision of dissertations by outsiders may reduce the use of QRPs.

With my colleagues, I developed a statistical method that can detect and quantify the use of QRPs (Bartos & Schimmack, 2020; Brunner & Schimmack, 2020). Z-curve uses the distribution of statistically significant p-values to estimate the mean power of studies before selection for significance. This estimate predicts how many non-significant results were obtained in the serach for the significant ones. This makes it possible to compute the estimated discovery rate (EDR). The EDR can then be compared to the observed discovery rate, which is simply the percentage of published results that are statistically significant. The bigger the difference between the ODR and the EDR is, the more questionable research practices were used (see Schimmack, 2021, for a more detailed introduction).

I merely focus on social psychology because (a) I am a social/personality psychologists, who is interested in the credibility of results in my field, and (b) because social psychology has a large number of replication failures (Schimmack, 2020). Similar analyses are planned for other areas of psychology and other disciplines. I also focus on social psychology more than personality psychology because personality psychology is often more exploratory than confirmatory.

Method

I illustrate the use of z-curve to quantify the use of QRPs with the most extreme examples in the credibility rankings of social/personality psychologists (Schimmack, 2021). Figure 1 shows the z-value plot (ZVP) of David Matsumoto. To generate this plot, the tests statistics from t-tests and F-tests were transformed into exact p-values and then transformed into the corresponding values on the standard normal distribution. As two-sided p-values are used, all z-scores are positive. However, because the curve is centered over the z-score that corresponds to the median power before selection for significance (and not zero, when the null-hypothesis is true), the distribution can look relatively normal. The variance of the distribution will be greater than 1 when studies vary in statistical power.

The grey curve in Figure 1 shows the predicted distribution based on the observed distribution of z-scores that are significant (z > 1.96). In this case, the observed number of non-significant results is similar to the predicted number of significant results. As a result, the ODR of 78% closely matches the EDR of 79%.

Figure 2 shows the results for Shelly Chaiken. The first notable observation is that the ODR of 75% is very similar to Matsumoto’s EDR of 78%. Thus, if we simply count the number of significant and non-significant p-values, there is no difference between these two researchers. However, the z-value plot (ZVP) shows a dramatically different picture. The peak density is 0.3 for Matsoumoto and 1.0 for Chaiken. As the maximum density of the standard normal distribution is .4, it is clear that the results in Chaiken’s articles are not from an actual sampling distribution. In other words, QRPs must have been used to produce too many just significant results with p-values just below .05.

The comparison of the ODR and EDR shows a large discrepancy of 64 percentage points too many significant results (ODR = 75% minus EDR = 11%). This is clearly not a chance finding because the ODR falls well outside the 95% confidence interval of the EDR, 5% to 21%.

To examine the use of QPSs in social psychology, I computed the EDR and ORDR for over 200 social/personality psychologists. Personality psychologists were excluded if they reported too few t-values and F-values. The actual values can be found and additional statistics can be found in the credibility rankings (Schimmack, 2021). Here I used these data to examine the use of QRPs in social psychology.

Average Use of QRPs

The average ODR is 73.48 with a 95% confidence interval ranging from 72.67 to 74.29. The average EDR is 35.28 with a 95% confidence interval ranging from 33.14 to 37.43. the inflation due to QRPs is 38.20 percentage points, 95%CI = 36.10 to 40.30. This difference is highly significant, t(221) = 35.89, p < too many zeros behind the decimal for R to give an exact value.

It is of course not surprising that QRPs have been used. More important is the effect size estimate. The results suggest that QRPs inflate the discovery rate by over 100%. This explains why unbiased replication studies in social psychology have only a 25% chance of being significant (Open Science Collaboration, 2015). In fact, we can use the EDR as a conservative predictor of replication outcomes (Bartos & Schimmack, 2020). While the EDR of 35% is a bit higher than the actual replication rate, this may be due to the inclusion of non-focal hypothesis tests in these analyses. Z-curve analyses of focal hypothesis tests typically produce lower EDRs. In contrast, Fiedler and Schwarz failed to comment on the low replicability of social psychology. If social psychologists would not have used QRPs, it remains a mystery why their results are so hard to replicate.

In sum, the present results confirm that, on average, social psychologists heavily used QRPs to produce significant results that support their predictions. However, these averages masks differences between researchers like Matsumoto and Chaiken. The next analyses explore these individual differences between researchers.

Cohort Effects

I had no predictions about the effect of cohort on the use of QRPs. I conducted a twitter poll that suggested a general intuition that the use of QRPs may not have changed over time, but there was a lot of uncertainty in these answers. Similar results were obtained in a Facebook poll in the Psychological Methods Discussion Group. Thus, the a priori hypothesis is a vague prior of no change.

The dataset includes different generations of researchers. I used the first publication listed in WebofScience to date researchers. The earliest date was 1964 (Robert S. Wyer). The latest date was 2012 (Kurt Gray). The histogram shows that researchers from the 1970s to 2000s were well-represented in the dataset.

There was a significant negative correlation between the ODR and cohort, r(N = 222) = -.25, 95%CI = -.12 to -.37, t(220) = 3.83, p = .0002. This finding suggests that over time the proportion of non-significant results increased. For researchers with the first publication in the 1970s, the average ODR was 76%, whereas it was 72% for researchers with the first publication in the 2000s. This is a modest trend. There are various explanations for this trend.

One possibility is that power decreased as researchers started looking for weaker effects. In this case, the EDR should also show a decrease. However, the EDR showed no relationship with cohort, r(N = 222) = -.03, 95%CI = -.16 to .10, t(220) = 0.48, p = .63. Thus, less power does not seem to explain the decrease in the ODR. At the same time, the finding that EDR does not show a notable, abs(r) < .2, relationship with cohort suggests that power has remained constant over time. This is consistent with previous examinations of statistical power in social psychology (Sedlmeier & Gigerenzer, 1989).

Although the ODR decreased significantly and the EDR did not decrease significantly, bias (ODR – EDR) did not show a significant relationship with cohort, r(N = 222) = -.06, 95%CI = -19 to .07, t(220) = -0.94, p = .35, but the 95%CI allows for a slight decrease in bias that would be consistent with the significant decrease in the ODR.

In conclusion, there is a small, statistically significant decrease in the ODR, but the effect over the past 40 decades is too small to have practical significance. The EDR and bias are not even statistically significantly related to cohort. These results suggest that research practices and the use of questionable ones has not changed notably since the beginning of empirical social psychology (Cohen, 1961; Sterling, 1959).

Achievement Motivation

Another possibility is that in each generation, QRPs are used more by researches who are more achievement motivated (Janke et al., 2019). After all, the reward structure in science is based on number of publications and significant results are often needed to publish. In social psychology it is also necessary to present a package of significant results across multiple studies, which is nearly impossible without the use of QRPs (Schimmack, 2012). To examine this hypothesis, I correlated the EDR with researchers’ H-Index (as of 2/1/2021). The correlation was small, r(N = 222) = .10, 95%CI = -.03 to .23, and not significant, t(220) = 1.44, p = .15. This finding is only seemingly inconsistent with Janke et al.’s (2019) finding that self-reported QRPs were significantly correlated with self-reported ambition, r(217) = .20, p = .014. Both correlations are small and positive, suggesting that achievement motivated researchers may be slightly more likely to use QRPs. However, the evidence is by no means conclusive and the actual relationship is weak. Thus, there is no evidence to support that highly productive researchers with impressive H-indices achieved their success by using QRPs more than other researchers. Rather, they became successful in a field where QRPs are the norm. If the norms were different, they would have become successful following these other norms.

Impact

A common saying in science is that “extraordinary claims require extraordinary evidence.” Thus, we might expect stronger evidence for claims of time-reversed feelings (Bem, 2011) than for evidence that individuals from different cultures regulate their emotions differently (Matsumoto et al., 2008). However, psychologists have relied on statistical significance with alpha = .05 as a simple rule to claim discoveries. This is a problem because statistical significance is meaningless when results are selected for significance and replication failures with non-significant results remain unpublished (Sterling, 1959). Thus, psychologists have trusted an invalid criterion that does not distinguish between true and false discoveries. It is , however, possible that social psychologists used other information (e.g, gossip about replication failures at conferences) to focus on credible results and to ignore incredible ones. To examine this question, I correlated authors’ EDR with the number of citations in 2019. I used citation counts for 2019 because citation counts for 2020 are not yet final (the results will be updated with the 2020 counts). Using 2019 increases the chances of finding a significant relationship because replication failures over the past decade could have produced changes in citation rates.

The correlation between EDR and number of citations was statistically significant, r(N = 222) = .16, 95%CI = .03 to .28, t(220) = 2.39, p = .018. However, the lower limit of the 95% confidence interval is close to zero. Thus, it is possible that the real relationship is too small to matter. Moreover, the non-parametric correlation with Kendell’s tau was not significant, tau = .085, z = 1.88, p = .06. Thus, at present there is insufficient evidence to suggest that citation counts take the credibility of significant results into account. At present, p-values less than .05 are treated as equally credible no matter how they were produced.

Conclusion

There is general agreement that questionable research practices have been used to produce an unreal success rate of 90% or more in psychology journals (Sterling, 1959). However, there is less agreement about the amount of QRPs that are being used and the implications for the credibility of significant results in psychology journals (John et al., 2012; Fiedler & Schwarz, 2016). The problem is that self-reports may be biased because researchers are unable or unwilling to report the use of QRPs (Nisbett & Wilson, 1977). Thus, it is necessary to examine this question with alternative methods. The present study used a statistical method to compare the observed discovery rate with a statistically estimated discovery rate based on the distribution of significant p-values. The results showed that on average social psychologists have made extensive use of QRPs to inflate an expected discovery rate of around 35% to an observed discovery rate of 70%. Moreover, the estimated discovery rate of 35%is likely to be an inflated estimate of the discovery rate for focal hypothesis tests because the present analysis is based on focal and non-focal tests. This would explain why the actual success rate in replication studies is even lower thna the estimated discovery rate of 35% (Open Science Collaboration, 2015).

The main novel contribution of this study was to examine individual differences in the use of QRPs. While the ODR was fairly consistent across articles, the EDR varied considerably across researchers. However, this variation showed only very small relationships with a researchers’ cohort (first year of publication). This finding suggests that the use of QRPs varies more across research fields and other factors than over time. Additional analysis should explore predictors of the variation across researchers.

Another finding was that citations of authors’ work do not take credibility of p-values into account. Citations are influenced by popularity of topics and other factors and do not take the strength of evidence into account. One reason for this might be that social psychologists often publish multiple internal replications within a single article. This gives the illusion that results are robust and credible because it is very unlikely to replicate type-I errors. However, Bem’s (2011) article with 9 internal replications of time-reversed feelings showed that QRPs are also used to produce consistent results within a single article (Francis, 2012; Schimmack, 2012). Thus, number of significant results within an article or across articles is also an invalid criterion to evaluate the robustness of results.

In conclusion, social psychologists have conducted studies with low statistical power since the beginning of empirical social psychology. The main reason for this is the preference for between-subject designs that have low statistical power with small sample sizes of N = 40 participants and small to moderate effect sizes. Despite repeated warnings about the problems of selection for significance (Sterling, 1959) and the problems of small sample sizes (Cohen, 1961; Sedelmeier & Gigerenzer, 1989; Tversky & Kahneman, 1971), the practices have not changed since Festinger conducted his seminal study on dissonance with n = 20 per group. Over the past decades, social psychology journals have reported thousands of statistically significant results that are used in review articles, meta-analyses, textbooks, and popular books as evidence to support claims about human behavior. The problem is that it is unclear which of these significant results are true positives and which are false positives, especially if false positives are not just strictly nil-results, but also results with tiny effect sizes that have no practical significance. Without other reliable information, even social psychologists do not know which of their colleagues results are credible or not. Over the past decade, the inability to distinguish credible and incredible information has produced heated debates and a lack of confidence in published results. The present study shows that the general research practices of a researcher provide valuable information about credibility. For example, a p-value of .01 by a researcher with an EDR of 70 is more credible than a p-value of .01 by a researcher with an EDR of 15. Thus, rather than stereotyping social psychologists based on the low replication rate in the Open Science Collaboration project, social psychologists should be evaluated based on their own research practices.

References

Cairo, A. H., Green, J. D., Forsyth, D. R., Behler, A. M. C., & Raldiris, T. L. (2020). Gray (Literature) Matters: Evidence of Selective Hypothesis Reporting in Social Psychological Research. Personality and Social Psychology Bulletin, 46(9), 1344–1362. https://doi.org/10.1177/0146167220903896

Janke, S., Daumiller, M., & Rudert, S. C. (2019). Dark pathways to achievement in science: Researchers’ achievement goals predict engagement in questionable research practices.
Social Psychological and Personality Science, 10(6), 783–791. https://doi.org/10.1177/1948550618790227

Personalized P-Values for Social/Personality Psychologists

Last update 8/25/2021
(expanded to 410 social/personality psychologists; included Dan Ariely)

Introduction

Since Fisher invented null-hypothesis significance testing, researchers have used p < .05 as a statistical criterion to interpret results as discoveries worthwhile of discussion (i.e., the null-hypothesis is false). Once published, these results are often treated as real findings even though alpha does not control the risk of false discoveries.

Statisticians have warned against the exclusive reliance on p < .05, but nearly 100 years after Fisher popularized this approach, it is still the most common way to interpret data. The main reason is that many attempts to improve on this practice have failed. The main problem is that a single statistical result is difficult to interpret. However, when individual results are interpreted in the context of other results, they become more informative. Based on the distribution of p-values it is possible to estimate the maximum false discovery rate (Bartos & Schimmack, 2020; Jager & Leek, 2014). This approach can be applied to the p-values published by individual authors to adjust p-values to keep the risk of false discoveries at a reasonable level, FDR < .05.

Researchers who mainly test true hypotheses with high power have a high discovery rate (many p-values below .05) and a low false discovery rate (FDR < .05). Figure 1 shows an example of a researcher who followed this strategy (for a detailed description of z-curve plots, see Schimmack, 2021).

We see that out of the 317 test-statistics retrieved from his articles, 246 were significant with alpha = .05. This is an observed discovery rate of 78%. We also see that this discovery rate closely matches the estimated discovery rate based on the distribution of the significant p-values, p < .05. The EDR is 79%. With an EDR of 79%, the maximum false discovery rate is only 1%. However, the 95%CI is wide and the lower bound of the CI for the EDR, 27%, allows for 14% false discoveries.

When the ODR matches the EDR, there is no evidence of publication bias. In this case, we can improve the estimates by fitting all p-values, including the non-significant ones. With a tighter CI for the EDR, we see that the 95%CI for the maximum FDR ranges from 1% to 3%. Thus, we can be confident that no more than 5% of the significant results wit alpha = .05 are false discoveries. Readers can therefore continue to use alpha = .05 to look for interesting discoveries in Matsumoto’s articles.

Figure 3 shows the results for a different type of researcher who took a risk and studied weak effect sizes with small samples. This produces many non-significant results that are often not published. The selection for significance inflates the observed discovery rate, but the z-curve plot and the comparison with the EDR shows the influence of publication bias. Here the ODR is similar to Figure 1, but the EDR is only 11%. An EDR of 11% translates into a large maximum false discovery rate of 41%. In addition, the 95%CI of the EDR includes 5%, which means the risk of false positives could be as high as 100%. In this case, using alpha = .05 to interpret results as discoveries is very risky. Clearly, p < .05 means something very different when reading an article by David Matsumoto or Shelly Chaiken.

Rather than dismissing all of Chaiken’s results, we can try to lower alpha to reduce the false discovery rate. If we set alpha = .01, the FDR is 15%. If we set alpha = .005, the FDR is 8%. To get the FDR below 5%, we need to set alpha to .001.

A uniform criterion of FDR < 5% is applied to all researchers in the rankings below. For some this means no adjustment to the traditional criterion. For others, alpha is lowered to .01, and for a few even lower than that.

The rankings below are based on automatrically extracted test-statistics from 40 journals (List of journals). The results should be interpreted with caution and treated as preliminary. They depend on the specific set of journals that were searched, the way results are being reported, and many other factors. The data are available (data.drop) and researchers can exclude articles or add articles and run their own analyses using the z-curve package in R (https://replicationindex.com/2020/01/10/z-curve-2-0/).

I am also happy to receive feedback about coding errors. I also recommended to hand-code articles to adjust alpha for focal hypothesis tests. This typically lowers the EDR and increases the FDR. For example, the automated method produced an EDR of 31 for Bargh, whereas hand-coding of focal tests produced an EDR of 12 (Bargh-Audit).

And here are the rankings. The results are fully automated and I was not able to cover up the fact that I placed only #188 out of 400 in the rankings. In another post, I will explain how researchers can move up in the rankings. Of course, one way to move up in the rankings is to increase statistical power in future studies. The rankings will be updated again when the 2021 data are available.

Despite the preliminary nature, I am confident that the results provide valuable information. Until know all p-values below .05 have been treated as if they are equally informative. The rankings here show that this is not the case. While p = .02 can be informative for one researcher, p = .002 may still entail a high false discovery risk for another researcher.

Good science requires not only open and objective reporting of new data; it also requires unbiased review of the literature. However, there are no rules and regulations regarding citations, and many authors cherry-pick citations that are consistent with their claims. Even when studies have failed to replicate, original studies are cited without citing the replication failures. In some cases, authors even cite original articles that have been retracted. Fortunately, it is easy to spot these acts of unscientific behavior. Here I am starting a project to list examples of bad scientific behaviors. Hopefully, more scientists will take the time to hold their colleagues accountable for ethical behavior in citations. They can even do so by posting anonymously on the PubPeer comment site.

RankNameTestsODREDRERRFDRAlpha
1Robert A. Emmons538789901.05
2Allison L. Skinner2295981851.05
3David Matsumoto3788379851.05
4Linda J. Skitka5326875822.05
5Todd K. Shackelford3057775822.05
6Jonathan B. Freeman2745975812.05
7Virgil Zeigler-Hill5157274812.05
8Arthur A. Stone3107573812.05
9David P. Schmitt2077871772.05
10Emily A. Impett5497770762.05
11Paula Bressan628270762.05
12Kurt Gray4877969812.05
13Michael E. McCullough3346969782.05
14Kipling D. Williams8437569772.05
15John M. Zelenski1567169762.05
16Amy J. C. Cuddy2128368782.05
17Elke U. Weber3126968770.05
18Hilary B. Bergsieker4396768742.05
19Cameron Anderson6527167743.05
20Rachael E. Jack2497066803.05
21Jamil Zaki4307866763.05
22A. Janet Tomiyama767865763.05
23Benjamin R. Karney3925665733.05
24Phoebe C. Ellsworth6057465723.05
25Jim Sidanius4876965723.05
26Amelie Mummendey4617065723.05
27Carol D. Ryff2808464763.05
28Juliane Degner4356364713.05
29Steven J. Heine5977863773.05
30David M. Amodio5846663703.05
31Thomas N Bradbury3986163693.05
32Elaine Fox4727962783.05
33Miles Hewstone14277062733.05
34Linda R. Tropp3446561803.05
35Rainer Greifeneder9447561773.05
36Klaus Fiedler19507761743.05
37Jesse Graham3777060763.05
38Richard W. Robins2707660704.05
39Simine Vazire1376660644.05
40On Amir2676759884.05
41Edward P. Lemay2898759814.05
42William B. Swann Jr.10707859804.05
43Margaret S. Clark5057559774.05
44Bernhard Leidner7246459654.05
45B. Keith Payne8797158764.05
46Ximena B. Arriaga2846658694.05
47Joris Lammers7286958694.05
48Patricia G. Devine6067158674.05
49Rainer Reisenzein2016557694.05
50Barbara A. Mellers2878056784.05
51Joris Lammers7056956694.05
52Jean M. Twenge3817256594.05
53Nicholas Epley15047455724.05
54Kaiping Peng5667754754.05
55Krishna Savani6387153695.05
56Leslie Ashburn-Nardo1098052835.05
57Lee Jussim2268052715.05
58Richard M. Ryan9987852695.05
59Ethan Kross6146652675.05
60Edward L. Deci2847952635.05
61Roger Giner-Sorolla6638151805.05
62Bertram F. Malle4227351755.05
63George A. Bonanno4797251705.05
64Jens B. Asendorpf2537451695.05
65Samuel D. Gosling1085851625.05
66Tessa V. West6917151595.05
67Paul Rozin4497850845.05
68Joachim I. Krueger4367850815.05
69Sheena S. Iyengar2076350805.05
70James J. Gross11047250775.05
71Mark Rubin3066850755.05
72Pieter Van Dessel5787050755.05
73Shinobu Kitayama9837650715.05
74Matthew J. Hornsey16567450715.05
75Janice R. Kelly3667550705.05
76Antonio L. Freitas2477950645.05
77Paul K. Piff1667750635.05
78Mina Cikara3927149805.05
79Beate Seibt3797249626.01
80Ludwin E. Molina1636949615.05
81Bertram Gawronski18037248766.01
82Penelope Lockwood4587148706.01
83Edward R. Hirt10428148656.01
84Matthew D. Lieberman3987247806.01
85John T. Cacioppo4387647696.01
86Agneta H. Fischer9527547696.01
87Leaf van Boven7117247676.01
88Stephanie A. Fryberg2486247666.01
89Daniel M. Wegner6027647656.01
90Anne E. Wilson7857147646.01
91Rainer Banse4027846726.01
92Alice H. Eagly3307546716.01
93Jeanne L. Tsai12417346676.01
94Jennifer S. Lerner1818046616.01
95Andrea L. Meltzer5495245726.01
96R. Chris Fraley6427045727.01
97Constantine Sedikides25667145706.01
98Paul Slovic3777445706.01
99Dacher Keltner12337245646.01
100Brian A. Nosek8166844817.01
101George Loewenstein7527144727.01
102Ursula Hess7747844717.01
103Jason P. Mitchell6007343737.01
104Jessica L. Tracy6327443717.01
105Charles M. Judd10547643687.01
106S. Alexander Haslam11987243647.01
107Mark Schaller5657343617.01
108Susan T. Fiske9117842747.01
109Lisa Feldman Barrett6446942707.01
110Jolanda Jetten19567342677.01
111Mario Mikulincer9018942647.01
112Bernadette Park9737742647.01
113Paul A. M. Van Lange10927042637.01
114Wendi L. Gardner7986742637.01
115Will M. Gervais1106942597.01
116Jordan B. Peterson2666041797.01
117Philip E. Tetlock5497941737.01
118Amanda B. Diekman4388341707.01
119Daniel H. J. Wigboldus4927641678.01
120Michael Inzlicht6866641638.01
121Naomi Ellemers23887441638.01
122Phillip Atiba Goff2996841627.01
123Stacey Sinclair3277041578.01
124Francesca Gino25217540698.01
125Michael I. Norton11367140698.01
126David J. Hauser1567440688.01
127Elizabeth Page-Gould4115740668.01
128Tiffany A. Ito3498040648.01
129Richard E. Petty27716940648.01
130Tim Wildschut13747340648.01
131Norbert Schwarz13377240638.01
132Veronika Job3627040638.01
133Wendy Wood4627540628.01
134Minah H. Jung1568339838.01
135Marcel Zeelenberg8687639798.01
136Tobias Greitemeyer17377239678.01
137Jason E. Plaks5827039678.01
138Carol S. Dweck10287039638.01
139Christian S. Crandall3627539598.01
140Harry T. Reis9986938749.01
141Vanessa K. Bohns4207738748.01
142Jerry Suls4137138688.01
143Eric D. Knowles3846838648.01
144C. Nathan DeWall13367338639.01
145Clayton R. Critcher6978238639.01
146John F. Dovidio20196938629.01
147Joshua Correll5496138629.01
148Abigail A. Scholer5565838629.01
149Chris Janiszewski1078138589.01
150Herbert Bless5867338579.01
151Mahzarin R. Banaji8807337789.01
152Rolf Reber2806437729.01
153Kevin N. Ochsner4067937709.01
154Mark J. Brandt2777037709.01
155Geoff MacDonald4066737679.01
156Mara Mather10387837679.01
157Antony S. R. Manstead16567237629.01
158Lorne Campbell4336737619.01
159Sanford E. DeVoe2367137619.01
160Ayelet Fishbach14167837599.01
161Fritz Strack6077537569.01
162Jeff T. Larsen18174366710.01
163Nyla R. Branscombe12767036659.01
164Yaacov Schul4116136649.01
165D. S. Moskowitz34187436639.01
166Pablo Brinol13566736629.01
167Todd B. Kashdan3777336619.01
168Barbara L. Fredrickson2877236619.01
169Duane T. Wegener9807736609.01
170Joanne V. Wood10937436609.01
171Daniel A. Effron4846636609.01
172Niall Bolger3766736589.01
173Craig A. Anderson4677636559.01
174Michael Harris Bond37873358410.01
175Glenn Adams27071357310.01
176Daniel M. Bernstein40473357010.01
177C. Miguel Brendl12176356810.01
178Azim F. Sharif18374356810.01
179Emily Balcetis59969356810.01
180Eva Walther49382356610.01
181Michael D. Robinson138878356610.01
182Igor Grossmann20364356610.01
183Diana I. Tamir15662356210.01
184Samuel L. Gaertner32175356110.01
185John T. Jost79470356110.01
186Eric L. Uhlmann45767356110.01
187Nalini Ambady125662355610.01
188Daphna Oyserman44655355410.01
189Victoria M. Esses29575355310.01
190Linda J. Levine49574347810.01
191Wiebke Bleidorn9963347410.01
192Thomas Gilovich119380346910.01
193Alexander J. Rothman13369346510.01
194Francis J. Flynn37872346310.01
195Paula M. Niedenthal52269346110.01
196Ozlem Ayduk54962345910.01
197Paul Ekman8870345510.01
198Alison Ledgerwood21475345410.01
199Christopher R. Agnew32575337610.01
200Michelle N. Shiota24260336311.01
201Malte Friese50161335711.01
202Kerry Kawakami48768335610.01
203Danu Anthony Stinson49477335411.01
204Jennifer A. Richeson83167335211.01
205Margo J. Monteith77376327711.01
206Ulrich Schimmack31875326311.01
207Mark Snyder56272326311.01
208Michele J. Gelfand36576326311.01
209Russell H. Fazio109469326111.01
210Eric van Dijk23867326011.01
211Tom Meyvis37777326011.01
212Eli J. Finkel139262325711.01
213Robert B. Cialdini37972325611.01
214Jonathan W. Kunstman43066325311.01
215Delroy L. Paulhus12177318212.01
216Yuen J. Huo13274318011.01
217Gerd Bohner51371317011.01
218Christopher K. Hsee68975316311.01
219Vivian Zayas25171316012.01
220John A. Bargh65172315512.01
221Tom Pyszczynski94869315412.01
222Roy F. Baumeister244269315212.01
223E. Ashby Plant83177315111.01
224Kathleen D. Vohs94468315112.01
225Jamie Arndt131869315012.01
226Anthony G. Greenwald35772308312.01
227Nicholas O. Rule129468307513.01
228Lauren J. Human44759307012.01
229Jennifer Crocker51568306712.01
230Dale T. Miller52171306412.01
231Thomas W. Schubert35370306012.01
232Joseph A. Vandello49473306012.01
233W. Keith Campbell52870305812.01
234Arthur Aron30765305612.01
235Pamela K. Smith14966305212.01
236Aaron C. Kay132070305112.01
237Steven W. Gangestad19863304113.005
238Eliot R. Smith44579297313.01
239Nir Halevy26268297213.01
240E. Allan Lind37082297213.01
241Richard E. Nisbett31973296913.01
242Hazel Rose Markus67476296813.01
243Emanuele Castano44569296513.01
244Dirk Wentura83065296413.01
245Boris Egloff27481295813.01
246Monica Biernat81377295713.01
247Gordon B. Moskowitz37472295713.01
248Russell Spears228673295513.01
249Jeff Greenberg135877295413.01
250Caryl E. Rusbult21860295413.01
251Naomi I. Eisenberger17974287914.01
252Brent W. Roberts56272287714.01
253Yoav Bar-Anan52575287613.01
254Eddie Harmon-Jones73873287014.01
255Matthew Feinberg29577286914.01
256Roland Neumann25877286713.01
257Eugene M. Caruso82275286413.01
258Ulrich Kuehnen82275286413.01
259Elizabeth W. Dunn39575286414.01
260Jeffry A. Simpson69774285513.01
261Sander L. Koole76765285214.01
262Richard J. Davidson38064285114.01
263Shelly L. Gable36464285014.01
264Adam D. Galinsky215470284913.01
265Grainne M. Fitzsimons58568284914.01
266Geoffrey J. Leonardelli29068284814.005
267Joshua Aronson18385284614.005
268Henk Aarts100367284514.005
269Vanessa K. Bohns42276277415.01
270Jan De Houwer197270277214.01
271Dan Ariely60070276914.01
272Charles Stangor18581276815.01
273Karl Christoph Klauer80167276514.01
274Mario Gollwitzer50058276214.01
275Jennifer S. Beer8056275414.01
276Eldar Shafir10778275114.01
277Guido H. E. Gendolla42276274714.005
278Klaus R. Scherer46783267815.01
279William G. Graziano53271266615.01
280Galen V. Bodenhausen58574266115.01
281Sonja Lyubomirsky53071265915.01
282Kai Sassenberg87271265615.01
283Kristin Laurin64863265115.01
284Claude M. Steele43473264215.005
285David G. Rand39270258115.01
286Paul Bloom50272257916.01
287Kerri L. Johnson53276257615.01
288Batja Mesquita41671257316.01
289Rebecca J. Schlegel26167257115.01
290Phillip R. Shaver56681257116.01
291David Dunning81874257016.01
292Laurie A. Rudman48272256816.01
293David A. Lishner10565256316.01
294Mark J. Landau95078254516.005
295Ronald S. Friedman18379254416.005
296Joel Cooper25772253916.005
297Alison L. Chasteen22368246916.01
298Jeff Galak31373246817.01
299Steven J. Sherman88874246216.01
300Shigehiro Oishi110964246117.01
301Thomas Mussweiler60470244317.005
302Mark W. Baldwin24772244117.005
303Evan P. Apfelbaum25662244117.005
304Nurit Shnabel56476237818.01
305Klaus Rothermund73871237618.01
306Felicia Pratto41073237518.01
307Jonathan Haidt36876237317.01
308Roland Imhoff36574237318.01
309Jeffrey W Sherman99268237117.01
310Jennifer L. Eberhardt20271236218.005
311Bernard A. Nijstad69371235218.005
312Brandon J. Schmeichel65266234517.005
313Sam J. Maglio32572234217.005
314David M. Buss46182228019.01
315Yoel Inbar28067227119.01
316Serena Chen86572226719.005
317Spike W. S. Lee14568226419.005
318Marilynn B. Brewer31475226218.005
319Michael Ross116470226218.005
320Dieter Frey153868225818.005
321G. Daniel Lassiter18982225519.01
322Sean M. McCrea58473225419.005
323Wendy Berry Mendes96568224419.005
324Paul W. Eastwick58365216919.005
325Kees van den Bos115084216920.005
326Maya Tamir134280216419.005
327Joseph P. Forgas88883215919.005
328Michaela Wanke36274215919.005
329Dolores Albarracin54066215620.005
330Elizabeth Levy Paluck3184215520.005
331Vanessa LoBue29968207621.01
332Christopher J. Armitage16062207321.005
333Elizabeth A. Phelps68678207221.005
334Jay J. van Bavel43764207121.005
335David A. Pizarro22771206921.005
336Andrew J. Elliot101881206721.005
337William A. Cunningham23876206422.005
338Laura D. Scherer21269206421.01
339Kentaro Fujita45869206221.005
340Geoffrey L. Cohen159068205021.005
341Ana Guinote37876204721.005
342Tanya L. Chartrand42467203321.001
343Selin Kesebir32866197322.005
344Vincent Y. Yzerbyt141273197322.01
345James K. McNulty104756196523.005
346Robert S. Wyer87182196322.005
347Travis Proulx17463196222.005
348Peter M. Gollwitzer130364195822.005
349Nilanjana Dasgupta38376195222.005
350Jamie L. Goldenberg56877195022.01
351Richard P. Eibach75369194723.001
352Gerald L. Clore45674194522.001
353James M. Tyler13087187424.005
354Roland Deutsch36578187124.005
355Ed Diener49864186824.005
356Kennon M. Sheldon69874186623.005
357Wilhelm Hofmann62467186623.005
358Laura L. Carstensen72377186424.005
359Toni Schmader54669186124.005
360Frank D. Fincham73469185924.005
361David K. Sherman112861185724.005
362Lisa K. Libby41865185424.005
363Chen-Bo Zhong32768184925.005
364Stefan C. Schmukle11462177126.005
365Michel Tuan Pham24686176825.005
366Leandre R. Fabrigar63270176726.005
367Neal J. Roese36864176525.005
368Carey K. Morewedge63376176526.005
369Timothy D. Wilson79865176326.005
370Brad J. Bushman89774176225.005
371Ara Norenzayan22572176125.005
372Benoit Monin63565175625.005
373Michael W. Kraus61772175526.005
374Ad van Knippenberg68372175526.001
375E. Tory. Higgins186868175425.001
376Ap Dijksterhuis75068175426.005
377Joseph Cesario14662174526.001
378Simone Schnall27062173126.001
379Joshua M. Ackerman38053167013.01
380Melissa J. Ferguson116372166927.005
381Laura A. King39176166829.005
382Daniel T. Gilbert72465166527.005
383Charles S. Carver15482166428.005
384Leif D. Nelson40974166428.005
385David DeSteno20183165728.005
386Sandra L. Murray69760165528.001
387Heejung S. Kim85859165529.001
388Mark P. Zanna65964164828.001
389Nira Liberman130475156531.005
390Gun R. Semin15979156429.005
391Tal Eyal43962156229.005
392Nathaniel M Lambert45666155930.001
393Angela L. Duckworth12261155530.005
394Dana R. Carney20060155330.001
395Garriy Shteynberg16854153130.005
396Lee Ross34977146331.001
397Arie W. Kruglanski122878145833.001
398Ziva Kunda21767145631.001
399Shelley E. Taylor42769145231.001
400Jon K. Maner104065145232.001
401Gabriele Oettingen104761144933.001
402Nicole L. Mead24070144633.01
403Gregory M. Walton58769144433.001
404Michael A. Olson34665136335.001
405Fiona Lee22167135834.001
406Melody M. Chao23757135836.001
407Adam L. Alter31478135436.001
408Sarah E. Hill50978135234.001
409Jaime L. Kurtz9155133837.001
410Michael A. Zarate12052133136.001
411Jennifer K. Bosson65976126440.001
412Daniel M. Oppenheimer19880126037.001
413Deborah A. Prentice8980125738.001
414Yaacov Trope127773125738.001
415Oscar Ybarra30563125540.001
416William von Hippel39865124840.001
417Steven J. Spencer54167124438.001
418Martie G. Haselton18673115443.001
419Shelly Chaiken36074115244.001
420Susan M. Andersen36174114843.001
421Dov Cohen64168114441.001
422Mark Muraven49652114441.001
423Ian McGregor40966114041.001
424Hans Ijzerman2145694651.001
425Linda M. Isbell1156494150.001
426Cheryl J. Wakslak2787383559.001

The Diminishing Utility of Replication Studies In Social Psychology

Dorthy Bishop writes on her blog.

“As was evident from my questions after the talk, I was less enthused by the idea of doing a large, replication of Darryl Bem’s studies on extra-sensory perception. Zoltán Kekecs and his team have put in a huge amount of work to ensure that this study meets the highest standards of rigour, and it is a model of collaborative planning, ensuring input into the research questions and design from those with very different prior beliefs. I just wondered what the point was. If you want to put in all that time, money and effort, wouldn’t it be better to investigate a hypothesis about something that doesn’t contradict the laws of physics?”


I think she makes a valid and important point. Bem’s (2011) article highlighted everything that was wrong with the research practices in social psychology. Other articles in JPSP are equally incredible, but this was ignored because naive readers found the claims more plausible (e.g., blood glucose is the energy for will power). We know now that none of these published results provide empirical evidence because the results were obtained with questionable research practices (Schimmack, 2014; Schimmack, 2018). It is also clear that these were not isolated incidents, but that hiding results that do not support a theory was (and still is) a common practice in social psychology (John et al., 2012; Schimmack, 2019).

A large attempt at estimating the replicability of social psychology revealed that only 25% of published significant results could be replicated (OSC). The rate for between-subject experiments was even lower. Thus, the a-priori probability (base rate) that a randomly drawn study from social psychology will produce a significant result in a replication attempt is well below 50%. In other words, a replication failure is the more likely outcome.

The low success rate of these replication studies was a shock. However, it is sometimes falsely implied that the low replicability of results in social psychology was not recognized earlier because nobody conducted replication studies. This is simply wrong. In fact, social psychology is one of the disciplines in psychology that required researchers to conduct multiple studies that showed the same effect to ensure that a result was not a false positive result. Bem had to present 9 studies with significant results to publish his crazy claims about extrasensory perception (Schimmack, 2012). Most of the studies that failed to replicate in the OSC replication project were taken from multiple-study articles that reported several successful demonstrations of an effect. Thus, the problem in social psychology was not that nobody conducted replication studies. The problem was that social psychologists only reported replication studies that were successful.

The proper analyses of the problem also suggests a different solution to the problem. If we pretend that nobody did replication studies, it may seem useful to starting doing replication studies. However, if social psychologists conducted replication studies, but did not report replication failures, the solution is simply to demand that social psychologists report all of their results honestly. This demand is so obvious that undergraduate students are surprised when I tell them that this is not the way social psychologists conduct their research.

In sum, it has become apparent that questionable research practices undermine the credibility of the empirical results in social psychology journals, and that the majority of published results cannot be replicated. Thus, social psychology lacks a solid empirical foundation.

What Next?

It is implied by information theory that little information is gained by conducting actual replication studies in social psychology because a failure to replicate the original result is likely and uninformative. In fact, social psychologists have responded to replication failures by claiming that these studies were poorly conducted and do not invalidate the original claims. Thus, replication studies are both costly and have not advanced theory development in social psychology. More replication studies are unlikely to change this.

A better solution to the replication crisis in social psychology is to characterize research in social psychology from Festinger’s classic small-sample, between-subject study in 1957 to research in 2017 as exploratory and hypotheses generating research. As Bem suggested to his colleagues, this was a period of adventure and exploration where it was ok to “err on the side of discovery” (i.e., publish false positive results, like Bem’s precognition for erotica). Lot’s of interesting discoveries were made during this period; it is just not clear which of these findings can be replicated and what they tell us about social behavior.

Thus, new studies in social psychology should not try to replicate old studies. For example, nobody should try to replicate Devine’s subliminal priming study with racial primes with computers and software from the 1980s (Devine, 1989). Instead, prominent theoretical predictions should be tested with the best research methods that are currently available. Thus, the way forward is not to do more replication studies, but rather to use open science (a.k.a. honest science) that uses experiments to subject theories to empirical tests that may also falsify a theory (e.g., subliminal racial stimuli have no influence on behavior). The main shift that is required is to get away from research that can only confirm theories and to allow for empirical data to falsify theories.

This was exactly the intent of Danny Kahneman’s letter, when he challenged social priming researchers to respond to criticism of their work by going into their labs and to demonstrate that these effects can be replicated across many labs.

Kahneman makes it clear that the onus of replication is on the original researchers who want others to believe their claims. The response to this letter speaks volumes. Not only did social psychologists fail to provide new and credible evidence that their results can be replicated, they also demonstrated defiant denial in the face of replication failures by others. The defiant denial by prominent social psychologists (e.g., Baumeister, 2019) make it clear that they will not be convinced by empirical evidence, while others who can look at the evidence objectively do not need more evidence to realize that the social psychological literature is a train-wreck (Schimmack, 2017; Kahneman, 2017). Thus, I suggest that young social psychologists search the train wreck for survivors, but do not waste their time and resources on replication studies that are likely to fail.

A simple guide through the wreckage of social psychology is to distrust any significant result with a p-value greater than .01 (Schimmack, 2019). Prediction markets also suggest that readers are able to distinguish credible and incredible results (Atlantic). Thus, I recommend to build on studies that are credible and to stay clear of sexy findings that are unlikely to replicate. As Danny Kahneman pointed out, young social psychologists who work in questionable areas face a dilemma. Either they have to replicate the questionable methods that were used to get the original results, which is increasingly considered unethical, or they end up with results that are not very informative. On the positive side, the replication crisis implies that there are many important topics in social psychology that need to be studied properly with the scientific method. Addressing these important questions may be the best way to rescue social psychology.