Category Archives: False Discovery Rate

Estimating the False Discovery Risk of Psychology Science

January 3, 2022False Discovery Rate, False Discovery Risk, False Positives, Science-Wise False Discovery RateUlrich Schimmack

Abstract

Since 2011, the credibility of psychological science is in doubt. A major concern is that questionable research practices could have produced many false positive results, and it has been suggested that most published results are false. Here we present an empirical estimate of the false discovery risk using a z-curve analysis of randomly selected p-values from a broad range of journals that span most disciplines in psychology. The results suggest that no more than a quarter of published results could be false positives. We also show that the false positive risk can be reduced to less than 5% by using alpha = .01 as the criterion for statistical significance. This remedy can restore confidence in the direction of published effects. However, published effect sizes cannot be trusted because the z-curve analysis shows clear evidence of selection for significance that inflates effect size estimates.

Introduction

Several events in the early 2010s led to a credibility crisis in psychology. As journals selectively publish only statistically significant results, statistical significance loses its, well, significance. Every published focal hypothesis will be statistically significant, and it is unclear which of these results are true positives and which are false positives.

A key article that contributed to the credibility crisis was Simmons, Nelson, & Simonsohn’s article “False Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant”

The title made a bold statement that it is easy to obtain statistically significant results even when the null-hypothesis is true. This led to concerns that many, if not most, published results are indeed false positive results. Many meta-psychological articles quoted Simmons et al.’s (2011) article to suggest that there is a high risk or even a high rate of false positive results in the psychological literature; including my own 2012 article.

“Researchers can use questionable research practices (e.g., snooping, not reporting failed studies, dropping dependent variables, etc.; Simmons et al., 2011; Strube, 2006) to dramatically increase the chances of obtaining a false-positive result” (Schimmack, 2012, p. 552, 248 citations)

The Appendix lists citations from influential meta-psychological articles that imply a high false positive risk in the psychological literature. Only one article suggested that fears about high false positive rates may be unwarranted (Strobe & Strack, 2014). In contrast, other articles have suggested that false positive rates might be as high as 50% or more (Szucs & Ioannidis, 2017).

There have been two noteworthy attempts at estimating the false discovery rate in psychology. Szucs and Ioannidis (2017) automatically extracted p-values from five psychology journals and estimated the average power of extracted t-tests. They then used this power estimate in combination with the assumption that psychologists discover one true, non-zero, effect for every 13 true null-hypotheses to suggest that the false discovery rate in psychology exceeds 50%. The problem with this estimate is that it relies on the questionable assumption that psychologists tests a very small percentage of true hypotheses.

The other article tried to estimate the false positive rate based on 70 of the 100 studies that were replicated in the Open Science Collaboration project (Open Science Collaboration, 2015). The statistical model estimated that psychologists test 93 true null-hypotheses for every 7 true effects (true positives), and that true effects are tested with 75% power (Johnson et al., 2017). This yields a false positive rate of about 50%. The main problem with this study is the reliance on a small, unrepresentative sample of studies that focused heavily on experimental social psychology, a field that triggered concerns about the credibility of psychology in general (Schimmack, 2020). Another problem is that point estimates based on a small sample are unreliable.

To provide new and better information about the false positive risk in psychology, we conducted a new investigation that addresses three limitations of the previous studies. First, we used hand-coding of focal hypothesis tests, rather than automatic extraction of all test-statistics. Second, we sampled from a broad range of journals that cover all areas of psychology rather than focusing narrowly on experimental psychology. Third, we used a validated method to estimate the false discovery risk based on an estimate of the expected discovery rate (Bartos & Schimmack, 2021). In short, the false discovery risk decreases as a monotonic function of the number of discoveries (i.e., p-values below .05) (Soric, 1989).

Z-curve relies on the observation that false positives and true positives produce different distributions of p-values. To fit a model to distributions of significant p-values, z-curve transforms p-values into absolute z-scores. We illustrate z-curve with two simulation studies. The first simulation is based on Simmons et al.’s (2011) scenario in which the combination of four questionable research practices inflates the false positive risk from 5% to 60%. In our simulation, we assumed an equal number of true null-hypotheses (effect size d = 0) and true hypotheses with small to moderate effect sizes (d = .2 to .5). The use of questionable research practices also increases the chances of getting a significant result for true hypotheses. In our simulation, the probability to get significance with true H0 was 58%, whereas the probability to get significance with true H1 was .93. Given the 1:1 ratio of H0 and H1 that were tested, this yields a false discovery rate of 39%.

Figure 1 shows that questionable research practices produce a steeply declining z-curve. Based on this shape, z-curve estimates a discovery rate of 5%, with a 95%CI ranging from 5% to 10%. This translates into estimates of the false discovery risk of 100% with a 95%CI ranging from 46% to 100% (Soric, 1989). The reason why z-curve provides a conservative estimate of the false discovery risk is that p-hacking changes the shape of the distribution in a way that produces even more z-values just above 1.96 than mere selection for significance would produce. In other words, p-hacking destroys evidential value when true hypotheses are being tested. It is not necessary to simulate scenarios in which even more true null-hypotheses are being tested because this would make the z-curve even steeper. Thus, Figure 1 provides a prediction for our z-curve analyses based on actual data, if psychologists heavily rely on Simmons et al.’s recipe to produce significant results.

Figure 2 is based on a simulation of Johnson et al.’s (2013) scenario with a 9% discovery rate (9 true hypotheses for very 100 hypothesis tests), a false discovery rate of 50%, and power to detect true effects of 75% (Figure 2). Johnson et al. did not assume or model p-hacking.

The z-curve for this scenario also shows a steep decline that can be attributed to the high percentage of false positive results. However, there is also a notable tail with z-values greater than 3 that reflects the influence of true hypotheses with adequate power. In this scenario, the expected discovery rate is higher with a 95%CI ranging from 7% to 20%. This translates into a 95%CI for the false discovery risk ranging from 21% to 71% (Soric, 1989). This interval contains the true value of 50%, although the point estimate, 34% underestimates the true value. Thus, we recommend to use the upper limit of the 95%CI as an estimate of the maximum false discovery rate that is consistent with data.

We now turn to real data. Figure 3 shows a z-curve analysis of Kühberger, Frity, and Scherndl (2014) data. The authors conducted an audit of psychological research by randomly sampling 1,000 English language articles published in the year 2007 that were listed in PsychInfo. This audit produced 344 significant p-values that could be subjected to a z-curve analysis. The results differ notably from the previous results. The expected discovery rate is higher and implies a much smaller false discovery risk of only 9%. However, due to the small set of studies, the confidence interval is wide and allows for nearly 50% false positive results.

To produce a larger set of test-statistics, my students and I have hand-coded over 1,000 randomly selected articles from a broad range of journals (Schimmack, 2021). These data were combined with Motyl et al.’s (2017) coding of social psychology journals. The time period spans the years 2008 to 2014, with a focus on the year 2010 and 2009. This dataset produced 1,715 significant p-values. The estimated false discovery risk is similar to the estimate for Kühberger et al.’s (2014) studies. Although the point estimate for the false discovery risk is a bit higher, 12%, the upper bound of the 95%CI is lower because the confidence interval is tighter.

Given the similarity of the results, we combined the two datasets to obtain an even more precise estimate of the false discovery risk based on 2,059 significant p-values. However, the upper limit of the 95%CI decreased only slightly from 30% to 26%.

The most important conclusion from these findings is that concerns about the amount of false positive results have exaggerated assumptions about the prevalence of false positive results in psychology journals. The present results suggest that at most a quarter of published results are false positives and that actual z-curves are very different from those implied by the influential simulation studies of Simmons et al. (2011). Our empirical results show no evidence that massive p-hacking is a common practice.

However, a false positive rate of 25% is still unacceptably high. Fortunately, there is an easy solution to this problem because the false discovery rate depends on the significance threshold. Based on their pessimistic estimates, Johnson et al. (2015) suggested to lower alpha to .005 or even .001. However, these stringent criteria would render most published results statistically non-significant. We suggest to lower alpha to .01. Figure 6 shows the rational for this recommendation by fitting z-curve with alpha = .01 (i.e., the red vertical line that represents the significance criterion is moved from 1.96 to 2.58.

Lowering alpha to .01, lowers the percentage of significant results from 83% (not counting marginally significant, p < .1, results) to 53%. Thus, the expected discovery decreases, but the more stringent criterion for significance lowers the false discovery risk to 4% and even the upper limit of the 95%CI is just 4%.

It is likely that discovery rates vary across journals and disciplines (Schimmack, 2021). In the future, it may be possible to make more specific recommendations for different disciplines or journals based on their discovery rates. Journals that publish riskier hypotheses tests or studies with modest power would need a more stringent significance criterion to maintain an acceptable false discovery risk.

An alpha level of .01 is also recommended by Simmons et al.’s (2011) simulation studies of p-hacking. Massive p-hacking that inflates the false positive risk from 5% to 61% produces only 22% false positives with alpha = .01. Milder forms of p-hacking inflates the false positive risk produces only a probability of 8% to obtain a p-value below .01. Ideally, open science practices like pre-registration will curb the use of questionable practices in the future. Increasing sample sizes will also help to lower the false positive risk. A z-curve analysis of new studies can be used to estimate the current false discovery risk and may suggest that even the traditional alpha level of .05 is able to maintain a false discovery risk below 5%.

While the present results may be considered good news relative to the scenario that most published results cannot be trusted, the results do not change the fact that some areas of psychology have a replication crisis (Open Science Collaboration, 2015). The z-curve results show clear evidence of selection for significance, which leads to inflated effect size estimates. Studies suggest that effect sizes are often inflated by more than 100% (Open Science Collaboration, 2015). Thus, published effect size estimates cannot be trusted even if p-values below .01 show the correct sign of an effect. The present results also imply that effect size meta-analyses that did not correct for publication bias produce inflated effect size estimates. For these reasons, many meta-analyses have to be reexamined and use statistical tools that correct for publication bias.

Appendix

“Given that these publishing biases are pervasive across scientific practice, it is possible that false positives heavily contaminate the neuroscience literature as well, and this problem may
affect at least as much, if not even more so, the most prominent journals” (Button et al., 2013; 3,316 citations).

“In a theoretical analysis, Ioannidis estimated that publishing and analytic practices make it likely that more than half of research results are false and therefore irreproducible” (Open Science Collaboration, 2015, aac4716-1)

“There is increasing concern that most current published research findings are false. (Ioannidis,
2005, abstract)” (Cumming, 2014, p7, 1,633 citations).

“In a recent article, Simmons, Nelson, and Simonsohn (2011) showed how, due to the misuse of statistical tools, significant results could easily turn out to be false positives (i.e., effects considered significant whereas the null hypothesis is actually true). (Leys et al., 2013, p. 765, 1,406 citations)

“During data analysis it can be difficult for researchers to recognize P-hacking or data dredging because confirmation and hindsight biases can encourage the acceptance of outcomes that fit expectations or desires as appropriate, and the rejection of outcomes that do not as the result of suboptimal designs or analyses. Hypotheses may emerge that fit the data and are then reported without indication or recognition of their post hoc origin. This, unfortunately, is not scientific discovery, but self-deception. Uncontrolled, it can dramatically increase the false discovery rate” (Munafò et al., 2017, p. 2, 1,010 citations)

Just how dramatic these effects can be was demonstrated by Simmons, Nelson, and Simonsohn (2011) in a series of experiments and simulations that showed how greatly QRPs increase the likelihood of finding support for a false hypothesis. (John et al., 2012, p. 524, 877 citations).

“Simonsohn’s simulations have shown that changes in a few data-analysis
decisions can increase the false-positive rate in a single study to 60%” (Nuzzo, 2014, 799 citations).

“the publication of an important article in Psychological Science showing how easily researchers can, in the absence of any real effects, nonetheless obtain statistically significant differences through various questionable research practices (QRPs) such as exploring multiple dependent variables or covariates and only reporting these when they yield significant results (Simmons, Nelson, & Simonsohn, 2011)” (Pashler & Wagenmakers, 2012, p. 528, 736 citations)

“Even seemingly conservative levels of p-hacking make it easy for researchers to find statistically significant support for nonexistent effects. Indeed, p-hacking can allow researchers to get most studies to reveal significant relationships between truly unrelated variables (Simmons et al., 2011).” (Simonsohn, Nelson, & Simmons, 2014, p. 534, 656 citations)

“Recent years have seen intense interest in the reproducibility of scientific results and the degree to which some problematic, but common, research practices may be responsible for high rates of false findings in the scientific literature, particularly within psychology but also more generally” (Poldrack et al., 2017, p. 115, 475 citations)

“especially in an environment in which multiple comparisons or researcher dfs (Simmons, Nelson, & Simonsohn, 2011) make it easy for researchers to find large and statistically significant effects that could arise from noise alone” (Gelman & Carlin,

“In an influential recent study, Simmons and colleagues demonstrated that even a moderate amount of flexibility in analysis choice—for example, selecting from among two DVs or
optionally including covariates in a regression analysis— could easily produce false-positive rates in excess of 60%, a figure they convincingly argue is probably a conservative
estimate (Simmons et al., 2011).” (Yarkoni & Westfall, 2017, p. 1103, 457 citations)

“In the face of human biases and the vested interest of the experimenter, such freedom of analysis provides access to a Pandora’s box of tricks that can be used to achieve any desired result (e.g., John et al., 2012; Simmons, Nelson, & Simonsohn, 2011″ (Wagenmakers et al., 2012, p. 633, 425 citations)

“Simmons et al. (2011) illustrated how easy it is to inflate Type I error rates when researchers employ hidden degrees of freedom in their analyses and design of studies (e.g., selecting the most desirable outcomes, letting the sample size depend on results of significance tests).” (Bakker et al., 2012, p. 545, 394 citations).

“Psychologists have recently become increasingly concerned about the likely overabundance of false positive results in the scientific literature. For example, Simmons, Nelson, and Simonsohn (2011) state that “In many cases, a researcher is more likely to falsely find
evidence that an effect exists than to correctly find evidence that it does not” (p. 1359)” (Maxwell, Lau, & Howard, 2015, p. 487,

“More-over, the highest impact journals famously tend to favor highly surprising results; this makes it easy to see how the proportion of false positive findings could be even higher in such journals.” (Pashler & Harris, 2012, p. 532, 373 citations)

“There is increasing concern that many published results are false positives [1,2] (but see [3]).” (Head et al., 2015, p. 1, 356 citations)

“Quantifying p-hacking is important because publication of false positives hinders scientific
progress” (Head et al., 2015, p. 2, 356 citations).

“To be sure, methodological discussions are important for any discipline, and both fraud and dubious research procedures are damaging to the image of any field and potentially undermine confidence in the validity of social psychological research findings. Thus far, however, no solid data exist on the prevalence of such research practices in either social or any other area of psychology.” (Strobe & Strack, 2014, p. 60, 291 citations)

“Assuming a realistic range of prior probabilities for null hypotheses, false report probability is likely to exceed 50% for the whole literature” (Szucs & Ioannidis, 2017, p. 1, 269 citations)

“Notably, if we consider the recent estimate of 13:1 H0:H1 odds [30], then FRP exceeds 50% even in the absence of bias” (Szucs & Ioannidis, 2017, p. 12, 269 citations)

“In all, the combination of low power, selective reporting, and other biases and errors that have been well documented suggest that high FRP can be expected in cognitive neuroscience and psychology. For example, if we consider the recent estimate of 13:1 H0:H1 odds [30], then
FRP exceeds 50% even in the absence of bias.” (Szucs & Ioannidis, 2017, p. 15, 269 citations)

“Many prominent researchers believe that as much as half of the scientific literature—not only in medicine, by also in psychology and other fields—may be wrong [11,13–15]” (Smaldino & McElreath, 2016, p. 2, 251 citations).

“A more recent article compellingly demonstrated how flexibility in data collection, analysis, and reporting can dramatically increase false-positive rates (Simmons, Nelson, & Simonsohn, 2011).” (Dick et al., 2015, p. 43, 208 citations)

“In 2011, we wrote “False-Positive Psychology” (Simmons et al. 2011), an article reporting the surprisingly severe consequences of selectively reporting data and analyses, a practice that we later called p-hacking. In that article, we showed that conducting multiple analyses on the same data set and then reporting only the one(s) that obtained statistical significance (e.g., analyzing multiple measures but reporting only one) can dramatically increase the likelihood of publishing a false-positive finding. Independently and nearly simultaneously, John et al. (2012) documented that a large fraction of psychological researchers admitted engaging in precisely the forms of p-hacking that we had considered. Identifying these realities—that researchers engage in p-hacking and that p-hacking makes it trivially easy to accumulate significant evidence for a false hypothesis—opened psychologists’ eyes to the fact that many published findings, and even whole literatures, could be false positive.” (Nelson, Simmons, & Simonsohn, 2018, 204 citations).

“As Simmons et al.(2011) concluded—reflecting broadly on the state of the discipline—“it is unacceptably easy to publish ‘statistically significant’ evidence consistent with any hypothesis”(p.1359)” (Earp & Trafimov, 2015, p. 4, 200 citations)

“The second, related set of events was the publication of articles by a series of authors (Ioannidis 2005, Kerr 1998, Simmons et al. 2011, Vul et al. 2009) criticizing questionable research practices (QRPs) that result in grossly inflated false positive error rates in the psychological literature” (Shrout & Rodgers, 2018, p. 489, 195 citations).

“Let us add a new dimension, which was brought up in a seminal publication of Simmons, Nelson & Simonsohn (2011). They stated that researchers actually have so much flexibility in deciding how to analyse their data that this flexibility allows them to coax statistically significant results from nearly any data set” (Forstmeier, Wagenmakers, & Parker, 2017, p. 1945, 173 citations)

“Publication bias (Ioannidis, 2005) and flexibility during data analyses (Simmons, Nelson, & Simonsohn, 2011) create a situation in which false positives are easy to publish, whereas contradictory null findings do not reach scientific journals (but see Nosek & Lakens, in press)” (Lakens & Evers, 2014, p. 278, 139 citations)

“Recent reports hold that allegedly common research practices allow psychologists to support just about any conclusion (Ioannidis, 2005; Simmons, Nelson, & Simonsohn, 2011).” (Koole & Lakens, 2012, p. 608, 139 citations)

“Researchers then may be tempted to write up and concoct papers around the significant results and send them to journals for publication. This outcome selection seems to be widespread practice in psychology [12], which implies a lot of false positive results in the literature and a massive overestimation of ES, especially in meta-analyses” (

“Researcher df, or researchers’ behavior directed at obtaining statistically significant results (Simonsohn, Nelson, & Simmons, 2013), which is also known as p-hacking or questionable research practices in the context of null hypothesis significance testing (e.g., O’Boyle, Banks, & Gonzalez-Mulé, 2014), results in a higher frequency of studies with false positives (Simmons et al., 2011) and inflates genuine effects (Bakker et al., 2012).” (van Assen, van Aert, & Wicherts, p. 294, 133 citations)

“The scientific community has witnessed growing concern about the high rate of false positives and unreliable results within the psychological literature, but the harmful impact
of false negatives has been largely ignored” (Vadillo, Konstantinidis, & Shanks, p. 87, 131 citations)

“Much of the debate has concerned habits (such as “phacking” and the filedrawer effect) which can boost the prevalence of false positives in the published literature (Ioannidis, Munafò, Fusar-Poli, Nosek, & David, 2014; Simmons, Nelson, & Simonsohn, 2011).” (Vadillo, Konstantinidis, & Shanks, p. 87, 131 citations)

“Simmons, Nelson, and Simonsohn (2011) showed that researchers without scruples can nearly always find a p < .05 in a data set if they set their minds to it.” (Crandall & Sherman, 2014, p. 96, 114 citations)

Personalized P-Values for Social/Personality Psychologists

January 19, 2021Alpha Wars, False Discovery Rate, Replicability, Replicability Ranking, Science-Wise False Discovery Rate, Social PsychologyUlrich Schimmack

Last update 8/25/2021
(expanded to 410 social/personality psychologists; included Dan Ariely)

Introduction

Since Fisher invented null-hypothesis significance testing, researchers have used p < .05 as a statistical criterion to interpret results as discoveries worthwhile of discussion (i.e., the null-hypothesis is false). Once published, these results are often treated as real findings even though alpha does not control the risk of false discoveries.

Statisticians have warned against the exclusive reliance on p < .05, but nearly 100 years after Fisher popularized this approach, it is still the most common way to interpret data. The main reason is that many attempts to improve on this practice have failed. The main problem is that a single statistical result is difficult to interpret. However, when individual results are interpreted in the context of other results, they become more informative. Based on the distribution of p-values it is possible to estimate the maximum false discovery rate (Bartos & Schimmack, 2020; Jager & Leek, 2014). This approach can be applied to the p-values published by individual authors to adjust p-values to keep the risk of false discoveries at a reasonable level, FDR < .05.

Researchers who mainly test true hypotheses with high power have a high discovery rate (many p-values below .05) and a low false discovery rate (FDR < .05). Figure 1 shows an example of a researcher who followed this strategy (for a detailed description of z-curve plots, see Schimmack, 2021).

We see that out of the 317 test-statistics retrieved from his articles, 246 were significant with alpha = .05. This is an observed discovery rate of 78%. We also see that this discovery rate closely matches the estimated discovery rate based on the distribution of the significant p-values, p < .05. The EDR is 79%. With an EDR of 79%, the maximum false discovery rate is only 1%. However, the 95%CI is wide and the lower bound of the CI for the EDR, 27%, allows for 14% false discoveries.

When the ODR matches the EDR, there is no evidence of publication bias. In this case, we can improve the estimates by fitting all p-values, including the non-significant ones. With a tighter CI for the EDR, we see that the 95%CI for the maximum FDR ranges from 1% to 3%. Thus, we can be confident that no more than 5% of the significant results wit alpha = .05 are false discoveries. Readers can therefore continue to use alpha = .05 to look for interesting discoveries in Matsumoto’s articles.

Figure 3 shows the results for a different type of researcher who took a risk and studied weak effect sizes with small samples. This produces many non-significant results that are often not published. The selection for significance inflates the observed discovery rate, but the z-curve plot and the comparison with the EDR shows the influence of publication bias. Here the ODR is similar to Figure 1, but the EDR is only 11%. An EDR of 11% translates into a large maximum false discovery rate of 41%. In addition, the 95%CI of the EDR includes 5%, which means the risk of false positives could be as high as 100%. In this case, using alpha = .05 to interpret results as discoveries is very risky. Clearly, p < .05 means something very different when reading an article by David Matsumoto or Shelly Chaiken.

Rather than dismissing all of Chaiken’s results, we can try to lower alpha to reduce the false discovery rate. If we set alpha = .01, the FDR is 15%. If we set alpha = .005, the FDR is 8%. To get the FDR below 5%, we need to set alpha to .001.

A uniform criterion of FDR < 5% is applied to all researchers in the rankings below. For some this means no adjustment to the traditional criterion. For others, alpha is lowered to .01, and for a few even lower than that.

The rankings below are based on automatrically extracted test-statistics from 40 journals (List of journals). The results should be interpreted with caution and treated as preliminary. They depend on the specific set of journals that were searched, the way results are being reported, and many other factors. The data are available (data.drop) and researchers can exclude articles or add articles and run their own analyses using the z-curve package in R (https://replicationindex.com/2020/01/10/z-curve-2-0/).

I am also happy to receive feedback about coding errors. I also recommended to hand-code articles to adjust alpha for focal hypothesis tests. This typically lowers the EDR and increases the FDR. For example, the automated method produced an EDR of 31 for Bargh, whereas hand-coding of focal tests produced an EDR of 12 (Bargh-Audit).

And here are the rankings. The results are fully automated and I was not able to cover up the fact that I placed only #188 out of 400 in the rankings. In another post, I will explain how researchers can move up in the rankings. Of course, one way to move up in the rankings is to increase statistical power in future studies. The rankings will be updated again when the 2021 data are available.

Despite the preliminary nature, I am confident that the results provide valuable information. Until know all p-values below .05 have been treated as if they are equally informative. The rankings here show that this is not the case. While p = .02 can be informative for one researcher, p = .002 may still entail a high false discovery risk for another researcher.

Good science requires not only open and objective reporting of new data; it also requires unbiased review of the literature. However, there are no rules and regulations regarding citations, and many authors cherry-pick citations that are consistent with their claims. Even when studies have failed to replicate, original studies are cited without citing the replication failures. In some cases, authors even cite original articles that have been retracted. Fortunately, it is easy to spot these acts of unscientific behavior. Here I am starting a project to list examples of bad scientific behaviors. Hopefully, more scientists will take the time to hold their colleagues accountable for ethical behavior in citations. They can even do so by posting anonymously on the PubPeer comment site.

Rank	Name	Tests	ODR	EDR	ERR	FDR	Alpha
1	Robert A. Emmons	53	87	89	90	1	.05
2	Allison L. Skinner	229	59	81	85	1	.05
3	David Matsumoto	378	83	79	85	1	.05
4	Linda J. Skitka	532	68	75	82	2	.05
5	Todd K. Shackelford	305	77	75	82	2	.05
6	Jonathan B. Freeman	274	59	75	81	2	.05
7	Virgil Zeigler-Hill	515	72	74	81	2	.05
8	Arthur A. Stone	310	75	73	81	2	.05
9	David P. Schmitt	207	78	71	77	2	.05
10	Emily A. Impett	549	77	70	76	2	.05
11	Paula Bressan	62	82	70	76	2	.05
12	Kurt Gray	487	79	69	81	2	.05
13	Michael E. McCullough	334	69	69	78	2	.05
14	Kipling D. Williams	843	75	69	77	2	.05
15	John M. Zelenski	156	71	69	76	2	.05
16	Amy J. C. Cuddy	212	83	68	78	2	.05
17	Elke U. Weber	312	69	68	77	0	.05
18	Hilary B. Bergsieker	439	67	68	74	2	.05
19	Cameron Anderson	652	71	67	74	3	.05
20	Rachael E. Jack	249	70	66	80	3	.05
21	Jamil Zaki	430	78	66	76	3	.05
22	A. Janet Tomiyama	76	78	65	76	3	.05
23	Benjamin R. Karney	392	56	65	73	3	.05
24	Phoebe C. Ellsworth	605	74	65	72	3	.05
25	Jim Sidanius	487	69	65	72	3	.05
26	Amelie Mummendey	461	70	65	72	3	.05
27	Carol D. Ryff	280	84	64	76	3	.05
28	Juliane Degner	435	63	64	71	3	.05
29	Steven J. Heine	597	78	63	77	3	.05
30	David M. Amodio	584	66	63	70	3	.05
31	Thomas N Bradbury	398	61	63	69	3	.05
32	Elaine Fox	472	79	62	78	3	.05
33	Miles Hewstone	1427	70	62	73	3	.05
34	Linda R. Tropp	344	65	61	80	3	.05
35	Rainer Greifeneder	944	75	61	77	3	.05
36	Klaus Fiedler	1950	77	61	74	3	.05
37	Jesse Graham	377	70	60	76	3	.05
38	Richard W. Robins	270	76	60	70	4	.05
39	Simine Vazire	137	66	60	64	4	.05
40	On Amir	267	67	59	88	4	.05
41	Edward P. Lemay	289	87	59	81	4	.05
42	William B. Swann Jr.	1070	78	59	80	4	.05
43	Margaret S. Clark	505	75	59	77	4	.05
44	Bernhard Leidner	724	64	59	65	4	.05
45	B. Keith Payne	879	71	58	76	4	.05
46	Ximena B. Arriaga	284	66	58	69	4	.05
47	Joris Lammers	728	69	58	69	4	.05
48	Patricia G. Devine	606	71	58	67	4	.05
49	Rainer Reisenzein	201	65	57	69	4	.05
50	Barbara A. Mellers	287	80	56	78	4	.05
51	Joris Lammers	705	69	56	69	4	.05
52	Jean M. Twenge	381	72	56	59	4	.05
53	Nicholas Epley	1504	74	55	72	4	.05
54	Kaiping Peng	566	77	54	75	4	.05
55	Krishna Savani	638	71	53	69	5	.05
56	Leslie Ashburn-Nardo	109	80	52	83	5	.05
57	Lee Jussim	226	80	52	71	5	.05
58	Richard M. Ryan	998	78	52	69	5	.05
59	Ethan Kross	614	66	52	67	5	.05
60	Edward L. Deci	284	79	52	63	5	.05
61	Roger Giner-Sorolla	663	81	51	80	5	.05
62	Bertram F. Malle	422	73	51	75	5	.05
63	George A. Bonanno	479	72	51	70	5	.05
64	Jens B. Asendorpf	253	74	51	69	5	.05
65	Samuel D. Gosling	108	58	51	62	5	.05
66	Tessa V. West	691	71	51	59	5	.05
67	Paul Rozin	449	78	50	84	5	.05
68	Joachim I. Krueger	436	78	50	81	5	.05
69	Sheena S. Iyengar	207	63	50	80	5	.05
70	James J. Gross	1104	72	50	77	5	.05
71	Mark Rubin	306	68	50	75	5	.05
72	Pieter Van Dessel	578	70	50	75	5	.05
73	Shinobu Kitayama	983	76	50	71	5	.05
74	Matthew J. Hornsey	1656	74	50	71	5	.05
75	Janice R. Kelly	366	75	50	70	5	.05
76	Antonio L. Freitas	247	79	50	64	5	.05
77	Paul K. Piff	166	77	50	63	5	.05
78	Mina Cikara	392	71	49	80	5	.05
79	Beate Seibt	379	72	49	62	6	.01
80	Ludwin E. Molina	163	69	49	61	5	.05
81	Bertram Gawronski	1803	72	48	76	6	.01
82	Penelope Lockwood	458	71	48	70	6	.01
83	Edward R. Hirt	1042	81	48	65	6	.01
84	Matthew D. Lieberman	398	72	47	80	6	.01
85	John T. Cacioppo	438	76	47	69	6	.01
86	Agneta H. Fischer	952	75	47	69	6	.01
87	Leaf van Boven	711	72	47	67	6	.01
88	Stephanie A. Fryberg	248	62	47	66	6	.01
89	Daniel M. Wegner	602	76	47	65	6	.01
90	Anne E. Wilson	785	71	47	64	6	.01
91	Rainer Banse	402	78	46	72	6	.01
92	Alice H. Eagly	330	75	46	71	6	.01
93	Jeanne L. Tsai	1241	73	46	67	6	.01
94	Jennifer S. Lerner	181	80	46	61	6	.01
95	Andrea L. Meltzer	549	52	45	72	6	.01
96	R. Chris Fraley	642	70	45	72	7	.01
97	Constantine Sedikides	2566	71	45	70	6	.01
98	Paul Slovic	377	74	45	70	6	.01
99	Dacher Keltner	1233	72	45	64	6	.01
100	Brian A. Nosek	816	68	44	81	7	.01
101	George Loewenstein	752	71	44	72	7	.01
102	Ursula Hess	774	78	44	71	7	.01
103	Jason P. Mitchell	600	73	43	73	7	.01
104	Jessica L. Tracy	632	74	43	71	7	.01
105	Charles M. Judd	1054	76	43	68	7	.01
106	S. Alexander Haslam	1198	72	43	64	7	.01
107	Mark Schaller	565	73	43	61	7	.01
108	Susan T. Fiske	911	78	42	74	7	.01
109	Lisa Feldman Barrett	644	69	42	70	7	.01
110	Jolanda Jetten	1956	73	42	67	7	.01
111	Mario Mikulincer	901	89	42	64	7	.01
112	Bernadette Park	973	77	42	64	7	.01
113	Paul A. M. Van Lange	1092	70	42	63	7	.01
114	Wendi L. Gardner	798	67	42	63	7	.01
115	Will M. Gervais	110	69	42	59	7	.01
116	Jordan B. Peterson	266	60	41	79	7	.01
117	Philip E. Tetlock	549	79	41	73	7	.01
118	Amanda B. Diekman	438	83	41	70	7	.01
119	Daniel H. J. Wigboldus	492	76	41	67	8	.01
120	Michael Inzlicht	686	66	41	63	8	.01
121	Naomi Ellemers	2388	74	41	63	8	.01
122	Phillip Atiba Goff	299	68	41	62	7	.01
123	Stacey Sinclair	327	70	41	57	8	.01
124	Francesca Gino	2521	75	40	69	8	.01
125	Michael I. Norton	1136	71	40	69	8	.01
126	David J. Hauser	156	74	40	68	8	.01
127	Elizabeth Page-Gould	411	57	40	66	8	.01
128	Tiffany A. Ito	349	80	40	64	8	.01
129	Richard E. Petty	2771	69	40	64	8	.01
130	Tim Wildschut	1374	73	40	64	8	.01
131	Norbert Schwarz	1337	72	40	63	8	.01
132	Veronika Job	362	70	40	63	8	.01
133	Wendy Wood	462	75	40	62	8	.01
134	Minah H. Jung	156	83	39	83	8	.01
135	Marcel Zeelenberg	868	76	39	79	8	.01
136	Tobias Greitemeyer	1737	72	39	67	8	.01
137	Jason E. Plaks	582	70	39	67	8	.01
138	Carol S. Dweck	1028	70	39	63	8	.01
139	Christian S. Crandall	362	75	39	59	8	.01
140	Harry T. Reis	998	69	38	74	9	.01
141	Vanessa K. Bohns	420	77	38	74	8	.01
142	Jerry Suls	413	71	38	68	8	.01
143	Eric D. Knowles	384	68	38	64	8	.01
144	C. Nathan DeWall	1336	73	38	63	9	.01
145	Clayton R. Critcher	697	82	38	63	9	.01
146	John F. Dovidio	2019	69	38	62	9	.01
147	Joshua Correll	549	61	38	62	9	.01
148	Abigail A. Scholer	556	58	38	62	9	.01
149	Chris Janiszewski	107	81	38	58	9	.01
150	Herbert Bless	586	73	38	57	9	.01
151	Mahzarin R. Banaji	880	73	37	78	9	.01
152	Rolf Reber	280	64	37	72	9	.01
153	Kevin N. Ochsner	406	79	37	70	9	.01
154	Mark J. Brandt	277	70	37	70	9	.01
155	Geoff MacDonald	406	67	37	67	9	.01
156	Mara Mather	1038	78	37	67	9	.01
157	Antony S. R. Manstead	1656	72	37	62	9	.01
158	Lorne Campbell	433	67	37	61	9	.01
159	Sanford E. DeVoe	236	71	37	61	9	.01
160	Ayelet Fishbach	1416	78	37	59	9	.01
161	Fritz Strack	607	75	37	56	9	.01
162	Jeff T. Larsen	181	74	36	67	10	.01
163	Nyla R. Branscombe	1276	70	36	65	9	.01
164	Yaacov Schul	411	61	36	64	9	.01
165	D. S. Moskowitz	3418	74	36	63	9	.01
166	Pablo Brinol	1356	67	36	62	9	.01
167	Todd B. Kashdan	377	73	36	61	9	.01
168	Barbara L. Fredrickson	287	72	36	61	9	.01
169	Duane T. Wegener	980	77	36	60	9	.01
170	Joanne V. Wood	1093	74	36	60	9	.01
171	Daniel A. Effron	484	66	36	60	9	.01
172	Niall Bolger	376	67	36	58	9	.01
173	Craig A. Anderson	467	76	36	55	9	.01
174	Michael Harris Bond	378	73	35	84	10	.01
175	Glenn Adams	270	71	35	73	10	.01
176	Daniel M. Bernstein	404	73	35	70	10	.01
177	C. Miguel Brendl	121	76	35	68	10	.01
178	Azim F. Sharif	183	74	35	68	10	.01
179	Emily Balcetis	599	69	35	68	10	.01
180	Eva Walther	493	82	35	66	10	.01
181	Michael D. Robinson	1388	78	35	66	10	.01
182	Igor Grossmann	203	64	35	66	10	.01
183	Diana I. Tamir	156	62	35	62	10	.01
184	Samuel L. Gaertner	321	75	35	61	10	.01
185	John T. Jost	794	70	35	61	10	.01
186	Eric L. Uhlmann	457	67	35	61	10	.01
187	Nalini Ambady	1256	62	35	56	10	.01
188	Daphna Oyserman	446	55	35	54	10	.01
189	Victoria M. Esses	295	75	35	53	10	.01
190	Linda J. Levine	495	74	34	78	10	.01
191	Wiebke Bleidorn	99	63	34	74	10	.01
192	Thomas Gilovich	1193	80	34	69	10	.01
193	Alexander J. Rothman	133	69	34	65	10	.01
194	Francis J. Flynn	378	72	34	63	10	.01
195	Paula M. Niedenthal	522	69	34	61	10	.01
196	Ozlem Ayduk	549	62	34	59	10	.01
197	Paul Ekman	88	70	34	55	10	.01
198	Alison Ledgerwood	214	75	34	54	10	.01
199	Christopher R. Agnew	325	75	33	76	10	.01
200	Michelle N. Shiota	242	60	33	63	11	.01
201	Malte Friese	501	61	33	57	11	.01
202	Kerry Kawakami	487	68	33	56	10	.01
203	Danu Anthony Stinson	494	77	33	54	11	.01
204	Jennifer A. Richeson	831	67	33	52	11	.01
205	Margo J. Monteith	773	76	32	77	11	.01
206	Ulrich Schimmack	318	75	32	63	11	.01
207	Mark Snyder	562	72	32	63	11	.01
208	Michele J. Gelfand	365	76	32	63	11	.01
209	Russell H. Fazio	1094	69	32	61	11	.01
210	Eric van Dijk	238	67	32	60	11	.01
211	Tom Meyvis	377	77	32	60	11	.01
212	Eli J. Finkel	1392	62	32	57	11	.01
213	Robert B. Cialdini	379	72	32	56	11	.01
214	Jonathan W. Kunstman	430	66	32	53	11	.01
215	Delroy L. Paulhus	121	77	31	82	12	.01
216	Yuen J. Huo	132	74	31	80	11	.01
217	Gerd Bohner	513	71	31	70	11	.01
218	Christopher K. Hsee	689	75	31	63	11	.01
219	Vivian Zayas	251	71	31	60	12	.01
220	John A. Bargh	651	72	31	55	12	.01
221	Tom Pyszczynski	948	69	31	54	12	.01
222	Roy F. Baumeister	2442	69	31	52	12	.01
223	E. Ashby Plant	831	77	31	51	11	.01
224	Kathleen D. Vohs	944	68	31	51	12	.01
225	Jamie Arndt	1318	69	31	50	12	.01
226	Anthony G. Greenwald	357	72	30	83	12	.01
227	Nicholas O. Rule	1294	68	30	75	13	.01
228	Lauren J. Human	447	59	30	70	12	.01
229	Jennifer Crocker	515	68	30	67	12	.01
230	Dale T. Miller	521	71	30	64	12	.01
231	Thomas W. Schubert	353	70	30	60	12	.01
232	Joseph A. Vandello	494	73	30	60	12	.01
233	W. Keith Campbell	528	70	30	58	12	.01
234	Arthur Aron	307	65	30	56	12	.01
235	Pamela K. Smith	149	66	30	52	12	.01
236	Aaron C. Kay	1320	70	30	51	12	.01
237	Steven W. Gangestad	198	63	30	41	13	.005
238	Eliot R. Smith	445	79	29	73	13	.01
239	Nir Halevy	262	68	29	72	13	.01
240	E. Allan Lind	370	82	29	72	13	.01
241	Richard E. Nisbett	319	73	29	69	13	.01
242	Hazel Rose Markus	674	76	29	68	13	.01
243	Emanuele Castano	445	69	29	65	13	.01
244	Dirk Wentura	830	65	29	64	13	.01
245	Boris Egloff	274	81	29	58	13	.01
246	Monica Biernat	813	77	29	57	13	.01
247	Gordon B. Moskowitz	374	72	29	57	13	.01
248	Russell Spears	2286	73	29	55	13	.01
249	Jeff Greenberg	1358	77	29	54	13	.01
250	Caryl E. Rusbult	218	60	29	54	13	.01
251	Naomi I. Eisenberger	179	74	28	79	14	.01
252	Brent W. Roberts	562	72	28	77	14	.01
253	Yoav Bar-Anan	525	75	28	76	13	.01
254	Eddie Harmon-Jones	738	73	28	70	14	.01
255	Matthew Feinberg	295	77	28	69	14	.01
256	Roland Neumann	258	77	28	67	13	.01
257	Eugene M. Caruso	822	75	28	64	13	.01
258	Ulrich Kuehnen	822	75	28	64	13	.01
259	Elizabeth W. Dunn	395	75	28	64	14	.01
260	Jeffry A. Simpson	697	74	28	55	13	.01
261	Sander L. Koole	767	65	28	52	14	.01
262	Richard J. Davidson	380	64	28	51	14	.01
263	Shelly L. Gable	364	64	28	50	14	.01
264	Adam D. Galinsky	2154	70	28	49	13	.01
265	Grainne M. Fitzsimons	585	68	28	49	14	.01
266	Geoffrey J. Leonardelli	290	68	28	48	14	.005
267	Joshua Aronson	183	85	28	46	14	.005
268	Henk Aarts	1003	67	28	45	14	.005
269	Vanessa K. Bohns	422	76	27	74	15	.01
270	Jan De Houwer	1972	70	27	72	14	.01
271	Dan Ariely	600	70	27	69	14	.01
272	Charles Stangor	185	81	27	68	15	.01
273	Karl Christoph Klauer	801	67	27	65	14	.01
274	Mario Gollwitzer	500	58	27	62	14	.01
275	Jennifer S. Beer	80	56	27	54	14	.01
276	Eldar Shafir	107	78	27	51	14	.01
277	Guido H. E. Gendolla	422	76	27	47	14	.005
278	Klaus R. Scherer	467	83	26	78	15	.01
279	William G. Graziano	532	71	26	66	15	.01
280	Galen V. Bodenhausen	585	74	26	61	15	.01
281	Sonja Lyubomirsky	530	71	26	59	15	.01
282	Kai Sassenberg	872	71	26	56	15	.01
283	Kristin Laurin	648	63	26	51	15	.01
284	Claude M. Steele	434	73	26	42	15	.005
285	David G. Rand	392	70	25	81	15	.01
286	Paul Bloom	502	72	25	79	16	.01
287	Kerri L. Johnson	532	76	25	76	15	.01
288	Batja Mesquita	416	71	25	73	16	.01
289	Rebecca J. Schlegel	261	67	25	71	15	.01
290	Phillip R. Shaver	566	81	25	71	16	.01
291	David Dunning	818	74	25	70	16	.01
292	Laurie A. Rudman	482	72	25	68	16	.01
293	David A. Lishner	105	65	25	63	16	.01
294	Mark J. Landau	950	78	25	45	16	.005
295	Ronald S. Friedman	183	79	25	44	16	.005
296	Joel Cooper	257	72	25	39	16	.005
297	Alison L. Chasteen	223	68	24	69	16	.01
298	Jeff Galak	313	73	24	68	17	.01
299	Steven J. Sherman	888	74	24	62	16	.01
300	Shigehiro Oishi	1109	64	24	61	17	.01
301	Thomas Mussweiler	604	70	24	43	17	.005
302	Mark W. Baldwin	247	72	24	41	17	.005
303	Evan P. Apfelbaum	256	62	24	41	17	.005
304	Nurit Shnabel	564	76	23	78	18	.01
305	Klaus Rothermund	738	71	23	76	18	.01
306	Felicia Pratto	410	73	23	75	18	.01
307	Jonathan Haidt	368	76	23	73	17	.01
308	Roland Imhoff	365	74	23	73	18	.01
309	Jeffrey W Sherman	992	68	23	71	17	.01
310	Jennifer L. Eberhardt	202	71	23	62	18	.005
311	Bernard A. Nijstad	693	71	23	52	18	.005
312	Brandon J. Schmeichel	652	66	23	45	17	.005
313	Sam J. Maglio	325	72	23	42	17	.005
314	David M. Buss	461	82	22	80	19	.01
315	Yoel Inbar	280	67	22	71	19	.01
316	Serena Chen	865	72	22	67	19	.005
317	Spike W. S. Lee	145	68	22	64	19	.005
318	Marilynn B. Brewer	314	75	22	62	18	.005
319	Michael Ross	1164	70	22	62	18	.005
320	Dieter Frey	1538	68	22	58	18	.005
321	G. Daniel Lassiter	189	82	22	55	19	.01
322	Sean M. McCrea	584	73	22	54	19	.005
323	Wendy Berry Mendes	965	68	22	44	19	.005
324	Paul W. Eastwick	583	65	21	69	19	.005
325	Kees van den Bos	1150	84	21	69	20	.005
326	Maya Tamir	1342	80	21	64	19	.005
327	Joseph P. Forgas	888	83	21	59	19	.005
328	Michaela Wanke	362	74	21	59	19	.005
329	Dolores Albarracin	540	66	21	56	20	.005
330	Elizabeth Levy Paluck	31	84	21	55	20	.005
331	Vanessa LoBue	299	68	20	76	21	.01
332	Christopher J. Armitage	160	62	20	73	21	.005
333	Elizabeth A. Phelps	686	78	20	72	21	.005
334	Jay J. van Bavel	437	64	20	71	21	.005
335	David A. Pizarro	227	71	20	69	21	.005
336	Andrew J. Elliot	1018	81	20	67	21	.005
337	William A. Cunningham	238	76	20	64	22	.005
338	Laura D. Scherer	212	69	20	64	21	.01
339	Kentaro Fujita	458	69	20	62	21	.005
340	Geoffrey L. Cohen	1590	68	20	50	21	.005
341	Ana Guinote	378	76	20	47	21	.005
342	Tanya L. Chartrand	424	67	20	33	21	.001
343	Selin Kesebir	328	66	19	73	22	.005
344	Vincent Y. Yzerbyt	1412	73	19	73	22	.01
345	James K. McNulty	1047	56	19	65	23	.005
346	Robert S. Wyer	871	82	19	63	22	.005
347	Travis Proulx	174	63	19	62	22	.005
348	Peter M. Gollwitzer	1303	64	19	58	22	.005
349	Nilanjana Dasgupta	383	76	19	52	22	.005
350	Jamie L. Goldenberg	568	77	19	50	22	.01
351	Richard P. Eibach	753	69	19	47	23	.001
352	Gerald L. Clore	456	74	19	45	22	.001
353	James M. Tyler	130	87	18	74	24	.005
354	Roland Deutsch	365	78	18	71	24	.005
355	Ed Diener	498	64	18	68	24	.005
356	Kennon M. Sheldon	698	74	18	66	23	.005
357	Wilhelm Hofmann	624	67	18	66	23	.005
358	Laura L. Carstensen	723	77	18	64	24	.005
359	Toni Schmader	546	69	18	61	24	.005
360	Frank D. Fincham	734	69	18	59	24	.005
361	David K. Sherman	1128	61	18	57	24	.005
362	Lisa K. Libby	418	65	18	54	24	.005
363	Chen-Bo Zhong	327	68	18	49	25	.005
364	Stefan C. Schmukle	114	62	17	71	26	.005
365	Michel Tuan Pham	246	86	17	68	25	.005
366	Leandre R. Fabrigar	632	70	17	67	26	.005
367	Neal J. Roese	368	64	17	65	25	.005
368	Carey K. Morewedge	633	76	17	65	26	.005
369	Timothy D. Wilson	798	65	17	63	26	.005
370	Brad J. Bushman	897	74	17	62	25	.005
371	Ara Norenzayan	225	72	17	61	25	.005
372	Benoit Monin	635	65	17	56	25	.005
373	Michael W. Kraus	617	72	17	55	26	.005
374	Ad van Knippenberg	683	72	17	55	26	.001
375	E. Tory. Higgins	1868	68	17	54	25	.001
376	Ap Dijksterhuis	750	68	17	54	26	.005
377	Joseph Cesario	146	62	17	45	26	.001
378	Simone Schnall	270	62	17	31	26	.001
379	Joshua M. Ackerman	380	53	16	70	13	.01
380	Melissa J. Ferguson	1163	72	16	69	27	.005
381	Laura A. King	391	76	16	68	29	.005
382	Daniel T. Gilbert	724	65	16	65	27	.005
383	Charles S. Carver	154	82	16	64	28	.005
384	Leif D. Nelson	409	74	16	64	28	.005
385	David DeSteno	201	83	16	57	28	.005
386	Sandra L. Murray	697	60	16	55	28	.001
387	Heejung S. Kim	858	59	16	55	29	.001
388	Mark P. Zanna	659	64	16	48	28	.001
389	Nira Liberman	1304	75	15	65	31	.005
390	Gun R. Semin	159	79	15	64	29	.005
391	Tal Eyal	439	62	15	62	29	.005
392	Nathaniel M Lambert	456	66	15	59	30	.001
393	Angela L. Duckworth	122	61	15	55	30	.005
394	Dana R. Carney	200	60	15	53	30	.001
395	Garriy Shteynberg	168	54	15	31	30	.005
396	Lee Ross	349	77	14	63	31	.001
397	Arie W. Kruglanski	1228	78	14	58	33	.001
398	Ziva Kunda	217	67	14	56	31	.001
399	Shelley E. Taylor	427	69	14	52	31	.001
400	Jon K. Maner	1040	65	14	52	32	.001
401	Gabriele Oettingen	1047	61	14	49	33	.001
402	Nicole L. Mead	240	70	14	46	33	.01
403	Gregory M. Walton	587	69	14	44	33	.001
404	Michael A. Olson	346	65	13	63	35	.001
405	Fiona Lee	221	67	13	58	34	.001
406	Melody M. Chao	237	57	13	58	36	.001
407	Adam L. Alter	314	78	13	54	36	.001
408	Sarah E. Hill	509	78	13	52	34	.001
409	Jaime L. Kurtz	91	55	13	38	37	.001
410	Michael A. Zarate	120	52	13	31	36	.001
411	Jennifer K. Bosson	659	76	12	64	40	.001
412	Daniel M. Oppenheimer	198	80	12	60	37	.001
413	Deborah A. Prentice	89	80	12	57	38	.001
414	Yaacov Trope	1277	73	12	57	38	.001
415	Oscar Ybarra	305	63	12	55	40	.001
416	William von Hippel	398	65	12	48	40	.001
417	Steven J. Spencer	541	67	12	44	38	.001
418	Martie G. Haselton	186	73	11	54	43	.001
419	Shelly Chaiken	360	74	11	52	44	.001
420	Susan M. Andersen	361	74	11	48	43	.001
421	Dov Cohen	641	68	11	44	41	.001
422	Mark Muraven	496	52	11	44	41	.001
423	Ian McGregor	409	66	11	40	41	.001
424	Hans Ijzerman	214	56	9	46	51	.001
425	Linda M. Isbell	115	64	9	41	50	.001
426	Cheryl J. Wakslak	278	73	8	35	59	.001

Ioannidis is Wrong Most of the Time

December 24, 2020False Discovery Rate, Ioannidis, Most Published Research Findings are False, Science-Wise False Discovery RateUlrich Schimmack

John P. A. Ioannidis is a rock star in the world of science (wikipedia).

By traditional standards of science, he is one of the most prolific and influential scientists alive. He has published over 1,000 articles that have been cited over 100,000 times.

He is best known for the title of his article “Why most published research findings are false” that has been cited nearly 5,000 times. The irony of this title is that it may also apply to Ioannidis, especially because there is a trade-off between quality and quantity in publishing.

Fact Checking Ioannidis

The title of Ioannidis’s article implies a factual statement: “Most published results ARE false.” However, the actual article does not contain empirical data to support this claim. Rather, Ioannidis presents some hypothetical scenarios that show under what conditions published results MAY BE false.

To produce mostly false findings, a literature has to meet two conditions.

First, it has to test mostly false hypotheses.
Second, it has to test hypotheses in studies with low statistical power, that is a low probability of producing true positive results.

To give a simple example, imagine a field that tests only 10% true hypothesis with just 20% power. As power predicts the percentage of true discoveries, only 2 out of the 10 true hypothesis will be significant. Meanwhile, the alpha criterion of 5% implies that 5% of the false hypotheses will also produce a significant result. Thus, 5 of the 90 false hypotheses will also produce a significant result. As a result, there will be two times more false positives (4.5 over 100) than true positives (2 over 100).

These relatively simple calculations were well known by 2005 (Soric, 1989). Thus, why did Ioannidis article have such a big impact? The answer is that Ioannidis convinced many people that his hypothetical examples are realistic and describe most areas in science.

2020 has shown that Ioannidis’s claim does not apply to all areas of science. In amazing speed, bio-tech companies were able to make not just one but several successful vaccine’s with high effectiveness. Clearly some sciences are making real progress. On the other hand, other areas of science suggest that Ioannidis’s claims were accurate. For example, the whole literature on single-gene variations as predictors of human behavior has produced mostly false claims. Social psychology has a replication crisis where only 25% of published results could be replicated (OSC, 2015).

Aside from this sporadic and anecdotal evidence, it remains unclear how many false results are published in science as a whole. The reason is that it is impossible to quantify the number of false positive results in science. Fortunately, it is not necessary to know the actual rate of false positives to test Ioannidis’s prediction that most published results are false positives. All we need to know is the discovery rate of a field (Soric, 1989). The discovery rate makes it possible to quantify the maximum percentage of false positive discoveries. If the maximum false discovery rate is well below 50%, we can reject Ioannidis’s hypothesis that most published results are false.

The empirical problem is that the observed discovery rate in a field may be inflated by publication bias. It is therefore necessary to estimate the amount of publication bias and if necessary correct the discovery rate, if publication bias is present.

In 2005, Ioannidis and Trikalinos (2005) developed their own test for publication bias, but this test had a number of shortcomings. First, it could be biased in heterogeneous literatures. Second, it required effect sizes to compute power. Third, it only provided information about the presence of publication bias and did not quantify it. Fourth, it did not provide bias-corrected estimates of the true discovery rate.

When the replication crisis became apparent in psychology, I started to develop new bias tests that address these limitations (Bartos & Schimmack, 2020; Brunner & Schimmack, 2020; Schimmack, 2012). The newest tool, called z-curve.2.0 (and yes, there is a app for that), overcomes all of the limitations of Ioannidis’s approach. Most important, it makes it possible to compute a bias-corrected discovery rate that is called the expected discovery rate. The expected discovery rate can be used to examine and quantify publication bias by comparing it to the observed discovery rate. Moreover, the expected discovery rate can be used to compute the maximum false discovery rate.

The Data

The data were compiled by Simon Schwab from the Cochrane database (https://www.cochrane.org/) that covers results from thousands of clinical trials. The data are publicly available (https://osf.io/xjv9g/) under a CC-By Attribution 4.0 International license (“Re-estimating 400,000 treatment effects from intervention studies in the Cochrane Database of Systematic Reviews”; (see also van Zwet, Schwab, & Senn, 2020).

Studies often report results for several outcomes. I selected only results for the primary outcome. It is often suggested that researchers switch outcomes to produce significant results. Thus, primary outcomes are the most likely to show evidence of publication bias, while secondary outcomes might even be biased to show more negative results for the same reason. The choice of primary outcomes also ensures that the test statistics are statistically independent because they are based on independent samples.

Results

I first fitted the default model to the data. The default model assumes that publication bias is present and only uses statistically significant results to fit the model. Z-curve.2.0 uses a finite mixture model to approximate the observed distribution of z-scores with a limited number of non-centrality parameters. After finding optimal weights for the components, power can be computed as the weighted average of the implied power of the components (Bartos & Schimmack, 2020). Bootstrapping is used to compute 95% confidence intervals that have shown to have good coverage in simulation studies (Bartos & Schimmack, 2020).

The main finding with the default model is that the model (grey curve) fits the observed distribution of z-scores very well in the range of significant results. However, z-curve has problems extrapolating from significant results to the distribution of non-significant results. In this case, the model (grey curve) underestimates the amount of non-significant results. Thus, there is no evidence of publication bias. This is seen in a comparison of the observed and expected discovery rates. The observed discovery rate of 26% is lower than the expected discovery rate of 38%.

When there is no evidence of publication bias, there is no reason to fit the model only to the significant results. Rather, the model can be fitted to the full distribution of all test statistics. The results are shown in Figure 2.

The key finding for this blog post is that the estimated discovery rate of 27% closely matches the observed discovery rate of 26%. Thus, there is no evidence of publication bias. In this case, simply counting the percentage of significant results provides a valid estimate of the discovery rate in clinical trials. Roughly one-quarter of trials end up with a positive result. The new question is how many of these results might be false positives.

To maximize the rate of false positives, we have to assume that true positives were obtained with maximum power (Soric, 1989). In this scenario, we could get as many as 14% (4 over 27) false positive results.

Even if we use the upper limit of the 95% confidence interval, we only get 19% false positives. Moreover, it is clear that Soric’s (1989) scenario overestimate the false discovery rate because it is unlikely that all tests of true hypotheses have 100% power.

In short, an empirical test of Ioannidis’s hypothesis that most published results in science are false shows that this claim is at best a wild overgeneralization. It is not true for clinical trials in medicine. In fact, the real problem is that many clinical trials may be underpowered to detect clinically relevant effects. This can be seen in the estimated replication rate of 61%, which is the mean power of studies with significant results. This estimate of power includes false positives with 5% power. If we assume that 14% of the significant results are false positives, the conditional power based on a true discovery is estimated to be 70% (14 * .05 + 86 * . 70 = .61).

With information about power, we can modify Soric’s worst case scenario and change power from 100% to 70%. This has only a small influence on the false positive discovery rate that decreases to 11% (3 over 27). However, the rate of false negatives increases from 0 to 14% (10 over 74). This also means that there are now three-times more false negatives than false positives (10 over 3).

Even this scenario overestimates power of studies that produced false negative results because power of studies with significant results is higher than power of studies that produced non-significant results when power is heterogenous (Brunner & Schimmack, 2020). In the worst case scenario, the null-hypothesis may rarely be true and power of studies with non-significant results could be as low as 14.5%. To explain, if we redo all of the studies, we expected that 61% of the significant studies produce a significant result again, producing 16.5% significant results. We also expect that the discovery rate will be 27% again. Thus, the remaining 73% of studies have to make up the difference between 27% and 16.5%, which is 10.5%. For 73 studies to produce 10.5 significant results, the studies have to have 14.5% power. 27 = 27 * .61 + 73 * .145.

In short, while Ioannidis predicted that most published results are false positives, it is much more likely that most published results are false negatives. This problem is of course not new. To make conclusions about effectiveness of treatments, medical researchers usually do not rely on a single clinical trial. Rather results of several studies are combined in a meta-analysis. As long as there is no publication bias, meta-analyses of original studies can boost power and reduce the risk of false negative results. It is therefore encouraging that the present results suggest that there is relatively little publication bias in these studies. Additional analyses for subgroups of studies can be conducted, but are beyond the main point of this blog post.

Conclusion

Ioannidis wrote an influential article that used hypothetical scenarios to make the prediction that most published results are false positives. Although this article is often cited as if it contained evidence to support this claim, the article contained no empirical evidence. Surprisingly, there also have been few attempts to test Ioannidis’s claim empirically. Probably the main reason is that nobody knew how to test it. Here I showed a way to test Ioannidis’s claim and I presented clear empirical evidence that contradicts this claim in Ioannidis’s own field of science, namely medicine.

The main feature that distinguishes science and fiction is not that science is always right. Rather, science is superior because proper use of the scientific method allows for science to correct itself, when better data become available. In 2005, Ioannidis had no data and no statistical method to prove his claim. Fifteen years later, we have good data and a scientific method to test his claim. It is time for science to correct itself and to stop making unfounded claims that science is more often wrong than right.

The danger of not trusting science has been on display this year, where millions of Americans ignored good scientific evidence, leading to the unnecessary death of many US Americans. So far, 330, 000 US Americans are estimated to have died of Covid-19. In a similar country like Canada, 14,000 Canadians have died so far. To adjust for population, we can compare the number of deaths per million, which is 1000 in the USA and 400 in Canada. The unscientific approach to the pandemic in the US may explain some of this discrepancy. Along with the development of vaccines, it is clear that science is not always wrong and can save lives. Iannaidis (2005) made unfounded claims that success stories are the exception rather than the norm. At least in medicine, intervention studies show real successes more often than false ones.

The Covid-19 pandemic also provides another example where Ioannidis used off-the-cuff calculations to make big claims without any evidence. In a popular article titled “A fiasco in the making” he speculated that the Covid-19 virus might be less deadly than the flu and suggested that policies to curb the spread of the virus were irrational.

As the evidence accumulated, it became clear that the Covid-19 virus is claiming many more lives than the flu, despite policies that Ioannidis considered to be irrational. Scientific estimates suggest that Covid-19 is 5 to 10 times more deadly than the flu (BNN), not less deadly as Ioannidis implied. Once more, Ioannidis quick, unempirical claims were contradicted by hard evidence. It is not clear how many of his other 1,000 plus articles are equally questionable.

To conclude, Ioannidis should be the last one to be surprised that several of his claims are wrong. Why should he be better than other scientists? The question is only how he deals with this information. However, for science it is not important whether scientists correct themselves. Science corrects itself by replacing old, false information with better information. One question is what science does with false and misleading information that is highly cited.

If YouTube can remove a video with Ioannidis’s false claims about Covid-19 (WP), maybe PLOS Medicine can retract an article with the false claim that “most published results in science are false”.

The attention-grabbing title is simply misleading because nothing in the article supports the claim. Moreover, actual empirical data contradict the claim at least in some domains. Most claims in science are not false and in a world with growing science skepticism spreading false claims about science may be just as deadly as spreading false claims about Covid-19.

If we learned anything from 2020, it is that science and democracy are not perfect, but a lot better than superstition and demagogy.

I wish you all a happier 2021.

Soric’s Maximum False Discovery Rate

January 31, 2020Expected Discovery Rate, False Discovery Rate, Ioannidis, Most Published Research Findings are False, Observed Discovery Rate, Publication Bias, Science-Wise False Discovery Rate, SoricUlrich Schimmack

Originally published January 31, 2020
Revised December 27, 2020

Psychologists, social scientists, and medical researchers often conduct empirical studies with the goal to demonstrate an effect (e.g., a drug is effective). They do so by rejecting the null-hypothesis that there is no effect, when a test statistic falls into a region of improbable test-statistics, p < .05. This is called null-hypothesis significance testing (NHST).

The utility of NHST has been a topic of debate. One of the oldest criticisms of NHST is that the null-hypothesis is likely to be false most of the time (Lykken, 1968). As a result, demonstrating a significant result adds little information, while failing to do so because studies have low power creates false information and confusion.

This changed in the 2000s, when the opinion emerged that most published significant results are false (Ioannidis, 2005; Simmons, Nelson, & Simonsohn, 2011). In response, there have been some attempts to estimate the actual number of false positive results (Jager & Leek, 2013). However, there has been surprisingly little progress towards this goal.

One problem for empirical tests of the false discovery rate is that the null-hypothesis is an abstraction. Just like it is impossible to say the number of points that make up the letter X, it is impossible to count null-hypotheses because the true population effect size is always unknown (Zhao, 2011, JASA).

An article by Soric (1989, JASA) provides a simple solution to this problem. Although this article was influential in stimulating methods for genome-wide association studies (Benjamin & Hochberg, 1995, over 40,000) citations, the article itself has garnered fewer than 100 citations. Yet, it provides a simple and attractive way to examine how often researchers may be obtaining significant results when the null-hypothesis is true. Rather than trying to estimate the actual false discovery rate, the method estimates the maximum false discovery rate. If a literature has a low maximum false discovery rate, readers can be assured that most significant results are true positives.

The method is simple because researchers do not have to determine whether a specific finding was a true or false positive result. Rather, the maximum false discovery rate can be computed from the actual discovery rate (i.e., the percentage of significant results for all tests).

The logic of Soric’s (1989) approach is illustrated in Tables 1.

	NS	SIG
TRUE	0	60	60
FALSE	760	40	800
	760	100	860

Table 1

To maximize the false discovery rate, we make the simplifying assumption that all tests of true hypotheses (i.e., the null-hypothesis is false) are conducted with 100% power (i.e., all tests of true hypotheses produce a significant result). In Table 1, this leads to 60 significant results for 60 true hypotheses. The percentage of significant results for false hypotheses (i.e., the null-hypothesis is true) is given by the significance criterion, which is set at the typical level of 5%. This means that for every 20 tests, there are 19 non-significant results and one false positive result. In Table 1 this leads to 40 false positive results for 800 tests.

In this example, the discovery rate is (40 + 60)/860 = 11.6%. Out of these 100 discoveries, 60 are true discoveries and 40 are false discoveries. Thus, the false discovery rate is 40/100 = 40%.

Soric’s (1989) insight makes it easy to examine empirically whether a literature tests many false hypotheses, using a simple formula to compute the maximum false discovery rate from the observed discovery rate; that is, the percentage of significant results. All we need to do is count and use simple math to obtain valuable information about the false discovery rate.

However, a major problem with Soric’s approach is that the observed discovery rate in a literature may be misleading because journals are more likely to publish significant results than non-significant results. This is known as publication bias or the file-drawer problem (Rosenthal, 1979). In some sciences, publication bias is a big problem. Sterling (1959; also Sterling et al., 1995) found that the observed discovery rated in psychology is over 90%. Rather than suggesting that psychologists never test false hypotheses, it rather suggests that publication bias is particularly strong in psychology (Fanelli, 2010). Using these inflated discovery rates to estimate the maximum FDR would severely understimate the actual risk of false positive results.

Recently, Bartoš and Schimmack (2020) developed a statistical model that can correct for publication bias and produce a bias-corrected estimate of the discovery rate. This is called the expected discovery rate. A comparison of the observed discovery rate (ODR) and the expected discovery rate (EDR) can be used to assess the presence and extent of publication bias. In addition, the EDR can be used to compute Soric’s maximum false discovery rate when publication bias is present and inflates the ODR.

To demonstrate this approach, I I use test-statistics from the journal Psychonomic Bulletin and Review. The choice of this journal is motivated by prior meta-psychological investigations of results published in this journal. Gronau, Duizer, Bakker, and Wagenmakers (2017) used a Bayesian Mixture Model to estimate that about 40% of results published in this journal are false positive results. Using Soric’s formula in reverse shows that this estimate implies that cognitive psychologists test only 10% true hypotheses (Table 3; 72/172 = 42%). This is close to Dreber, Pfeiffer, Almenber, Isakssona, Wilsone, Chen, Nosek, and Johannesson’s (2015) estimate of only 9% true hypothesis in cognitive psychology.

	NS	SIG
TRUE	0	100	100
FALSE	1368	72	900
	1368	172	1000

Table 3

These results are implausible because rather different results are obtained when Soric’s method is applied to the results from the Open Science Collaboration (2015) project that conducted actual replication studies and found that 50% of published significant results could be replicated; that is, produced a significant results again in the replication study. As there was no publication bias in the replication studies, the ODR of 50% can be used to compute the maximum false discovery rate, which is only 5%. This is much lower than the estimate obtained with Gronau et al.’s (2018) mixture model.

I used an R-script to automatically extract test-statistics from articles that were published in Psychonomic Bulletin and Review from 2000 to 2010. I limited the analysis to this period because concerns about replicability and false positives might have changed research practices after 2010. The program extracted 13,571 test statistics.

Figure 1 shows clear evidence of selection bias. The observed discovery rate of 70% is much higher than the estimated discovery rate of 35% and the 95%CI of the EDR, 25% to 53% does not include the ODR. As a result, the ODR produces an inflated estimate of the actual discover rate and cannot be used to compute the maximum false discovery rate.

However, even with a much lower estimated discovery rate of 36%, the maximum false discovery rate is only 10%. Even with the lower bound of the confidence interval for the EDR of 25%, the maximum FDR is only 16%.

Figure 2 shows the results for a replication with test statistics from 2011 to 2019. Although changes in research practices could have produced different results, the results are unchanged. The ODR is 69% vs. 70%; the EDR is 38% vs. 35% and the point estimate of the maximum FDR is 9% vs. 10%. This close replication also implies that research practices in cognitive psychology have not changed over the past decade.

The maximum FDR estimates of 10% confirms the results based on the replication rate in a small set of actual replication studies (OSC, 2015) with a much larger sample of test statistics. The results also show that Gronau et al.’s mixture model produces dramatically inflated estimates of the false discovery rate (see also Brunner & Schimmack, 2019, for a detailed discussion of their flawed model).

In contrast to cognitive psychology, social psychology has seen more replication failures. The OSC project estimated a discovery rate of only 25%. Even this low rate would imply that a maximum of 16% of discoveries in social psychology are false positives. A z-curve analysis of a representative sample of 678 focal tests in social psychology produced an estimated discovery rate of 19% with a 95%CI ranging from 6% to 36% (Schimmack, 2020). The point estimate implies a maximum FDR of 22%, but the lower limit of the confidence interval allows for a maximum FDR of 82%. Thus, social psychology may be a literature where most published results are false. However, the replication crisis in social psychology should not be generalized to other disciplines.

Conclusion

Numerous articles have made claims that false discoveries are rampant (Dreber et al., 2015; Gronau et al., 2015; Ioannidis, 2005; Simmons et al., 2011). However, these articles did not provide empirical data to support their claim. In contrast, empirical studies of the false discovery risk usually show much lower rates of false discoveries (Jager & Leek, 2013), but this finding has been dismissed (Ioannidis, 2014) or ignored (Gronau et al., 2018). Here I used a simpler approach to estimate the maximum false discovery rate and showed that most significant results in cognitive psychology are true discoveries. I hope that this demonstration revives attempts to estimate the science-wise false discovery rate (Jager & Leek, 2013) rather than relying on hypothetical scenarios or models that reflect researchers’ prior beliefs that may not match actual data (Gronau et al., 2018; Ioannidis, 2005).

References

Bartoš, F., & Schimmack, U. (2020, January 10). Z-Curve.2.0: Estimating Replication Rates and Discovery Rates. https://doi.org/10.31234/osf.io/urgtn

Dreber A., Pfeiffer T., Almenberg, J., Isaksson S., Wilson B., Chen Y., Nosek B. A., Johannesson, M. (2015). Prediction markets in science. Proceedings of the National Academy of Sciences, 50, 15343-15347. DOI: 10.1073/pnas.1516179112

Fanelli D (2010) Positive” Results Increase Down the Hierarchy of the Sciences. PLOS ONE 5(4): e10068. https://doi.org/10.1371/journal.pone.0010068

Gronau, Q. F., Duizer, M., Bakker, M., & Wagenmakers, E.-J. (2017). Bayesian mixture modeling of significant p values: A meta-analytic method to estimate the degree of contamination from H₀. Journal of Experimental Psychology: General, 146(9), 1223–1233. https://doi.org/10.1037/xge0000324

Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLOS Medicine 2(8): e124. https://doi.org/10.1371/journal.pmed.0020124

Ioannidis JP. (2014). Why “An estimate of the science-wise false discovery rate and application to the top medical literature” is false. Biostatistics, 15(1), 28-36.
DOI: 10.1093/biostatistics/kxt036.

Jager, L. R., & Leek, J. T. (2014). An estimate of the science-wise false discovery rate and application to the top medical literature. Biostatistics, 15(1), 1-12.
DOI: 10.1093/biostatistics/kxt007

Lykken, D. T. (1968). Statistical significance in psychological research. Psychological Bulletin, 70(3, Pt.1), 151–159. https://doi.org/10.1037/h0026141

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), 1–8.

Schimmack, U. (2019). The Bayesian Mixture Model is fundamentally flawed. https://replicationindex.com/2019/04/01/the-bayesian-mixture-model-is-fundamentally-flawed/

Schimmack, U. (2020). A meta-psychological perspective on the decade of replication failures in social psychology. Canadian Psychology/Psychologie canadienne, 61(4), 364–376. https://doi.org/10.1037/cap0000246

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science, 22(11), 1359–1366.
https://doi.org/10.1177/0956797611417632

Soric, B. (1989). Statistical “Discoveries” and Effect-Size Estimation. Journal of the American Statistical Association, 84(406), 608-610. doi:10.2307/2289950

Zhao, Y. (2011). Posterior Probability of Discovery and Expected Rate of Discovery for Multiple Hypothesis Testing and High Throughput Assays. Journal of the American Statistical Association, 106, 984-996, DOI: 10.1198/jasa.2011.tm09737

Replicability-Index

Improving the replicability of empirical research

Category Archives: False Discovery Rate

Estimating the False Discovery Risk of Psychology Science

Abstract

Introduction

Appendix

Ioannidis is Wrong Most of the Time

Fact Checking Ioannidis

The Data

Results

Conclusion

Soric’s Maximum False Discovery Rate

Conclusion

References