Preliminary 2017 Replicability Rankings of 104 Psychology Journals

Please see the latest 2018 rankings. (LINK)

The table shows the preliminary 2017 rankings of 104 psychology journals.  A description of the methodology and analyses of by discipline and time are reported below the table.

Rank   Journal 2017 2016 2015 2014 2013 2012 2011 2010
1 European Journal of Developmental Psychology 93 88 67 83 74 71 79 65
2 Journal of Nonverbal Behavior 93 72 66 74 81 73 64 70
3 Behavioral Neuroscience 86 67 71 70 69 71 68 73
4 Sex Roles 83 83 75 71 73 78 77 74
5 Epilepsy & Behavior 82 82 82 85 85 81 87 77
6 Journal of Anxiety Disorders 82 77 73 77 76 80 75 77
7 Attention, Perception and Psychophysics 81 71 73 77 78 80 75 73
8 Cognitive Development 81 73 82 73 69 73 67 65
9 Judgment and Decision Making 81 79 78 78 67 75 70 74
10 Psychology of Music 81 80 72 73 77 72 81 86
11 Animal Behavior 80 74 71 72 72 71 70 78
12 Early Human Development 80 92 86 83 79 70 64 81
13 Journal of Experimental Psychology – Learning, Memory & Cognition 80 80 79 80 77 77 71 81
14 Journal of Memory and Language 80 84 81 74 77 73 80 76
15 Memory and Cognition 80 75 79 76 77 78 76 76
16 Social Psychological and Personality Science 80 67 61 65 61 58 63 55
17 Journal of Positive Psychology 80 70 72 72 64 64 73 81
18 Archives of Sexual Behavior 79 79 81 80 83 79 78 87
19 Consciousness and Cognition 79 71 69 73 67 70 73 74
20 Journal of Applied Psychology 79 80 74 76 69 74 72 73
21 Journal of Experimental Psychology – Applied 79 67 68 75 68 74 74 72
22 Journal of Experimental Psychology – General 79 75 73 73 76 69 74 69
23 Journal of Experimental Psychology – Human Perception and Performance 79 78 76 77 76 78 78 75
24 Journal of Personality 79 75 72 68 72 75 73 82
25 JPSP-Attitudes & Social Cognition 79 57 75 69 50 62 61 61
26 Personality and Individual Differences 79 79 79 78 78 76 74 73
27 Social Development 79 78 66 75 73 72 73 75
28 Appetite 78 74 69 66 75 72 74 77
29 Cognitive Behavioral Therapy 78 82 76 65 72 82 71 62
30 Journal of Comparative Psychology 78 77 76 83 83 75 69 64
31 Journal of Consulting and Clinical Psychology 78 71 68 65 66 66 69 68
32 Neurobiology of Learning and Memory 78 72 75 72 71 70 75 73
33 Psychonomic Bulletin and Review 78 79 82 79 82 72 71 78
34 Acta Psychologica 78 75 73 78 76 75 77 75
35 Behavior Therapy 77 74 71 75 76 78 64 76
36 Journal of Affective Disorders 77 85 84 77 83 82 76 76
37 Journal of Child and Family Studies 77 76 69 71 76 71 76 77
38 Journal of Vocational Behavior 77 85 84 69 82 79 86 74
39 Motivation and Emotion 77 64 67 66 67 65 79 68
40 Psychology and Aging 77 79 78 80 74 78 78 74
41 Psychophysiology 77 77 70 69 68 70 80 78
42 Britsh Journal of Social Psychology 76 65 66 62 64 60 72 63
43 Cognition 76 74 75 75 77 76 73 73
44 Cognitive Psychology 76 80 74 76 79 72 82 75
45 Developmental Psychology 76 77 77 75 71 68 70 70
46 Emotion 76 72 69 69 72 70 70 73
47 Frontiers in Behavioral Neuroscience 76 70 71 68 71 72 73 70
48 Frontiers in Psychology 76 75 73 73 72 72 70 82
49 Journal of Autism and Developmental Disorders 76 77 73 67 73 70 70 72
50 Journal of Social and Personal Relationships 76 82 60 63 69 67 79 83
51 Journal of Youth and Adolescence 76 88 81 82 79 76 79 74
52 Cognitive Therapy and Research 75 71 72 62 77 75 70 66
53 Depression & Anxiety 75 78 73 76 82 79 82 84
54 Journal of Child Psychology and Psychiatry and Allied Disciplines 75 63 66 66 72 76 58 66
55 Journal of Occupational and Organizational Psychology 75 85 84 71 77 77 74 67
56 Journal of Social Psychology 75 75 74 67 65 80 71 75
57 Political Psychology 75 81 75 72 75 74 51 70
58 Social Cognition 75 68 68 73 62 78 71 60
59 British Journal of Developmental Psychology 74 77 74 63 61 85 77 79
60 Evolution & Human Behavior 74 81 75 79 67 77 78 68
61 Journal of Research in Personality 74 77 82 80 79 73 74 71
62 Memory 74 79 66 83 73 71 76 78
63 Psychological Medicine 74 83 71 79 79 68 79 75
64 Psychopharmacology 74 75 73 73 71 73 73 71
65 Psychological Science 74 69 70 64 65 64 62 63
66 Behavioural Brain Research 73 69 75 69 71 72 73 74
67 Behaviour Research and Therapy 73 74 76 77 74 77 68 71
68 Journal of Cross-Cultural Psychology 73 75 80 78 78 71 76 76
69 Journal of Experimental Child Psychology 73 73 78 74 74 72 72 76
70 Personality and Social Psychology Bulletin 73 71 65 65 61 61 62 61
71 Social Psychology 73 75 72 74 69 64 75 74
72 Developmental Science 72 68 68 66 71 68 68 66
73 Journal of Cognition and Development 72 78 68 64 69 62 66 70
74 Law and Human Behavior 72 76 76 61 76 76 84 72
75 Perception 72 78 79 74 78 85 94 91
76 Journal of Applied Social Psychology 71 81 69 72 71 80 74 75
77 Journal of Experimental Social Psychology 71 68 63 61 58 56 58 57
78 Annals of Behavioral Medicine 70 70 62 71 71 77 75 71
79 Frontiers in Human Neuroscience 70 74 73 74 75 75 75 72
80 Health Psychology 70 63 68 69 68 63 70 72
81 Journal of Abnormal Child Psychology 70 74 70 74 78 78 68 78
82 Journal of Counseling Psychology 70 69 74 75 76 78 67 80
83 Journal of Educational Psychology 70 74 73 76 76 78 78 84
84 Journal of Family Psychology 70 68 75 71 73 66 68 69
85 JPSP-Interpersonal Relationships and Group Processes 70 74 64 62 66 58 60 56
86 Child Development 69 72 72 71 69 75 72 75
87 European Journal of Social Psychology 69 76 64 72 67 59 69 66
88 Group Processes & Intergroup Relations 69 67 73 68 70 66 68 61
89 Organizational Behavior and Human Decision Processes 69 73 70 70 72 70 71 65
90 Personal Relationships 69 72 71 70 68 74 60 69
91 Journal of Pain 69 79 71 81 73 78 74 72
92 Journal of Research on Adolescence 68 78 69 68 75 76 84 77
93 Self and Identity 66 70 56 73 71 72 70 73
94 Developmental Psychobiology 65 69 67 69 70 69 71 66
95 Infancy 65 61 57 65 70 67 73 57
96 Hormones & Behavior 64 68 66 66 67 64 68 67
97 Journal of Abnormal Psychology 64 67 71 64 71 67 73 70
98 JPSP-Personality Processes and Individual Differences 64 74 70 70 72 71 71 64
99 Psychoneuroendocrinology 64 68 66 65 65 62 66 63
100 Cognition and Emotion 63 69 75 72 76 76 76 76
101 European Journal of Personality 62 78 66 81 70 74 74 78
102 Biological Psychology 61 68 70 66 65 62 70 70
103 Journal of Happiness Studies 60 78 79 72 81 78 80 83
104 Journal of Consumer Psychology 58 56 69 66 61 62 61 66

ranking.ggplot

Download PDF of this ggplot representation of the table courtesy of David Lovis-McMahon.

Introduction

I define replicability as the probability of obtaining a significant result in an exact replication of a study that produced a significant result.  In the past five years, there have been concerns about a replication crisis in psychology.  Even results that are replicated internally by the same author multiple times fail to replicate in independent replication attempts (Bem, 2011).  The key reason for the replication crisis is selective publishing of significant results (publication bias). While journals report over 95% significant results (Sterling, 1959; Sterling et al., 1995), a 2015 article estimated that less than 50% of these results can be replicated  (OSC, 2015).

The OSC reproducibility made an important contribution by demonstrating that published results in psychology have low replicability.  However, the reliance on actual replication studies has a a number of limitations.  First, actual replication studies are expensive, time-consuming, and sometimes impossible (e.g., a longitudinal study spanning 20 years).  This makes it difficult to rely on actual replication studies to assess the replicability of psychological results, produce replicability rankings of journals, and to track replicability over time.

Schimmack and Brunner (2016) developed a statistical method (z-curve) that makes it possible to estimate average replicability for a set of published results based on the test-statistics reported in published articles.  This statistical approach to the estimation of replicability has several advantages over the use of actual replication studies: (a) replicability can be assessed in real time, (b) it can be estimated for all published results rather than a small sample of studies, and (c) it can be applied to studies that are impossible to reproduce.  Finally, it has the advantage that actual replication studies can be criticized  (Gilbert, King, Pettigrew, & Wilson, 2016). Estimates of replicabilty based on original studies do not have this problem because they are based on results reported in the original articles.

Z-curve has been validated with simulation studies and can be used with heterogeneous sets of studies that vary across statistical methods, sample sizes, and effect sizes  (Brunner & Schimmack, 2016).  I have applied this method to articles published in psychology journals to create replicability rankings of psychology journals in 2015 and 2016.  This blog post presents preliminary rankings for 2017 based on articles that have been published so far. The rankings will be updated in 2018, when all 2017 articles are available.

For the 2016 rankings, I used z-curve to obtain annual replicability estimates for 103 journals from 2010 to 2016.  Analyses of time trends showed no changes from 2010-2015. However, in 2016 there were first signs of an increase in replicabilty.  Additional analyses suggested that social psychology journals contributed mostly to this trend.  The preliminary 2017 rankings provide an opportunity to examine whether there is a reliable increase in replicability in psychology and whether such a trend is limited to social psychology.

Journals

Journals were mainly selected based on impact factor.  Preliminary replicability rankings for 2017 are based on 104 journals. Several new journals were added to increase the number of journals specializing in five disciplines: social (24), cognitive (13), development (15), clinical/medical (18), biological (13).  The other 24 journals were broad journals (Psychological Science) or from other disciplines.  The total number of journals for the preliminary rankings is 104.  More journals will be added to the final rankings for 2017.

Data Preparation

All PDF versions of published articles were downloaded and converted into text files using the conversion program pdfzilla.  Text files were searched for reports of statistical results using a self-created R program. Only F-tests, t-tests, and z-tests were used for the rankings because they can be reliabilty extracted from diverse journals. t-values that were reported without df were treated as z-values which leads to a slight inflation in replicability estimates. However, the bulk of test-statistics were F-values and t-values with degrees of freedom. Test-statistics were converted into exact p-values and exact p-values were converted into absolute z-scores as a measure of the strength of evidence against the null-hypothesis.

Data Analysis

The data for each year were analyzed using z-curve (Schimmack and Brunner (2016). Z-curve provides a replicability estimate. In addition, it generates a Powergraph. A Powergraph is essentially a histogram of absolute z-scores. Visual inspection of Powergraphs can be used to examine publication bias. A drop of z-values on the left side of the significance criterion (p < .05, two-tailed, z = 1.96) shows that non-significant results are underpresented. A further drop may be visible at z = 1.65 because values between z = 1.65 and z = 1.96 are sometimes reported as marginally significant support for a hypothesis.  The critical values z = 1.65 and z = 1.96 are marked by vertical red lines in the Powergraphs.

Replicabilty rankings rely only on statistically significant results (z > 1.96).  The aim of z-curve is to estimate the average probability that an exact replication of a study that produced a significant result produces a significant result again.  As replicability estimates rely only on significant results, journals are not being punished for publishing non-significant results.  The key criterion is how strong the evidence against the null-hypothesis is when an article published results that lead to the rejection of the null-hypothesis.

Statistically, replicability is the average statistical power of the set of studies that produced significant results.  As power is the probabilty of obtaining a significant result, average power of the original studies is equivalent with average power of a set of exact replication studies. Thus, average power of the original studies is an estimate of replicability.

Links to powergraphs for all journals and years are provided in the ranking table.  These powergraphs provide additional information that is not used for the rankings. The only information that is being used is the replicability estimate based on the distribution of significant z-scores.

Results

The replicability estimates for each journal and year (104 * 8 = 832 data points) served as the raw data for the following statistical analyses.  I fitted a growth model to examine time trends and variability across journals and disciplines using MPLUS7.4.

I compared several models. Model 1 assumed no mean level changes and stable variability across journals (significant variance in the intercept/trait). Model 2 assumed no change from 2010 to 2015 and allowed for mean level changes in 2016 and 2017 as well as stable differences between journals. Model 3 was identical to Model 2 and allowed for random variability in the slope factor.

Model 1 did not have acceptable fit (RMSEA = .109, BIC = 5198). Model 2 increased fit (RMSEA = 0.063, BIC = 5176).  Model 3 did not improve model fit (RMSEA = .063, BIC = 5180), the variance of the slope factor was not significant, and BIC favored the more parsimonious Model 2.  The parameter estimates suggested that replicability estimates increased from 72 in the years from 2010 to 2015 by 2 points to 74 (z = 3.70, p < .001).

The standardized loadings of individual years on the latent intercept factor ranged from .57 to .61.  This implies that about one-third of the variance is stable, while the remaining two-thirds of the variance is due to fluctuations in estimates from year to year.

The average of 72% replicability is notably higher than the estimate of 62% reported in the 2016 rankings.  The difference is due to a computational error in the 2016 rankings that affected mainly the absolute values, but not the relative ranking of journals. The r-code for the 2016 rankings miscalculated the percentage of extreme z-scores (z > 6), which is used to adjust the z-curve estimate that are based on z-scores between 1.96 and 6 because all z-scores greater than 6 essentially have 100% power.  For the 2016 rankings, I erroneously computed the percentage of extreme z-scores out of all z-scores rather than limiting it to the set of statistically significant results. This error became apparent during new simulation studies that produced wrong estimates.

Although the previous analysis failed to find significant variability for the slope (change factor), this could be due to the low power of this statistical test.  The next models included disciplines as predictors of the intercept (Model 4) or the intercept and slope (Model 5).  Model 4 had acceptable fit (RMSEA = .059, BIC = 5175), but Model 5 improved fit, although BIC favored the more parsimonious model (RMSEA = .036, BIC = 5178). The Bayesian Information Criterion favors parsimony and better fit cannot be interpreted as evidence for the absence of an effect.  Model 5 showed two significant (p < .05) effects for social and developmental psychology.  In Model 6 I included only social and development as predictors of the slope factor.  BIC favored this model over the other models (RMSEA = .029, BIC = 5164).  The model results showed improvements for social psychology (increase by 4.48 percentage points, z = 3.46, p = .001) and developmental psychology (increase by 3.25 percentage points, z = 2.65, p = .008).  Whereas the improvement for social psychology was expected based on the 2016 results, the increase for developmental psychology was unexpected and requires replication in the 2018 rankings.

The only significant predictors for the intercept were social psychology (-4.92 percentage points, z = 4.12, p < .001) and cognitive psychology (+2.91, z = 2.15, p = .032).  The strong negative effect (standardized effect size d = 1.14) for social psychology confirms earlier findings that social psychology was most strongly affected by the replication crisis (OSC, 2015). It is encouraging to see that social psychology is also the discipline with the strongest evidence for improvement in response to the replication crisis.  With an increase by 4.48 points, replicabilty of social psychology is now at the same level as other disciplines in psychology other than cognitive psychology, which is still a bit more replicable than all other disciplines.

In conclusion, the results confirm that social psychology had lower replicability than other disciplines, but also shows that social psychology has significantly improved in replicabilty over the past couple of years.

Analysis of Individual Journals

The next analysis examined changes in replicabilty at the level of individual journals. Replicability estimates were regressed on a dummy variable that contrasted 2010-1015 (0) with 2016-2017 (1). This analysis produced 10 significant increases with p < .01 (one-tailed), when only 1 out of 100 would be expected by chance.

Five of the 10 journals (50% vs. 20% in the total set of journals) were from social psychology (SPPS + 13, JESP + 11, JPSP-IRGP + 11, PSPB + 10, Sex Roles + 8).  The remaining journals were from developmental psychology (European J. Dev. Psy + 17, J Cog. Dev. + 9), clinical psychology (J. Cons. & Clinical Psy + 8, J. Autism and Dev. Disorders + 6), and the Journal of Applied Psychology (+7).  The high proportion of social psychology journals provides further evidence that social psychology has responded most strongly to the replication crisis.

Limitations

Although z-curve provides very good absolute estimates of replicability in simulation studies, the absolute values in the rankings have to be interpreted with a big grain of salt for several reasons.  Most important, the rankings are based on all test-statistics that were reported in an article.  Only a few of these statistics test theoretically important hypothesis. Others may be manipulation checks or other incidental analyses.  For the OSC (2015) studies the replicability etimate was 69% when the actual success rate was only 37%.  Moreover, comparisons of the automated extraction method used for the rankings and hand-coding of focal hypothesis in the same article also show a 20% point difference.  Thus, a posted replicability of 70% may imply only 50% replicability for a critical hypothesis test.  Second, the estimates are based on the ideal assumptions underlying statistical test distributions. Violations of these assumptions (outliers) are likely to reduce actual replicability.  Third, actual replication studies are never exact replication studies and minor differences between the studies are also likely to reduce replicability.  There are currently not sufficient actual replication studies to correct for these factors, but the average is likely to be less than 72%. It is also likely to be higher than 37% because this estimate is heavily influenced by social psychology, while cognitive psychology had a success rate of 50%.  Thus, a plausible range of the typical replicability of psychology is somwhere between 40% and 60%.  We might say the glass is half full and have empty, while there is systematic variation around this average across journals.

Conclusion

55 years after Cohen (1962) pointed out that psychologists conduct many studies that produce non-significant results (type-II errors).  For decades there was no sign of improvement.  The preliminary rankings of 2017 provide the first empirical evidence that psychologists are waking up to the replication crisis caused by selective reporting of significant results from underpowered studies.  Right now, social psychologists appear to respond most strongly to concerns about replicability.  However, it is possible that other disciplines will follow in the future as the open science movement is gaining momentum.  Hopefully, replicabilty rankings can provide an incentive to consider replicability as one of several criterion for publication.   A study with z = 2.20 and another study with z = 3.85 are both significant (z > 1.96), but a study with z =3.85 has a higher chance of being replicable. Everything else being equal, editors should favor studies with stronger evidence; that is higher z-scores (a.k.a, lower p-values).  By taking the strength of evidence into account, psychologists can move away from treating all significant results (p < .05) as equal and take type-II errors and power into account.

11 thoughts on “Preliminary 2017 Replicability Rankings of 104 Psychology Journals

  1. Why not give standard errors on the estimates? Or at least the number of tests you base them on. Come on, try a little harder to be scientific about this. I agree with your mission, but this makes it really easy to dismiss you.

    1. the number of cases are included in the powergraphs when you click on the journal titles. Bootstrapped 95%CI will be added to the powergraphs when the final data are available.

Leave a Reply