Preliminary 2017 Replicability Rankings of 104 Psychology Journals

Please see the latest 2018 rankings. (LINK)

The table shows the preliminary 2017 rankings of 104 psychology journals. A description of the methodology and analyses of by discipline and time are reported below the table.

Rank	Journal	2017	2016	2015	2014	2013	2012	2011	2010
1	European Journal of Developmental Psychology	93	88	67	83	74	71	79	65
2	Journal of Nonverbal Behavior	93	72	66	74	81	73	64	70
3	Behavioral Neuroscience	86	67	71	70	69	71	68	73
4	Sex Roles	83	83	75	71	73	78	77	74
5	Epilepsy & Behavior	82	82	82	85	85	81	87	77
6	Journal of Anxiety Disorders	82	77	73	77	76	80	75	77
7	Attention, Perception and Psychophysics	81	71	73	77	78	80	75	73
8	Cognitive Development	81	73	82	73	69	73	67	65
9	Judgment and Decision Making	81	79	78	78	67	75	70	74
10	Psychology of Music	81	80	72	73	77	72	81	86
11	Animal Behavior	80	74	71	72	72	71	70	78
12	Early Human Development	80	92	86	83	79	70	64	81
13	Journal of Experimental Psychology – Learning, Memory & Cognition	80	80	79	80	77	77	71	81
14	Journal of Memory and Language	80	84	81	74	77	73	80	76
15	Memory and Cognition	80	75	79	76	77	78	76	76
16	Social Psychological and Personality Science	80	67	61	65	61	58	63	55
17	Journal of Positive Psychology	80	70	72	72	64	64	73	81
18	Archives of Sexual Behavior	79	79	81	80	83	79	78	87
19	Consciousness and Cognition	79	71	69	73	67	70	73	74
20	Journal of Applied Psychology	79	80	74	76	69	74	72	73
21	Journal of Experimental Psychology – Applied	79	67	68	75	68	74	74	72
22	Journal of Experimental Psychology – General	79	75	73	73	76	69	74	69
23	Journal of Experimental Psychology – Human Perception and Performance	79	78	76	77	76	78	78	75
24	Journal of Personality	79	75	72	68	72	75	73	82
25	JPSP-Attitudes & Social Cognition	79	57	75	69	50	62	61	61
26	Personality and Individual Differences	79	79	79	78	78	76	74	73
27	Social Development	79	78	66	75	73	72	73	75
28	Appetite	78	74	69	66	75	72	74	77
29	Cognitive Behavioral Therapy	78	82	76	65	72	82	71	62
30	Journal of Comparative Psychology	78	77	76	83	83	75	69	64
31	Journal of Consulting and Clinical Psychology	78	71	68	65	66	66	69	68
32	Neurobiology of Learning and Memory	78	72	75	72	71	70	75	73
33	Psychonomic Bulletin and Review	78	79	82	79	82	72	71	78
34	Acta Psychologica	78	75	73	78	76	75	77	75
35	Behavior Therapy	77	74	71	75	76	78	64	76
36	Journal of Affective Disorders	77	85	84	77	83	82	76	76
37	Journal of Child and Family Studies	77	76	69	71	76	71	76	77
38	Journal of Vocational Behavior	77	85	84	69	82	79	86	74
39	Motivation and Emotion	77	64	67	66	67	65	79	68
40	Psychology and Aging	77	79	78	80	74	78	78	74
41	Psychophysiology	77	77	70	69	68	70	80	78
42	Britsh Journal of Social Psychology	76	65	66	62	64	60	72	63
43	Cognition	76	74	75	75	77	76	73	73
44	Cognitive Psychology	76	80	74	76	79	72	82	75
45	Developmental Psychology	76	77	77	75	71	68	70	70
46	Emotion	76	72	69	69	72	70	70	73
47	Frontiers in Behavioral Neuroscience	76	70	71	68	71	72	73	70
48	Frontiers in Psychology	76	75	73	73	72	72	70	82
49	Journal of Autism and Developmental Disorders	76	77	73	67	73	70	70	72
50	Journal of Social and Personal Relationships	76	82	60	63	69	67	79	83
51	Journal of Youth and Adolescence	76	88	81	82	79	76	79	74
52	Cognitive Therapy and Research	75	71	72	62	77	75	70	66
53	Depression & Anxiety	75	78	73	76	82	79	82	84
54	Journal of Child Psychology and Psychiatry and Allied Disciplines	75	63	66	66	72	76	58	66
55	Journal of Occupational and Organizational Psychology	75	85	84	71	77	77	74	67
56	Journal of Social Psychology	75	75	74	67	65	80	71	75
57	Political Psychology	75	81	75	72	75	74	51	70
58	Social Cognition	75	68	68	73	62	78	71	60
59	British Journal of Developmental Psychology	74	77	74	63	61	85	77	79
60	Evolution & Human Behavior	74	81	75	79	67	77	78	68
61	Journal of Research in Personality	74	77	82	80	79	73	74	71
62	Memory	74	79	66	83	73	71	76	78
63	Psychological Medicine	74	83	71	79	79	68	79	75
64	Psychopharmacology	74	75	73	73	71	73	73	71
65	Psychological Science	74	69	70	64	65	64	62	63
66	Behavioural Brain Research	73	69	75	69	71	72	73	74
67	Behaviour Research and Therapy	73	74	76	77	74	77	68	71
68	Journal of Cross-Cultural Psychology	73	75	80	78	78	71	76	76
69	Journal of Experimental Child Psychology	73	73	78	74	74	72	72	76
70	Personality and Social Psychology Bulletin	73	71	65	65	61	61	62	61
71	Social Psychology	73	75	72	74	69	64	75	74
72	Developmental Science	72	68	68	66	71	68	68	66
73	Journal of Cognition and Development	72	78	68	64	69	62	66	70
74	Law and Human Behavior	72	76	76	61	76	76	84	72
75	Perception	72	78	79	74	78	85	94	91
76	Journal of Applied Social Psychology	71	81	69	72	71	80	74	75
77	Journal of Experimental Social Psychology	71	68	63	61	58	56	58	57
78	Annals of Behavioral Medicine	70	70	62	71	71	77	75	71
79	Frontiers in Human Neuroscience	70	74	73	74	75	75	75	72
80	Health Psychology	70	63	68	69	68	63	70	72
81	Journal of Abnormal Child Psychology	70	74	70	74	78	78	68	78
82	Journal of Counseling Psychology	70	69	74	75	76	78	67	80
83	Journal of Educational Psychology	70	74	73	76	76	78	78	84
84	Journal of Family Psychology	70	68	75	71	73	66	68	69
85	JPSP-Interpersonal Relationships and Group Processes	70	74	64	62	66	58	60	56
86	Child Development	69	72	72	71	69	75	72	75
87	European Journal of Social Psychology	69	76	64	72	67	59	69	66
88	Group Processes & Intergroup Relations	69	67	73	68	70	66	68	61
89	Organizational Behavior and Human Decision Processes	69	73	70	70	72	70	71	65
90	Personal Relationships	69	72	71	70	68	74	60	69
91	Journal of Pain	69	79	71	81	73	78	74	72
92	Journal of Research on Adolescence	68	78	69	68	75	76	84	77
93	Self and Identity	66	70	56	73	71	72	70	73
94	Developmental Psychobiology	65	69	67	69	70	69	71	66
95	Infancy	65	61	57	65	70	67	73	57
96	Hormones & Behavior	64	68	66	66	67	64	68	67
97	Journal of Abnormal Psychology	64	67	71	64	71	67	73	70
98	JPSP-Personality Processes and Individual Differences	64	74	70	70	72	71	71	64
99	Psychoneuroendocrinology	64	68	66	65	65	62	66	63
100	Cognition and Emotion	63	69	75	72	76	76	76	76
101	European Journal of Personality	62	78	66	81	70	74	74	78
102	Biological Psychology	61	68	70	66	65	62	70	70
103	Journal of Happiness Studies	60	78	79	72	81	78	80	83
104	Journal of Consumer Psychology	58	56	69	66	61	62	61	66

Download PDF of this ggplot representation of the table courtesy of David Lovis-McMahon.

Introduction

I define replicability as the probability of obtaining a significant result in an exact replication of a study that produced a significant result. In the past five years, there have been concerns about a replication crisis in psychology. Even results that are replicated internally by the same author multiple times fail to replicate in independent replication attempts (Bem, 2011). The key reason for the replication crisis is selective publishing of significant results (publication bias). While journals report over 95% significant results (Sterling, 1959; Sterling et al., 1995), a 2015 article estimated that less than 50% of these results can be replicated (OSC, 2015).

The OSC reproducibility made an important contribution by demonstrating that published results in psychology have low replicability. However, the reliance on actual replication studies has a a number of limitations. First, actual replication studies are expensive, time-consuming, and sometimes impossible (e.g., a longitudinal study spanning 20 years). This makes it difficult to rely on actual replication studies to assess the replicability of psychological results, produce replicability rankings of journals, and to track replicability over time.

Schimmack and Brunner (2016) developed a statistical method (z-curve) that makes it possible to estimate average replicability for a set of published results based on the test-statistics reported in published articles. This statistical approach to the estimation of replicability has several advantages over the use of actual replication studies: (a) replicability can be assessed in real time, (b) it can be estimated for all published results rather than a small sample of studies, and (c) it can be applied to studies that are impossible to reproduce. Finally, it has the advantage that actual replication studies can be criticized (Gilbert, King, Pettigrew, & Wilson, 2016). Estimates of replicabilty based on original studies do not have this problem because they are based on results reported in the original articles.

Z-curve has been validated with simulation studies and can be used with heterogeneous sets of studies that vary across statistical methods, sample sizes, and effect sizes (Brunner & Schimmack, 2016). I have applied this method to articles published in psychology journals to create replicability rankings of psychology journals in 2015 and 2016. This blog post presents preliminary rankings for 2017 based on articles that have been published so far. The rankings will be updated in 2018, when all 2017 articles are available.

For the 2016 rankings, I used z-curve to obtain annual replicability estimates for 103 journals from 2010 to 2016. Analyses of time trends showed no changes from 2010-2015. However, in 2016 there were first signs of an increase in replicabilty. Additional analyses suggested that social psychology journals contributed mostly to this trend. The preliminary 2017 rankings provide an opportunity to examine whether there is a reliable increase in replicability in psychology and whether such a trend is limited to social psychology.

Journals

Journals were mainly selected based on impact factor. Preliminary replicability rankings for 2017 are based on 104 journals. Several new journals were added to increase the number of journals specializing in five disciplines: social (24), cognitive (13), development (15), clinical/medical (18), biological (13). The other 24 journals were broad journals (Psychological Science) or from other disciplines. The total number of journals for the preliminary rankings is 104. More journals will be added to the final rankings for 2017.

Data Preparation

All PDF versions of published articles were downloaded and converted into text files using the conversion program pdfzilla. Text files were searched for reports of statistical results using a self-created R program. Only F-tests, t-tests, and z-tests were used for the rankings because they can be reliabilty extracted from diverse journals. t-values that were reported without df were treated as z-values which leads to a slight inflation in replicability estimates. However, the bulk of test-statistics were F-values and t-values with degrees of freedom. Test-statistics were converted into exact p-values and exact p-values were converted into absolute z-scores as a measure of the strength of evidence against the null-hypothesis.

Data Analysis

The data for each year were analyzed using z-curve (Schimmack and Brunner (2016). Z-curve provides a replicability estimate. In addition, it generates a Powergraph. A Powergraph is essentially a histogram of absolute z-scores. Visual inspection of Powergraphs can be used to examine publication bias. A drop of z-values on the left side of the significance criterion (p < .05, two-tailed, z = 1.96) shows that non-significant results are underpresented. A further drop may be visible at z = 1.65 because values between z = 1.65 and z = 1.96 are sometimes reported as marginally significant support for a hypothesis. The critical values z = 1.65 and z = 1.96 are marked by vertical red lines in the Powergraphs.

Replicabilty rankings rely only on statistically significant results (z > 1.96). The aim of z-curve is to estimate the average probability that an exact replication of a study that produced a significant result produces a significant result again. As replicability estimates rely only on significant results, journals are not being punished for publishing non-significant results. The key criterion is how strong the evidence against the null-hypothesis is when an article published results that lead to the rejection of the null-hypothesis.

Statistically, replicability is the average statistical power of the set of studies that produced significant results. As power is the probabilty of obtaining a significant result, average power of the original studies is equivalent with average power of a set of exact replication studies. Thus, average power of the original studies is an estimate of replicability.

Links to powergraphs for all journals and years are provided in the ranking table. These powergraphs provide additional information that is not used for the rankings. The only information that is being used is the replicability estimate based on the distribution of significant z-scores.

Results

The replicability estimates for each journal and year (104 * 8 = 832 data points) served as the raw data for the following statistical analyses. I fitted a growth model to examine time trends and variability across journals and disciplines using MPLUS7.4.

I compared several models. Model 1 assumed no mean level changes and stable variability across journals (significant variance in the intercept/trait). Model 2 assumed no change from 2010 to 2015 and allowed for mean level changes in 2016 and 2017 as well as stable differences between journals. Model 3 was identical to Model 2 and allowed for random variability in the slope factor.

Model 1 did not have acceptable fit (RMSEA = .109, BIC = 5198). Model 2 increased fit (RMSEA = 0.063, BIC = 5176). Model 3 did not improve model fit (RMSEA = .063, BIC = 5180), the variance of the slope factor was not significant, and BIC favored the more parsimonious Model 2. The parameter estimates suggested that replicability estimates increased from 72 in the years from 2010 to 2015 by 2 points to 74 (z = 3.70, p < .001).

The standardized loadings of individual years on the latent intercept factor ranged from .57 to .61. This implies that about one-third of the variance is stable, while the remaining two-thirds of the variance is due to fluctuations in estimates from year to year.

The average of 72% replicability is notably higher than the estimate of 62% reported in the 2016 rankings. The difference is due to a computational error in the 2016 rankings that affected mainly the absolute values, but not the relative ranking of journals. The r-code for the 2016 rankings miscalculated the percentage of extreme z-scores (z > 6), which is used to adjust the z-curve estimate that are based on z-scores between 1.96 and 6 because all z-scores greater than 6 essentially have 100% power. For the 2016 rankings, I erroneously computed the percentage of extreme z-scores out of all z-scores rather than limiting it to the set of statistically significant results. This error became apparent during new simulation studies that produced wrong estimates.

Although the previous analysis failed to find significant variability for the slope (change factor), this could be due to the low power of this statistical test. The next models included disciplines as predictors of the intercept (Model 4) or the intercept and slope (Model 5). Model 4 had acceptable fit (RMSEA = .059, BIC = 5175), but Model 5 improved fit, although BIC favored the more parsimonious model (RMSEA = .036, BIC = 5178). The Bayesian Information Criterion favors parsimony and better fit cannot be interpreted as evidence for the absence of an effect. Model 5 showed two significant (p < .05) effects for social and developmental psychology. In Model 6 I included only social and development as predictors of the slope factor. BIC favored this model over the other models (RMSEA = .029, BIC = 5164). The model results showed improvements for social psychology (increase by 4.48 percentage points, z = 3.46, p = .001) and developmental psychology (increase by 3.25 percentage points, z = 2.65, p = .008). Whereas the improvement for social psychology was expected based on the 2016 results, the increase for developmental psychology was unexpected and requires replication in the 2018 rankings.

The only significant predictors for the intercept were social psychology (-4.92 percentage points, z = 4.12, p < .001) and cognitive psychology (+2.91, z = 2.15, p = .032). The strong negative effect (standardized effect size d = 1.14) for social psychology confirms earlier findings that social psychology was most strongly affected by the replication crisis (OSC, 2015). It is encouraging to see that social psychology is also the discipline with the strongest evidence for improvement in response to the replication crisis. With an increase by 4.48 points, replicabilty of social psychology is now at the same level as other disciplines in psychology other than cognitive psychology, which is still a bit more replicable than all other disciplines.

In conclusion, the results confirm that social psychology had lower replicability than other disciplines, but also shows that social psychology has significantly improved in replicabilty over the past couple of years.

Analysis of Individual Journals

The next analysis examined changes in replicabilty at the level of individual journals. Replicability estimates were regressed on a dummy variable that contrasted 2010-1015 (0) with 2016-2017 (1). This analysis produced 10 significant increases with p < .01 (one-tailed), when only 1 out of 100 would be expected by chance.

Five of the 10 journals (50% vs. 20% in the total set of journals) were from social psychology (SPPS + 13, JESP + 11, JPSP-IRGP + 11, PSPB + 10, Sex Roles + 8). The remaining journals were from developmental psychology (European J. Dev. Psy + 17, J Cog. Dev. + 9), clinical psychology (J. Cons. & Clinical Psy + 8, J. Autism and Dev. Disorders + 6), and the Journal of Applied Psychology (+7). The high proportion of social psychology journals provides further evidence that social psychology has responded most strongly to the replication crisis.

Limitations

Although z-curve provides very good absolute estimates of replicability in simulation studies, the absolute values in the rankings have to be interpreted with a big grain of salt for several reasons. Most important, the rankings are based on all test-statistics that were reported in an article. Only a few of these statistics test theoretically important hypothesis. Others may be manipulation checks or other incidental analyses. For the OSC (2015) studies the replicability etimate was 69% when the actual success rate was only 37%. Moreover, comparisons of the automated extraction method used for the rankings and hand-coding of focal hypothesis in the same article also show a 20% point difference. Thus, a posted replicability of 70% may imply only 50% replicability for a critical hypothesis test. Second, the estimates are based on the ideal assumptions underlying statistical test distributions. Violations of these assumptions (outliers) are likely to reduce actual replicability. Third, actual replication studies are never exact replication studies and minor differences between the studies are also likely to reduce replicability. There are currently not sufficient actual replication studies to correct for these factors, but the average is likely to be less than 72%. It is also likely to be higher than 37% because this estimate is heavily influenced by social psychology, while cognitive psychology had a success rate of 50%. Thus, a plausible range of the typical replicability of psychology is somwhere between 40% and 60%. We might say the glass is half full and have empty, while there is systematic variation around this average across journals.

Conclusion

55 years after Cohen (1962) pointed out that psychologists conduct many studies that produce non-significant results (type-II errors). For decades there was no sign of improvement. The preliminary rankings of 2017 provide the first empirical evidence that psychologists are waking up to the replication crisis caused by selective reporting of significant results from underpowered studies. Right now, social psychologists appear to respond most strongly to concerns about replicability. However, it is possible that other disciplines will follow in the future as the open science movement is gaining momentum. Hopefully, replicabilty rankings can provide an incentive to consider replicability as one of several criterion for publication. A study with z = 2.20 and another study with z = 3.85 are both significant (z > 1.96), but a study with z =3.85 has a higher chance of being replicable. Everything else being equal, editors should favor studies with stronger evidence; that is higher z-scores (a.k.a, lower p-values). By taking the strength of evidence into account, psychologists can move away from treating all significant results (p < .05) as equal and take type-II errors and power into account.

11 thoughts on “Preliminary 2017 Replicability Rankings of 104 Psychology Journals”

Pingback: Dr. R’s Blog about Replicability | Replicability-Index
Pingback: 2016 Replicability Rankings of 103 Psychology Journals | Replicability-Index
234iuz3g4jbd says:

October 25, 2017 at 5:18 am

Why not give standard errors on the estimates? Or at least the number of tests you base them on. Come on, try a little harder to be scientific about this. I agree with your mission, but this makes it really easy to dismiss you.

Loading...

1. Dr. R says:
  
  October 25, 2017 at 8:27 am
  
  the number of cases are included in the powergraphs when you click on the journal titles. Bootstrapped 95%CI will be added to the powergraphs when the final data are available.
  
  Loading...
  
  1. 234iuz3g4jbd says:
    
    October 25, 2017 at 8:45 am
    
    Good! I’d prefer it were machine-readable information though.
    
    Loading...
  2. Dr. R says:
    
    October 25, 2017 at 9:03 am
    
    Contact me, if you are interested in analyzing the data.
    
    Loading...
Pingback: Klaus Fiedler’s Response to the Replication Crisis: In/actions speaks louder than words | Replicability-Index
Pingback: Why Wagenmakers is Wrong | Replicability-Index
Pingback: 2017 Blogs | Replicability-Index
Pingback: Rotten to the Core II: A Replication and Extension of Motyl et al. | Replicability-Index
Pingback: Klaus Fiedler’s Response to the Replication Crisis: In/actions speaks louder than words | Replicability-Index

Replicability-Index

Improving the replicability of empirical research

Preliminary 2017 Replicability Rankings of 104 Psychology Journals

Like this:

11 thoughts on “Preliminary 2017 Replicability Rankings of 104 Psychology Journals”

Leave a ReplyCancel reply

Share this:

Like this:

11 thoughts on “Preliminary 2017 Replicability Rankings of 104 Psychology Journals”

Leave a ReplyCancel reply

Discover more from Replicability-Index