R-INDEX BULLETIN (RIB): Share the Results of your R-Index Analysis with the Scientific Community

R-Index Bulletin

The world of scientific publishing is changing rapidly and there is a growing need to share scientific information as fast and as cheap as possible.

Traditional journals with pre-publication peer-review are too slow and focussed on major ground-breaking discoveries.

Open-access journals can be expensive.

R-Index Bulletin offers a new opportunity to share results with the scientific community quickly and free of charge.

R-Index Bulletin also avoids the problems of pre-publication peer-review by moving to a post-publication peer-review process. Readers are welcome to comment on posted contributions and to post their own analyses. This process ensures that scientific disputes and their resolution are open and part of the scientific process.

For the time being, submissions can be uploaded as comments to this blog. In the future, R-Index Bulletin may develop into a free online journal.

A submission should contain a brief description of the research question (e.g., what is the R-Index of studies on X, by X, or in the journal X?), the main statistical results (median observed power, success rate, inflation rate, R-Index) and a brief discussion of the implications of the analysis. There is no page restriction and analyses of larger data sets can include moderator analysis. Inclusion of other bias tests (Egger’s regression, TIVA, P-Curve, P-Uniform) is also welcome.

If you have conducted an R-Index analysis, please submit it to R-Index Bulletin to share your findings.

Submissions can be made anonymously or with an author’s name.

Go ahead and press the “Leave a comment” or “Leave a reply” button or scroll to the bottom of the page and past your results in the “Leave a reply” box.

3 thoughts on “R-INDEX BULLETIN (RIB): Share the Results of your R-Index Analysis with the Scientific Community”

Too good to be true: A reanalysis of Damisch, Stoberock, and Mussweiler (2010). Keep Your Fingers Crossed! How Superstition Improves Performance. Psychological Science, (21)7, p.1014-1020

Preliminary note:
Test statistics of the t-tests on p.1016 (t(48) = 2.0, p < .05 and t(48) = 2.36, p < .03) were excluded from the following analyses as they served just as manipulation checks. The t-test reported on p.1017 (t(39) = 3.07, p < .01) was also excluded because mean differences in self-efficacy represent a mere exploratory analysis.

One statistical test reported a significant finding with F(2, 48) = 3.16, p < .05. However, computing the p-value with R gives a p-value of 0.051, which is above the criterion value of .05. For this analysis, the critical p-value was set to p = .055 to be consistent with the interpretation of the test as significant evidence in favor of the authors' hypothesis.

R-Index analysis:
Success rate = 1
Mean observed power = 0.5659
Median observed power = 0.537
Inflation rate = 0.4341
R-Index = 0.1319

Note that, according to http://www.r-index.org/uploads/3/5/6/7/3567479/r-index_manual.pdf (p.7):
"An R-Index of 22% is consistent with a set of studies in which the null-hypothesis is true and a researcher reported only significant results".

Furthermore, the test of insufficient variance (TIVA) was conducted.
Note that variances of z-values < 1 suggest bias. The chi2 test tests the H0 that variance = 1.
Results:
Variance = 0.1562
Chi^2(7) = 1.094; p = .007

Thus, the insufficient variance in z-scores of .156 suggests that it is extremely likely that the reported results overestimate the population effect and replicability of the reported studies.

It should be noted that the present analysis is consistent with earlier claims that these results are too good to be true based on Francis's Test of Excessive Significance (Francis et al., 2014; http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0114255).

Finally, the study results were analyzed using p-curve (http://p-curve.com/):

Statistical Inference on p-curve:
Studies contain evidential value:
chisq(16) = 10.745; p = .825
Note that a significant p-value indicates that the p-curve is right-skewed, which indicates evidential value.

Studies lack evidential value:
chisq(16) = 36.16; p = .003
Note that a significant p-value indicates that the p-curve is flatter than one would expect if studies were powered at 33%, which indicates that the results have no evidential value.

Studies lack evidential value and were intensely p-hacked :
chisq(16) = 26.811; p = .044
Note that a significant p-value indicates that the p-curve is left-skewed, which indicates p-hacking/selective reporting.

All bias tests suggest that the reported results are biased. Consistent with these statistical results, a replication study failed to reproduce the original findings (see https://osf.io/fsadm/)

Because all studies were conducted by the same team of researchers the bias cannot be attributed to publication bias. Thus, it appears probable that questionable research practices were used to produce the observed significant results. A possible explanation might be that the authors ran multiple studies and reported just those that produced significant results.

In conclusion, researchers should be suspicious about the power of superstition or at least keep their fingers crossed when they attempt to replicate the reported findings.

Reblogged this on Replication-Index.

Expression of Concern

http://pss.sagepub.com/content/19/3/302.abstract
doi: 10.1111/j.1467-9280.2008.02084.x

Lawrence E. Williams and
John A. Bargh

Williams and Bargh (2008) published the article “Keeping One’s Distance: The Influence of Spatial Distance Cues on Affect and Evaluation” in Psychological Science (doi: 10.1111/j.1467-9280.2008.02084.x)

As of August, 2015, the article has been cited 98 times in Web of Science.

The article reports four studies that appear to support the claim that priming individuals with the concept of spatial distance produced “greater enjoyment of media depicting embarrassment (Study 1), less emotional distress from violent media (Study 2), lower estimates of the number of calories in unhealthy food (Study 3), and weaker reports of emotional attachments to family members and hometowns (Study 4)”

However, a closer examination of the evidence suggests that the results of these studies were obtained with the help of questionable research methods that inflate effect sizes and the strength of evidence against the null-hypothesis (priming has no effect).

The critical test in the four studies was an Analysis of Variance that compared three experimental conditions.

The critical tests were:
F(2,67) = 3.14, p = .049, z = 1.96
F(2,39) = 4.37, p = .019, z = 2.34
F(2,56) = 3.36, p = .042, z = 2.03
F(2,81) = 4.97, p = .009, z = 2.60

The p-values can be converted into z-scores (norm.inv(1 – p/2)). The z-scores of independent statistical tests should follow a normal distribution and have a variance of 1. Insufficient variation in z-scores suggests that the results of the four studies are influenced by questionable research practices.

The variance of z-scores is Var(z) = 0.08. A chi-square test against the expected variance of 1 is significant, Chi-Square(df = 3) = .26, left-tailed p = .033.
The article reports 100% significant results, but median observed power is only 59%. With an inflation of 41%, the Replicability-Index is 59-41 = 18.

An R-Index of 18 is lower than the R-Index of 22, which would be obtained if the null-hypothesis were true and only significant results are reported. Thus, after correcting for inflation, the data provide no support for the alleged effect.

It is therefore not surprising that multiple replication attempts have failed to replicate the reported results. http://www.psychfiledrawer.org/chart.php?target_article=2

In conclusion, there is no credible empirical support for the theoretical claims in Williams and Bargh (2008) and the article should not be quoted as providing evidence for these claims.

psychometrix says:

February 18, 2015 at 4:11 pm

Too good to be true: A reanalysis of Damisch, Stoberock, and Mussweiler (2010). Keep Your Fingers Crossed! How Superstition Improves Performance. Psychological Science, (21)7, p.1014-1020

Preliminary note:
Test statistics of the t-tests on p.1016 (t(48) = 2.0, p < .05 and t(48) = 2.36, p < .03) were excluded from the following analyses as they served just as manipulation checks. The t-test reported on p.1017 (t(39) = 3.07, p < .01) was also excluded because mean differences in self-efficacy represent a mere exploratory analysis.

One statistical test reported a significant finding with F(2, 48) = 3.16, p < .05. However, computing the p-value with R gives a p-value of 0.051, which is above the criterion value of .05. For this analysis, the critical p-value was set to p = .055 to be consistent with the interpretation of the test as significant evidence in favor of the authors' hypothesis.

R-Index analysis:
Success rate = 1
Mean observed power = 0.5659
Median observed power = 0.537
Inflation rate = 0.4341
R-Index = 0.1319

Note that, according to http://www.r-index.org/uploads/3/5/6/7/3567479/r-index_manual.pdf (p.7):
"An R-Index of 22% is consistent with a set of studies in which the null-hypothesis is true and a researcher reported only significant results".

Furthermore, the test of insufficient variance (TIVA) was conducted.
Note that variances of z-values < 1 suggest bias. The chi2 test tests the H0 that variance = 1.
Results:
Variance = 0.1562
Chi^2(7) = 1.094; p = .007

Thus, the insufficient variance in z-scores of .156 suggests that it is extremely likely that the reported results overestimate the population effect and replicability of the reported studies.

It should be noted that the present analysis is consistent with earlier claims that these results are too good to be true based on Francis's Test of Excessive Significance (Francis et al., 2014; http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0114255).

Finally, the study results were analyzed using p-curve (http://p-curve.com/):

Statistical Inference on p-curve:
Studies contain evidential value:
chisq(16) = 10.745; p = .825
Note that a significant p-value indicates that the p-curve is right-skewed, which indicates evidential value.

Studies lack evidential value:
chisq(16) = 36.16; p = .003
Note that a significant p-value indicates that the p-curve is flatter than one would expect if studies were powered at 33%, which indicates that the results have no evidential value.

Studies lack evidential value and were intensely p-hacked :
chisq(16) = 26.811; p = .044
Note that a significant p-value indicates that the p-curve is left-skewed, which indicates p-hacking/selective reporting.

All bias tests suggest that the reported results are biased. Consistent with these statistical results, a replication study failed to reproduce the original findings (see https://osf.io/fsadm/)

Because all studies were conducted by the same team of researchers the bias cannot be attributed to publication bias. Thus, it appears probable that questionable research practices were used to produce the observed significant results. A possible explanation might be that the authors ran multiple studies and reported just those that produced significant results.

In conclusion, researchers should be suspicious about the power of superstition or at least keep their fingers crossed when they attempt to replicate the reported findings.

Loading...

Dr. R says:

May 19, 2015 at 5:43 pm

Reblogged this on Replication-Index.

Loading...

Dr. R says:

August 1, 2015 at 12:41 pm

Expression of Concern

http://pss.sagepub.com/content/19/3/302.abstract
doi: 10.1111/j.1467-9280.2008.02084.x

Lawrence E. Williams and
John A. Bargh

Williams and Bargh (2008) published the article “Keeping One’s Distance: The Influence of Spatial Distance Cues on Affect and Evaluation” in Psychological Science (doi: 10.1111/j.1467-9280.2008.02084.x)

As of August, 2015, the article has been cited 98 times in Web of Science.

The article reports four studies that appear to support the claim that priming individuals with the concept of spatial distance produced “greater enjoyment of media depicting embarrassment (Study 1), less emotional distress from violent media (Study 2), lower estimates of the number of calories in unhealthy food (Study 3), and weaker reports of emotional attachments to family members and hometowns (Study 4)”

However, a closer examination of the evidence suggests that the results of these studies were obtained with the help of questionable research methods that inflate effect sizes and the strength of evidence against the null-hypothesis (priming has no effect).

The critical test in the four studies was an Analysis of Variance that compared three experimental conditions.

The critical tests were:
F(2,67) = 3.14, p = .049, z = 1.96
F(2,39) = 4.37, p = .019, z = 2.34
F(2,56) = 3.36, p = .042, z = 2.03
F(2,81) = 4.97, p = .009, z = 2.60

The p-values can be converted into z-scores (norm.inv(1 – p/2)). The z-scores of independent statistical tests should follow a normal distribution and have a variance of 1. Insufficient variation in z-scores suggests that the results of the four studies are influenced by questionable research practices.

The variance of z-scores is Var(z) = 0.08. A chi-square test against the expected variance of 1 is significant, Chi-Square(df = 3) = .26, left-tailed p = .033.
The article reports 100% significant results, but median observed power is only 59%. With an inflation of 41%, the Replicability-Index is 59-41 = 18.

An R-Index of 18 is lower than the R-Index of 22, which would be obtained if the null-hypothesis were true and only significant results are reported. Thus, after correcting for inflation, the data provide no support for the alleged effect.

It is therefore not surprising that multiple replication attempts have failed to replicate the reported results. http://www.psychfiledrawer.org/chart.php?target_article=2

In conclusion, there is no credible empirical support for the theoretical claims in Williams and Bargh (2008) and the article should not be quoted as providing evidence for these claims.

Loading...

Replicability-Index

Improving the replicability of empirical research

R-INDEX BULLETIN (RIB): Share the Results of your R-Index Analysis with the Scientific Community

Like this:

3 thoughts on “R-INDEX BULLETIN (RIB): Share the Results of your R-Index Analysis with the Scientific Community”

Leave a ReplyCancel reply

Share this:

Like this:

3 thoughts on “R-INDEX BULLETIN (RIB): Share the Results of your R-Index Analysis with the Scientific Community”

Leave a ReplyCancel reply

Discover more from Replicability-Index