Psychology, especially social psychology, has a credibility problem. For decades, psychologists used questionable research practices to confirm their predictions. As a result, psychology journals are filled with successful experiments, p < .05, that fail to provide credible evidence for social psychological theories. In theory, p-values below .05, ensure that no more than 5% of published results are false positives. However, if results are selected to be significant, it is theoretically possible that all of the published results are false. The happy production of incredible results in social psychology only attracted attention when social psychologists published crazier and crazier results that culminated in evidence for time-reversed, unconscious detection of erotic stimuli (Bem, 2011). Even social psychologists had a hard time believing this nonsense.
Over the past decade, some social psychologists have tried to improve (or establish) sound scientific practices in social psychology with new initiatives like pre-registration of studies, open data sharing, and using larger samples to reduce sampling error so that results could be significant without inflated effect sizes.
One format that aims to improve the credibility of published findings is a registered report. Registered reports are studies where researchers submit a study design without data to a journal. The editor and reviewers vet the proposed study. If the study is approved, it receives a conditional acceptance, data are collected, and results are reported. One advantages of registered reports are that potential flaws are detected before data are collected, ensuring that studies that are carried out are better designed than studies that did not receive formal peer-review. Another advantage is that results will be published even if the results do not support researchers’ hypotheses.
The format of registered reports is fairly new and few articles have used this format. Unfortunately, there are already signs that journals and authors are abusing this new format and publishing articles as registered reports that do not have the desirable properties of registered reports. One example, is a registered report published in Frontiers of Psychology.
In 2014, Frontiers in Psychology published an article with the title “Registered Report: Measuring Unconscious Deception Detection by Skin Temperature” (van’t Veer, Stel, van Beest, & Marcello Gallucci, 2014). This article was not a registered report. Rather, it reported the results of a so-called pilot study and then outlined the design of a follow-up study. In 2015, the authors published an article with the title “Unconscious deception detection measured by finger skin temperature and indirect veracity judgments— results of a registered report” (van ’t Veer, Gallucci, Stel, van Beest, 2015).
The publication of two articles is inconsistent with the structure of a pre-registered report, where a data-analysis plan is submitted, approved, and then published with the results after data collection, resulting in a single article. More important is the question how similar or different the pilot study in the first article is to the new study in the actual registered report.
The pilot study had a sample size of 108 participants. The actual registered study had a sample size of 155 participants. Although slightly bigger, this is not a notable difference in sample size. More important, the pilot study was not merely a study that examined the validity of outcome measures or the effectiveness of the experimental manipulation with manipulation checks. Rather, both studies used the full design to test the same hypotheses. In other words, the registered study was mainly a replication of a previous studies.
The main hypothesis (H1) was specified to be “confirmed if the average skin temperature of participants while watching a liar is lower than when watching a truth-teller. This should translate into an interaction between veracity and time, and possibly a main effect of veracity” (p. 7).
The so-called pilot study showed no main effect of veracity, the predicted veracity x time interaction, b = 0.0007, F(1,73218.7) = 387.14, p < 0.001, and a three-way interaction between veracity x time x awareness, b = 0.0002, F(1,73218.7) = 6.69, p = 0.010.
The registered replication study also showed no main effects of veracity. It also failed to replicate the two-way interaction with time (p = .598). Thus, H1 was not confirmed. However, in the discussion section the authors write
“The observed patterns of temperature change over time only partly confirmed our main hypothesis(H1), and the current findings pertaining to this hypothesis are therefore inconclusive” (p. 7).
This makes no sense. The results did not confirm the hypothesis, the results are inconsistent with those of a nearly equally powered previous study, and the main conclusion that can be drawn from these results is that the studies were underpowered to reliably detect the predicted effect.
The discussion section goes on to interpret the pattern of a non-significant 3-way interaction (p = .06) that was not predicted a priori.
“We found that finger skin temperature consistently decreased while observing a liar. When participants were observing a truth-teller, however, their finger skin temperature decreased more than it did for liars in the phase where participants did not have the goal to detect deception. In contrast, participant’s finger skin temperature stayed higher when observing a truth-teller compared to a liar when participants did have the goal to detect deception” (p. 7).
The discussion leaves out that the so-called pilot study produced a different pattern.
“From Figure1, it becomes apparent that in the not aware condition (i.e., in the first block of videos) participants were warming up overtime while watching both a video of a liar or a truth-teller. Interestingly and surprisingly, in the second block of videos (what we have called the aware phase) finger temperature dropped when watching a video of a truth-teller, mores ot han when watching a liar.”
Much later in the discussion the authors do make a comparison to the pilot study.
“When comparing the current results to the results obtained in the pilot study, the temperature pattern observed in the second phase of the pilot study seems to resemble the current pattern
observed in the not forewarned phase. Although speculative, a perceivable cause of this could be that in the pilot study the forewarning was not manipulated as strong as in the current study, leaving participants still in a relatively ignorant state about what was to come and whether the experimental context was one of deception detection. Being able to expect and prepare for what is to come arguably has some advantages, although it should be noted that not anticipating threats is comparable to an everyday life situation in which people assume they will not be lied to.” (p. 9).
The registered report has been cited only six times. Another study of liars cites the 2015 article with the claim that the new findings “converge and expand on those of van ’t Veer, Gallucci, Stel, and van Beest (2015), who found that participants experienced decreases
in finger skin temperature while observing lies relative to truths” (ten Brinke, Lee, & Carney, p.574). This statement is an inaccurate statement about the actual results (a lie?), which showed no main effect of lying in the so-called pilot study and the registered study. Apparently, social psychologists still favor compelling stories over truth.
In conclusion, there is a danger that social psychologists once again undermine methodological reforms that force them to put their fancy theories to hard empirical tests and to burry them when they fail. Social psychology could clearly benefit from a reality check that puts many zombie theories out of their misery.
That being said, van’t Veer et al.’s registered report did produce some useful insights.
“In our experiment we found liars to be liked and trusted less.” (p. 9).