Home / General / Cheery Contrarianism About Reproducibility in Psychology

Cheery Contrarianism About Reproducibility in Psychology

On August 30, 2015

At 8:17 am

In General

1732 Views

The Open Science Collaboration’s attempt at replicating 100 findings from cognitive and social psychology is up, and the headline finding is that only 36% of them succeeded, using the criterion for success of a p-value less than .05 (for a finding in the direction of the original effect, of course) in the replication study. The article is open-access.

Maybe it’s a symptom of the dim view of the state of the field I already had, but I don’t think the news is so terrible, at least compared to what it could have been. There are other ways to evaluate replication success besides clearing the p < .05 bar, and they are included in the paper in Table 1. Combining the original finding and the replication attempt in a meta-analysis, 68% of the findings were significant at p < .05. Admittedly, it's certainly possible that one replication does not provide enough additional subjects for a good estimate of the true effect. This graph is an illustration of how effect sizes can plummet the more subjects you add and your estimate converges toward validity. Less encouragingly, but still a little brighter than the headline number, 47% of the replication studies fell within the 95% confidence interval. (“Confidence interval” is often incorrectly defined. I’ll just outsource the correct definition.) In any case, this:

In the investigation, a whopping 75% of the social psychology experiments were not replicated, meaning that the originally reported findings vanished when other scientists repeated the experiments.

is incorrect reporting. In most cases, the effects didn’t vanish, they got smaller.

Psychology in most instances investigates small effects. That is, there is a ton of variation in behavior we have no idea how to capture in a model, so the effect size, a ratio of signal to noise, will necessarily be small. If effect sizes in psychology were very robust we wouldn’t need inferential statistics. Further, if your study isn’t large enough to reliably detect small effects (statistical power, the ability is a function of effect size and sample size), then by definition, if you found a significant effect at p<.05, you overestimated the effect size. For this reason, I long ago shed any illusion that most studies I read were estimating the effect sizes accurately. For that matter, I don't believe one study, even as an indicator of the direction of the effect, at all. But if a meta-analytic combination of two studies is still yielding a significant result, I read that as, hm, we still don't know the effect size, but maybe we start to tentatively believe something about the direction of the effect, which, if it's consistent, might tell us something theoretically interesting even if it's small.

I think it’s just as well people read this result as bleak, since psychology (like many other fields) very desperately needs to change incentives so that researchers are rewarded for producing replicable work rather than getting their p-values below a line. But my cynical heart is a bit warmed by the fact that psychology is producing some signal in the noise.