Home / General / Replication Crisis in Psychology: Part Five

Replication Crisis in Psychology: Part Five

By Katie Surrence

On May 24, 2016

At 8:56 am

In General

1510 Views

Parts one, two, three, and four.

On the question: How should a lab regard its own “failures”?

In a paper in the Journal of Experimental Social Psychology, Baumeister offers another sort of explanation for why experiments might fail (not open access — sorry!):

Patience and diligence may be rewarded, but competence may matter less than in the past. Getting a significant result with n = 10 often required having an intuitive flair for how to set up the most conducive situation and produce a highly impactful procedure. Flair, intuition, and related skills matter much less with n = 50.
In fact, one effect of the replication crisis can even be seen as rewarding incompetence. These days, many journals make a point of publishing replication studies, especially failures to replicate. The intent is no doubt a valuable corrective, so as to expose conclusions that were published but have not held up.
But in that process, we have created a career niche for bad experimenters. This is an underappreciated fact about the current push for publishing failed replications. I submit that some experimenters are incompetent. In the past their careers would have stalled and failed. But today, a broadly incompetent experimenter can amass a series of impressive publications simply by failing to replicate other work and thereby publishing a series of papers that will achieve little beyond undermining our field’s ability to claim that it has accomplished anything.
Having mentored several dozen budding researchers as graduate students and postdocs, I have seen ample evidence that people’s ability to achieve success in social psychology varies. My laboratory has been working on self-regulation and ego depletion for a couple decades. Most of my advisees have been able to produce such effects, though not always on the first try. A few of them have not been able to replicate the basic effect after several tries. These failures are not evenly distributed across the group. Rather, some people simply seem to lack whatever skills and talents are needed. Their failures do not mean that the theory is wrong.

It’s not wrong to think that variation in technical skills of the experimenter could have an effect on experimental results. Psychophysiology and neuroscience experiments require all kinds. Laboratory manipulations that require convincing deception or establishing meaningful relationships require acting and direction skill. Procedures should be better standardized and communicated. But adherence to procedures ought to be something that can be observed and evaluated before the results come in. If you only decide after the fact that the grad students who get results were the skilled ones, then you’re at very high risk of motivating reasoning — this negative result doesn’t count because the grad student was unskilled — or worse, creating an environment where negative results are shameful evidence of “incompetence” and are less likely to be reported to a supervising investigator. Another thing I can report from experience and from talking to other early career researchers is a feeling of shame about negative results, even when you know rationally that there’s nothing to be ashamed of, and even when you don’t have the burden of believing your supervisor will attribute them to your failure. When the scientific culture encourages shame, it also encourages deception to hide the source of it.

One the other end of the spectrum, one lab, after finding a seemingly (but improbably) huge effect of oxytocin nasal spray on trust, then saw its findings prove to be inconsistently replicable. They published one of their failures to replicate in the journal PLOS one, but even further, they published a paper that combined all their data, both published “successes” and unpublished “failures,” and concluded that there was no reliable effect of oxytocin spray on trust. This was admirable and brave. Some authors have recommended that researchers treat their own results as continuing inputs into a big metastudy, that they do continuously cumulative meta-analyses, treating significant and non-significant results as part of the same information pool. Baumeister is arguing that a significant result is indicative of skill or flair, but it would far less risky to decide beforehand whether procedures are valid, and treat every new study as useful information.