Home / General / Replication Crisis in Psychology: Part Five

Replication Crisis in Psychology: Part Five

Comments
/
/
/
621 Views

Parts one, two, three, and four.

On the question: How should a lab regard its own “failures”?

In a paper in the Journal of Experimental Social Psychology, Baumeister offers another sort of explanation for why experiments might fail (not open access — sorry!):

Patience and diligence may be rewarded, but competence may matter less than in the past. Getting a significant result with n = 10 often required having an intuitive flair for how to set up the most conducive situation and produce a highly impactful procedure. Flair, intuition, and related skills matter much less with n = 50.

In fact, one effect of the replication crisis can even be seen as rewarding incompetence. These days, many journals make a point of publishing replication studies, especially failures to replicate. The intent is no doubt a valuable corrective, so as to expose conclusions that were published but have not held up.

But in that process, we have created a career niche for bad experimenters. This is an underappreciated fact about the current push for publishing failed replications. I submit that some experimenters are incompetent. In the past their careers would have stalled and failed. But today, a broadly incompetent experimenter can amass a series of impressive publications simply by failing to replicate other work and thereby publishing a series of papers that will achieve little beyond undermining our field’s ability to claim that it has accomplished anything.

Having mentored several dozen budding researchers as graduate students and postdocs, I have seen ample evidence that people’s ability to achieve success in social psychology varies. My laboratory has been working on self-regulation and ego depletion for a couple decades. Most of my advisees have been able to produce such effects, though not always on the first try. A few of them have not been able to replicate the basic effect after several tries. These failures are not evenly distributed across the group. Rather, some people simply seem to lack whatever skills and talents are needed. Their failures do not mean that the theory is wrong.

It’s not wrong to think that variation in technical skills of the experimenter could have an effect on experimental results.  Psychophysiology and neuroscience experiments require all kinds.  Laboratory manipulations that require convincing deception or establishing meaningful relationships require acting and direction skill.  Procedures should be better standardized and communicated.  But adherence to procedures ought to be something that can be observed and evaluated before the results come in. If you only decide after the fact that the grad students who get results were the skilled ones, then you’re at very high risk of motivating reasoning — this negative result doesn’t count because the grad student was unskilled — or worse, creating an environment where negative results are shameful evidence of “incompetence” and are less likely to be reported to a supervising investigator.  Another thing I can report from experience and from talking to other early career researchers is a feeling of shame about negative results, even when you know rationally that there’s nothing to be ashamed of, and even when you don’t have the burden of believing your supervisor will attribute them to your failure.  When the scientific culture encourages shame, it also encourages deception to hide the source of it.  

One the other end of the spectrum, one lab, after finding a seemingly (but improbably) huge effect of oxytocin nasal spray on trust, then saw its findings prove to be inconsistently replicable.  They published one of their failures to replicate in the journal PLOS one, but even further, they published a paper that combined all their data, both published “successes” and unpublished “failures,” and concluded that there was no reliable effect of oxytocin spray on trust.  This was admirable and brave.  Some authors have recommended that researchers treat their own results as continuing inputs into a big metastudy, that they do continuously cumulative meta-analyses, treating significant and non-significant results as  part of the same information pool.  Baumeister is arguing that a significant result is indicative of skill or flair, but it would far less risky to decide beforehand whether procedures are valid, and treat every new study as useful information.

FacebookTwitterGoogle+Share
  • Facebook
  • Twitter
  • Google+
  • Linkedin
  • Pinterest
  • Vance Maverick

    Does Baumeister present an argument that carrying out and writing a persuasive, effective non-replication paper is easier than persuasively writing up a new result? Or is he just being resentful?

  • Srsly Dad Y

    Most of my advisees have been able to produce such effects, though not always on the first try. A few of them have not been able to replicate the basic effect after several tries. These failures are not evenly distributed across the group. Rather, some people simply seem to lack whatever skills and talents are needed. Their failures do not mean that the theory is wrong.

    I mean … srsly? You are a bad grad student in his lab if you can’t generate the answer he knows is true at p < .05. Bad. You failed. But of course there's no reason to suspect anyone of not engaging in high-minded above-board academic inquiry, and no reason at all to be suspicious when labs won't share their data, is there? No no no no no no no no no, that would be cynical.

    • Sebastian_h

      Well I must agree with him that ‘produce’ is exactly the right word for that paragraph.

  • James B. Shearer

    That Baumeister quote is really something. How common Is that sort of thinking in psychology?

    • Katie Surrence

      I haven’t heard precisely that argument (but I have heard a lot of other kinds of very motivated reasoning) in my own studies, although the case that flair, or lack thereof, explains differences in findings has been made by several prominent researchers.

  • sonamib

    Patience and diligence may be rewarded, but competence may matter less than in the past. Getting a significant result with n = 10 often required having an intuitive flair for how to set up the most conducive situation and produce a highly impactful procedure. Flair, intuition, and related skills matter much less with n = 50.

    What the hell does he even mean with this comparison between n=10 and n=50? More statistics is always better. All else being equal, the results are more reliable when the sample size increases. If the experimenter is bad and doesn’t follow the correct procedure, a larger sample size won’t save them, they will keep reaching the wrong conclusion.

    So yeah, I don’t really understand this nonsensical rant about flair not being necessary with n=50. Like other commenters have noted, this paragraph looks an awful lot like Baumesiter’s nudging his experiments to reach the conclusions he wants.

    • ckc_not_kc

      Most of my advisees have been able to produce such effects, though not always on the first try. A few of them have not been able to replicate the basic effect after several tries. These failures

      Nudging!?!?

  • On the one hand, the point is banal: I’ve had students fail the replicate a computational study because they failed to be able to run it at all.

    On the other hand, dear god, his attitude is a recipe for confirmation and publication bias. He dropped sharply in my estimation and trust.

    I remember in philosophy complaints about people who were merely critical or negative, because it wa “harder” to be positive. There a point in that (esp the way a lot of philosophers are u constructively negative) but there are real dangers too.

It is main inner container footer text