A Huge Study Found Less than Half of Psychology Findings Were Reproducible
More evidence that science isn’t as self-correcting as you might hope.
Scientists love a good study of studies, but a new paper in Science takes this investigative thoroughness to a particularly high level: It features 270 researchers attempting to replicate a whopping 100 experiments from previously published psychology studies.
The point was to see if the studies, which are published in well-respected journals, were generally reproducible. By copying the same experiment with the same method, do you get the same results? In many cases: no.
"Replication effects were half the magnitude of original effects, representing a substantial decline," the authors said in their paper, writing under the Open Science Collaboration and led by Brian Nosek.
"Independent replication is really a feature of good quality, robust scientific evidence," Gavin Sullivan, a reader at the Centre for Research in Psychology, Behavior and Achievement at Coventry University, told me. He's one of the many researchers who took part. If a study is clear about its methodology and findings, you'd expect to be able to repeat it, he said. "If you don't, it makes you wonder what factors were involved—was it internal factors, something to do with the participants; or was it something external to do with the journal?"
"Sometimes journal editors may favour more innovative or even counterintuitive findings."
To do the study, the teams of researchers each took one experiment from a previous study that had appeared in a 2008 edition of one of three leading psychology journals. They were then tasked with following the same protocol as as closely as possible, even reaching out to the original authors for guidance.
But this wasn't true for all of the papers tested. "A large portion of replications produced weaker evidence for the original findings despite using materials provided by the original authors, review in advance for methodological fidelity, and high statistical power to detect the original effect sizes," the authors report.
That doesn't mean that the initial studies were bunk. They could well have found the interesting results they reported, but for whatever reason they may be novel rather than more widely applicable, and more investigation is needed. Sullivan noted that social psychological studies tended to be less reproducible than cognitive studies—perhaps because they are generally more complex so it's harder to control for all the variables.
While this paper specifically looked at psychology studies, it's certainly not the only field to suffer from the reproducibility issue. A paper earlier this year found that in life sciences, fewer than half of clinical trials were reproducible. The findings in the new paper don't just suggest, then, that psychology is somehow less disciplined or "scientific" than other sciences. Indeed, the fact these researchers are trying to analyse the credibility of findings from their own discipline is surely an indicator of a commitment to scientific rigour.
Another reason so many irreproducible studies from any field end up published could be a result of journals' preferences. "Sometimes journal editors may favour more innovative or even counterintuitive findings," said Sullivan. This kind of bias can make potentially shaky results more attractive, as we've seen time and again.
This can even incentivize "p-hacking," a term used to describe selectively mining data to try to make something look significant. Obviously that's not the best way to go about getting solid scientific conclusions, but it might make your findings appear more snazzy—and it's known to be widespread.
"You would really want to reduce the possibility that people might do that," said Sullivan, pointing to some journals initiatives to agree to publish reproductions of studies regardless if the results are positive or negative as a good way to counteract the implicit incentive to hack results to make them look more interesting.
The paper concludes, unsurprisingly, that psychology has room for improvement on the reproducibility front. But it also adds the caveat that none of the replication attempts offer clear answers on any one finding. "Humans desire certainty, and science infrequently provides it. As much as we might wish it to be otherwise, a single study almost never provides definitive resolution for or against an effect and its explanation," the authors write.
And as Sullivan remarked, there's no end to pursuing the ideal of self-correction in science. He proposed a further "metaquestion": "There's an interesting question of 'can the reproducibility study be reproduced?'"