Preclinical cancer biology research is not as reproducible as it should be
Representative photo: vnwayne fan / Unsplash
- The “Reproducibility Project: Cancer Biology” planned to repeat experiments selected from 53 large-scale articles on cancer biology published during the period 2010-2012.
- The RPCB’s goal was not to find faulty or faulty documents, and a failure by the team to replicate an experiment does not mean the original was fake.
- However, the results of the project should give the biomedical research company food for thought.
In 2014, when the first articles from the Reproductibility Project: Cancer Biology (RPCB) were published in the journal eLife, there were widespread concerns about what appeared to be low levels of replicability and reproducibility in certain areas of research. Researchers from two pharmaceutical companies – Bayer and Amgen – had reported that they had failed to replicate many of the findings published in cancer biology and other areas of preclinical research. Since then, large-scale studies of replicability and reproducibility in psychology, economics and other areas of research, reports from learned societies, surveys of researchers and popular books have ensured that concerns about the “reproducibility crisis” has remained in the foreground ever since.
The RPCB had two main objectives: to provide evidence of replicability in preclinical cancer research and to identify factors that influence replicability more generally. Now, seven years later, the last three articles of the project have just been published (here, here and here), and they confirm that there is still considerable scope to improve the reproducibility of preclinical research in the biology of the disease. Cancer.
The RPCB was a collaboration between the Center for Open Science and Science Exchange, and the project was funded by a grant from a private foundation (now called Arnold Ventures). To achieve its objectives, the project team planned to repeat selected experiments from 53 leading articles in the field of cancer biology that had been published during the period 2010-2012. eLife agreed to be the editorial partner of the project and to use what was then a new peer review approach to assess the results of the project.
Under this approach, for each article selected, the project team would prepare a “recorded report” that would describe in detail how the experiments would be conducted and how the data would be analyzed. Every recorded report would be peer reviewed and experiments could not begin until it was accepted for publication. The results of the experiments would then be written up as a replication study, which would be peer reviewed to ensure that the experiments and data analysis had been performed according to the recorded report. Where possible, one of the original article authors would be involved in the peer review of the recorded report and replication study.
A total of 193 experiments from 53 articles were selected for replication, and the project team set out to prepare recorded reports for each article. However, as explained in detail in “Challenges for the Assessment of Replicability in Preclinical Cancer Biology,” the team encountered problems almost immediately.
For example, many original articles failed to report key descriptive and inferential statistics, and despite contacting the original authors, the project team was unable to obtain these data for 68% of the experiments. Likewise, none of the 193 experiments were described in sufficient detail that the project team could design protocols to repeat them. And although the original authors were often helpful when asked for such details, they were “not at all helpful” (or did not respond to the project team) for 32% of the experiments. .
These problems caused the early stages of the project to take longer than expected and exceeded the budget: the end result was that it was only possible to publish 29 recorded reports.
Once the experimental work had started, two-thirds of the protocols had to be modified to allow the experiments to be carried out. Again, this stage of the project took longer and cost more than expected, and in the end the project team could only repeat 50 experiments from 23 papers: the results of these experiments are reported in an aggregated article. The clear message that emerges here is that the communication of methods and results needs to be improved.
So how reproducible were the 50 experiments the team managed to repeat? As explained in a meta-analysis that combines data from all replications, there are a number of different answers to this question. One reason for this is that many experiments involved measuring more than one effect (such as measuring the influence of an intervention on both tumor burden and overall survival). Indeed, the 50 experiments involved a total of 158 effects. In addition, these effects could be positive effects or no effects. Additionally, some of the original articles reported effects in terms of numerical values, while others relied on images.
The team used seven criteria to assess replicability, although some were not suitable for assessing all effects (eg, some only worked for positive effects, or when numerical values were available). One endpoint compared effect sizes for positive effects: this revealed that the median effect size in replications was 85% smaller than in the original experiments; in addition, the effect size in replication was smaller than the original 92% of the time. The other criteria were binary – replication was either successful or unsuccessful – and five of these could be used for positive and zero effects when effect sizes were reported as numeric values. For positive effects, 40% of replications were successful on three or more of these criteria, and this figure rose to 80% for zero effects.
In a separate article, Patrick Kane and Jonathan Kimmelman (who were not part of the RPCB) take a step back and discuss some of the scientific, ethical and political implications of the project. They compare basic and preclinical cancer biology research to a “diagnostic machine” that is used to decide which clinical hypotheses should be made (including which should be subjected to clinical trials). While the RPCB results may be “concerning,” Kane and Kimmelman argue that more work is needed to better understand the performance of the diagnostic machine.
And more work is underway on many fronts. National projects to explore various aspects of reproducibility are underway in several countries, including Brazil, Germany and the Netherlands. National reproducibility networks have also been set up in Germany and the United Kingdom.
The purpose of the RPCB was not to find faulty or faulty documents, and a failure of the team to reproduce an experiment does not mean that the original was fake (and, similarly, a successful replication does not guarantee that the ‘original was correct – the original and the replication may be wrong). However, the results of the project should give the biomedical research company food for thought. Reviews have encouraged more comprehensive reporting of methods and results in recent years, but there is still room for improvement, especially when it comes to making data and code freely available. Many studies would benefit from more input from statistical experts, ideally before data collection, and pre-registration should help reduce bias and increase rigor in certain types of studies.
Increased preprinting will also be useful for most articles by increasing both readership and scrutiny, and by making new results available sooner. Finally, a greater emphasis on rigorous, as opposed to eye-catching, science by researchers, institutions, funders and journals would benefit everyone.
Peter Rodgers and Andy Collings are editor-in-chief and editor-in-chief of eLife.
This article was first published by eLife and was republished here under a CC BY license.