On the low reproducibility of cancer studies

Earlier reports suggest that close to 90% of cancer biology publications are irreproducible (2). The low number has recently been corroborated by 5 detailed replication studies in eLife (1). While the irreproducibility is often attributed to human factors (1), which are remediable, the reason might be biological and the irreproducibility is intrinsic to such studies. The low reproducibility, reflecting the diversity in the evolutionary pathways of tumorigenesis, will likely impact clinical strategies significantly.


INTRODUCTION
Previous reports have suggested that close to 90% of cancer biology publications are irreproducible. The low number has recently been corroborated by five detailed replication studies in eLife, which have been commented on in Nature (Replication studies offer much more than technical details. Nature 2017;541:259-60). While the irreproducibility is often attributed to human factors, which are remediable, the reason might be biological and the irreproducibility is intrinsic to such studies. The low reproducibility, reflecting the diversity in the evolutionary pathways of tumourigenesis, will likely significantly impact clinical strategies.
Reproducibility is the foundation of experimental science. While there are many factors afflicting different fields of inquiry to varying degrees [1], cancer biology studies appear to stand out. In an earlier report, only 6 of 53 published findings in cancer biology could be confirmed [2], a rate approaching an alarmingly low 10% of reproducibility. According to the report, such a low rate is common in the pharmaceutical industry.
The low rate is of particular concern because these studies are generally published in 'highimpact' journals, thus having real consequences in both clinical practice and basic research. In this critique, we first review recent updates on the extent of reproducibility. Second, the reasons underlying low reproducibility are explored. While concerns about low reproducibility are often expressed in terms of human factors [2][3][4][5], there are deeper biological reasons. Third, a distinction is made between studies that should be reproducible and those that are intrinsically irreproducible. Fourth, suggestions are made to cope with irreproducibility issues.
For permissions, please e-mail: journals.permissions@oup.com melanomas (15 out of 107). Furthermore, the introduction of a PREX2 mutation into human melanocytes was reported to substantially reduce the tumour-free survival in xenograft mice. In the replication study, Horrigan et al. [9] (see also Davis [10]) cited recent studies that failed to support the prevalence of PREX2 mutations in human melanomas. In addition, Horrigan et al. failed to corroborate the reduction in tumour-free survival caused by PREX2 mutation because the controls without the mutation die just as rapidly (1 week), in contrast with the original report of 9-week median survival in the control.
In the other study, Willingham et al. [11] suggest that the CD47 protein is overexpressed on the membrane of most cancer cells. Since CD47 signals to macrophages to withhold attack, blocking CD47 reduces the mass of orthotopic breast tumours by ∼10 fold in immune competent mice that have been injected with MT1A2 mouse cells, relative to the control. In the replication by Horrigan [12], the tumour sizes are curiously reversed between the CD47and IgG (control)-treated mice. Horrigan noted that the tumour mass is highly variable, often with a 5-fold difference in the same setting. Such high variability is common is tumour evolution but is often treated as noise. It should be noted that Willingham et al.
obtained similar results by transplanting human tumour cells into immune-compromised mice. Horrigan also reported that the xenograft experiments have been reproduced only in some follow-up studies.
The one study that was deemed not to be reproducible was that by Sugahara et al. [13], who showed an increase in drug permeability when tumours were treated with the iRGD peptide. In their study, tumours grow from xenografted prostate cancer cells. Mantis et al. [14] could not reproduce the results but did report some successful replication by other studies [15][16][17]. While the study by Sugahara et al. was the only study whereby the experiments could not be reproduced, it is possible that either the control or the experiment can be more variable. Thus, it seems curious to consider one type of irreproducibility to be more serious than another.
For the remaining two studies that were declared to be 'essentially reproducible', one [18] would not be considered reproducible under most circumstances. In Sirota et al., the bioinformatic analysis of drug applications on cell lines led to the identification of cimetidine as an effective agent against lung adenocarcinoma cells. While the replication by Kandela et al. [19] observed the same trend, the difference was not significantly different from the control. The reason, as noted by Dang [20], is that the effect of cimetidine reported by Sirota et al. is too weak to be considered biologically significant. Overall, the variance, in comparison with the small difference in mean, makes it difficult to justify the conclusion that cimetidine is an effective new drug against lung cancer.
The only reproducible study in the set of five studies is that of Delmore et al. [21]. In the original study, the molecule (+)-JQ1 was reported to downregulate MYC transcription and reduce the burden of multiple myeloma tumours, resulting in the improved survival of xenograft mice. While the results were successfully replicated by Aird et al. (with some variations) [22], a negative control using an enantiomer (−)-JQ1, which does not impact MYC transcription, showed the same biological effect as (+)-JQ1. Given that the negative control also yielded the (unexpected) benefit, we suggest that the original study should be considered uninterpretable.

CAUSES OF IRREPRODUCIBILITY
The five replications corroborate the earlier report of reproducibility at a rate of 6 out of 53 [2]. While previous reports have hinted at technical (or even ethical) lapses [2,[4][5][6], the factors cited are common across biological disciplines. Furthermore, given the care invested in RP: CB, the replication efforts should be quite adequate. We therefore seek biological explanations below.
Whether or not any experimental study can be replicated depends on the measurements being reproduced. In coin tossing, seeing the same side five times in a row will be reproducible no more than 10% of the time. In cancer biology, reproducibility would mean that tumour progression corresponds to highly constrained courses, akin to tissue development. However, if tumour progression follows evolutionary trajectories, the outcome may be highly variable. The course of evolution is often a multi-step process requiring a suite of genetic changes, each of which is governed by stochastic factors including mutation emergence, random drift, and divergent selective pressures. Since each step is contingent on the previous steps taken, divergent outcomes may result from even a small deviation in an earlier stage.
In his book 'Wonderful Life', S. J.Gould [23] raised the issue of the reproducibility of evolution itself. He wondered if the same evolutionary trajectory would be followed had 'the tape of life' been rewound (see also Conway Morris [24]). 'Rewinding the tape of life' back to the time of the Cambrian explosion is of course mere fantasy, but there are indeed evolutionary processes that are continually reiterated. The best example may be the evolution of cancers [25][26][27][28]. It is hence curious that the word 'evolution' does not appear in the RP:CB registered/replication reports, editorials, and commentaries.
Reproducibility of evolution would be equivalent to convergent evolution in which a dominant pathway is repeatedly taken. In convergent evolution, the distinction between phenotype and genotype is crucial. Phenotypic convergence is common in natural populations. Similarly, morphological convergence is a basis on which pathologists define malignancy. The central issue is genotypic convergence: whether the genetic changes underlying the phenotypes are themselves convergent. With constant references to somatic mutation, gene expression, and target therapy, cancer biology publications apparently consider genotypic convergence plausible.
With this backdrop, the Cancer Genome Atlas (TCGA) project [29][30][31] has attempted to identify genes that are commonly mutated in tumours. The results show that genetic convergence is much less frequent than had been hoped for [29]. For example, across 12 cancer types, only two genes are mutated in more than 10% of cases: TP53 and PIK3CA; the former is an outlier in the human genome [32,33] and the latter is a very large gene. The number of frequently mutated genes (in >10% of cases) for a given type of cancer is generally around 10 [29,30]. With such low genic convergence, two cases of the same cancer type usually have few mutated genes in common, or may share no mutated genes at all. These observations suggest that, from a very similar starting point (two human beings), the evolution of cancer usually takes different courses. Even cancer cells from the same starting point (within the same person) would continue to diverge, leading to substantial genetic diversity [34,35] and variable responses to therapeutic treatments [36].
The TCGA results have their parallel in natural populations. While genotypic convergence may be observed for highly specialized traits such as echolocation [37], it is nevertheless rare. For adaptations that do not involve highly specialized constructs, diverse molecular mechanisms may operate and genotypic convergence is not expected. For example, human populations living in the high altitudes of the Tibetan, Ethiopian, and Andean Altiplano plateaus have different genetic mutations for hypoxic adaptation [38]. Recently, the search for molecular convergence has expanded to finding signals in the entire genome [37,[39][40][41][42][43][44]. Again, even with the aid of multiple genomes in the same environment, molecular convergence is rare and the signals rarely exceed those of the background noise.
Reproducibility of cancer progression and convergent evolution in organisms is nevertheless observable but the conditions are stringent, i.e. when there is a dominant evolutionary pathway leading to an end state. For example, when the organisms are genetically simple (e.g. viruses), there would often be few genetic solutions. Alternatively, if the selective pressure for specific genetic changes is strong (e.g. [39]), then convergence is a likely outcome. Wu et al. speculate that the selective pressure may be particularly high in 'liquid tumours' where cells with a proliferative advantage can spread rapidly and widely [26]. Indeed, chronic myelogenous leukaemia remains one of the best examples of cancer convergence at the genic level with the BCRABL translocation being a diagnostic feature [45,46]. In general, when we take into account the multi-phenotypic and multi-genic nature of tumour evolution (see Hanahan and Weinberg on cancer hallmarks [33]and Kandoth [29]oncancer driver genes), as well as the complexity of the mammalian genome, molecular convergence in cancer progression would likely be the exception rather than the rule [26].
As TCGA reveals the low convergence in real-life tumourigenesis, one might still expect a high level of convergence in mouse models when most conditions are under control. Now, the RP: CB studies have cast doubt on the predictability of evolution even in simple models. In the five RP: CB reports, cells from cancer cell lines are transplanted into mice, as xenografts or autografts. In these studies, experimental (E) and control ( C ) samples are collected, and designated (E1, C1) for the original studies and (E2, C2) for the replications. Tumour growth in each mouse is the culmination of two evolutionary processes. First, the cell populations have been evolving prior to transplantation [47]. Second, these cells subsequently evolve as xeno(auto)-grafts into tumours. While the second stage is widely discussed (see Wu et al. [26] for references), the first stage has been neglected even though cell lines do evolve continuously. This first stage is reminiscent of the classic 'Luria/ Delbruck fluctuations' [48].
Results from E1, C1, E2, and C2 are all conditional distributions (Fig. 1). The final analysis of the RP: CB rests on the comparisons between E1-C1 and E2-C2. In those replication reports that fail to reproduce the original results, C1 and C2 have not evolved along the same path in two reports, and E1 and E2 have evolved divergently in one (see Table 1). In another report, E1-C1 is too small to be biologically or statistically significant.
Tumour evolution in RP: CB may be sketched by a simple genetic model (Fig. 1) that frames evolutionary pathways as conditional probabilities. Each stage of evolution is conditional on prior steps of evolution via segregation of existing polymorphisms and the emergence of de novo mutations. A slight difference in the early stage may pave the way for a much greater divergence at a later time. In stage 2 in Fig. 1, the two replications overlap little in their trajectories and very different tumour phenotypes emerge as a result. A more realistic model than that of Fig. 1 will likely yield even more diverse patterns. We suggest that cancer biology studies should develop explicite volutionary models, rather than assume simple and reproducible outcomes.

WHICH STUDIES ARE, OR ARE NOT, REPRODUCIBLE?
The RP: CB reports give a glimpse of what may or may not be reproducible [7]. If the phenotype being assayed does not evolve in the course of the experiments, the reproducibility is generally high. For example, the treatment of cells with the chemical JQ1 leads to the reproducible downregulation of MYC transcription [21,22]. Similarly, Horrigan [12] was able to reproduce the toxicity effect resulting in mild anaemia in normal mice, whose tissues have not evolved.
In the evolution of tumours, reproducibility would be a function of the number, strength, and length of the evolutionary pathways. The TCGA data suggest that the number of genetic pathways for tumourigenesis must be quite large. As the number of steps in each pathway increases, the number of possible alternatives increases exponentially, rendering many observations irreproducible. We should note that the sort of contingent evolution depicted in Fig. 1 results in variable outcomes that may be difficult to capture by increasing the sample size.
Facing the diversity of pathways, cancer biology studies often attempt to isolate cases that share part of their pathways; for examples, lung cancer cases sharing the mutated EGFR gene. These cases may indeed show robust and more reproducible outcomes in responding to EGFR inhibitors [49]. However, such partially defined genetic pathways are still quite diverse and irreproducible results, as in drug resistance, are not uncommon.

CONCLUSIONS
It is curious that the RP: CB project [4,6], together with the earlier reports [2], are met with near total silence. Perhaps the prevailing view that low reproducibility is attributable to human factors [3,5] does not call for intellectual discourse. This view, which has not been supported by any evidence, may have obscured the more fundamental reason behind irreproducibility. From an evolutionary perspective, low reproducibility is intrinsic to such studies because tumourigenesis does not usually traverse the same evolutionary pathway.
Reproducibility is nevertheless the central tenet of cancer biology, which assumes convergent pathways with relatively well-defined genetic changes. Thus, in basic research, genetic changes are identified and therapies are then developed to target these changes. The continual evolution of the underlying genetic architecture means that mutations are 'moving targets', both between and within individuals. From this perspective, target-gene therapy operates against evolutionary rules.
Traits that do not evolve or that evolve slowly may be better targets. Indeed, the evolution of drug resistance continues to be a major impediment in targeted cancer therapy. The efficacies of the top three monoclonal antibody drugs-bevacizumab, trastuzumab, and rituximab-are instructive [50]. Bevacizumab and trastuzumab targeting VEGF and HER-2, respectively, and have limited success [50,51]. In contrast, rituximab, which targets CD20 on the cell surface of all pre-B cells, significantly improves survival in patients with B-cell lymphoma [52]. The efficacy of rituximab may be due to the fact that it does not target a product of cellular evolution. Recent strategies that have targeted the basal transcription machinery [53,54] are compatible with this view favoring non-moving targets.
Finally, many diseases are the (by)products of evolution from the viewpoint of Darwinian medicine [55]. Tumourigenesis is different as it is not merely a product of evolution [56]; it is the process itself, or evolution in action [57]. Lewontin [58] pointed out that the essence of evolution is its variability. The diversity, rather than a standard type, is the subject of interest. In cancer studies, the diversity in evolutionary trajectories is also clinically significant. H. J. Muller remarked that biology has gone too long without evolutionary thinking at the centennial of the publication of 'On the Origin of Species'. Five decades later, Muller's remarks remain relevant to studies of cancers.
Finally, it should be noted that four additional reports were released by eLife [59][60][61][62] at or after the time of this submission. The eLife editorial considers three to be reproducible in 'important parts' and one as reproducible in 'some parts'. Among the four studies, all reproducible experiments were carried out on cell lines growing in dishes and the time units for collecting data after treatment were either minutes or hours. The only xenograft tumour model examined was in Shan et al. [59], which tested the efficacy of I-BET151, an inhibitor of the BET bromodomain, as a treatment for mixed-lineage leukaemia (MLL)fusion leukaemia. The experiment took 2 months, thus allowing for plenty of mutation and selection events. Not unexpectedly, the replication group failed to reproduce the original finding of increased survival in I-BET151-treated mice. The timescale-dependent reproducibility of cancer research strengthens the evolutionary arguments of our report. A model of pathway diversity in tumor evolution. Each step of the pathway is the realization from a probability distribution that is conditional on the previous steps taken. In this model of 12 loci (a-l), one locus may change at each stage. The vertical bar below indicates the evolvable locus, which may change from, say, e to E (further changes are allowed.). The locus that actually changes is marked in red. It is assumed that each locus has positive fitness epistasis with the two adjacent loci on each side (e.g. E interacts positively with C, D, F, and G). Stage 1 represents cell line evolution and stage 2 represents the evolution of these cells into tumours. The evolution of two populations (replications) is portrayed. The evolved genotypes in stage 2 determine the tumour's phenotypes (bottom of the figure), which show no overlap between populations. If 10 samples are taken from each replication, as shown by the small arrows, the two replications would appear to be totally irreproducible. Summary of the first batch of RP: CB studies.

eLife editorial This study Comments
Sirota et al. [18] Kandela et al. [19] Yes No Cimetidine slowing down lung adenocarcinoma growth. Reported effect is weak in the original report and the weak effect is found to be insignificant in the replication.
Delmore et al. [21] Aird et al. [22] Yes UI JQ1 binding MYC and slowing down myeloma growth. The experiment is reproducible but a negative control using (−)JQ1 has the same effect on tumour growth.
Sugahara et al. [13] Mantis et al. [14] No No Co-administration of iRGD with chemo-agent enhances drug uptake by tumour cells. Neither drug uptake nor tumour growth is reproduced.
Berger et al. [8] Horrigan et al. [9] UI No Transplanted melanoma cells expressing a mutated PREX2 gene grow faster as tumours, speeding up death. In replication, the control cells have the same lethal effect.
Willingham et al. [11] Horrigan [12] UI No Anti-CD47 antibody promotes growth of mouse breast cancer cells by blocking phagocytosis, vis-a-vis the IgG control. The opposite effect is observed in the replication Yes -reproducible; No -Not reproducible; UI -Uninterpretable.