Abstract

Next-generation sequencing technologies have led to profound characterization of mutation spectra for several cancer types. Hence, we sought to systematically compare genomic aberrations between primary tumors and cancer lines. For this, we compiled publically available sequencing data of 1651 genes across 905 cell lines. We used them to characterize 23 distinct primary tumor sites by a novel approach that is based on Bayesian spam-filtering techniques. Thereby, we confirmed the strong overall similarity of alterations between patient samples and cell culture. However, we also identified several suspicious mutations, which had not been associated with their cancer types before. Based on these characterizations, we developed the inferring cancer origins from mutation spectra (ICOMS) tool. On our cell line collection, the algorithm reached a prediction specificity rate of 79%, which strongly variegated between primary cancer sites. On an independent validation cohort of 431 primary tumor samples, we observed a similar accuracy of 71%. Additionally, we found that ICOMS could be employed to deduce further attributes from mutation spectra, including sub-histology and compound sensitivity. Thus, thorough classification of site-specific mutation spectra for cell lines may decipher further genome–phenotype associations in cancer.

INTRODUCTION

Over the past years, targeted therapies have tremendously affected the clinical outcome of several cancer types (1–6). The rationale of these personalized approaches is to exploit the individual biology of tumor lesions by specific kinase inhibitors (7). So far, most cancer-specific therapeutically amenable targets have been deciphered in cell culture models (8–10). Bearing in mind the enormous efforts devoted to the elucidation of mutation spectra of primary tumors (11–15), we aimed at developing a pan-cancer characterization of functionally relevant aberrations in cell lines.

Recently, sequencing data of large cohorts of patient samples [The Cancer Genome Atlas (TCGA)] (11) and cell lines [Cancer Cell Line Encyclopedia (CCLE)] (16) became publically available. We used them to establish a genomic classification, linking a gene to the cancer type, in which it is most frequently mutated. Based on this, we developed a computational tool, which infers cancer origins from mutation spectra (ICOMS). We demonstrate its application to predict multiple cell line characteristics from mutation spectra, including its primary site, sub-histology and response to specific kinase inhibitors.

This study involved sequencing data of 1651 genes across 905 cell lines derived from 23 distinct primary sites [CCLE database, (16)] (Fig. 1A). We validated our findings using a collection of 431 independent primary tumor samples [TCGA database (11–15)] (Fig. 1A).

Figure 1.

(A) Schematic outline of the five main steps in this study. (B) Selective (purple frame) and frequent (brown frame) biologically relevant meta-mutations of eight cancer types (histology-specific colors) are listed in mutation index descending order. Gene name, selected amino acid changes, mutation index as well as histology-specific (green) and total (red) counts are recorded.

Figure 1.

(A) Schematic outline of the five main steps in this study. (B) Selective (purple frame) and frequent (brown frame) biologically relevant meta-mutations of eight cancer types (histology-specific colors) are listed in mutation index descending order. Gene name, selected amino acid changes, mutation index as well as histology-specific (green) and total (red) counts are recorded.

RESULTS

A pan-cancer association between primary tumor sites and cell line mutation spectra

Our first aim was to capture the spectrum of characteristic mutations for each cancer type. To this end, we examined hybrid capture sequencing data of 905 cancer cell lines (Supplementary Material, Table S1) derived from 23 distinct primary tumor sites (CCLE database). For each mutation, we recorded its recurrence in the CCLE data set, counted the number of cancer types, for which it most frequently occurred (rating), and calculated its mutation index as a novel measure to assess its feasibility for characterizing a certain primary site (Supplementary Material, Table S2 and Methods).

Altogether, we found that the CCLE data set contained 3435 groups of recurrent mutations. Of these, 2490 (72%) were selective, i.e. associated with a single primary site (Supplementary Material, Table S2, green). Further, the mutation index of 2534 mutations (74%) was high enough to serve as an acceptable genomic predictor. Both criteria were fulfilled for 1993 (58%) distinct recurrent mutations.

Next, we clustered these mutations to a panel of 2717 meta-mutations (Supplementary Material, Table S3), groups of genomic alterations in the same gene that can be combined for primary site characterization. Of these, we selected 1131 (42%) meta-mutations, which were unambiguously associated with a single cancer type (Supplementary Material, Table S3, green). They constituted our genomic classification of cancer types.

The number of selective meta-mutations varied greatly between primary tumor sites (Supplementary Material, Fig. S1A), but did not correlate significantly with their a priori representation in the total cell line cohort (P = 0.093, Supplementary Material, Fig. S1B). However, we observed significant correlation (P = 0.002) with the average count of mutations per sample (Supplementary Material, Fig. S2).

Finally, we studied the functional relevance of the strongest genomic predictors of our primary site classification (Fig. 1B). For this, we sorted meta-mutations by mutation indices (Supplementary Material, Table S3) and compared them with previous reports in literature. Functionally prominent mutations were classified into frequent and selective aberrations; missense mutations were scored by PolyPhen (version 2) algorithm (17), which assessed their disruption of the global protein structure (Fig. 1B).

In concordance with previous studies, we associated mutations of KRAS, TP53, EGFR, STK11 (18–23) and DDR2 (24) with lung cancer, mutations of BRAF (25–27) with melanoma, mutations of NRAS (28–31) and MYC (32–34) with hematological malignancies, mutations of ARID1A (35–38) and PTEN (39–44) with endometrium cancer, mutations of APC (14,45–47) and BMPR2 (48–50) with colorectal cancer, mutations of TSHR (51–54) and TPO (55,56) with thyroid cancer and mutations of VHL (57–61) with kidney cancer. A detailed examination can be found in the Supplementary Material, Mutation Annotation.

Of note, we also recovered typical domains of gain-of-function mutations, e.g. L858R substitutions and exon 19 deletions for EGFR (62), amino acid substitutions at G12, Q16 for KRAS (21,63–65) as well as V600E mutations for BRAF (25–27) (Fig. 1B). Additionally, we observed that loss-of-function mutations dispersed over the whole amino acid sequence of tumor suppressor genes, such as ARID1A (35–38), APC (14,45–47) and VHL (57–61) (Fig. 1B and Supplementary Material, Table S3). Thus, tumor suppressors could be differentiated from proto-oncogenes by analysis of meta-mutation spectra.

Owing to their specificity and high mutation index, mutations in BCL3, MLL3 (lung), JAK1, ERBB3, RBL2, ALDH7A1 and HMMR (endometrium), P2RX7, RAC1 and IRS1 (blood), MEN1, FANCI, FANCB and PAX8 (skin) as well as SACS, ROCK1, JARID2 and TET1 (large intestine) showed up as novel promising targets (Fig. 1B and Supplementary Material, Table S3), which had, to the best of our knowledge, not been associated with their respective primary sites before.

Computational prediction of primary tumor sites by ICOMS algorithm

We employed this classification to computationally predict primary sites of cell lines by ICOMS algorithm, a novel computational tool that we developed to infer cancer origins from mutation spectra. For each cell line, we calculated its mutation-derived coefficients according to the 23 distinct primary sites (Supplementary Material, Table S4).

Plotting mutation-derived coefficients against cancer types unraveled their strong accumulation on the diagonal (Fig. 2). This reflected their discriminatory power, on which the outcome of the ICOMS algorithm was highly dependent. For 20 of 23 histotypes, coefficients were significantly superior in the group of samples, for which pathological diagnoses matched, compared with complementary control groups (homoscedastic T-test: endometrium, P = 3.7 × 10−114; blood, P = 5.1 × 10−64; kidney, P = 9.8 × 10−93; lung, P = 1.9 × 10−38; prostate, P = 1.8 × 10−82; skin, P = 6.4 × 10−60; thyroid, P = 1.7 × 10−95, etc.) (Fig. 2 and Supplementary Material, Table S4).

Figure 2.

Bar plots of mutation coefficients for cell lines (ordinate) against cancer types (abscissa), segregated by correct (green), incorrect (red) and missing (blue) ICOMS diagnoses. Color intensities mirror accumulation of mutation coefficients in each field.

Figure 2.

Bar plots of mutation coefficients for cell lines (ordinate) against cancer types (abscissa), segregated by correct (green), incorrect (red) and missing (blue) ICOMS diagnoses. Color intensities mirror accumulation of mutation coefficients in each field.

However, distribution shapes of mutation coefficients (Fig. 2 and Supplementary Material, Fig. S3) strongly varied between cancer types, e.g. coefficients of hematological and lung malignancies were broadly spread (average control groups: 1.69 and 5.83) with prominent site-specific amplitudes (coefficient peaks: 8.03 and 15.73). On the contrary, coefficient baseline (average control group: 0.01) and amplitudes (2.05) were low for kidney tumors (Supplementary Material, Fig. S3 and Table S4).

This was considered in our algorithm design both by the independent choice of upper and lower thresholds of mutation coefficients (Supplementary Material, Table S5) and by the hierarchy of cancer types according to the histology order (Supplementary Material, Fig. S4), which resolved ICOMS diagnosis conflicts. In order to optimize upper and lower mutation coefficient thresholds, we calculated two ROC analogs (66) per primary site (Supplementary Material, Fig. S3). Naturally, strong a priori representations (e.g. breast, large intestine and ovary) allowed lower choice of both coefficients, whereas less focal distributions (e.g. blood and lung) required isolated increase of the lower threshold (Supplementary Material, Table S5).

Based on this optimal calibration, we performed the ICOMS algorithm on the whole CCLE data set (Fig. 3A,B). We calculated both the fraction of samples, for which a computational diagnosis was available (available diagnoses, i.e. sensitivity) and the portion of these, for which computational diagnosis matched CCLE cell line annotation (correct diagnoses, i.e. specificity) (Fig. 3C and Supplementary Material, Table S6).

Figure 3.

(A) Schematic outline of ICOMS prediction steps. (B) Primary associations (green) and exclusions (red) of cancer types as well as their comparability (primary partial order, purple) are combined to derive a final ICOMS prediction (six decision modes). Criteria, marked by ‘-’ are irrelevant for diagnosis. (C) ICOMS accuracy on cell lines. Rates of available (left) and correct (right) diagnoses are illustrated as bar plots for 19 cancer types (histology-specific colors). All cell lines are incorporated equally into the calculation of the average fractions. (D) Fractions of primary, secondary and missing diagnoses (outer ring) are segregated by ICOMS prediction modes (inner ring). Correct and incorrect diagnoses are marked by green and red stripes, respectively. (E) ICOMS accuracy on primary tumor samples (4 cancer types).

Figure 3.

(A) Schematic outline of ICOMS prediction steps. (B) Primary associations (green) and exclusions (red) of cancer types as well as their comparability (primary partial order, purple) are combined to derive a final ICOMS prediction (six decision modes). Criteria, marked by ‘-’ are irrelevant for diagnosis. (C) ICOMS accuracy on cell lines. Rates of available (left) and correct (right) diagnoses are illustrated as bar plots for 19 cancer types (histology-specific colors). All cell lines are incorporated equally into the calculation of the average fractions. (D) Fractions of primary, secondary and missing diagnoses (outer ring) are segregated by ICOMS prediction modes (inner ring). Correct and incorrect diagnoses are marked by green and red stripes, respectively. (E) ICOMS accuracy on primary tumor samples (4 cancer types).

Overall, the ICOMS algorithm achieved sensitivity and specificity rates of 58 and 71%, respectively, which strongly varied between cancer types (Fig. 3C). Surprisingly, no significant association of performance with a priori representation in the total cell line cohort (P = 0.064) was perceived. Particularly, breast (rank: 11) and central nervous system (rank: 13) achieved low performance results (specificity ≤65%), whereas the weakly represented origins of endometrium (rank: 4), pancreas (rank: 3) and kidney (rank: 6) cancer had excellent accuracy rates (specificity ≥90%) (Fig. 3C).

Comparing the histology distribution in the total cell line cohort to the panel of cell lines, for which a computational diagnosis was available, confirmed these findings (Supplementary Material, Fig. S5). Lung (P < 0.0001) and blood (P = 0.008) malignancies were significantly over-represented in the former, whereas primary sites of bone (P = 0.0003), oesophagus (P = 0.042), soft tissue (P = 0.049), upper aerodigestive tract (P = 0.0024) and central nervous (P = 0.036) malignancies, which had achieved weak performance rates, were underrepresented. Additionally, enlarging the histology distribution of mismatches, revealed blood (36 mismatches), lung (32 mismatches), pancreas (11 mismatches) and breast (8 mismatches) to be the most frequent targets of incorrect predictions (Supplementary Material, Fig. S5).

Finally, we examined whether ICOMS prediction mode variegated specificity rate (Fig. 3D, Supplementary Material, Table S6). ICOMS derived its diagnosis either from positive histology selection (primary diagnosis, 36%) or from exclusion of residual histotypes (secondary diagnosis, 24%) (Fig. 3B,D). On the contrary, unavailability of an ICOMS diagnosis was caused either by unavailability of predictive mutations or by incompatible combination of predictive mutations (primary and secondary diagnosis conflicts); the former alternative dominated (71%) the group of unavailable diagnoses in our data set (Fig. 3D). Notably, specificity was significantly superior for primary compared with secondary diagnoses (P < 0.0001). This suggests that diagnosis modes can be interpreted as distinct levels of diagnosis validity (Fig. 3B,D and Supplementary Material, Table S6).

Systematic characterization of overlaps between meta-mutation spectra

Our cancer origin classification followed a winner-take-all strategy (67), which made it robust against minor changes in low-frequency histology-mutation associations as well as inadequate primary site annotations in the training set (Supplementary Material, Methods). Hence, merely the most probable primary site of each mutation was considered for prediction (Fig. 1B and Supplementary Material, Table S3), so that we assessed whether strong overlaps of mutation spectra might negatively affect ICOMS accuracy (Fig. 3C,D).

First, we systematically characterized the overlaps of specific meta-mutations (Supplementary Material, Fig. S6A and Table S7) by generalizing the definition of the rating $$g_\tau $$ (Supplementary Material, Fig. S6B and Methods). We detected nine inter-histologic overlaps of mutation spectra. In concordance with previous studies, we found activating mutations of BRAF, apart from their high frequency in melanoma (25–27), also in cell lines derived from colorectal (14,26,68) or thyroid cancer (69,70) (Supplementary Material, Fig. S6A and Table S7). Additionally, both in endometrium (39–44) and brain cancer cell lines, frameshift mutations of PTEN were present (Supplementary Material, Fig. S6A and Table S7), which have been characterized as the second most frequent mutation of glioblastoma multiforme (71–73) (prevalence: ∼20%).

Most strikingly, though, we found that malignancies of lung, endometrium, large intestine and prostate were strongly cognate from a genomic perspective. In total, 20 selective mutations associated with all four cancer types; 27 and 33 further mutations were related to three or two of these histotypes, respectively (Supplementary Material, Fig. S6A and Table S7). As these histotypes covered 78.7% of all adenocarcinoma lines (Supplementary Material, Table S1), such these joining mutations might be characteristic of adenocarcinomas in general and not merely be associated with a specific primary tumor site.

We further examined whether overlapping mutation spectra might negatively influence ICOMS prediction performance (Supplementary Material, Fig. S7). As expected, several frequent aberrations positively affected algorithm accuracy on the primary site they were associated with. However, we observed no significant negative effects for those cancer types, which had been ignored. In particular, BRAF-mutant melanomas showed significant (P < 0.001) increase in prediction correctness, whereas the accuracy did not differ for mutant samples of thyroid (P = 0.18) or large intestine (P = 0.76) origins (Supplementary Material, Fig. S7).

Independent validation of ICOMS algorithm accuracy

Next, we sought to confirm the accuracy of ICOMS algorithm for disjoint choices of training and validation cohorts. We used training sets to derive genomic classifications and validation sets to assess the ICOM performance. This strategy was implemented by two orthogonal approaches.

First, we compiled 500 random segregations of the CCLE data set (16) into non-overlapping training (95%) and validation (5%) cohorts. From each 95%-training set, we derived a classification of cancer types; subsequently, we employed it to perform ICOMS on the complementary 5%-validation set with unaltered choices of thresholds. For each partition, we recorded ICOMS performance (availability and correctness) both on the total validation set as well as on each histology subset (Supplementary Material, Figs. S8 and S9).

As expected, ICOMS performance rates varied strongly, in particular for cancer types, which were weakly represented in the total cell line cohort. However, average rates were close to our previous prediction outcomes on the total CCLE data set (ΔTotal ∼ 2%, ΔMax = 6.2%) for 19 of 23 cancer types (Supplementary Material, Figs. S8 and S9). This provided a first confirmation of the general validity of our approach on cancer cell lines.

Secondly, we rated the performance of ICOMS on an independent validation panel of 431 patient tumor samples [TCGA database (11–15)], which comprised equal portions of cancers originating from lung, kidney, breast and large intestine (Supplementary Material, Table S8). As the genes, which were captured by CCLE and TCGA platforms, only partially overlapped, frequencies of some mutations were shifted between cell lines and primary tumors (Supplementary Material, Fig. S10). Hence, mutation-derived coefficients of TCGA samples calculated on the CCLE classification were naturally lower (e.g. large intestine, P = 2.9 × 10−8; lung, P = 3.7 × 10−4) than coefficients of CCLE cell lines, so that mutation index thresholds had to be adapted for primary tumors (Supplementary Material, Table S9).

With this calibration, ICOMS achieved average sensitivity (58%) and specificity (71%) rates on TCGA samples, which were comparable with cell lines (Fig. 3E and Supplementary Material, Fig. S11 and Table S10). Concordantly, also primary tumors of large intestine, lung and kidney reached excellent outcomes (sensitivity ≥64%; specificity ≥68%), whereas prediction performance was still unsatisfactory for breast cancer (Fig. 3E, Supplementary Material, Fig. S11 and Table S10).

Clinical application of ICOMS algorithm

Several recent studies (74–79) have endeavored enormous technical efforts to isolate circulating tumor cells (CTCs), which have sloughed off cancerous tissue, from peripheral patient blood. As typically only few tumor cells circulate in patient blood [5–50 CTCs per teaspoon of blood (79)], current diagnostic techniques are limited.

Although the sensitivity and specificity rates, which ICOMS reached on CCLE and TCGA data sets, were far too low for safe clinical application, we exemplarily tested clinical feasibility of ICOMS for this diagnostic purpose (Supplementary Material, Fig. S12). We analyzed sequencing data (68 cancer-associated genes) of CTCs, which were collected from peripheral blood of two colorectal cancer patients (80) according to our CCLE classification (Supplementary Material, Table S3). Both samples were correctly allocated to large intestine. While diagnosis of Patient 1 was unambiguous (secondary diagnosis), a diagnosis conflict (large intestine versus lung) had to be resolved for Patient 2 (Supplementary Material, Fig. S12).

Further application of mutation index classifications and ICOMS prediction algorithm

Finally, we tested two further potential applications of our classification and prediction procedures (Fig. 4 and Supplementary Material, Fig. S13). Parallel to classifications of general cancer types, we first used our algorithms to render cancer sub-histologies.

For lung cancer, we found unique annotations for 148 of 172 cell lines (Supplementary Material, Table S1), from which we derived a genomic lung cancer classification (Fig. 4). Again, we found that several strong genomic predictors were in concordance with literature (Fig. 4A). Mutations in KRAS (18,20,21,23), ALK (3,81,82), EGFR (1,2,18,20,21) and STK11 (18–20) were associated with the adenocarcinoma subtype, whereas alterations of RB1 (83,84) and PTEN (83,85) predicted the small-cell (SCLC) histology (Fig. 4A). Mutation indices of large-cell (P = 4.37 × 10−7, LCLC) and squamous-cell (P = 1.93 × 10−8, SQLC) carcinomas differed significantly from residual lung tumors. They have been reported to exhibit higher spontaneous mutation rates (83), resulting in less recurrent and characteristic mutations, which could be used for sub-histology classification.

Figure 4.

(A) Biologically relevant meta-mutations of lung cancer subtypes are listed in mutation index descending order. (B) Bar plots of mutation-derived coefficients of lung cancer subtypes. Color intensities mirror accumulation of mutation coefficients in each field. (C) ICOMS accuracy on lung cancer subtypes. All cell lines are incorporated equally into the calculation of the average fractions. (D) Mapping of annotated (top, TCGA) to predicted (bottom, ICOMS) lung cancer subtypes of 104 primary lung adenocarcinoma samples. Samples, for which no computational diagnosis is available, are stained in navy blue.

Figure 4.

(A) Biologically relevant meta-mutations of lung cancer subtypes are listed in mutation index descending order. (B) Bar plots of mutation-derived coefficients of lung cancer subtypes. Color intensities mirror accumulation of mutation coefficients in each field. (C) ICOMS accuracy on lung cancer subtypes. All cell lines are incorporated equally into the calculation of the average fractions. (D) Mapping of annotated (top, TCGA) to predicted (bottom, ICOMS) lung cancer subtypes of 104 primary lung adenocarcinoma samples. Samples, for which no computational diagnosis is available, are stained in navy blue.

Nevertheless, mutation indices still showed strong accumulation on the diagonal (Fig. 4B) and discriminatory powers of mutation indices were strongly significant for small-cell (P = 6.91 × 10−29), large-cell (P = 2.89 × 10−18), squamous-cell (P = 1.29 × 10−8) and adenocarcinomas (P = 5.77 × 10−17). Accordingly, ICOMS performance was satisfactory on lung cancer in general (sensitivity: 81.1%, specificity: 95.0%) (Fig. 4C). The same held true for all subtypes except for large-cell carcinoma, which was erroneously mapped to small-cell and adenocarcinomas in 33% of all diagnoses (Fig. 4C). On the TCGA lung cancer cohort (104 adenocarcinomas), ICOMS reached performance rates of 64.4% (sensitivity) and 97.0% (specificity), respectively (Fig. 4D).

Secondly, we sought to infer cell line response to specific kinase inhibitors from mutation spectra (Supplementary Material, Fig. S13). To this end, we involved screening data of three compounds (Erlotinib, PLX4720 and LBW242) across large panels of cancer cell lines (n = 450; 443; 450) (Supplementary Material, Fig. S13A,B,C). In concordance with other studies (1,2), mutant EGFR was identified as strongest sensitivity marker for the EGFR-inhibitor Erlotinib (Supplementary Material, Fig. S13D). As reported previously (4,5,86), the BRAF-inhibitor PLX4720 displayed strongest potency in BRAF-mutant lines (Supplementary Material, Fig. S13E). The IRS1 gene, known to modulate BRAF addiction (87), was also associated with PLX4720 sensitivity. In contrast, the TP53 gene emerged as strong predictor for PLX4720 resistance, as the majority of sensitive lines (72%) were derived from melanoma, which is known to exhibit low TP53 mutation frequency (88–90) (Supplementary Material, Fig. S13E). More genomic diversity was observed for the IAP-inhibitor LBW242 (Supplementary Material, Fig. S13F). Here, mutations in FBXW7 and NFKB2 pointed to alterations in pathways, which have been associated with the antineoplastic effect of IAP inhibitors in combination with steroids (91–94).

DISCUSSION

Our study developed a comprehensive framework of mathematical approaches, which systematically capture the information about primary tumor sites that are conveyed by the mutation spectra of cell lines (Fig. 1A). For this purpose, we developed a novel mutation index, by which we established a classification of site-specific genomic aberrations for each cancer type (Fig. 1B and Supplementary Material, Table S3). Based on this, we introduced ICOMS (Fig. 3A,B), a computational tool to infer the origin of a cell line merely from its mutation spectrum (Supplementary Material, Tables S6 and S10). We demonstrated that the range of potential ICOMS applications might be extended, including the discrimination of sub-histologies (Fig. 4) as well as the pre-selection of a suitable therapy regimen (Supplementary Material, Fig. S13). In summary, we derive the following three main conclusions from our analyses:

  1. Strong ICOMS performance on an independent cohort of primary tumors (Fig. 3E and Supplementary Material, Table S10) suggested that this approach may amend molecular-pathologic diagnostic procedures of tumor classification in ambiguous cases. We exemplarily demonstrated its clinical feasibility to sequencing data of circulating tumor cells (Supplementary Material, Fig. S12).

  2. Tumor-specific mutation spectra differed strongly in size, recurrence and specificity between cancer types (Fig. 1B and Supplementary Material, Table S3). For this reason, cell lines of certain primary sites could be easier allocated by ICOMS than others (Fig. 3C,E). This aspect should be considered for further design and interpretation of cell culture studies; it provides a way to identify those cell lines, which are genomically marked-off and carry mutations that are characteristic of their respective origin.

  3. We employed mutation indices to characterize spectra of genomic aberrations that are characteristic of a particular cancer type. Thorough comparison to literature revealed that these spectra were mainly in concordance with previous studies (Fig. 1B and Supplementary Material, Mutation Annotation) (11–15,18–61). However, we also identified some novel promising aberrations, which had not been associated with the respective histology before. Yet, most of these novel associations between cancer origin and mutation are still on an unconfirmed level and require further validation by an independent sequencing method.

MATERIALS AND METHODS

Sequencing data and annotation of cell lines and patient tumor samples

We downloaded hybrid capture sequencing data of 905 cell lines (Supplementary Material, Table S1) as MAF-files from Broad-Novartis Cancer Cell Line Encyclopedia (CCLE) Download Portal (http://www.broadinstitute.org/ccle/home) (16). Additionally, we used CCLE cell line response data (half maximal growth inhibitory concentrations, GI50s) to the specific kinase inhibitors Erlotinib, PLX4720 and LBW242 (Supplementary Material, Fig. S13A,B,C). Further cell line annotation data (primary tumor site and sub-histology type) were compiled from the CCLE database (16) or, if not annotated, directly from the cell line provider (ATCC, DSMZ, HSRRB and ECACC) (Supplementary Material, Table S1).

Mutation calling data of 431 patient samples (MAF-files) were obtained from TCGA (11) Data Portal (Data Matrix Access, https://tcga-data.nci.nih.gov/tcga/dataAccessMatrix.htm) (Supplementary Material, Table S8). These data had been submitted by Broad Institute (Lung Adenocarcinoma and Kidney Renal Clear Cell Carcinoma) (12,13), Baylor College of Medicine (Colon Adenocarcinoma) (14) and Washington University School of Medicine (Breast Invasive Carcinoma) (15). For each of these cancer types, we selected ∼100 random samples for download (Supplementary Material, Table S8). Our analysis further included sequencing data (68 cancer-associated genes) of circulating tumor cells (Supplementary Material, Fig. S12), collected from peripheral blood of two colorectal cancer patients (80).

Characterization of mutations by recurrence, rating and mutation index

We called a mutation recurrent, if at least one independent mutation was tractable in the data set, so that the affected amino acids of the two aberrations were displaced by at most one position (Supplementary Material, Table S2).

For each recurrent mutation, we determined its rating and mutation index. The rating denoted the number of cancer types, for which the mutation was predominantly detected (mutation associated types) (Supplementary Material, Methods). The mutation index $$l_\tau $$ served as overall measure for the qualification of a mutation to discriminate between tissues. We derived the latter from a weighted product of conditional occurrence and histology likelihood, incorporating histology-specific occurrence (mutation frequency in mutation-associated histotypes), mutation-specific occurrence (conditional probability of a mutation-associated histology upon mutation detection) and mutation rendering ratio (ratio of frequency in mutation-associated histotypes and frequency in the $$g_\tau $$ next most common histologies) (Supplementary Material, Table S2 and Methods).

Based on this, we termed a mutation selective, if exactly one histology was discriminated from the others, i.e. $$g_\tau = 1$$ (Supplementary Material, Table S2, green). We considered a mutation as qualified for characterizing its associated cancer types, if $$l_\tau \ge 1.5$$. The latter threshold was derived from overall analysis of histology-wise specificity-sensitivity-patterns (receiver operating characteristic (ROC) curves (66), data not shown).

Choice of meta-mutations

For each cancer type, we compiled a characteristic list of pairwise disjoint meta-mutations (Supplementary Material, Table S3). The latter are collections of genomic alterations in the same gene that can be combined for predicting a primary site. We obtained them by maximizing a cardinality-penalized version of the mutation index $$l_\tau $$ (Supplementary Material, Methods). We included all optimized meta-mutations, which associated with a single cancer type, in our genomic classification (Supplementary Material, Table S3, green). For each combination of gene and primary site, this resulted in one of the following four relation types:

No Association the gene bears no meta-mutations, which are selective for the respective cancer type. Amino Acid Specific Association the gene bears a collection of at most three recurrent aberrations, which are characteristic of the respective primary site. Dissimilar alterations of the gene can still be associated with other sites. Mutation Recurrence Association the group of recurrent mutations detected in the gene is linked to the respective tissue. Complete Mutation Spectrum Association all mutations in the gene, including sporadic alterations, are linked to the respective tissue (Supplementary Material, Table S3 and Methods).

Inferring cancer origins from mutation spectra (ICOMS)

The ICOMS prediction procedure is composed of three main steps (Fig. 3A,B) and uses sample-wise mutation spectra, two coefficient thresholds $$\varepsilon _{\rm H} \hbox{\ge }\delta _{\rm H} $$ (Supplementary Material, Tables S5 and S9) for each histotype H (upper threshold $$\varepsilon _{\rm H} $$, lower threshold $$\delta _{\rm H} $$), as well as a partial histology order (Supplementary Material, Fig. S4) as input.

Given a mutation spectrum, we computed histology-specific mutation coefficients mH (Fig. 2 and Supplementary Material, Table S4) (step 1) and compared them with upper and lower thresholds $$\varepsilon _{\rm H} $$ and $$\delta _{\rm H} $$ (step 2) (Supplementary Material, Table S5). An H was associated as primary histotype, if the corresponding mH exceeded $$\varepsilon _{\rm H} $$, whereas histologies, for which mH undercut $$\delta _{\rm H} $$, were excluded from further analysis of the mutation spectrum (excluded histotype) (Fig. 3A). For the residual histotypes, we did not draw any conclusion (secondary histotype) in this step.

Finally, these assessments were combined to derive histology predictions (step 3) (Fig. 3B, Supplementary Material, Tables S6 and S10). If a unique histology had been primarily associated in step 2, it was chosen as computational diagnosis (primary diagnosis). In case more than one existed, we tested whether the set of primary associations gained a unique maximum with respect to the partial histology order defined in the input. If so (resolvable primary diagnosis conflict), the maximum was chosen as final prediction; if not, no diagnosis was available (unresolvable primary diagnosis conflict). In case no primary histotypes were available, we studied the set of secondary associations further. If the latter was unique, this sole histology was selected as prediction (secondary diagnosis); if the set was either empty (no mutations available) or not unique, then no diagnosis was available (secondary diagnosis conflict) (Fig. 3B,D and Supplementary Material, Methods).

Parameter calibration for ICOMS algorithm

In order to optimize upper and lower mutation coefficient thresholds (Supplementary Material, Tables S5 and S9), we calculated two ROC analogs (66) per primary site (Supplementary Material, Fig. S3 and Methods). More precisely, we continuously varied for each cancer type its associated mutation coefficient threshold. We determined the fraction of samples, which exceeded this threshold, separately for cell lines of the respective cancer type (n1, matched curve) as well as the complementary subset (n2, control curve). Given $$(\varepsilon |y_1 )$$ and $$(\varepsilon |y_2 )$$ as coordinates of matched and control curves, choosing $$\varepsilon $$ as upper threshold results in y1n1 correct and y2n2 incorrect primary associations. Hence, the ratio y1n1/y2n2 served as optimization term of the upper threshold. Analogously, given $$(\delta |y_1 )$$ and $$(\delta |y_2 )$$ as coordinates of matched and control curves, setting the lower cutoff to $$\delta $$ produces 1–y1 incorrect and 1–y2 correct exclusions of cancer types from diagnosis, so that the quotient $$(n_2 - y_2 n_2 )/(n_1 - y_1 n_1 )$$ served as optimization term of the lower threshold (Supplementary Material, Fig. S3).

Also the choice of a primary partial order (Supplementary Material, Fig. S4) was carried out by a greedy algorithm technique, iteratively selecting a new relation by maximizing the difference between correct and incorrect primary conflict solutions. More precisely, starting from an empty set, a relation $$H_1 \gt H_2 $$ was added, if the ratio of correct and incorrect primary conflict solutions derived from the additional respective relation was at least 3. If multiple alternatives for the subsequent choice existed, we selected the relation, for which the difference between correct and incorrect solutions was maximal. Finally, the primary partial order was defined as convex hull of the relations selected (Supplementary Material, Fig. S4 and Methods).

Inferring compound sensitivity from mutation spectra

We dichotomically clustered cell lines into resistant and sensitive groups, according to their response (half maximal growth inhibitory concentrations, GI50) to the specific kinase inhibitors Erlotinib (EGFR), PLX4720 (BRAF) or LBW242 (IAP) (Supplementary Material, Fig. S13A,B,C). As expected, resistant groups were strongly over-represented (∼97%) in the total cell line cohort. For each compound, we therefore compiled 50 random cell line panels, which included all sensitive lines as well as an equal number of resistant cell lines. For each random panel, we determined a genomic sensitivity classification, comprising meta-mutations and their mutation indices. We employed them to calculate weighted averages of mutation indices, where resistant and sensitive groups were equally assessed with contrary signs (Supplementary Material, Fig. S13D,E,F). Similarly, we calculated the weighted fraction of subsets (selective fraction), which rendered a meta-mutation as predictor for either of the two groups. Finally, we sorted the meta-mutations of each group by mutation indices, in order to decipher the strongest genomic markers for compound sensitivity or resistance (Supplementary Material, Fig. S13D,E,F).

SUPPLEMENTARY MATERIAL

Supplementary Material is available at HMG online.

FUNDING

Felix Dietlein is supported by the Deutsche Krebshilfe through Mildred-Scheel-Doktorandenprogramm (Grant 110770).

Conflict of Interest statement. None declared.

ACKNOWLEDGEMENTS

We thank Drs. Markus Dietlein, Alexander Drzezga, Florian Malchers, Martin Peifer, H. Christian Reinhardt and Roman K. Thomas for support and fruitful discussions.

REFERENCES

1
Lynch
T.J.
Bell
D.W.
Sordella
R.
Gurubhagavatula
S.
Okimoto
R.A.
Brannigan
B.W.
Harris
P.L.
Haserlat
S.M.
Supko
J.G.
Haluska
F.G.
, et al.  . 
Activating mutations in the epidermal growth factor receptor underlying responsiveness of non-small-cell lung cancer to gefitinib
N. Engl. J. Med.
 , 
2004
, vol. 
350
 (pg. 
2129
-
2139
)
2
Rosell
R.
Carcereny
E.
Gervais
R.
Vergnenegre
A.
Massuti
B.
Felip
E.
Palmero
R.
Garcia-Gomez
R.
Pallares
C.
Sanches
J.M.
, et al.  . 
Erlotinib versus standard chemotherapy as first-line treatment for European patients with advanced EGFR mutation-positive non-small-cell lung cancer (EURTAC): a multicentre, open-label, randomised phase 3 trial
Lancet Oncol.
 , 
2012
, vol. 
13
 (pg. 
239
-
246
)
3
Kwak
E.L.
Bang
Y.J.
Camidge
D.R.
Shaw
A.T.
Solomon
B.
Maki
R.G.
Ou
S.H.
Dezube
B.J.
Jänne
P.A.
Costa
D.B.
, et al.  . 
Anaplastic lymphoma kinase inhibition in non-small-cell lung cancer
N. Engl. J. Med.
 , 
2010
, vol. 
363
 (pg. 
1693
-
1703
)
4
Flaherty
K.T.
Infante
J.R.
Daud
A.
Gonzalez
R.
Kefford
R.F.
Sosman
J.
Hamid
O.
Schuchter
L.
Cebon
J.
Ibrahim
N.
, et al.  . 
Combined BRAF and MEK inhibition in melanoma with BRAF V600 mutations
N. Engl. J. Med.
 , 
2012
, vol. 
367
 (pg. 
1694
-
1703
)
5
Bollag
G.
Hirth
P.
Tsai
J.
Zhang
J.
Ibrahim
P.N.
Cho
H.
Spevak
W.
Zhang
C.
Zhang
Y.
Habets
G.
, et al.  . 
Clinical efficacy of a RAF inhibitor needs broad target blockade in BRAF-mutant melanoma
Nature
 , 
2010
, vol. 
467
 (pg. 
596
-
599
)
6
Verma
S.
Miles
D.
Gianni
L.
Krop
I.E.
Welslau
M.
Baselga
J.
Pegram
M.
Oh
D.Y.
Diéras
V.
Guardino
E.
, et al.  . 
Trastuzumab emtansine for HER2-positive advanced breast cancer
N. Engl. J. Med.
 , 
2012
, vol. 
367
 (pg. 
1783
-
1791
)
7
Harris
T.
Gene and drug matrix for personalized cancer therapy
Nat. Rev. Drug Discov.
 , 
2010
, vol. 
9
 pg. 
660
 
8
Garnett
M.J.
Edelman
E.J.
Heidorn
S.J.
Greenman
C.D.
Dastur
A.
Lau
K.W.
Greninger
P.
Thompson
I.R.
Luo
X.
Soares
J.
, et al.  . 
Systematic identification of genomic markers of drug sensitivity in cancer cells
Nature
 , 
2012
, vol. 
483
 (pg. 
570
-
575
)
9
Sos
M.L.
Dietlein
F.
Peifer
M.
Schöttle
J.
Balke-Want
H.
Müller
C.
Koker
M.
Richters
A.
Heynck
S.
Malchers
F.
, et al.  . 
A framework for identification of actionable cancer genome dependencies in small cell lung cancer
Proc. Natl. Acad. Sci. USA.
 , 
2012
, vol. 
109
 (pg. 
17034
-
17039
)
10
Sos
M.L.
Michel
K.
Zander
T.
Weiss
J.
Frommolt
P.
Peifer
M.
Li
D.
Ullrich
R.
Koker
M.
Fischer
M.
, et al.  . 
Predicting drug susceptibility of non-small cell lung cancers based on genetic lesions
J. Clin. Invest.
 , 
2009
, vol. 
119
 (pg. 
1727
-
1740
)
11
Network TCGAR
Comprehensive genomic characterization defines human glioblastoma genes and core pathways
Nature
 , 
2008
, vol. 
455
 (pg. 
1061
-
1068
)
12
Network CGAR
Comprehensive genomic characterization of squamous cell lung cancers
Nature
 , 
2012
, vol. 
489
 (pg. 
519
-
525
)
13
Network TCGAR
Comprehensive molecular characterization of clear cell renal cell carcinoma
Nature
 , 
2013
, vol. 
499
 (pg. 
43
-
49
)
14
Network CGA
Comprehensive molecular characterization of human colon and rectal cancer
Nature
 , 
2012
, vol. 
487
 (pg. 
330
-
337
)
15
Network CGA
Comprehensive molecular portraits of human breast tumours
Nature
 , 
2012
, vol. 
490
 (pg. 
61
-
70
)
16
Barretina
J.
Caponigro
G.
Stransky
N.
Venkatesan
K.
Margolin
A.A.
Kim
S.
Wilson
C.J.
Lehár
J.
Kryukov
G.V.
Sonkin
D.
, et al.  . 
The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity
Nature
 , 
2012
, vol. 
483
 (pg. 
603
-
607
)
17
Adzhubei
I.A.
Schmidt
S.
Peshkin
L.
Ramensky
V.E.
Gerasimova
A.
Bork
P.
Kondrashov
A.S.
Sunyaev
S.R.
A method and server for predicting damaging missense mutations
Nat. Methods
 , 
2010
, vol. 
7
 (pg. 
248
-
249
)
18
Ding
L.
Getz
G.
Wheeler
D.A.
Mardis
E.R.
McLellan
M.D.
Cibulskis
K.
Sougnez
C.
Greulich
H.
Muzny
D.M.
Morgan
M.B.
, et al.  . 
Somatic mutations affect key pathways in lung adenocarcinoma
Nature
 , 
2008
, vol. 
455
 (pg. 
1069
-
1075
)
19
Sanchez-Cespedes
M.
Parrella
P.
Esteller
M.
Nomoto
S.
Trink
B.
Engles
J.M.
Westra
W.H.
Herman
J.G.
Sidransky
D.
Inactivation of LKB1/STK11 is a common event in adenocarcinomas of the lung
Cancer Res.
 , 
2002
, vol. 
62
 (pg. 
3659
-
3662
)
20
Gao
B.
Sun
Y.
Zhang
J.
Ren
Y.
Fang
R.
Han
X.
Shen
L.
Liu
X.Y.
Pao
W.
Chen
H.
, et al.  . 
Spectrum of LKB1, EGFR, and KRAS mutations in Chinese lung adenocarcinomas
J. Thorac. Oncol.
 , 
2010
, vol. 
5
 (pg. 
1130
-
1135
)
21
Dogan
S.
Shen
R.
Ang
D.C.
Johnson
M.L.
D'Angelo
S.P.
Paik
P.K.
Brzostowski
E.B.
Riely
G.J.
Kris
M.G.
Zakowski
M.F.
, et al.  . 
Molecular epidemiology of EGFR and KRAS mutations in 3,026 lung adenocarcinomas: higher susceptibility of women to smoking-related KRAS-mutant cancers
Clin. Cancer Res.
 , 
2012
, vol. 
18
 (pg. 
6169
-
6177
)
22
Liu
C.X.
Li
Y.
Obermoeller-McCormick
L.M.
Schwartz
A.L.
Bu
G.
The putative tumor suppressor LRP1B, a novel member of the low density lipoprotein (LDL) receptor family, exhibits both overlapping and distinct properties with the LDL receptor-related protein
J. Biol. Chem.
 , 
2001
, vol. 
276
 (pg. 
28889
-
28896
)
23
Riely
G.J.
Marks
J.
Pao
W.
KRAS mutations in non-small cell lung cancer
Proc. Am. Thorac. Soc.
 , 
2009
, vol. 
6
 (pg. 
201
-
205
)
24
Hammerman
P.S.
Sos
M.L.
Ramos
A.H.
Xu
C.
Dutt
A.
Zhou
W.
Brace
L.E.
Woods
B.A.
Lin
W.
Zhang
J.
, et al.  . 
Mutations in the DDR2 kinase gene identify a novel therapeutic target in squamous cell lung cancer
Cancer Discov.
 , 
2011
, vol. 
1
 (pg. 
78
-
89
)
25
Davies
H.
Bignell
G.R.
Cox
C.
Stephens
P.
Edkins
S.
Clegg
S.
Teague
J.
Woffendin
H.
Garnett
M.J.
Bottomley
W.
, et al.  . 
Mutations of the BRAF gene in human cancer
Nature
 , 
2002
, vol. 
417
 (pg. 
949
-
954
)
26
Safaee Ardekani
G.
Jafarnejad
S.M.
Tan
L.
Saeedi
A.
Li
G.
The prognostic value of BRAF mutation in colorectal cancer and melanoma: a systematic review and meta-analysis
PLoS One
 , 
2012
, vol. 
7
 pg. 
e47054
 
27
Chapman
P.B.
Hauschild
A.
Robert
C.
Haanen
J.B.
Ascierto
P.
Larkin
J.
Dummer
R.
Garbe
C.
Testori
A.
Maio
M.
, et al.  . 
Improved survival with vemurafenib in melanoma with BRAF V600E mutation
N. Engl. J. Med.
 , 
2011
, vol. 
364
 (pg. 
2507
-
2516
)
28
Bowen
D.T.
Frew
M.E.
Hills
R.
Gale
R.E.
Wheatley
K.
Groves
M.J.
Langabeer
S.E.
Kottaridis
P.D.
Moorman
A.V.
Burnett
A.K.
, et al.  . 
RAS Mutation in acute myeloid leukemia is associated with distinct cytogenetic subgroups but does not influence outcome in patients younger than 60 years
Blood
 , 
2005
, vol. 
106
 (pg. 
2113
-
2119
)
29
Bacher
U.
Haferlach
T.
Schoch
C.
Kern
W.
Schnittger
S.
Implications of NRAS mutations in AML: a study of 2502 patients
Blood
 , 
2006
, vol. 
107
 (pg. 
3847
-
3853
)
30
Farr
C.J.
Saiki
R.K.
Erlich
H.A.
McCormick
F.
Marshall
C.J.
Analysis of RAS gene mutations in acute myeloid leukemia by polymerase chain reaction and oligonucleotide probes
Proc. Natl. Acad. Sci. U. S. A.
 , 
1988
, vol. 
85
 (pg. 
1629
-
1633
)
31
Neubauer
A.
Maharry
K.
Mrózek
K.
Thiede
C.
Marcucci
G.
Paschka
P.
Mayer
R.J.
Larson
R.A.
Liu
E.T.
Bloomfield
C.D.
Patients with acute myeloid leukemia and RAS mutations benefit most from postremission high-dose cytarabine: a Cancer and Leukemia Group B study
J. Clin. Oncol.
 , 
2008
, vol. 
26
 (pg. 
4603
-
4609
)
32
Bahram
F.
von der Lehr
N.
Cetinkaya
C.
Larsson
L.G.
c-Myc hot spot mutations in lymphomas result in inefficient ubiquitination and decreased proteasome-mediated turnover
Blood
 , 
2000
, vol. 
95
 (pg. 
2104
-
2110
)
33
Bhatia
K.
Spangler
G.
Gaidano
G.
Hamdy
N.
Dalla-Favera
R.
Magrath
I.
Mutations in the coding region of c-myc occur frequently in acquired immunodeficiency syndrome-associated lymphomas
Blood
 , 
1994
, vol. 
84
 (pg. 
883
-
888
)
34
Smith-Sørensen
B.
Hijmans
E.M.
Beijersbergen
R.L.
Bernards
R.
Functional analysis of burkitt's lymphoma mutant c-Myc proteins
J. Biol. Chem.
 , 
1996
, vol. 
271
 (pg. 
5513
-
5518
)
35
Le Gallo
M.
O'Hara
A.J.
Rudd
M.L.
Urick
M.E.
Hansen
N.F.
O'Neil
N.J.
Price
J.C.
Zhang
S.
England
B.M.
Godwin
A.K.
, et al.  . 
Exome sequencing of serous endometrial tumors identifies recurrent somatic mutations in chromatin-remodeling and ubiquitin ligase complex genes
Nat. Genet.
 , 
2012
, vol. 
44
 (pg. 
1310
-
1315
)
36
Maeda
D.
Shih
I.E.M.
Pathogenesis and the role of ARID1A mutation in endometriosis-related ovarian neoplasms
Adv. Anat. Pathol.
 , 
2013
, vol. 
20
 (pg. 
45
-
52
)
37
Wu
J.N.
Roberts
C.W.
ARID1A mutations in cancer: another epigenetic tumor suppressor
Cancer Discov.
 , 
2013
, vol. 
3
 (pg. 
35
-
43
)
38
Wiegand
K.C.
Lee
A.F.
Al-Agha
O.M.
Chow
C.
Kalloger
S.E.
Scott
D.W.
Steidl
C.
Wiseman
S.M.
Gascoyne
R.D.
Gilks
B.
, et al.  . 
Loss of BAF250a (ARID1A) is frequent in high-grade endometrial carcinomas
J. Pathol.
 , 
2011
, vol. 
224
 (pg. 
328
-
333
)
39
Risinger
J.I.
Hayes
A.K.
Berchuck
A.
Barrett
J.C.
PTEN/MMAC1 mutations in endometrial cancers
Cancer Res.
 , 
1997
, vol. 
57
 (pg. 
4736
-
4738
)
40
Kong
D.
Suzuki
A.
Zou
T.T.
Sakurada
A.
Kemp
L.W.
Wakatsuki
S.
Yokoyama
T.
Yamakawa
H.
Furukawa
T.
Sato
M.
, et al.  . 
PTEN1 is frequently mutated in primary endometrial carcinomas
Nat. Genet.
 , 
1997
, vol. 
17
 (pg. 
143
-
144
)
41
Bussaglia
E.
del Rio
E.
Matias-Guiu
X.
Prat
J.
PTEN mutations in endometrial carcinomas: a molecular and clinicopathologic analysis of 38 cases
Hum. Pathol.
 , 
2000
, vol. 
31
 (pg. 
312
-
317
)
42
Dedes
K.J.
Wetterskog
D.
Mendes-Pereira
A.M.
Natrajan
R.
Lambros
M.B.
Geyer
F.C.
Vatcheva
R.
Savage
K.
Mackay
A.
Lord
C.J.
, et al.  . 
PTEN Deficiency in endometrioid endometrial adenocarcinomas predicts sensitivity to PARP inhibitors
Sci. Transl. Med.
 , 
2010
, vol. 
2
 pg. 
53ra75
 
43
Risinger
J.I.
Hayes
K.
Maxwell
G.L.
Carney
M.E.
Dodge
R.K.
Barrett
J.C.
Berchuck
A.
PTEN mutation in endometrial cancers is associated with favorable clinical and pathologic characteristics
Clin. Cancer Res.
 , 
1998
, vol. 
4
 (pg. 
3005
-
3010
)
44
Tashiro
H.
Blazes
M.S.
Wu
R.
Cho
K.R.
Bose
S.
Wang
S.I.
Li
J.
Parsons
R.
Ellenson
L.H.
Mutations in PTEN are frequent in endometrial carcinoma but rare in other common gynecological malignancies
Cancer Res.
 , 
1997
, vol. 
57
 (pg. 
3935
-
3940
)
45
Rowan
A.J.
Lamlum
H.
Ilyas
M.
Wheeler
J.
Straub
J.
Papadopoulou
A.
Bicknell
D.
Bodmer
W.F.
Tomlinson
I.P.
APC mutations in sporadic colorectal tumors: a mutational ‘hotspot’ and interdependence of the ‘two hits’
Proc. Natl. Acad. Sci. USA
 , 
2000
, vol. 
97
 (pg. 
3352
-
3357
)
46
Fodde
R.
The APC gene in colorectal cancer
Eur. J. Cancer
 , 
2002
, vol. 
38
 (pg. 
867
-
871
)
47
Yang
J.
Zhang
W.
Evans
P.M.
Chen
X.
He
X.
Liu
C.
Adenomatous polyposis coli (APC) differentially regulates beta-catenin phosphorylation and ubiquitination in colon cancer cells
J. Biol. Chem.
 , 
2006
, vol. 
281
 (pg. 
17751
-
17757
)
48
Beck
S.E.
Jung
B.H.
Fiorino
A.
Gomez
J.
Rosario
E.D.
Cabrera
B.L.
Huang
S.C.
Chow
J.Y.
Carethers
J.M.
Bone morphogenetic protein signaling and growth suppression in colon cancer
Am. J. Physiol. Gastrointest. Liver Physiol.
 , 
2006
, vol. 
291
 (pg. 
135
-
145
)
49
Park
S.W.
Hur
S.Y.
Yoo
N.J.
Lee
S.H.
Somatic frameshift mutations of bone morphogenic protein receptor 2 gene in gastric and colorectal cancers with microsatellite instability
APMIS
 , 
2010
, vol. 
118
 (pg. 
824
-
829
)
50
Slattery
M.L.
Lundgreen
A.
Herrick
J.S.
Kadlubar
S.
Caan
B.J.
Potter
J.D.
Wolff
R.K.
Genetic variation in bone morphogenetic protein and colon and rectal cancer
Int. J. Cancer
 , 
2012
, vol. 
130
 (pg. 
653
-
664
)
51
Spambalg
D.
Sharifi
N.
Elisei
R.
Gross
J.L.
Medeiros-Neto
G.
Fagin
J.A.
Structural studies of the thyrotropin receptor and Gs alpha in human thyroid cancers: low prevalence of mutations predicts infrequent involvement in malignant transformation
J. Clin. Endocrinol. Metab.
 , 
1996
, vol. 
81
 (pg. 
3898
-
3901
)
52
Matsuo
K.
Friedman
E.
Gejman
P.V.
Fagin
J.A.
The thyrotropin receptor (TSH-R) is not an oncogene for thyroid tumors: structural studies of the TSH-R and the alpha-subunit of Gs in human thyroid neoplasms
J. Clin. Endocrinol. Metab.
 , 
1993
, vol. 
76
 (pg. 
1446
-
1451
)
53
Russo
D.
Arturi
F.
Schlumberger
M.
Caillou
B.
Monier
R.
Filetti
S.
Suárez
H.G.
Activating mutations of the TSH receptor in differentiated thyroid carcinomas
Oncogene
 , 
1995
, vol. 
11
 (pg. 
1907
-
1911
)
54
Cetani
F.
Tonacchera
M.
Pinchera
A.
Barsacchi
R.
Basolo
F.
Miccoli
P.
Pacini
F.
Genetic analysis of the TSH receptor gene in differentiated human thyroid carcinomas
J. Endocrinol. Invest.
 , 
1999
, vol. 
22
 (pg. 
273
-
278
)
55
Tajima
T.
Tsubaki
J.
Fujieda
K.
Two novel mutations in the thyroid peroxidase gene with goitrous hypothyroidism
Endocr. J.
 , 
2005
, vol. 
52
 (pg. 
643
-
645
)
56
Altmann
K.
Hermanns
P.
Mühlenberg
R.
Fricke-Otto
S.
Wentzell
R.
Pohlenz
J.
Congenital goitrous primary hypothyroidism in Two German families caused by novel thyroid peroxidase (TPO) gene mutations
Exp. Clin. Endocrinol. Diabetes
 , 
2013
, vol. 
121
 (pg. 
343
-
346
)
57
Moore
L.E.
Nickerson
M.L.
Brennan
P.
Toro
J.R.
Jaeger
E.
Rinsky
J.
Han
S.S.
Zaridze
D.
Matveev
V.
Janout
V.
, et al.  . 
Von Hippel-Lindau (VHL) inactivation in sporadic clear cell renal cancer: associations with germline VHL polymorphisms and etiologic risk factors
PLoS Genet.
 , 
2011
, vol. 
7
 pg. 
e1002312
 
58
Hemminki
K.
Jiang
Y.
Ma
X.
Yang
K.
Egevad
L.
Lindblad
P.
Molecular epidemiology of VHL gene mutations in renal cell carcinoma patients: relation to dietary and other factors
Carcinogenesis
 , 
2002
, vol. 
23
 (pg. 
809
-
815
)
59
Schietke
R.E.
Hackenbeck
T.
Tran
M.
Günther
R.
Klanke
B.
Warnecke
C.L.
Knaup
K.X.
Shukla
D.
Rosenberger
C.
Koesters
R.
, et al.  . 
Renal tubular HIF-2α expression requires VHL inactivation and causes fibrosis and cysts
PLoS One
 , 
2012
, vol. 
7
 pg. 
e31034
 
60
Kim
W.Y.
Kaelin
W.G.
Role of VHL gene mutation in human cancer
J. Clin. Oncol.
 , 
2004
, vol. 
22
 (pg. 
4991
-
5004
)
61
Cheng
L.
Zhang
S.
MacLennan
G.T.
Lopez-Beltran
A.
Montironi
R.
Molecular and cytogenetic insights into the pathogenesis, classification, differential diagnosis, and prognosis of renal epithelial neoplasms
Hum. Pathol.
 , 
2009
, vol. 
40
 (pg. 
10
-
29
)
62
Sharma
S.V.
Bell
D.W.
Settleman
J.
Haber
D.A.
Epidermal growth factor receptor mutations in lung cancer
Nat. Rev. Cancer
 , 
2007
, vol. 
7
 (pg. 
169
-
181
)
63
Kim
S.T.
Lim do
H.
Jang
K.T.
Lim
T.
Lee
J.
Choi
Y.L.
Jang
H.L.
Yi
J.H.
Baek
K.K.
Park
S.H.
, et al.  . 
Impact of KRAS mutations on clinical outcomes in pancreatic cancer patients treated with first-line gemcitabine-based chemotherapy
Mol. Cancer Ther.
 , 
2011
, vol. 
10
 (pg. 
1993
-
1999
)
64
Laghi
L.
Orbetegli
O.
Bianchi
P.
Zerbi
A.
Di Carlo
V.
Boland
C.R.
Malesci
A.
Common occurrence of multiple K-RAS mutations in pancreatic cancers with associated precursor lesions and in biliary cancers
Oncogene
 , 
2002
, vol. 
21
 (pg. 
4301
-
4306
)
65
Collins
M.A.
Bednar
F.
Zhang
Y.
Brisset
J.C.
Galbán
S.
Galbán
C.J.
Rakshit
S.
Flannagan
K.S.
Adsay
N.V.
Pasca di Magliano
M.
Oncogenic Kras is required for both the initiation and maintenance of pancreatic cancer in mice
J. Clin. Invest.
 , 
2012
, vol. 
122
 (pg. 
639
-
653
)
66
Sing
T.
Sander
O.
Beerenwinkel
N.
Lengauer
T.
ROCR: visualizing classifier performance in R
Bioinformatics
 , 
2005
, vol. 
21
 (pg. 
3940
-
3941
)
67
Maass
W.
On the computational power of winner-take-all
Neural. Comput.
 , 
2000
, vol. 
12
 (pg. 
2519
-
2535
)
68
Kalady
M.F.
Dejulius
K.L.
Sanchez
J.A.
Jarrar
A.
Liu
X.
Manilich
E.
Skacel
M.
Church
J.M.
BRAF mutations in colorectal cancer are associated with distinct clinical characteristics and worse prognosis
Dis. Colon Rectum
 , 
2012
, vol. 
55
 (pg. 
128
-
133
)
69
Xing
M.
BRAF mutation in thyroid cancer
Endocr. Relat. Cancer
 , 
2005
, vol. 
12
 (pg. 
245
-
246
)
70
Tufano
R.P.
Teixeira
G.V.
Bishop
J.
Carson
K.A.
Xing
M.
BRAF mutation in papillary thyroid cancer and its value in tailoring initial treatment: a systematic review and meta-analysis
Medicine (Baltimore)
 , 
2012
, vol. 
91
 (pg. 
274
-
286
)
71
Li
J.
Yen
C.
Liaw
D.
Podsypanina
K.
Bose
S.
Wang
S.I.
Puc
J.
Miliaresis
C.
Rodgers
L.
McCombie
R.
, et al.  . 
PTEN, a putative protein tyrosine phosphatase gene mutated in human brain, breast, and prostate cancer
Science
 , 
1997
, vol. 
275
 (pg. 
1943
-
1947
)
72
Duerr
E.M.
Rollbrocker
B.
Hayashi
Y.
Peters
N.
Meyer-Puttlitz
B.
Louis
D.N.
Schramm
J.
Wiestler
O.D.
Parsons
R.
Eng
C.
, et al.  . 
PTEN Mutations in gliomas and glioneuronal tumors
Oncogene
 , 
1998
, vol. 
16
 (pg. 
2259
-
2264
)
73
Yang
Y.
Shao
N.
Luo
G.
Li
L.
Zheng
L.
Nilsson-Ehle
P.
Xu
N.
Mutations of PTEN gene in gliomas correlate to tumor differentiation and short-term survival rate
Anticancer Res.
 , 
2010
, vol. 
30
 (pg. 
981
-
985
)
74
Vona
G.
Sabile
A.
Louha
M.
Sitruk
V.
Romana
S.
Schütze
K.
Capron
F.
Franco
D.
Pazzagli
M.
Vekemans
M.
, et al.  . 
Isolation by size of epithelial tumor cells: a new method for the immunomorphological and molecular characterization of circulating tumor cells
Am. J. Pathol.
 , 
2000
, vol. 
156
 (pg. 
57
-
63
)
75
Nagrath
S.
Sequist
L.V.
Maheswaran
S.
Bell
D.W.
Irimia
D.
Ulkus
L.
Smith
M.R.
Kwak
E.L.
Digumarthy
S.
Muzikansky
A.
, et al.  . 
Isolation of rare circulating tumour cells in cancer patients by microchip technology
Nature
 , 
2007
, vol. 
450
 (pg. 
1235
-
1239
)
76
Yu
M.
Ting
D.T.
Stott
S.L.
Wittner
B.S.
Ozsolak
F.
Paul
S.
Ciciliano
J.C.
Smas
M.E.
Winokur
D.
Gilman
A.J.
, et al.  . 
RNA sequencing of pancreatic circulating tumour cells implicates WNT signalling in metastasis
Nature
 , 
2012
, vol. 
487
 (pg. 
510
-
513
)
77
Maheswaran
S.
Sequist
L.V.
Nagrath
S.
Ulkus
L.
Brannigan
B.
Collura
C.V.
Inserra
E.
Diederichs
S.
Iafrate
A.J.
Bell
D.W.
, et al.  . 
Detection of mutations in EGFR in circulating lung-cancer cells
N. Engl. J. Med.
 , 
2008
, vol. 
359
 (pg. 
366
-
377
)
78
Hou
H.W.
Warkiani
M.E.
Khoo
B.L.
Li
Z.R.
Soo
R.A.
Tan
D.S.
Lim
W.T.
Han
J.
Bhagat
A.A.
Lim
C.T.
Isolation and retrieval of circulating tumor cells using centrifugal forces
Sci. Rep.
 , 
2013
, vol. 
3
 pg. 
1259
 
79
Williams
S.C.
Circulating tumor cells
Proc. Natl. Acad. Sci. USA.
 , 
2013
, vol. 
110
 pg. 
4861
 
80
Heitzer
E.
Auer
M.
Gasch
C.
Pichler
M.
Ulz
P.
Hoffmann
E.M.
Lax
S.
Waldispuehl-Geigl
J.
Mauermann
O.
Lackner
C.
, et al.  . 
Complex tumor genomes inferred from single circulating tumor cells by array-CGH and next-generation sequencing
Cancer Res.
 , 
2013
, vol. 
73
 (pg. 
2965
-
2975
)
81
Soda
M.
Choi
Y.L.
Enomoto
M.
Takada
S.
Yamashita
Y.
Ishikawa
S.
Fujiwara
S.
Watanabe
H.
Kurashina
K.
Hatanaka
H.
, et al.  . 
Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer
Nature
 , 
2007
, vol. 
448
 (pg. 
561
-
566
)
82
Koivunen
J.P.
Mermel
C.
Zejnullahu
K.
Murphy
C.
Lifshits
E.
Holmes
A.J.
Choi
H.G.
Kim
J.
Chiang
D.
Thomas
R.
, et al.  . 
EML4-ALK fusion gene and efficacy of an ALK kinase inhibitor in lung cancer
Clin. Cancer Res.
 , 
2008
, vol. 
14
 (pg. 
4275
-
4283
)
83
Peifer
M.
Fernandez-Cuesta
L.
Sos
M.L.
George
J.
Seidel
D.
Kasper
L.H.
Plenker
D.
Leenders
F.
Sun
R.
Zander
T.
, et al.  . 
Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer
Nat. Genet.
 , 
2012
, vol. 
44
 (pg. 
1104
-
1110
)
84
Harbour
J.W.
Lai
S.L.
Whang-Peng
J.
Gazdar
A.F.
Minna
J.D.
Kaye
F.J.
Abnormalities in structure and expression of the human retinoblastoma gene in SCLC
Science
 , 
1988
, vol. 
241
 (pg. 
353
-
357
)
85
Kim
S.K.
Su
L.K.
Oh
Y.
Kemp
B.L.
Hong
W.K.
Mao
L.
Alterations of PTEN/MMAC1, a candidate tumor suppressor gene, and its homologue, PTH2, in small cell lung cancer cell lines
Oncogene
 , 
1998
, vol. 
16
 (pg. 
89
-
93
)
86
Tsai
J.
Lee
J.T.
Wang
W.
Zhang
J.
Cho
H.
Mamo
S.
Bremer
R.
Gillette
S.
Kong
J.
Haass
N.K.
, et al.  . 
Discovery of a selective inhibitor of oncogenic B-Raf kinase with potent antimelanoma activity
Proc. Natl. Acad. Sci. USA.
 , 
2008
, vol. 
105
 (pg. 
3041
-
3046
)
87
Reuveni
H.
Flashner-Abramson
E.
Steiner
L.
Makedonski
K.
Song
R.
Shir
A.
Herlyn
M.
Bar-Eli
M.
Levitzki
A.
Therapeutic destruction of insulin receptor substrates for cancer treatment
Cancer Res.
 , 
2013
, vol. 
73
 (pg. 
4383
-
4394
)
88
Castresana
J.S.
Rubio
M.P.
Vazquez
J.J.
Idoate
M.
Sober
A.J.
Seizinger
B.R.
Barnhill
R.L.
Lack of allelic deletion and point mutation as mechanisms of p53 activation in human malignant melanoma
Int. J. Cancer
 , 
1993
, vol. 
55
 (pg. 
562
-
565
)
89
Papp
T.
Jafari
M.
Schiffmann
D.
Lack of p53 mutations and loss of heterozygosity in non-cultured human melanocytic lesions
J. Cancer Res. Clin. Oncol.
 , 
1996
, vol. 
122
 (pg. 
541
-
548
)
90
Houben
R.
Hesbacher
S.
Schmid
C.P.
Kauczok
C.S.
Flohr
U.
Haferkamp
S.
Müller
C.S.
Schramma
D.
Wischhusen
J.
Becker
J.C.
High-level expression of wild-type p53 in melanoma cells is frequently associated with inactivity in p53 reporter gene assays
PLoS One
 , 
2011
, vol. 
6
 pg. 
e22096
 
91
Sionov
R.V.
MicroRNAs and glucocorticoid-induced apoptosis in lymphoid malignancies
ISRN Hematol.
 , 
2013
, vol. 
2013
 pg. 
348212
 
92
Petrucci
E.
Pasquini
L.
Bernabei
M.
Saulle
E.
Biffoni
M.
Accarpio
F.
Sibio
S.
Di Giorgio
A.
Di Donato
V.
Casorelli
A.
, et al.  . 
A small molecule SMAC mimic LBW242 potentiates TRAIL- and anticancer drug-mediated cell death of ovarian cancer cells
PLoS One
 , 
2012
, vol. 
7
 pg. 
e35073
 
93
Eschenburg
G.
Eggert
A.
Schramm
A.
Lode
H.N.
Hundsdoerfer
P.
Smac mimetic LBW242 sensitizes XIAP-overexpressing neuroblastoma cells for TNF-α-independent apoptosis
Cancer Res.
 , 
2012
, vol. 
72
 (pg. 
2645
-
2656
)
94
Dai
Y.
Rahmani
M.
Dent
P.
Grant
S.
Blockade of histone deacetylase inhibitor-induced relA/p65 acetylation and NF-kappaB activation potentiates apoptosis in leukemia cells through a process mediated by oxidative damage, XIAP downregulation, and c-Jun N-terminal kinase 1 activation
Mol. Cell. Biol.
 , 
2005
, vol. 
25
 (pg. 
5429
-
5444
)

Supplementary data