Abstract

The majority of compounds designed against cancer drug targets do not progress to become approved drugs, mainly due to lack of efficacy and/or unmanageable toxicity. Robust target evaluation is therefore required before progressing through the drug discovery process to reduce the high attrition rate. There are a wealth of publicly available databases that can be mined to generate data as part of a target evaluation. It can, however, be challenging to learn what databases are available, how and when they should be used, and to understand the associated limitations. Here, we have compiled and present key, freely accessible and easy-to-use databases that house informative datasets from in vitro, in vivo and clinical studies. We also highlight comprehensive target review databases that aim to bring together information from multiple sources into one-stop portals. In the post-genomics era, a key objective is to exploit the extensive cell, animal and patient characterization datasets in order to deliver precision medicine on a patient-specific basis. Effective utilization of the highlighted databases will go some way towards supporting the cancer research community achieve these aims.

INTRODUCTION

Oncology drug discovery aims to develop therapeutic agents that modulate biological processes to inhibit progression of cancer in the clinical setting. Drug discovery is a resource-intensive process that can broadly be broken down into four steps: target identification and validation, hit identification and validation, lead identification and optimization, and clinical development (1). Each subsequent step is more resource intensive than the last, requiring significant financial investment. The average time for a new drug to go from the start of the process to approval is 10–15 years with costs exceeding $1 billion (2,3). Despite the significant resource involved, over 90% of new oncology agents do not become approved drugs, mainly due to lack of efficacy and/or unmanageable toxicity (4). It is therefore essential that sufficient efforts are invested in the evaluation of a novel target before a project progresses into drug discovery. The main aims of target evaluation (Figure 1) are to establish whether available information suggests that an agent modulating target activity is likely to be efficacious and tolerable in the clinical setting, identify the patient population to target (clinical positioning) and determine whether the project is technically feasible (tractability) (5).

Target evaluation summary. Schematic diagram illustrating tractability, tolerability, efficacy and clinical positioning information that is required as part of a target evaluation assessment to enable decisions to be made regarding progression of a project into the full drug discovery process. Information can be obtained from a variety of sources, including online databases as highlighted.
Figure 1.

Target evaluation summary. Schematic diagram illustrating tractability, tolerability, efficacy and clinical positioning information that is required as part of a target evaluation assessment to enable decisions to be made regarding progression of a project into the full drug discovery process. Information can be obtained from a variety of sources, including online databases as highlighted.

As efforts to identify new therapeutic interventions in oncology have moved from broad-spectrum cytotoxic agents to selective agents against specific targets dysregulated in disease or agents that modify the tumour immune response (6), our need to understand target biology in different settings has drastically increased. The complex nature of cancer progression and the interplay between tumour intrinsic effects and interactions with the tumour microenvironment make it extremely challenging to generate all the information required for robust evaluation of a target. For novel targets, it is often the case that there is not a sufficient depth of literature available. However, in the post-genomics era, the extensive availability of omics datasets from cell lines, animal models and patient samples means that detailed information is available to support comprehensive target evaluation. As detailed in Figure 1, there are several sources for information that can inform tractability, tolerability, efficacy and clinical positioning assessments. One valuable source that we believe is currently under-utilized is the wealth of publicly available databases that can be mined to generate these data. Database mining can be used to support the cancer research community achieve two key objectives: The first is to ensure that targets are robustly evaluated prior to entering the drug discovery process, with the aim of reducing the high attrition rate at later stages. The second is to support the progression of targets that offer the opportunity to develop precision medicines on a patient-specific basis.

The wealth of available databases can, however, present a significant challenge. How do we gain an awareness of the databases, learn what information they can provide and understand their limitations? To help address these key questions, we highlight several key freely available databases and discuss how they can be utilized as part of a target evaluation. To demonstrate the utility of the selected databases, we present selected key examples of data outputs that can be obtained. We also discuss the limitations of the available databases and suggest additional information that would be of use to the cancer research and drug discovery communities. Finally, we demonstrate how the database outputs can be combined to evaluate a novel target and demonstrate how outputs can be used to predict challenges associated with clinical development.

DATABASES

Cancer cell line

The primary mechanism of action of the majority of approved anti-cancer drugs is to directly inhibit proliferation or to induce tumour cell death (7). Demonstrating cancer cell line target dependency for proliferation and/or viability therefore provides key validation for targets whose function in disease is primarily mediated through cancer cell intrinsic effects. In addition to predicting efficacy of target inhibition, output from cancer cell line profiling databases can be used to guide clinical positioning and support the identification of suitable clinically relevant cellular models in which to test compound efficacy.

The Cancer Dependency Map (DepMap) portal (https://depmap.org/portal/) is an initiative from the Broad and Sanger Institutes that houses large-scale RNAi and CRISPR screening from several sources with the aim of identifying intrinsic vulnerabilities of cancer cell lines. In addition, the DepMap portal also houses small molecule inhibitor screening data to identify pharmacological sensitivities across large cancer cell line panels, which we cover in the ‘Identification and profiling of tool compounds’ section.

Within DepMap, data from large-scale CRISPR (8,9) or RNAi (10–12) screens in cancer cell lines have been combined. The Chronos (13) or DEMTER2 (14) algorithms are used to normalize datasets to produce a single integrated output. Output is presented in the form of a ‘gene effect’ score where anything below 0 represents a loss of viability, with −1 being the median score for common essential genes. Data integration allows over 1000 cancer cell lines for CRISPR and over 700 cell lines for RNAi to be compared. Combining screening data from different primary screens that have been performed using distinct methodologies could result in variability across datasets, diminishing the statistical power to identify cell line-specific vulnerabilities. However, a comparative analysis between two of the largest contributors to the CRISPR screening dataset demonstrated that despite the differences in experimental set-up, there was a high degree of concordance between the studies in terms of both cell line-specific dependencies and predictive biomarkers identified (15), which increases confidence in the use of DepMap CRISPR screening datasets to support cancer target evaluation.

In addition to target dependency information, DepMap also houses genomic and metabolomic characterization from the Cancer Cell Line Encyclopedia (https://sites.broadinstitute.org/ccle/) (16–19). This includes mRNA expression, copy number and mutation status data, with recent efforts being made to expand this characterization to include mass spectrometry-based quantification of the proteome (20). Integration of this comprehensive omics dataset with extensive genetic and pharmacological screening gives the statistical power to make predictions to inform clinical positioning. This is exemplified by the DepMap output of dependent cell lines for the well-validated anti-cancer drug target KRAS (Figure 2A). It is well established that gain-of-function mutations in KRAS are prevalent in pancreatic, colorectal and non-small cell lung cancer (21) and act as a driver of tumour progression. The DepMap output for KRAS demonstrates that dependency on KRAS is strongly selective, with a clear ‘tail’ of strongly dependent cell lines (gene effect score ≤−1) observed with both RNAi and CRISPR screening (Figure 2A). Analysis of this dataset demonstrates that there is a statistical enrichment of KRAS dependency in cell lines of a pancreatic and colorectal origin identified from the CRISPR target dependency screens (Figure 2B), consistent with the high prevalence of gain-of-function mutations in these indications. Further evidence of the power of DepMap to identify selective dependencies on a target based on genomic features of cell lines comes from the observation that the WRN DNA helicase was identified as a selective vulnerability for microsatellite instable (MSI) cancer cell lines using CRISPR and RNAi screening data, now housed within DepMap (22). As a result, there is now significant commercial interest in developing selective WRN small molecule inhibitors for use in MSI tumours.

Cancer and immune cell database outputs. (A) Cell line dependency information for KRAS obtained from the DepMap portal. The gene effect score for CRISPR (light blue) and RNAi (dark blue) screening for ∼1000 cell lines each is plotted and a text summary is provided above the graph. Cell lines are classed as dependent on the target if the gene effect score is <−0.5 and the median gene effect score for a common essential gene is −1, as indicated by the dashed line. ‘Strongly selective’ dependency on KRAS indicates that there is a subset of the cell lines that are highly dependent on KRAS, but this is not common across the majority of cell lines tested. (B) Gene effect scores for CRISPR are plotted for all cell lines and then compared to the spread of gene effect scores for individual lineages. P-values are used to indicate statistically significant enrichment of target dependency observed within cell lines of pancreatic and colorectal origin relative to the pooled cell line panel. (C) The Tumor–Immune System Interaction Database (TISIDB) was used to calculate the Spearman correlation coefficient between expression of PDCD1 and gene expression of known immunostimulators across human cancers based on The Cancer Genome Atlas (TCGA) datasets. Data are presented as a heatmap with strong positive correlations indicated in red and negative correlations indicated in blue. (D) Individual boxes within the heatmap can be analysed in more detail as shown for the correlation plot of PDCD1 and CD27 expression in lung adenocarcinoma patients. (E) Deeply Integrated human Single-Cell Omics data (DISCO) scRNA-seq uniform manifold approximation and projection (UMAP) is shown for PDCD1 expression in the indicated immune cell subtypes distributed within the kidney atlas.
Figure 2.

Cancer and immune cell database outputs. (A) Cell line dependency information for KRAS obtained from the DepMap portal. The gene effect score for CRISPR (light blue) and RNAi (dark blue) screening for ∼1000 cell lines each is plotted and a text summary is provided above the graph. Cell lines are classed as dependent on the target if the gene effect score is <−0.5 and the median gene effect score for a common essential gene is −1, as indicated by the dashed line. ‘Strongly selective’ dependency on KRAS indicates that there is a subset of the cell lines that are highly dependent on KRAS, but this is not common across the majority of cell lines tested. (B) Gene effect scores for CRISPR are plotted for all cell lines and then compared to the spread of gene effect scores for individual lineages. P-values are used to indicate statistically significant enrichment of target dependency observed within cell lines of pancreatic and colorectal origin relative to the pooled cell line panel. (C) The Tumor–Immune System Interaction Database (TISIDB) was used to calculate the Spearman correlation coefficient between expression of PDCD1 and gene expression of known immunostimulators across human cancers based on The Cancer Genome Atlas (TCGA) datasets. Data are presented as a heatmap with strong positive correlations indicated in red and negative correlations indicated in blue. (D) Individual boxes within the heatmap can be analysed in more detail as shown for the correlation plot of PDCD1 and CD27 expression in lung adenocarcinoma patients. (E) Deeply Integrated human Single-Cell Omics data (DISCO) scRNA-seq uniform manifold approximation and projection (UMAP) is shown for PDCD1 expression in the indicated immune cell subtypes distributed within the kidney atlas.

Cell Model Passports (https://cellmodelpassports.sanger.ac.uk/) is an integrated component of the DepMap that offers its own user-friendly portal with genomic and clinical characterization of >2000 cancer cell line models (23). The portal allows the user to search by cell line to identify all available information, including presence of driver mutations, mRNA and protein expression, copy number alterations and sensitivity to drugs tested by the Genomics of Drug Sensitivity in Cancer initiative (https://www.cancerrxgene.org/) (24). Integration within DepMap allows correlations between cell features and target dependency to be identified.

Synthetic lethality, where co-deletion/inhibition of two targets results in a loss of viability that is not observed upon perturbation of one target alone, is a strategy that offers the opportunity to achieve a therapeutic window due to limited effects on cells not bearing the cancer-specific alteration. The clinical relevance of this concept has been illustrated by the success of PARP inhibitors in BRCA1/2 mutant settings (25). There have been recent efforts to generate databases that can predict synthetic lethal interactions, such as SynLethDB (26), based on compiling synthetic lethal CRISPR screening data from literature. The power of these databases to identify novel targets is dependent on the quality and quantity of the compiled data and is currently limited by the scale of available synthetic lethality screening studies. In the coming years, we expect that these databases will have the potential to further inform clinical positioning strategies and identify novel targets that may only be revealed in a synthetic lethal setting.

Despite the power of cancer cell line databases, there remain several key limitations that should be considered when interpreting outputs. Depletion or deletion of a target is not necessarily recapitulated in full by inhibition of target activity, which is currently the most common modality of novel anti-cancer agents. Therefore, deletion/depletion data should be interpreted with caution, especially for targets that have multiple activities, as exemplified by the several known kinase-independent functions of protein and lipid kinases (27). Another key limitation is that cell screening data housed in DepMap have been generated under standard cell culture media conditions. It has become apparent that the composition of standard cell culture media conditions does not accurately recapitulate the in vivo environment and that utilizing physiologically relevant cell culture media can alter the metabolic profile of cancer cell lines in 2D and 3D growth conditions (28,29). Indeed, CRISPR screens performed in haemopoietic cell lines under physiological culture conditions result in the identification of specific gene dependencies not revealed in conventional culture conditions (30). This demonstrates that DepMap outputs will not always accurately reflect dependencies of cancer cell lines for certain targets, particularly those that play a role in metabolic processes, where dependency may be influenced by gene–nutrient interaction.

Cancer cell line growth in 3D conditions is believed to more accurately recapitulate the in vivo environment (31,32) and CRISPR screens comparing gene dependencies in 2D versus 3D conditions have identified 3D culture-specific vulnerabilities (33). Known cancer drivers and genes found to be heavily mutated in cancer were more likely to be identified by screening under 3D conditions, suggesting that there may be reduced target attrition at later stages of the drug discovery process, if 3D culture was used for novel target identification screens. A clear gap therefore exists for a database that houses large-scale CRISPR screening performed using 3D growth and physiologically relevant culture conditions.

Another key limitation of screening data from cancer cell lines is that using cancer cell lines alone is not an accurate reflection of the complex situation in vivo, where the tumour microenvironment, consisting of a range of cell types including stromal and immune cells, can play a key role in outcomes of therapeutic interventions. Lack of cancer cell intrinsic dependency on a target in this class would not therefore be indicative of response to perturbation in vivo. This is exemplified by zero cell lines being identified in DepMap CRISPR or RNAi datasets as being dependent on PD-L1 (https://depmap.org/portal/gene/CD274). Anti-PD-L1 antibodies have demonstrated efficacy in the clinical setting, where they act by enhancing the T-cell-mediated anti-tumour immune response (34).

A growing body of research highlights the role of the immune system in cancer initiation, progression, treatment and resistance to therapy (35). The interaction between malignant and immune cells is therefore an important consideration and several anti-cancer agents act by directly regulating the tumour immune response. Genome-wide CRISPR screens have been performed in immune cells such as macrophages and T cells (34–36) to identify factors required for cancer-relevant phenotypes, such as proliferation, viability and exhaustion. A database that compiled this information to cross-reference with cancer cell intrinsic dependency databases would be a step towards greater understanding of the impact of target inhibition in a more complex setting.

Once a target has been identified, database mining can also be used to identify interactors of the target of interest. Identifying known interactors can provide further mechanistic insight and lead to additional therapeutic opportunities, such as small molecule-mediated disruption of disease-relevant protein–protein interactions (PPIs). It is also possible that more tractable targets that play a similar role in disease progression can be identified through such efforts. The STRING database (https://string-db.org/) collates PPIs from several sources, including literature mining, computational prediction and databases of interactions identified experimentally (36). Several other PPI databases exist, and a thorough comparison of coverage has been reported (37), demonstrating that STRING has the highest coverage of experimentally verified PPIs and is updated frequently.

Immune cells

The immune oncology field has been driven forward in recent years by the approval of therapeutic antibodies targeting CTLA4, PD-1 and PD-L1 (38). The success of such immune checkpoint inhibitors encouraged development of therapeutic agents against additional targets. However, the outcomes of clinical trials for novel immunotherapies have been underwhelming (39), highlighting the complexity of the tumour microenvironment and the need to better understand the cross-talk between tumour cells and infiltrating components of the immune system.

Understanding the role a target plays in regulating the tumour immune response is key for all aspects of target evaluation. A particular immune cell may be the most appropriate model in which to measure target activity and dependency. Clinical positioning may be more accurately guided by tumour extrinsic effects and a role in normal immune functions will inform tolerability assessments. It is therefore recommended that the databases highlighted in this section are utilized as part of a comprehensive target evaluation.

Gene expression data are the most abundant omics data available for both patient-derived samples and cancer cell lines and are a commonly utilized resource for researchers looking to understand genotype–phenotype relationships. TCGA, for example, has gene expression data from >10 000 patient tumour samples (see the ‘Clinical’ section). However, these data are derived from bulk RNA sequencing (RNA-seq) that does not differentiate between the various cell types, such as immune and tumour cells, present in a sample. To address this issue, various groups have derived deconvolution methods to give estimates of immune infiltration from bulk RNA-seq data, including Cibersort (40), Estimate (41), xCell (41), EPIC (42), quanTIseq (43), mMCPcounter (44) and TIMER (45).

The Tumor IMmune Estimation Resource (TIMER) (http://timer.cistrome.org/), developed by the Liu lab at the Dana Faber Institute, provides a portal that allows researchers to explore the infiltration of immune cells in TCGA tumour samples and correlate this with gene expression, mutation and copy number alterations. Infiltration scores calculated with the aforementioned deconvolution algorithms are available. However, cross-comparison of output from the different deconvolution methods is important as no single method is completely accurate and in some cases the infiltration scores can even be conflicting. Such data should therefore be used primarily to guide further experimental validation. TIMER also provides the functionality to determine the association between gene expression, immune infiltration and clinical outcome. While the data for TCGA samples are directly available via the TIMER portal, there is also the option for researchers to upload custom bulk RNA-seq data to analyse data generated elsewhere.

TISIDB (http://cis.hku.hk/TISIDB/) predicts response to immunotherapy by integrating datasets for a given target from multiple sources, including gene expression data, high-throughput CRISPR/shRNA screening to determine sensitivity to T-cell-mediated killing and literature mining of several thousand publications (46). When querying a target of interest, TISIDB presents associated information in a multi-tab format, which includes text mining results for evidence in the literature that links the target to immunotherapy response. The output also comprises omics data from clinical samples to identify correlations between pre-immunotherapy treatment target expression and immune-relevant cancer subtype or between target expression, mutation, copy number or methylation status with response to immune checkpoint inhibitors, abundance of tumour infiltrating lymphocyte subtypes, expression of known immunomodulators or chemokine expression. Like TIMER, TISIDB provides correlation data as an easy to visualize and interpret heatmap. For example, a heatmap is used to visualize the correlation of PDCD1 (encodes PD-1 protein) expression and known immunomodulators across a range of human cancers (Figure 2C). Red squares indicate positive correlation between PDCD1 expression and expression of named immunomodulators, which for a novel target could provide strong evidence of a potential link to the tumour immune response, to be followed up experimentally. This dataset can be further probed by visualization of correlations with specific immunomodulators within a cancer indication of interest, as illustrated by the correlation of PDCD1 and CD27 expression in lung adenocarcinoma (Figure 2D), as has been previously shown for breast cancer (47). TISIDB also provides the ability, within the ‘Immunotherapy’ tab, to identify target gene expression differences between responders and non-responders to immune checkpoint inhibition from patient samples. However, low patient numbers are a general feature of studies housed within TISIDB, which limits the ability to identify statistically significant differences. Overall, TISIDB provides a comprehensive overview that supports detailed evaluation of the potential role of a novel target in the cancer immune response.

TIMER and TISIDB have been utilized effectively as tools to identify correlations between target expression (48,49) or DNA methylation status (50) and tumour immune cell infiltration and prognosis. This demonstrates the power of such correlative analysis to identify biomarkers or novel targets that regulate the tumour immune response.

The gene expression datasets used by TIMER and TISIDB are primarily from bulk RNA-seq experiments, and the deconvolution methods employed only estimate the heterogeneity of the cell types present within the sample. This has led to a shift towards single-cell RNA-seq (scRNA-seq) to give accurate expression profiles of individual cells. Until recently, scRNA-seq datasets have been notoriously difficult to access and analyse without considerable programming knowledge. Although publications require data to be deposited in repositories such as Gene Expression Omnibus (51) and the Database of Genotypes and Phenotypes (52), there is little consistency in formatting or even at what stage of processing the data are deposited, which complicates data extraction and analysis. Several groups have produced online portals to facilitate easier access to scRNA-seq datasets including the Single Cell Expression Atlas (https://www.ebi.ac.uk/gxa/sc/home) by EMBL-EBI (53) and CellxGene (https://cellxgene.cziscience.com/) from the Chan Zuckerberg Biohub project (54). The datasets provided by Single Cell Expression Atlas are limited, with only 131 human and 111 mouse experiments available at the time of writing. The tool allows expression of a gene of interest to be projected on to the UMAP (55) or t-stochastic neighbourhood embedding (56) plots (the two main graphical methods for scRNA-seq data visualization) of a chosen dataset. However, metadata annotations can be variable, with some datasets critically missing cell type assignments. CellxGene houses a much larger collection of datasets with >600 human studies currently available. The explore function provides extensive options for annotating the UMAP of each dataset with study parameters and author cell assignments but also importantly includes quality control metrics such as mitochondrial fraction and unique molecular identifier counts for users to confirm data quality. As with other tools, specific genes can be plotted on the UMAP, but CellxGene also offers a gene set function that allows users to plot multiple genes together, which is useful for exploring gene signatures.

An alternative portal for exploration of scRNA-seq data is DISCO (https://www.immunesinglecell.org/) (57). DISCO contains a large collection of human scRNA-seq datasets integrated into tissue-specific ‘atlases’. Each atlas can be queried by gene ID with outputs given as UMAPs or violin plots by cell type. Datasets can also be queried in a pan-tissue manner, which can be informative when evaluating a target for which the appropriate tissue context may not yet be known. DISCO allows visualization of target expression in immune cell types across all atlases as a violin plot to give an overview of distribution of target expression. Exploring the different atlases leads to the interactive UMAP tool that can be used to view the expression of a target gene at a cellular level, as shown for PDCD1 in a kidney atlas (Figure 2E) (58). These data can inform the selection of appropriate immune cell types in which to study target biology and direct clinical positioning efforts. There are several disease-specific atlases available, including for pancreatic ductal adenocarcinoma, ovarian cancer and triple negative breast cancer, which have the added functionality of being able to perform disease versus normal comparison for a gene of interest. Expansion of cancer-specific atlases in the coming years will further enhance the utility of DISCO for cancer target evaluation. DISCO also offers a ‘FastIntegration’ tool that allows specific datasets of interest to be integrated and analysed together. Such capabilities have until now not been possible without programming knowledge and use of tools such as the R package Seurat. Data from DISCO can also be downloaded as a Seurat object to facilitate more complex analysis if required.

Mouse models

An integral component of a target evaluation is to determine whether there is sufficient validation of target dependency via genetic or pharmacological methods in pre-clinical models that accurately model the complex nature of cancer in the physiological setting. This complex physiology includes understanding the impact of target modulation in a model that includes a proliferating tumour and components of the tumour microenvironment and that represents the inherent heterogeneity of human disease. Demonstration of efficacy in clinically relevant models is also used to guide clinical positioning strategies. Another key aim of target evaluation is to predict toxicity associated with target inhibition, using genetic or pharmacological methods. Mouse models have been used extensively for each of these aims and are routinely used to predict compound efficacy and toxicity in a clinical setting. Several databases compiling mouse datasets across multiple models provide different platforms that can be mined to both obtain available information and identify suitable models that can be used experimentally.

The International Mouse Phenotyping Consortium (IMPC) (https://www.mousephenotype.org) has been established between 21 research institutes with the aim of creating murine knockouts for every protein-coding gene within the mouse genome (59,60). Knockout generation and characterization are all performed within the consortium. Standard pipelines for phenotypic characterization are applied, enabling valid comparison between all knockout models. If homozygous knockout mice are viable, then extensive phenotyping of the early adult will take place between 9 and 15 weeks, or if not viable, then heterozygotes will be characterized, and the stage of embryonic lethality of the homozygotes will be determined. Querying IMPC by gene name will bring up a link, if available, to a graphical representation of the overall phenotypic characterization where 20 different phenotypic outputs are coloured to represent significant differences to wild type, no significant difference or not tested. The full details of the phenotypic characterization and an analysis of body weight can also be found within this page. Such phenotypic characterization data can be used to flag potential tolerability concerns associated with loss of a target protein that may be previously unknown. Since unmanageable toxicity is a key reason for attrition of targets during the drug discovery and clinical development process, it is essential that potential liabilities are flagged as part of a target evaluation. Knockout mice, embryonic stem cells or targeting vectors can also be purchased directly through this portal.

It is, however, important to note that knockout mouse data are a rather crude way to evaluate potential toxicity liabilities of a therapeutic, such as a small molecule inhibitor. Many proteins have multiple functions and target knockout will ablate all functions, resulting in a phenotype that may not be representative of therapeutic intervention. Key parameters for toxicity are the pharmacokinetic (PK) and pharmacodynamic (PD) properties of a therapeutic agent that are also not reflected by whole body or tissue-specific target knockout. Toxicity may also be driven by off-target effects of a therapeutic agent, which will not be predicted by target-directed evaluation.

The Mouse Models of Human Cancer Database (MMHCdb) (http://tumor.informatics.jax.org/) is a manually curated resource of several types of murine cancer models hosted by The Jackson Laboratory with funding from the National Cancer Institute (NCI) (61). MMHCdb is part of the Mouse Genome Informatics consortium, first released in 1998 as the Mouse Tumor Biology Database (62). The database houses data from over 46 000 models from nearly 7000 different cohorts of mice. Extensive efforts have been made to provide curated, consistent data to inform selection of clinically relevant models. Data are extracted from literature or submitted directly by individuals or large-scale research initiatives. The database bridges the historical gap around gene and strain nomenclature standards from diverse sources.

The three types of mouse models with available information within MMHCdb are inbred mouse models, genetically engineered mouse models (GEMMs) and patient-derived xenografts (PDXs). Information available for inbred mouse models includes an interactive graphical summary of the characteristic cancers observed in over 700 different inbred mouse strains. The Tumor Frequency Grid tool displays the frequency of spontaneous tumours across the different inbred strains. GEMMs are generated by introduction of murine equivalents of human cancer-associated mutations and can be used to study tumour initiation, progression and response to therapy (63). However, as illustrated by strain comparison using the Tumor Frequency Grid, the genetic background in which GEMMs are developed can have an impact on the observed phenotype. It is therefore essential that the influence of genetic backgrounds is taken into consideration when selecting appropriate models for transplantation or GEMM generation or when interpreting study data. To support model selection, the MMHCdb search function allows queries by gene type, cancer type and mouse strain to identify all associated studies and provide further information on tumour onset, pathology and sites of metastasis.

PDX models are generated by implantation of human tumour tissue in an immunodeficient or humanized mouse. In collaboration with EMBL-EBI, MMHCdb co-developed the PDX Finder resource that serves as a global catalogue of PDX models (64). The PDX Finder tool has since grown to include cancer cell line and organoid models and is now available as a stand-alone database, Patient Derived Cancer Models Finder (https://www.cancermodels.org), that can be used to identify clinically relevant patient-derived model systems for a given disease area and explore associated characterization. However, the majority of PDX models available within the MMHCdb are from the immunodeficient NSG host strain and therefore will not provide insight into potential interactions with the tumour immune response.

Syngeneic mouse models, where murine tissue or cell lines are transplanted into immune competent mouse models, allow the study of the tumour immune response in a complex setting. The Tumor Immune Syngeneic MOuse (TISMO) (http://tismo.cistrome.org/) database hosts datasets from 137 public syngeneic mouse model studies, comprising over 1500 samples from 68 different models (65). These models, however, do not cover all cancer indications, with brain cancers, for example, having no representative models within TISMO. In addition to manually curated model characterization, including details of cell line genotype and cancer type, mouse genetic background and implantation site, TISMO provides interactive visual interfaces to explore datasets for gene expression, immune cell infiltrate and response to therapy, in both treatment naïve and immune checkpoint blockade treated models.

There are several specific features of TISMO worth highlighting. The first is the ‘Pathway’ tab that allows comparison between different biological pathways, from KEGG, GO cellular compartment, WikiPathways, GO molecular function, GO biological process, Reactome and MSigDB C7 immunologic signature, across different tumour models and between pre- and post-treatment with immune checkpoint inhibitor treatment. These data can provide evidence that a target of interest plays a role in the response to specific therapies in mouse tumour models with specific genetic backgrounds. TISMO also allows upload of user gene sets for custom analysis, which is a powerful feature when evaluating the role of a novel target in the tumour immune response. Within the ‘Infiltrate’ tab, users can compare immune cell infiltration levels across different tumour models, between pre- and post-treatment, and between immune checkpoint inhibitor responders and non-responders. As discussed in the immune cell profiling section, immune cell infiltrates are not measured directly but rather estimated using deconvolution algorithms and therefore the data output should be used to guide further experimental validation. The main drawback compared to human databases is the limited number of immune cell types and signatures available to be assessed. TISMO currently only allows analysis of CD8+ T-cell, CD4+ T-cell, macrophage, dendritic cell, B-cell and neutrophil infiltrates.

Syngeneic mouse models are extensively used in immune oncology studies, generating an ever-expanding volume of gene expression, immune infiltration and treatment response data. The field has suffered from a lack of systematic collection and variation between analysis methods, which is being addressed by databases such as TISMO. TISMO is currently the only database with a comprehensive collection of datasets from syngeneic mouse tumour models. This database has also recently been used to support machine learning on syngeneic mouse tumour profiles to model clinical immunotherapy responses (66). Additional features that would enhance the utility of TISMO would be to allow the upload of propriety databases for analysis, enable correlation assessments between gene expression profiles and immune infiltration levels, and include available scRNA-seq datasets.

Identification and profiling of tool compounds

Small molecule chemical probes are used in drug discovery as an orthogonal approach to genetic techniques in cellular and animal models in order to predict efficacy, assess target-related toxicity and explore target biology (Figure 1) (67). These tool compounds are usually small molecule inhibitors, but can be receptor antagonists, receptor agonists or other modulators, such as proteolysis-targeting chimeras (PROTACs; see below). Biological agents such as therapeutic antibodies can also be used for similar aims, but for the purposes of this review we focus on pharmacological agents.

The use of non-selective tool compounds that are unsuitable for biological studies is common and resulting data can misinform target evaluation. An example of a non-selective compound that is still widely used is the non-selective PI3 kinase inhibitor LY294002 (68). Online resources have therefore been developed to help researchers select and use the best tool compounds for their studies (69). Such information can also be used to evaluate how informative existing literature or screening data utilizing compounds may be.

Probe Miner (https://probeminer.icr.ac.uk) was developed by the Institute of Cancer Research to objectively identify the most suitable tool compounds (70). It uses chemical and bioactivity data from large-scale public databases such as BindingDB (https://www.bindingdb.org/) and ChEMBL (https://www.ebi.ac.uk/chembl/) to assess over 1.8 million compounds (70–73). Probe Miner integrates fitness scores for cellular potency, target selectivity, permeability, structure–activity relationships, inactive analogues and pan-assay interference to automatically rank compounds for a particular protein target (70). Probe Miner displays a distribution of the top 20 rated chemical probes together with a compound viewer containing the chemical structure and a radar plot highlighting the strengths and limitations of each chemical tool. A direct link is also provided to common probes recommended by the Chemical Probes Portal so that the user can access guidance for best use of these reviewed compounds. Moreover, Probe Miner has identified high-quality compounds that have been prioritized for future appraisal by the Chemical Probes Portal (see below). Compounds that do not meet the minimum requirements (potency <100 nM; selectivity >10-fold against any other protein; and permeability, effects in cells at <10 μM) are flagged with a recommendation to use with caution or avoid when better compounds are available.

The Chemical Probes Portal (https://www.chemicalprobes.org) is a manually curated online resource for selecting tool compounds (67,69). It currently contains over 500 compounds that encompass >400 protein targets from 100 protein families. These compounds have been evaluated by chemical probe experts, who provide recommendations on the best available compounds, together with guidance on concentrations and conditions for use in cellular assays and in vivo models. Where available, the portal will highlight any inactive compound analogues and orthogonal compounds that can be used to confirm that observed phenotypes are target engagement dependent. The portal contains links to primary literature references, vendor websites and gene databases, and highlights flawed or outdated ‘historical compounds’ that researchers should avoid. For example, LY294002 is described in the Chemical Probes Portal as a ‘historical compound, not to be used as a selective chemical probe for a specific target’.

Additional databases that can be used to access molecular information for approved drugs and investigational compounds are DrugBank (https://go.drugbank.com/) (74) and the Structural Genomics Consortium (SGC) (https://www.thesgc.org/chemical-probes) (75). Using DrugBank to search by target links to known agents with activity towards that target, while searching by drug links to a wealth of information for that specific agent. This includes chemical structure, pharmacology, known drug–drug interactions, chemical properties and links to references for further information. DrugBank is freely available to use for non-commercial applications, but commercial use requires a licence. The SGC is a global partnership between academia, industry and funding agencies with one of the key aims being to create and characterize chemical probes that are made freely available with no restrictions on use. The SGC portal lists available chemical probes, sorted by target protein class. Clicking on the probe of interest links to all available information on probe properties, recommends a chemically similar negative control probe and provides a link to request the probe(s) of interest. Increased awareness and use of databases such as Probe Miner, the Chemical Probes Portal, DrugBank and the SGC will reduce the use of unsuitable tool compounds that can misinform target evaluation and promote the best practice of utilizing chemically similar negative control compounds to confirm on-target phenotypes.

PROTACs have become an increasingly attractive strategy for targeting ‘undruggable’ proteins (76,77). PROTAC molecules can exploit all surface binding sites and are not reliant on binding in a deep hydrophobic pocket or active sites to modulate target protein activity (78). PROTACs are heterobifunctional molecules that contain a warhead small molecule that binds the protein of interest, connected via a linker molecule to a small molecule E3 ligand that recruits an E3 ubiquitin ligase to degrade the bound target protein. PROTAC-DB (http://cadd.zju.edu.cn/protacdb/) is an online public database that collates currently described PROTAC molecules (79) and can be queried by target, compound name/ID or chemical structure. Output is presented as a datasheet showing 2D compound structures (divided into warhead, linker and E3 ligand), biological activities (degradation capacity, binding affinities and cellular activities) and calculated physicochemical properties. The database also utilizes a computational method (PROTAC-Model) to generate predicted ternary complex models for PROTACs that exhibit good degradation capacity (80). PROTACs can be a useful research tool to provide efficacy, tolerability or clinical positioning information. New modalities such as degrader approaches can also influence tractability assessments and allow targets previously deemed to have poor tractability to be revisited.

Targeted cancer therapies act by perturbing specific molecular pathways in tumours. However, analysis of the genomes from specific tumour types has shown that tumours are highly heterogeneous and that this heterogeneity can often explain varied patient responses to targeted therapies. This broad genetic heterogeneity is also observed across cancer cell lines (16). The Broad Institute and Harvard University have developed a novel screening technology called Profiling Relative Inhibition Simultaneously in Mixtures (PRISM) that enables simultaneous high-throughput drug screening in large panels of genetically characterized cell lines (81). This method allows pooled screening of cell lines labelled with unique DNA barcode sequences. Barcode abundance is used to generate cell line sensitivity signatures by comparing treatment to control conditions. Predictive models can identify molecular features that correlate with PRISM sensitivity profiles. The PRISM repurposing dataset (https://depmap.org/repurposing) is available on the DepMap portal and contains viability data generated using the PRISM multiplexed cell line assay to screen 578 cell lines with the Broad Repurposing Library (4518 compounds) (82). Approximately three quarters of the library compounds are approved clinical compounds or in clinical development, with the remainder consisting of tool compounds. This repurposing screen identified tepoxalin, a dual cyclooxygenase and lipoxygenase inhibitor, that selectively killed cell lines with elevated expression of the multidrug resistance protein, MDR1 (82). Expanding the PRISM drug repurposing resource to cover more compounds and cellular models would support repurposing of existing drugs into future cancer therapies. The output can also inform clinical positioning strategies for novel targets by identifying cell line features associated with sensitivity to tool compounds.

Clinical

A key aim of target evaluation is to identify a clinical positioning strategy for a compound developed against a specific target. Patient omics profiling information, including gene mutation or target expression profiles, can be correlated with disease-relevant outcome(s) such as patient survival to refine this clinical positioning strategy, with the aim of delivering precision medicine. A clinical positioning strategy can also inform the selection of relevant model systems for efficacy predictions or compound testing. In this section, we describe key databases that house patient omics and survival data and discuss how these can be utilized.

The combination of cost-effective next-generation sequencing together with large-scale cancer genomic efforts, such as TCGA and the International Cancer Genome Consortium (ICGC), meant that online platforms were needed to integrate the ever-increasing datasets generated and make them readily accessible to the wider cancer research community. TCGA was initiated in 2006 as a joint effort between the NCI and the National Human Genome Research Institute to create a comprehensive ‘atlas’ of cancer genomic profiles by cataloguing cancer-causing genome alterations in the most prevalent human tumour types (83). During the subsequent 16 years, the initiative has generated multi-omics data, including gene expression, DNA mutation, copy number variant and DNA methylation, from over 20 000 primary cancer and matched normal samples across 33 cancer types (84). TCGA data can be accessed through the Genomic Data Commons data portal (https://portal.gdc.cancer.gov/). The portal provides different navigation options for browsing available datasets to view summaries of data for each project, explore data at the case, gene and mutation levels, or compare different cohorts or clinical variables of a specific cohort. One limitation for cancer versus normal comparisons from the TCGA is that the number of samples from adjacent normal tissue is often far lower than that for tumour samples, which reduces the statistical power of the analysis. An alternative non-tumour gene expression resource that can be used for comparison purposes is the Genotype-Tissue Expression project (https://gtexportal.org/home/) that has gene expression data for 54 normal tissue types from close to 1000 individuals.

The cBio Cancer Genomics Portal (https://www.cbioportal.org/) was developed at the Memorial Sloan-Kettering Cancer Center to enable the visualization and in-depth analysis of multi-omics patient data for various types of cancer (85,86). It houses data from the entire TCGA Pan-Cancer Atlas, comprising over 10 000 samples (87) as well as additional data from over 200 published studies with almost 70 000 patient samples that have been curated to ensure there is no redundancy between the studies. Users can query selected cancer studies to visualize the available omics data for single or multiple genes across patient samples. For example, querying the TCGA lung adenocarcinoma Pan-Cancer Atlas study for omics data pertaining to RAS pathway members generates a series of reports, including a summary of genomic alterations (Figure 3A). These data demonstrate that KRAS, HRAS, NRAS and BRAF gene alterations are mutually exclusive, pointing to the shared functional relationship between pathway members. This can then be explored further using the ‘Pathways’ tab that provides a schematic of signalling pathway(s) and functionally linked proteins to the user’s target query, together with details of alteration frequency for all related targets (Figure 3B). Mutual exclusivity of mutations in cancer can be used to identify vulnerabilities that can be exploited therapeutically, such as the observation that cyclin E1 amplification is mutually exclusive with BRCA1 mutation in high-grade serous ovarian cancers and that BRCA1 is selectively required for survival of cyclin E1 amplified cells (88). Such studies demonstrate the power of this analysis to identify clinical positioning opportunities.

Clinical datasets. (A) The output from cBioPortal analysis of HRAS, KRAS, NRAS and BRAF genetic alterations found in a lung adenocarcinoma TCGA study. The overall prevalence of genetic alteration is indicated by a percentage for each gene and then further details of the type of alteration are represented by colour. Each vertical line of rectangles represents a single patient, so co-occurrence or mutual exclusivity of genetic alterations can be visualized, as indicated in this example by a general trend towards lack of co-occurrence of genetic alterations between the queried genes. (B) cBioPortal output using the same input query as panel (A), from the ‘Pathways’ tab. A schematic of the signalling pathway within lung adenocarcinoma is shown together with a genetic alteration frequency for each functionally linked gene. (C) A Kaplan–Meier plot from the Kaplan–Meier Plotter (KMplot) database demonstrating the correlation between high expression of RalA and reduced overall survival in hepatocellular carcinoma. Hazard ratios and logrank P-values are shown to demonstrate whether there is a statistically significant correlation observed.
Figure 3.

Clinical datasets. (A) The output from cBioPortal analysis of HRAS, KRAS, NRAS and BRAF genetic alterations found in a lung adenocarcinoma TCGA study. The overall prevalence of genetic alteration is indicated by a percentage for each gene and then further details of the type of alteration are represented by colour. Each vertical line of rectangles represents a single patient, so co-occurrence or mutual exclusivity of genetic alterations can be visualized, as indicated in this example by a general trend towards lack of co-occurrence of genetic alterations between the queried genes. (B) cBioPortal output using the same input query as panel (A), from the ‘Pathways’ tab. A schematic of the signalling pathway within lung adenocarcinoma is shown together with a genetic alteration frequency for each functionally linked gene. (C) A Kaplan–Meier plot from the Kaplan–Meier Plotter (KMplot) database demonstrating the correlation between high expression of RalA and reduced overall survival in hepatocellular carcinoma. Hazard ratios and logrank P-values are shown to demonstrate whether there is a statistically significant correlation observed.

Although the previously described databases focus on mRNA expression levels in cancer, the vast majority of anti-cancer drugs are designed against protein targets and changes in protein activity are the major driver of cancer progression. The NCI’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) (https://proteomics.cancer.gov/programs/cptac) utilizes large-scale mass spectrometry-based methods to characterize the proteome of patient samples. This initiative was built on transcriptomic data from TCGA to characterize colorectal, breast and ovarian cancer samples (89–91). These initial studies demonstrated that mRNA levels do not correlate accurately with protein levels and that proteomic analysis could be used to further refine patient stratification and identify novel therapeutic targets. The CPTAC dataset has since expanded into several other indications and now comprises proteomic characterization of over 2000 patient samples (92–97). In addition to measuring protein levels, proteomic datasets also include information on post-translation modifications (PTMs). These data have been used to identify downstream targets of KRAS in pancreatic cancer (95) and to suggest that inhibition of Rb phosphorylation could be a viable therapeutic strategy in colorectal cancer (98). Recent re-analysis of CPTAC datasets using more powerful cloud computing methods has identified PTMs that were present at lower levels than could be previously identified and from a wider range than were previously considered (99), suggesting that the full potential of the CPTAC datasets is yet to be realized.

For the purposes of cancer target evaluation, the University of ALabama at Birmingham CANcer data analysis portal (UALCAN) (http://ualcan.path.uab.edu/analysis-prot.html) (100), a user-friendly web portal to analyse CPTAC datasets, houses proteomic data obtained from analysing 2002 patient samples from 17 separate studies (101). The portal allows determination of target protein levels and phosphorylation state across a range of cancer types in comparison to adjacent normal tissue. It can also be used as a user-friendly method of accessing TCGA gene expression and clinical survival data.

The major limitation of current proteomic databases for target evaluation is the limited sample size compared to that available for transcriptomics. Mining datasets comprising hundreds, or low thousands, of patient samples spread across several indications reduces the statistical power to detect significant alterations in protein level and/or PTM status within a single target indication.

An alternative initiative to map the abundance of human proteins across human tissues using antibody-based imaging and mass spectrometry approaches is the Human Protein Atlas (https://www.proteinatlas.org/). The Human Protein Atlas began by characterizing the expression and localization of 700 proteins across 48 normal human tissues and 20 cancer types using tissue microarrays (102) and has progressed to providing expression and localization data for over 16 000 proteins in all major tissues and organs (103). An extension of this initiative is the Human Pathology Atlas (104), which houses mRNA (from TCGA) and protein (from antibody-based imaging) levels for several cancer types. This dataset can be used as independent confirmation of data from CPTAC and knowledge of target protein levels in normal tissue may flag key tissue-specific or general housekeeping functions that could inform tolerability assessments. A future expansion of the Human Protein Atlas that would be of interest would be to build on the PTM analysis of CPTAC and use validated PTM-specific antibodies for cancer-associated proteins to identify and characterize PTM events that could be targeted in cancer.

cBioPortal, TCGA and the Human Protein Atlas datasets allow correlation of omics data with available patient data, providing evidence that target expression and/or mutation are associated with clinical outcome. A simple, user-friendly portal that houses manually curated datasets from the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/info/overview.html), the European Genome–Phenome Archive (https://ega-archive.org/) and TCGA is also available: The KMplot web tool (https://kmplot.com/analysis/) allows users to upload their own datasets or mine available miRNA and mRNA expression data and correlate with patient outcome (105). The mRNA expression data from RNA-seq analysis, for example, have data for >7000 patient samples across all indications correlated with overall survival. KMplot also allows the user to further restrict the analysis by cancer subtype, including sex, grade, mutation burden or enrichment of specific immune cell subtypes. The output is presented as a Kaplan–Meier survival plot for high/low expression of the target of interest together with a hazard ratio and logrank P-value to determine statistical significance. RalA is a GTPase that functions downstream of KRAS. KMplot has data from 259 patients classified as having low RALA expression and 111 with high RALA expression. Unrestricted analysis demonstrates that there is a statistically significant correlation between high RALA expression and reduced overall survival in hepatocellular carcinoma (Figure 3C), consistent with published data (106). Data demonstrating that high target expression correlates with poor outcome in a specific disease setting can inform clinical positioning assessments.

Comprehensive target evaluation databases

There are now several open-source comprehensive databases that allow researchers to identify and prioritize potential drug targets for further study (107–109). Such databases aim to integrate information from several of the databases described above into a single easy-to-use platform.

The Open Targets Platform (https://platform.opentargets.org/) was developed as a public–private partnership between several European research institutes and pharmaceutical companies (107). It integrates data from 22 public sources and scores target–disease associations based on information regarding the target, disease, mutation, known drugs, animal models and scientific literature. The Open Targets Platform can be queried by target, drug, specific disease or phenotype (110,111). Searching by target generates a graphical visualization illustrating disease associations grouped by therapeutic area, together with a separate profile page of available information for the target. This profile includes details on known drugs (investigational or approved) for the target, reported safety effects, chemical probes, target expression (RNA and protein), molecular information and pathways, and phenotypes linked to mouse homologues. Also included in the profile page is a target tractability assessment for small molecule, antibody, PROTAC and other therapeutic modalities. Searching on a specific disease presents a list of targets associated with that disease that can be prioritized for further appraisal. Querying the platform for a specific drug shows its mechanism of action, investigational and approved indications, clinical trials, pharmacovigilance and all scientific literature associated with that drug.

TargetDB (https://github.com/sdecesco/targetDB) is a resource that allows users to rapidly query multiple public databases and generate an integrated summary of available information for a target (108). It is distributed as a Python package with SQLite database and can be downloaded by following the installation instructions on GitHub. TargetDB uses data obtained from public databases and sources such as ChEMBL, Open Targets, Protein Data Bank (PDB), PubMed, the Human Protein Atlas and UniProt (108). When queried for a single gene target, it generates an Excel file containing numerous worksheets with different target information: general information plus a spider plot summary of available data; PubMed search results listing the 500 most recent publications; disease information showing protein and GWAS associations; disease areas and association scores; protein expression; mouse genotypes and associated phenotypes; isoforms; observed variants and mutants; and a list of available crystal structures for the target. Also included is an analysis of potential small molecule binding pockets present in the target together with a ligandability score. Commercially available bioactive compounds are also listed with SMILES strings, target affinities and links to vendor websites. TargetDB allows queries for multiple targets using the list mode function to generate a report containing information for each target, or spider plot mode that outputs target information as a graphical representation.

The canSAR knowledgebase (https://cansar.ai/) is the largest publicly available online resource for translational cancer research and drug discovery (109,112). It was first released in 2011 (113) by the Institute of Cancer Research and has recently received major updates (109,112). This new edition integrates multi-omics patient and cell line data (ICGC and TCGA) with genetic mutation and dependency data (DepMap). The annotated human proteome (UniProt Swiss-Prot) together with data on protein 3D structures (PDB and AlphaFold), PPIs (IMEx Consortium and others), medicinal chemistry (ChEMBL and BindingDB) and clinical trials (ClinicalTrials.gov) is used by machine learning algorithms to assess ligandability of the target. Searching the canSAR knowledgebase by target takes the user to a dashboard that includes molecular synopsis pages for ligandability, signalling, disease types, experimental tools, features and chemical tools associated with the target. The ‘Disease types’ page shows a word cloud of cancer indications that are sized according to a cancer–target association score. There are clinical, mutation, copy number, expression and combined molecular scores available for each target with the score corresponding to how strong the link is between that particular target feature and the specified cancer target indication. This is a particularly powerful feature that allows previously unknown associations between the target and a disease setting to be identified. The ‘Ligandability’ tab provides a top line summary of areas of the protein associated with druggable cavities, a link to associated structural data and links to any approved or investigational drugs and chemical tools that can be used. canSAR labels associated chemical tools as ‘recommended’ or ‘acceptable’, depending on selectivity profiles, and provides direct links to the Chemical Probes Portal and Probe Miner for additional information. The ‘Experimental Tools’ synopsis page includes information on expression systems for expressing active target protein, known target engagement biomarkers and cell lines ranked on target expression, genetic dependencies and chemical dependencies identified with tool compounds known to act on the target. Additional features available for each target include a target gene family cladogram, target interactome and association of target mutation or expression with cancer indication-specific immune subtypes as defined by Thorsson et al. (114).

The comprehensive databases described above offer platforms that can quickly provide an overview of a wide range of available datasets for a given target in cancer and should be considered valuable tools for cancer target evaluation. However, no single platform currently covers all the datasets and tools highlighted in this review and therefore outputs from the various databases described should be considered together as part of a comprehensive target evaluation. The individual databases can also offer additional features or flexibility in data analysis that may provide further information.

Artificial intelligence in drug discovery

Although not the focus of this review, is it impossible to ignore the growing interest in artificial intelligence (AI) and the potential utility across all stages of the drug discovery process. The area has been covered by several recent comprehensive reviews (115–117) that discuss the potential of AI to increase the efficiency of the drug discovery process in target identification, protein structure and druggable site predictions, de novo small molecule design and predictions of drug toxicity. We will therefore only briefly cover applications of AI that we see having a clear impact in the immediate future.

AI offers the opportunity to rapidly integrate the extensive datasets from multiple resources, such as those described in this review, to either identify novel targets or evaluate proposed targets in an unbiased manner (116). In the coming years, omics datasets are only to get larger and more complex. AI offers the potential to integrate and interpret complex outputs from multiple sources to produce unbiased outputs. Indeed, there are several drug discovery companies that currently utilize AI platforms for target identification that have assets in clinical development (118).

As described in previous sections, toxicity is a key reason for attrition of anti-cancer agents during the clinical development process. Current methods to predict toxicity, including evaluating mouse knockout phenotypes or comparing expression or target dependency in cancer versus cells of a non-malignant origin, have limited utility and make it challenging to effectively predict tolerability as part of a target evaluation. This is a key area of importance for the field as more accurate toxicity predictions may better predict the likelihood of clinical success. A potential application of AI will be to make more accurate predictions of toxicity based on predictions of compound PK/PD properties and off-target effects together with liabilities associated with specific chemical characteristics. A recent comprehensive review highlights the current status of toxicity prediction models and challenges associated with development in this area (119). A gold-standard accessible tool in this space will be of immense importance to support the design of compounds with minimal toxicity risks.

The final AI application we wish to highlight is in protein structure predictions. Structural information for a target can be used to identify small molecule binding pockets and to guide compound design in a rationale manner. Available structural information is therefore a key factor in tractability assessments. Currently, the Worldwide Protein Data Bank (http://www.wwpdb.org/) is the primary source for such information as the main repository for protein structural data (120). However, coverage of the human proteome is not complete and there may not be available information for a novel target. The AlphaFold Protein Structure Database (https://alphafold.ebi.ac.uk/), which has been developed by DeepMind from Google, uses AI to predict protein structural data from primary protein sequence, with almost complete coverage of the human proteome. AlphaFold predictions have been evaluated and found to correlate reliably with experimental data (121–125). Structural enablement of a drug discovery project allows predictions to be made regarding the feasibility of identifying a ligand that can bind with sufficient affinity to modulate the activity of the target of interest. It is important to recognize, however, that predictions made by AlphaFold will not currently account for potential conformational changes induced by PPIs, post-translational modification or small molecule binding, for example. An example of the application of AlphaFold comes from the development of a novel small molecule inhibitor of CDK20, which utilized AI technology to design small molecules based on the structures predicted by AlphaFold (126).

Amid the excitement and significant investment in AI approaches, a note of caution is that there are currently no approved drugs against targets that have been identified primarily through AI or approved drugs that have been AI designed. This is likely to change in the near future, given the recent advances in AI technology and the time taken to move through the drug discovery process. In the coming years, comparisons between costs and timelines of AI-driven approaches to conventional drug discovery will allow determination of where AI can make the biggest impact in expediting the drug discovery process.

COMBINING DATABASE OUTPUTS FOR PRACTICAL PURPOSES

To demonstrate how outputs from a defined set of databases described within this review can be utilized in a practical setting, we use an example to illustrate how combining outputs can support evaluation of a novel target and also demonstrate how we can retrospectively analyse clinical trial failures to determine whether this could have been predicted via in silico analysis.

To illustrate how database outputs can support target evaluation and guide further target validation and clinical positioning efforts, we use NME6 as an example of a novel anti-cancer target. NME6 is a mitochondrial nucleoside diphosphate kinase whose primary reported function is to ensure sufficient supply of ribonucleotides to the mitochondria. A PubMed search of ‘NME6 AND Cancer’ gives only three results, none of which provide sufficient data to identify NME6 as a novel anti-cancer drug target. A recent bioRxiv preprint proposes that NME6 regulates mitochondrial gene expression and should be considered a novel target in diseases in which mitochondrial gene transcription is altered, including cancer (127).

A minimal set of databases can be used to support the unbiased evaluation of NME6 as a novel anti-cancer drug target (Figure 4A). DepMap output from CRISPR screens identified liver cancer as being the most enriched indication for cancer cell intrinsic NME6 dependency (Figure 4B). The UALCAN portal used to visualize TCGA patient gene expression data demonstrates that NME6 gene expression is increased in cancer tissue relative to adjacent normal samples in several indications, with one of the most striking upregulations being observed in liver hepatocellular carcinoma (Figure 4C; LIHC is boxed for clarity). KMplot was then used to determine whether higher NME6 expression was associated with patient outcome across 20 separate indications. There were four indications where high NME6 expression showed a significant correlation with poor outcome, with the strongest association being observed in liver hepatocellular carcinoma (Figure 4D). The IMPC mouse knockout database was used to flag adverse phenotypes associated with NME6 knockout, which may be indicative of toxicity liabilities. Homozygous NME6 knockout results in embryonic lethality, pointing to an essential role in embryonic development, while heterozygous knockout results in minimal observed phenotypes in adult mice (Figure 4E). There are no experimentally determined NME6 protein structures available within the PDB, but predicted structures are available via AlphaFold (Figure 4F). Together, this set of data provides support from cancer cell line and clinical datasets that NME6 function may be linked to liver cancer progression. Available mouse knockout data do not flag any clear toxicity liabilities associated with heterozygous knockout in the adult mouse and AlphaFold structure prediction gives a starting point for computational chemistry methods to identify putative drug-binding pockets and perform detailed ligandability assessments. Several additional databases described within this review can provide further mechanistic insight into target biology, guide the selection of clinically relevant pre-clinical model systems and guide potential NME6 targeting strategies. This minimal set of data from freely available databases addresses several of the key outputs of a target evaluation study (Figure 1), which mitigates the risk associated with committing resource to further exploration of NME6 as a novel anti-cancer drug target.

Combining databases for target evaluation. (A) Workflow for completing a minimal target evaluation for a novel target that has an impact on cancer cell intrinsic function. (B) Enriched lineage analysis from DepMap CRISPR screening data, demonstrating that liver cancer cells demonstrate a trend of increased NME6 dependence. (C) UALCAN portal output of patient gene expression data from TCGA comparing NME6 expression in cancer and matched normal tissue. Liver hepatocellular carcinoma is boxed for clarity. (D) KMplot output demonstrating patients with high NME6 have reduced overall survival compared to patients with low NME6 expression in liver hepatocellular carcinoma patients. (E) Summary of mouse homozygous and heterozygous NME6 knockout phenotypes from the IMPC. (F) Human NME6 protein structure prediction from AlphaFold (https://alphafold.ebi.ac.uk/entry/O75414).
Figure 4.

Combining databases for target evaluation. (A) Workflow for completing a minimal target evaluation for a novel target that has an impact on cancer cell intrinsic function. (B) Enriched lineage analysis from DepMap CRISPR screening data, demonstrating that liver cancer cells demonstrate a trend of increased NME6 dependence. (C) UALCAN portal output of patient gene expression data from TCGA comparing NME6 expression in cancer and matched normal tissue. Liver hepatocellular carcinoma is boxed for clarity. (D) KMplot output demonstrating patients with high NME6 have reduced overall survival compared to patients with low NME6 expression in liver hepatocellular carcinoma patients. (E) Summary of mouse homozygous and heterozygous NME6 knockout phenotypes from the IMPC. (F) Human NME6 protein structure prediction from AlphaFold (https://alphafold.ebi.ac.uk/entry/O75414).

A key objective of the use of databases for cancer target evaluation is to reduce the high attrition rate associated with clinical development of novel compounds. As highlighted in previous sections, accurate toxicity predictions are challenging with currently available tools. However, output from the databases described within this review can inform clinical development. To illustrate how integration of database outputs could have been used to predict differing clinical outcomes between alternative indications, we evaluate epidermal growth factor receptor (EGFR) targeting in glioblastoma (GBM). cBioPortal analysis shows that EGFR gene alterations are present in approximately half of GBM patient samples, with gene amplification and mutation being the most prevalent alterations observed (Figure 5A). The high rates of EGFR alterations gave optimism that targeting EGFR could be a therapeutic vulnerability of GBM and was supported by in vitro data from GBM cell lines (128). However, while EGFR inhibitors have approval for treatment of EGFR mutant NSCLC, they have failed to demonstrate efficacy in GBM (129,130). Multiple reasons may account for failure in the GBM setting. The anatomical properties of the blood–brain barrier can limit drug penetrance within GBM and compensatory mutations of other receptor tyrosine kinases may drive resistance. cBioPortal analysis of mutation hotspots shows that EGFR mutations in GBM are most frequently found within the extracellular furin-like domain, while EGFR mutations in lung cancer typically occur within the intracellular kinase domain (Figure 5B). These data suggest that first- and second-generation EGFR inhibitors that target the kinase domain are not targeted towards the EGFR mutations observed in GBM and explain to some extent lack of efficacy. The efficacy of EGFR inhibition is likely to at least in part be driven by the reshaping of the tumour immune microenvironment, with higher infiltration and proliferation of anti-tumour T cells observed in mouse models following EGFR inhibition (131). GBM is well recognized as a ‘cold’ tumour type, with low infiltration of activated T cells, where the striking efficacy of immunotherapies observed in some other indications has not yet been achieved (132). Mining the TIMER database for associations between gene alteration and immune cell infiltrate illustrates that EGFR mutation or amplification in GBM patient samples does not correlate with any change in T-cell infiltration (Figure 5C), in contrast to lung adenocarcinoma, where EGFR alteration is associated with reduced T-cell infiltration (Figure 5D). Together, these database analyses predict that, despite high levels of EGFR genetic alteration observed in GBM, the lack of kinase-activating mutations and lack of impact of EGFR alteration on the tumour immune microenvironment suggest that EGFR kinase inhibitors are likely to have differing efficacy in EGFR mutant NSCLC and GBM. Such analysis may point to a need to generate compounds that target the EGFR extracellular mutation to target EGFR mutant GBM (133) or to combine with additional agents that promote increased T-cell infiltration.

Prediction of clinical development challenges. (A) EGFR mutation/alteration frequency across selected cancer types from cBioPortal. (B) EGFR mutation landscape and properties mapped on protein domain schematic for GBM and lung adenocarcinoma (LUAD); mutations are coloured as missense (green), truncating (black), in-frame (brown), splice (orange) and fusion (purple) from cBioPortal. (C) Correlation between EGFR wild type and mutation (top panel) or EGFR deletion amplification versus normal (lower panel) with CD8 T-cell infiltration in GBM from TIMER2.0. (D) As for panel (C), with LUAD as indicated.
Figure 5.

Prediction of clinical development challenges. (A) EGFR mutation/alteration frequency across selected cancer types from cBioPortal. (B) EGFR mutation landscape and properties mapped on protein domain schematic for GBM and lung adenocarcinoma (LUAD); mutations are coloured as missense (green), truncating (black), in-frame (brown), splice (orange) and fusion (purple) from cBioPortal. (C) Correlation between EGFR wild type and mutation (top panel) or EGFR deletion amplification versus normal (lower panel) with CD8 T-cell infiltration in GBM from TIMER2.0. (D) As for panel (C), with LUAD as indicated.

PERSPECTIVES

The breadth of free-to-use publicly available cancer databases described within this review (Table 1) allows a comprehensive evaluation of a novel target to be carried out in a short timeframe and without the need for specialist training. The key aims of target evaluation (Figure 1) are to determine whether the risks associated with a specific target programme are sufficiently mitigated by available information to support progression into the drug discovery process. Utilization of the databases described within this review can increase the information available and support assessments of existing data to inform target evaluation and consequently reduce the high attrition rate associated with novel drug development. Effective mining of databases can also be used to identify previously unknown clinical opportunities for a selected target or repurposing opportunities for an existing drug. Incorporation of patient omics data and effective utilization of clinically relevant pre-clinical models provide key data to support the development of precision medicine on a patient-specific basis.

Table 1.

Summary table of recommended databases

DatabaseContent synopsisURL
AlphaFold Protein Structure DatabaseAI prediction of protein structure from primary sequence with almost complete coverage of the human proteomehttps://alphafold.ebi.ac.uk/
cBioPortalPortal for exploring and analysing multi-omics characterization of patient sampleshttps://www.cbioportal.org/
Cancer Cell Line EncyclopediaGenomic and metabolic characterization of cancer cell lineshttps://sites.broadinstitute.org/ccle/
canSARKnowledgebase of multidisciplinary data that applies machine learning approaches to provide drug discovery predictionshttps://cansar.ai/
Cell Model PassportsGenomic and clinical characterization of >2000 cancer cell lineshttps://cellmodelpassports.sanger.ac.uk/
CellxGenePortal to access scRNA-seq datasetshttps://cellxgene.cziscience.com/
Chemical Probes PortalExpert-reviewed online resource for identifying and using chemical probes in biomedical research and drug discoveryhttps://www.chemicalprobes.org
Clinical Proteomic Tumor Analysis ConsortiumHouses mass spectrometry characterization of the human proteome from patient sampleshttps://proteomics.cancer.gov/programs/cptac
Cancer Dependency MapHouses siRNA, CRISPR and pharmacological screening data for genomically characterized cancer cell line panelshttps://depmap.org/portal/
Deeply Integrated human Single-Cell Omics dataAccesses human scRNA-seq datasets integrated into tissue-specific atlaseshttps://www.immunesinglecell.org/
DrugBankDetailed information for approved drugs and investigational compoundshttps://go.drugbank.com/
Genomic Data Commons data portalAccesses TCGA multi-omics datasets from >20 000 primary cancer and matched normal sampleshttps://portal.gdc.cancer.gov/
Genomics of Drug Sensitivity in CancerProfiling of the response of >1000 cancer cell lines with over 600 approved and investigational pharmacological agentshttps://www.cancerrxgene.org/
Genotype-Tissue Expression projectGene expression data from 54 non-disease tissue types from close to 1000 individualshttps://gtexportal.org/home/
International Mouse Phenotyping ConsortiumMouse knockout phenotypic characterization from consortium aiming to knock out every protein-coding gene within mouse genomehttps://www.mousephenotype.org/
Kaplan–Meier PlotterAllows correlations between gene expression and patient outcome from manually curated datasets from several sourceshttps://kmplot.com/analysis/
The Human Protein AtlasResource that aims to map the human proteome across all major tissues and organs in normal and disease settingshttps://www.proteinatlas.org/
The Mouse Models of Human Cancer DatabaseKnowledgebase of mouse models of human cancer with data from >46 000 models, including inbred mouse models, PDXs and GEMMshttp://tumor.informatics.jax.org
Open Targets PlatformDatabase for target identification and prioritization of target–disease associationshttps://platform.opentargets.org/
Patient Derived Cancer Models FinderTool to identify suitable PDX mouse modelshttps://www.cancermodels.org
Probe MinerResource that uses fitness factors to objectively identify the best tool compounds for experimental usehttps://probeminer.icr.ac.uk
PROTAC-DBOnline resource for identifying currently described PROTAC moleculeshttp://cadd.zju.edu.cn/protacdb/
Single Cell Expression AtlasPortal to access scRNA-seq datasetshttps://www.ebi.ac.uk/gxa/sc/home
STRING: functional protein association networkDatabase of known and predicted PPIshttps://string-db.org/
Structural Genomics ConsortiumPortal to access information on and request chemical probeshttps://www.thesgc.org/chemical-probes
TargetDBTool for compiling target information from public databaseshttps://github.com/sdecesco/targetDB
Tumor IMmune Estimation ResourcePortal to explore infiltration of immune cells in TCGA tumour samples and correlate this with gene alterationshttp://timer.cistrome.org
Tumor–Immune System Interaction DatabasePredicts responses to immunotherapy by integrating datasets from multiple sources, including gene expression, CRISPR/shRNA screening to determine sensitivity to T-cell-mediated killing and literature mininghttp://cis.hku.hk/TISIDB/index.php
Tumor Immune Syngeneic MOuseSyngeneic mouse model datasets including cell line genotype and cancer type, mouse genetic background and implantation site. Provides interactive visual interfaces to explore gene expression, immune cell infiltrate and response to therapyhttp://tismo.cistrome.org/
University of ALabama at Birmingham CANcer data analysis portalHouses proteomic data obtained from mass spectrometry analysis of 2002 patient samples from 17 separate studieshttp://ualcan.path.uab.edu/analysis-prot.html
Worldwide Protein Data BankMain worldwide repository for protein structural informationhttp://www.wwpdb.org/
DatabaseContent synopsisURL
AlphaFold Protein Structure DatabaseAI prediction of protein structure from primary sequence with almost complete coverage of the human proteomehttps://alphafold.ebi.ac.uk/
cBioPortalPortal for exploring and analysing multi-omics characterization of patient sampleshttps://www.cbioportal.org/
Cancer Cell Line EncyclopediaGenomic and metabolic characterization of cancer cell lineshttps://sites.broadinstitute.org/ccle/
canSARKnowledgebase of multidisciplinary data that applies machine learning approaches to provide drug discovery predictionshttps://cansar.ai/
Cell Model PassportsGenomic and clinical characterization of >2000 cancer cell lineshttps://cellmodelpassports.sanger.ac.uk/
CellxGenePortal to access scRNA-seq datasetshttps://cellxgene.cziscience.com/
Chemical Probes PortalExpert-reviewed online resource for identifying and using chemical probes in biomedical research and drug discoveryhttps://www.chemicalprobes.org
Clinical Proteomic Tumor Analysis ConsortiumHouses mass spectrometry characterization of the human proteome from patient sampleshttps://proteomics.cancer.gov/programs/cptac
Cancer Dependency MapHouses siRNA, CRISPR and pharmacological screening data for genomically characterized cancer cell line panelshttps://depmap.org/portal/
Deeply Integrated human Single-Cell Omics dataAccesses human scRNA-seq datasets integrated into tissue-specific atlaseshttps://www.immunesinglecell.org/
DrugBankDetailed information for approved drugs and investigational compoundshttps://go.drugbank.com/
Genomic Data Commons data portalAccesses TCGA multi-omics datasets from >20 000 primary cancer and matched normal sampleshttps://portal.gdc.cancer.gov/
Genomics of Drug Sensitivity in CancerProfiling of the response of >1000 cancer cell lines with over 600 approved and investigational pharmacological agentshttps://www.cancerrxgene.org/
Genotype-Tissue Expression projectGene expression data from 54 non-disease tissue types from close to 1000 individualshttps://gtexportal.org/home/
International Mouse Phenotyping ConsortiumMouse knockout phenotypic characterization from consortium aiming to knock out every protein-coding gene within mouse genomehttps://www.mousephenotype.org/
Kaplan–Meier PlotterAllows correlations between gene expression and patient outcome from manually curated datasets from several sourceshttps://kmplot.com/analysis/
The Human Protein AtlasResource that aims to map the human proteome across all major tissues and organs in normal and disease settingshttps://www.proteinatlas.org/
The Mouse Models of Human Cancer DatabaseKnowledgebase of mouse models of human cancer with data from >46 000 models, including inbred mouse models, PDXs and GEMMshttp://tumor.informatics.jax.org
Open Targets PlatformDatabase for target identification and prioritization of target–disease associationshttps://platform.opentargets.org/
Patient Derived Cancer Models FinderTool to identify suitable PDX mouse modelshttps://www.cancermodels.org
Probe MinerResource that uses fitness factors to objectively identify the best tool compounds for experimental usehttps://probeminer.icr.ac.uk
PROTAC-DBOnline resource for identifying currently described PROTAC moleculeshttp://cadd.zju.edu.cn/protacdb/
Single Cell Expression AtlasPortal to access scRNA-seq datasetshttps://www.ebi.ac.uk/gxa/sc/home
STRING: functional protein association networkDatabase of known and predicted PPIshttps://string-db.org/
Structural Genomics ConsortiumPortal to access information on and request chemical probeshttps://www.thesgc.org/chemical-probes
TargetDBTool for compiling target information from public databaseshttps://github.com/sdecesco/targetDB
Tumor IMmune Estimation ResourcePortal to explore infiltration of immune cells in TCGA tumour samples and correlate this with gene alterationshttp://timer.cistrome.org
Tumor–Immune System Interaction DatabasePredicts responses to immunotherapy by integrating datasets from multiple sources, including gene expression, CRISPR/shRNA screening to determine sensitivity to T-cell-mediated killing and literature mininghttp://cis.hku.hk/TISIDB/index.php
Tumor Immune Syngeneic MOuseSyngeneic mouse model datasets including cell line genotype and cancer type, mouse genetic background and implantation site. Provides interactive visual interfaces to explore gene expression, immune cell infiltrate and response to therapyhttp://tismo.cistrome.org/
University of ALabama at Birmingham CANcer data analysis portalHouses proteomic data obtained from mass spectrometry analysis of 2002 patient samples from 17 separate studieshttp://ualcan.path.uab.edu/analysis-prot.html
Worldwide Protein Data BankMain worldwide repository for protein structural informationhttp://www.wwpdb.org/
Table 1.

Summary table of recommended databases

DatabaseContent synopsisURL
AlphaFold Protein Structure DatabaseAI prediction of protein structure from primary sequence with almost complete coverage of the human proteomehttps://alphafold.ebi.ac.uk/
cBioPortalPortal for exploring and analysing multi-omics characterization of patient sampleshttps://www.cbioportal.org/
Cancer Cell Line EncyclopediaGenomic and metabolic characterization of cancer cell lineshttps://sites.broadinstitute.org/ccle/
canSARKnowledgebase of multidisciplinary data that applies machine learning approaches to provide drug discovery predictionshttps://cansar.ai/
Cell Model PassportsGenomic and clinical characterization of >2000 cancer cell lineshttps://cellmodelpassports.sanger.ac.uk/
CellxGenePortal to access scRNA-seq datasetshttps://cellxgene.cziscience.com/
Chemical Probes PortalExpert-reviewed online resource for identifying and using chemical probes in biomedical research and drug discoveryhttps://www.chemicalprobes.org
Clinical Proteomic Tumor Analysis ConsortiumHouses mass spectrometry characterization of the human proteome from patient sampleshttps://proteomics.cancer.gov/programs/cptac
Cancer Dependency MapHouses siRNA, CRISPR and pharmacological screening data for genomically characterized cancer cell line panelshttps://depmap.org/portal/
Deeply Integrated human Single-Cell Omics dataAccesses human scRNA-seq datasets integrated into tissue-specific atlaseshttps://www.immunesinglecell.org/
DrugBankDetailed information for approved drugs and investigational compoundshttps://go.drugbank.com/
Genomic Data Commons data portalAccesses TCGA multi-omics datasets from >20 000 primary cancer and matched normal sampleshttps://portal.gdc.cancer.gov/
Genomics of Drug Sensitivity in CancerProfiling of the response of >1000 cancer cell lines with over 600 approved and investigational pharmacological agentshttps://www.cancerrxgene.org/
Genotype-Tissue Expression projectGene expression data from 54 non-disease tissue types from close to 1000 individualshttps://gtexportal.org/home/
International Mouse Phenotyping ConsortiumMouse knockout phenotypic characterization from consortium aiming to knock out every protein-coding gene within mouse genomehttps://www.mousephenotype.org/
Kaplan–Meier PlotterAllows correlations between gene expression and patient outcome from manually curated datasets from several sourceshttps://kmplot.com/analysis/
The Human Protein AtlasResource that aims to map the human proteome across all major tissues and organs in normal and disease settingshttps://www.proteinatlas.org/
The Mouse Models of Human Cancer DatabaseKnowledgebase of mouse models of human cancer with data from >46 000 models, including inbred mouse models, PDXs and GEMMshttp://tumor.informatics.jax.org
Open Targets PlatformDatabase for target identification and prioritization of target–disease associationshttps://platform.opentargets.org/
Patient Derived Cancer Models FinderTool to identify suitable PDX mouse modelshttps://www.cancermodels.org
Probe MinerResource that uses fitness factors to objectively identify the best tool compounds for experimental usehttps://probeminer.icr.ac.uk
PROTAC-DBOnline resource for identifying currently described PROTAC moleculeshttp://cadd.zju.edu.cn/protacdb/
Single Cell Expression AtlasPortal to access scRNA-seq datasetshttps://www.ebi.ac.uk/gxa/sc/home
STRING: functional protein association networkDatabase of known and predicted PPIshttps://string-db.org/
Structural Genomics ConsortiumPortal to access information on and request chemical probeshttps://www.thesgc.org/chemical-probes
TargetDBTool for compiling target information from public databaseshttps://github.com/sdecesco/targetDB
Tumor IMmune Estimation ResourcePortal to explore infiltration of immune cells in TCGA tumour samples and correlate this with gene alterationshttp://timer.cistrome.org
Tumor–Immune System Interaction DatabasePredicts responses to immunotherapy by integrating datasets from multiple sources, including gene expression, CRISPR/shRNA screening to determine sensitivity to T-cell-mediated killing and literature mininghttp://cis.hku.hk/TISIDB/index.php
Tumor Immune Syngeneic MOuseSyngeneic mouse model datasets including cell line genotype and cancer type, mouse genetic background and implantation site. Provides interactive visual interfaces to explore gene expression, immune cell infiltrate and response to therapyhttp://tismo.cistrome.org/
University of ALabama at Birmingham CANcer data analysis portalHouses proteomic data obtained from mass spectrometry analysis of 2002 patient samples from 17 separate studieshttp://ualcan.path.uab.edu/analysis-prot.html
Worldwide Protein Data BankMain worldwide repository for protein structural informationhttp://www.wwpdb.org/
DatabaseContent synopsisURL
AlphaFold Protein Structure DatabaseAI prediction of protein structure from primary sequence with almost complete coverage of the human proteomehttps://alphafold.ebi.ac.uk/
cBioPortalPortal for exploring and analysing multi-omics characterization of patient sampleshttps://www.cbioportal.org/
Cancer Cell Line EncyclopediaGenomic and metabolic characterization of cancer cell lineshttps://sites.broadinstitute.org/ccle/
canSARKnowledgebase of multidisciplinary data that applies machine learning approaches to provide drug discovery predictionshttps://cansar.ai/
Cell Model PassportsGenomic and clinical characterization of >2000 cancer cell lineshttps://cellmodelpassports.sanger.ac.uk/
CellxGenePortal to access scRNA-seq datasetshttps://cellxgene.cziscience.com/
Chemical Probes PortalExpert-reviewed online resource for identifying and using chemical probes in biomedical research and drug discoveryhttps://www.chemicalprobes.org
Clinical Proteomic Tumor Analysis ConsortiumHouses mass spectrometry characterization of the human proteome from patient sampleshttps://proteomics.cancer.gov/programs/cptac
Cancer Dependency MapHouses siRNA, CRISPR and pharmacological screening data for genomically characterized cancer cell line panelshttps://depmap.org/portal/
Deeply Integrated human Single-Cell Omics dataAccesses human scRNA-seq datasets integrated into tissue-specific atlaseshttps://www.immunesinglecell.org/
DrugBankDetailed information for approved drugs and investigational compoundshttps://go.drugbank.com/
Genomic Data Commons data portalAccesses TCGA multi-omics datasets from >20 000 primary cancer and matched normal sampleshttps://portal.gdc.cancer.gov/
Genomics of Drug Sensitivity in CancerProfiling of the response of >1000 cancer cell lines with over 600 approved and investigational pharmacological agentshttps://www.cancerrxgene.org/
Genotype-Tissue Expression projectGene expression data from 54 non-disease tissue types from close to 1000 individualshttps://gtexportal.org/home/
International Mouse Phenotyping ConsortiumMouse knockout phenotypic characterization from consortium aiming to knock out every protein-coding gene within mouse genomehttps://www.mousephenotype.org/
Kaplan–Meier PlotterAllows correlations between gene expression and patient outcome from manually curated datasets from several sourceshttps://kmplot.com/analysis/
The Human Protein AtlasResource that aims to map the human proteome across all major tissues and organs in normal and disease settingshttps://www.proteinatlas.org/
The Mouse Models of Human Cancer DatabaseKnowledgebase of mouse models of human cancer with data from >46 000 models, including inbred mouse models, PDXs and GEMMshttp://tumor.informatics.jax.org
Open Targets PlatformDatabase for target identification and prioritization of target–disease associationshttps://platform.opentargets.org/
Patient Derived Cancer Models FinderTool to identify suitable PDX mouse modelshttps://www.cancermodels.org
Probe MinerResource that uses fitness factors to objectively identify the best tool compounds for experimental usehttps://probeminer.icr.ac.uk
PROTAC-DBOnline resource for identifying currently described PROTAC moleculeshttp://cadd.zju.edu.cn/protacdb/
Single Cell Expression AtlasPortal to access scRNA-seq datasetshttps://www.ebi.ac.uk/gxa/sc/home
STRING: functional protein association networkDatabase of known and predicted PPIshttps://string-db.org/
Structural Genomics ConsortiumPortal to access information on and request chemical probeshttps://www.thesgc.org/chemical-probes
TargetDBTool for compiling target information from public databaseshttps://github.com/sdecesco/targetDB
Tumor IMmune Estimation ResourcePortal to explore infiltration of immune cells in TCGA tumour samples and correlate this with gene alterationshttp://timer.cistrome.org
Tumor–Immune System Interaction DatabasePredicts responses to immunotherapy by integrating datasets from multiple sources, including gene expression, CRISPR/shRNA screening to determine sensitivity to T-cell-mediated killing and literature mininghttp://cis.hku.hk/TISIDB/index.php
Tumor Immune Syngeneic MOuseSyngeneic mouse model datasets including cell line genotype and cancer type, mouse genetic background and implantation site. Provides interactive visual interfaces to explore gene expression, immune cell infiltrate and response to therapyhttp://tismo.cistrome.org/
University of ALabama at Birmingham CANcer data analysis portalHouses proteomic data obtained from mass spectrometry analysis of 2002 patient samples from 17 separate studieshttp://ualcan.path.uab.edu/analysis-prot.html
Worldwide Protein Data BankMain worldwide repository for protein structural informationhttp://www.wwpdb.org/

It is important, however, that database output is understood in the context of the specific limitations that we discuss throughout. For example, databases describing the impact of target depletion in cancer cell lines or in immune compromised mouse models will not be of use when predicting a target’s link to a role in the tumour immune response. Similarly, databases that house compound screening datasets will only be informative when the selectivity profile of the compounds used is known. There are also knowledge gaps that could be filled by the creation of new databases. Databases that would be of key importance may include housing datasets for cancer cell line screening under physiologically relevant cell culture conditions, a portal that allows comparison between target knockout phenotypes in cancer and non-cancer (including stromal and immune) cells, and a database that compiles in vivo target depletion/deletion data from tumour models.

Data Availability

No new data were generated or analysed in support of this research. All databases used to generate data have been cited and a URL provided.

ACKNOWLEDGEMENTS

We would like to acknowledge all members of Cancer Research Horizons who were involved in relevant discussions or gave suggestions based on draft versions of this review. We particularly want to thank Chris Bot, Magdalena Florkowska, Rebecca Harris, John Taylor and Tom MacVicar whose input was much appreciated.

FUNDING

This work was supported financially by Cancer Research UK/Cancer Research Horizons.

Conflict of interest statement. None declared.

REFERENCES

1.

Hughes
 
J.P.
,
Rees
 
S.
,
Kalindjian
 
S.B.
,
Philpott
 
K.L.
 
Principles of early drug discovery
.
Br. J. Pharmacol.
 
2011
;
162
:
1239
1249
.

2.

Sun
 
D.
,
Gao
 
W.
,
Hu
 
H.
,
Zhou
 
S.
 
Why 90% of clinical drug development fails and how to improve it?
.
Acta Pharm. Sin. B
.
2022
;
12
:
3049
3062
.

3.

DiMasi
 
J.A.
,
Grabowski
 
H.G.
,
Hansen
 
R.W.
 
Innovation in the pharmaceutical industry: new estimates of R&D costs
.
J. Health Econ.
 
2016
;
47
:
20
33
.

4.

Dowden
 
H.
,
Munro
 
J.
 
Trends in clinical success rates and therapeutic focus
.
Nat. Rev. Drug Discov.
 
2019
;
18
:
495
496
.

5.

Emmerich
 
C.H.
,
Gamboa
 
L.M.
,
Hofmann
 
M.C.J.
,
Bonin-Andresen
 
M.
,
Arbach
 
O.
,
Schendel
 
P.
,
Gerlach
 
B.
,
Hempel
 
K.
,
Bespalov
 
A.
,
Dirnagl
 
U.
 et al. .  
Improving target assessment in biomedical research: the GOT-IT recommendations
.
Nat. Rev. Drug Discov.
 
2021
;
20
:
64
81
.

6.

Zhong
 
L.
,
Li
 
Y.
,
Xiong
 
L.
,
Wang
 
W.
,
Wu
 
M.
,
Yuan
 
T.
,
Yang
 
W.
,
Tian
 
C.
,
Miao
 
Z.
,
Wang
 
T.
 et al. .  
Small molecules in targeted cancer therapy: advances, challenges, and future perspectives
.
Signal Transduct. Target. Ther.
 
2021
;
6
:
201
.

7.

Sun
 
J.
,
Wei
 
Q.
,
Zhou
 
Y.
,
Wang
 
J.
,
Liu
 
Q.
,
Xu
 
H.
 
A systematic analysis of FDA-approved anticancer drugs
.
BMC Syst. Biol.
 
2017
;
11
:
87
.

8.

Meyers
 
R.M.
,
Bryan
 
J.G.
,
McFarland
 
J.M.
,
Weir
 
B.A.
,
Sizemore
 
A.E.
,
Xu
 
H.
,
Dharia
 
N.V.
,
Montgomery
 
P.G.
,
Cowley
 
G.S.
,
Pantel
 
S.
 et al. .  
Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells
.
Nat. Genet.
 
2017
;
49
:
1779
1784
.

9.

Behan
 
F.M.
,
Iorio
 
F.
,
Picco
 
G.
,
Goncalves
 
E.
,
Beaver
 
C.M.
,
Migliardi
 
G.
,
Santos
 
R.
,
Rao
 
Y.
,
Sassi
 
F.
,
Pinnelli
 
M.
 et al. .  
Prioritization of cancer therapeutic targets using CRISPR–Cas9 screens
.
Nature
.
2019
;
568
:
511
516
.

10.

Marcotte
 
R.
,
Sayad
 
A.
,
Brown
 
K.R.
,
Sanchez-Garcia
 
F.
,
Reimand
 
J.
,
Haider
 
M.
,
Virtanen
 
C.
,
Bradner
 
J.E.
,
Bader
 
G.D.
,
Mills
 
G.B.
 et al. .  
Functional genomic landscape of human breast cancer drivers, vulnerabilities, and resistance
.
Cell
.
2016
;
164
:
293
309
.

11.

Tsherniak
 
A.
,
Vazquez
 
F.
,
Montgomery
 
P.G.
,
Weir
 
B.A.
,
Kryukov
 
G.
,
Cowley
 
G.S.
,
Gill
 
S.
,
Harrington
 
W.F.
,
Pantel
 
S.
,
Krill-Burger
 
J.M.
 et al. .  
Defining a cancer dependency map
.
Cell
.
2017
;
170
:
564
576
.

12.

McDonald
 
E.R.
 3rd,
de Weck
 
A.
,
Schlabach
 
M.R.
,
Billy
 
E.
,
Mavrakis
 
K.J.
,
Hoffman
 
G.R.
,
Belur
 
D.
,
Castelletti
 
D.
,
Frias
 
E.
,
Gampa
 
K.
 et al. .  
Project DRIVE: a compendium of cancer dependencies and synthetic lethal relationships uncovered by large-scale, deep RNAi screening
.
Cell
.
2017
;
170
:
577
592.e10
.

13.

Dempster
 
J.M.
,
Boyle
 
I.
,
Vazquez
 
F.
,
Root
 
D.E.
,
Boehm
 
J.S.
,
Hahn
 
W.C.
,
Tsherniak
 
A.
,
McFarland
 
J.M.
 
Chronos: a cell population dynamics model of CRISPR experiments that improves inference of gene fitness effects
.
Genome Biol.
 
2021
;
22
:
343
.

14.

McFarland
 
J.M.
,
Ho
 
Z.V.
,
Kugener
 
G.
,
Dempster
 
J.M.
,
Montgomery
 
P.G.
,
Bryan
 
J.G.
,
Krill-Burger
 
J.M.
,
Green
 
T.M.
,
Vazquez
 
F.
,
Boehm
 
J.S.
 et al. .  
Improved estimation of cancer dependencies from large-scale RNAi screens using model-based normalization and data integration
.
Nat. Commun.
 
2018
;
9
:
4610
.

15.

Dempster
 
J.M.
,
Pacini
 
C.
,
Pantel
 
S.
,
Behan
 
F.M.
,
Green
 
T.
,
Krill-Burger
 
J.
,
Beaver
 
C.M.
,
Younger
 
S.T.
,
Zhivich
 
V.
,
Najgebauer
 
H.
 et al. .  
Agreement between two large pan-cancer CRISPR–Cas9 gene dependency data sets
.
Nat. Commun.
 
2019
;
10
:
5817
.

16.

Barretina
 
J.
,
Caponigro
 
G.
,
Stransky
 
N.
,
Venkatesan
 
K.
,
Margolin
 
A.A.
,
Kim
 
S.
,
Wilson
 
C.J.
,
Lehar
 
J.
,
Kryukov
 
G.V.
,
Sonkin
 
D.
 et al. .  
The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity
.
Nature
.
2012
;
483
:
603
607
.

17.

Cancer Cell Line Encyclopedia Consortium and Genomics of Drug Sensitivity in Cancer Consortium
 
Pharmacogenomic agreement between two cancer cell line data sets
.
Nature
.
2015
;
528
:
84
87
.

18.

Ghandi
 
M.
,
Huang
 
F.W.
,
Jane-Valbuena
 
J.
,
Kryukov
 
G.V.
,
Lo
 
C.C.
,
McDonald
 
E.R.
,
Barretina
 
J.
,
Gelfand
 
E.T.
,
Bielski
 
C.M.
,
Li
 
H.
 et al. .  
Next-generation characterization of the Cancer Cell Line Encyclopedia
.
Nature
.
2019
;
569
:
503
508
.

19.

Li
 
H.
,
Ning
 
S.
,
Ghandi
 
M.
,
Kryukov
 
G.V.
,
Gopal
 
S.
,
Deik
 
A.
,
Souza
 
A.
,
Pierce
 
K.
,
Keskula
 
P.
,
Hernandez
 
D.
 et al. .  
The landscape of cancer cell line metabolism
.
Nat. Med.
 
2019
;
25
:
850
860
.

20.

Nusinow
 
D.P.
,
Szpyt
 
J.
,
Ghandi
 
M.
,
Rose
 
C.M.
,
McDonald
 
E.R.
,
Kalocsay
 
M.
,
Jane-Valbuena
 
J.
,
Gelfand
 
E.
,
Schweppe
 
D.K.
,
Jedrychowski
 
M.
 et al. .  
Quantitative proteomics of the Cancer Cell Line Encyclopedia
.
Cell
.
2020
;
180
:
387
402
.

21.

Huang
 
L.
,
Guo
 
Z.
,
Wang
 
F.
,
Fu
 
L.
 
KRAS mutation: from undruggable to druggable in cancer
.
Signal Transduct. Target. Ther.
 
2021
;
6
:
386
.

22.

Chan
 
E.M.
,
Shibue
 
T.
,
McFarland
 
J.M.
,
Gaeta
 
B.
,
Ghandi
 
M.
,
Dumont
 
N.
,
Gonzalez
 
A.
,
McPartlan
 
J.S.
,
Li
 
T.
,
Zhang
 
Y.
 et al. .  
WRN helicase is a synthetic lethal target in microsatellite unstable cancers
.
Nature
.
2019
;
568
:
551
556
.

23.

van der Meer
 
D.
,
Barthorpe
 
S.
,
Yang
 
W.
,
Lightfoot
 
H.
,
Hall
 
C.
,
Gilbert
 
J.
,
Francies
 
H.E.
,
Garnett
 
M.J.
 
Cell Model Passports—a hub for clinical, genetic and functional datasets of preclinical cancer models
.
Nucleic Acids Res.
 
2019
;
47
:
D923
D929
.

24.

Yang
 
W.
,
Soares
 
J.
,
Greninger
 
P.
,
Edelman
 
E.J.
,
Lightfoot
 
H.
,
Forbes
 
S.
,
Bindal
 
N.
,
Beare
 
D.
,
Smith
 
J.A.
,
Thompson
 
I.R.
 et al. .  
Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells
.
Nucleic Acids Res.
 
2013
;
41
:
D955
D961
.

25.

Rose
 
M.
,
Burgess
 
J.T.
,
O’Byrne
 
K.
,
Richard
 
D.J.
,
Bolderson
 
E.
 
PARP inhibitors: clinical relevance, mechanisms of action and tumor resistance
.
Front. Cell Dev. Biol.
 
2020
;
8
:
564601
.

26.

Deng
 
X.
,
Das
 
S.
,
Valdez
 
K.
,
Camphausen
 
K.
,
Shankavaram
 
U.
 
SL-BioDP: multi-cancer interactive tool for prediction of synthetic lethality and response to cancer treatment
.
Cancers (Basel)
.
2019
;
11
:
1682
.

27.

Rauch
 
J.
,
Volinsky
 
N.
,
Romano
 
D.
,
Kolch
 
W.
 
The secret life of kinases: functions beyond catalysis
.
Cell Commun. Signal.
 
2011
;
9
:
23
.

28.

Vande Voorde
 
J.
,
Ackermann
 
T.
,
Pfetzer
 
N.
,
Sumpton
 
D.
,
Mackay
 
G.
,
Kalna
 
G.
,
Nixon
 
C.
,
Blyth
 
K.
,
Gottlieb
 
E.
,
Tardito
 
S.
 
Improving the metabolic fidelity of cancer models with a physiological cell culture medium
.
Sci. Adv.
 
2019
;
5
:
eaau7314
.

29.

Cantor
 
J.R.
,
Abu-Remaileh
 
M.
,
Kanarek
 
N.
,
Freinkman
 
E.
,
Gao
 
X.
,
Louissaint
 
A.
 Jr
,
Lewis
 
C.A.
,
Sabatini
 
D.M
 
Physiologic medium rewires cellular metabolism and reveals uric acid as an endogenous inhibitor of UMP synthase
.
Cell
.
2017
;
169
:
258
272
.

30.

Rossiter
 
N.J.
,
Huggler
 
K.S.
,
Adelmann
 
C.H.
,
Keys
 
H.R.
,
Soens
 
R.W.
,
Sabatini
 
D.M.
,
Cantor
 
J.R.
 
CRISPR screens in physiologic medium reveal conditionally essential genes in human cells
.
Cell Metab.
 
2021
;
33
:
1248
1263
.

31.

Kapalczynska
 
M.
,
Kolenda
 
T.
,
Przybyla
 
W.
,
Zajaczkowska
 
M.
,
Teresiak
 
A.
,
Filas
 
V.
,
Ibbs
 
M.
,
Blizniak
 
R.
,
Luczewski
 
L.
,
Lamperska
 
K.
 
2D and 3D cell cultures—a comparison of different types of cancer cell cultures
.
Arch. Med. Sci.
 
2018
;
14
:
910
919
.

32.

Jensen
 
C.
,
Teng
 
Y.
 
Is it time to start transitioning from 2D to 3D cell culture
.
Front. Mol. Biosci.
 
2020
;
7
:
33
.

33.

Han
 
K.
,
Pierce
 
S.E.
,
Li
 
A.
,
Spees
 
K.
,
Anderson
 
G.R.
,
Seoane
 
J.A.
,
Lo
 
Y.H.
,
Dubreuil
 
M.
,
Olivas
 
M.
,
Kamber
 
R.A.
 et al. .  
CRISPR screens in cancer spheroids identify 3D growth-specific vulnerabilities
.
Nature
.
2020
;
580
:
136
141
.

34.

Akinleye
 
A.
,
Rasool
 
Z.
 
Immune checkpoint inhibitors of PD-L1 as cancer therapeutics
.
J. Hematol. Oncol.
 
2019
;
12
:
92
.

35.

Esfahani
 
K.
,
Roudaia
 
L.
,
Buhlaiga
 
N.
,
Del Rincon
 
S.V.
,
Papneja
 
N.
,
Miller
 
W.H.
 Jr
 
A review of cancer immunotherapy: from the past, to the present, to the future
.
Curr. Oncol.
 
2020
;
27
:
S87
S97
.

36.

Szklarczyk
 
D.
,
Kirsch
 
R.
,
Koutrouli
 
M.
,
Nastou
 
K.
,
Mehryary
 
F.
,
Hachilif
 
R.
,
Gable
 
A.L.
,
Fang
 
T.
,
Doncheva
 
N.T.
,
Pyysalo
 
S.
 et al. .  
The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest
.
Nucleic Acids Res.
 
2023
;
51
:
D638
D646
.

37.

Bajpai
 
A.K.
,
Davuluri
 
S.
,
Tiwary
 
K.
,
Narayanan
 
S.
,
Oguru
 
S.
,
Basavaraju
 
K.
,
Dayalan
 
D.
,
Thirumurugan
 
K.
,
Acharya
 
K.K.
 
Systematic comparison of the protein–protein interaction databases from a user’s perspective
.
J. Biomed. Inform.
 
2020
;
103
:
103380
.

38.

Twomey
 
J.D.
,
Zhang
 
B.
 
Cancer immunotherapy update: FDA-approved checkpoint inhibitors and companion diagnostics
.
AAPS J.
 
2021
;
23
:
39
.

39.

Nakhoda
 
S.K.
,
Olszanski
 
A.J.
 
Addressing recent failures in immuno-oncology trials to guide novel immunotherapeutic treatment strategies
.
Pharmaceut. Med.
 
2020
;
34
:
83
91
.

40.

Newman
 
A.M.
,
Liu
 
C.L.
,
Green
 
M.R.
,
Gentles
 
A.J.
,
Feng
 
W.
,
Xu
 
Y.
,
Hoang
 
C.D.
,
Diehn
 
M.
,
Alizadeh
 
A.A.
 
Robust enumeration of cell subsets from tissue expression profiles
.
Nat. Methods
.
2015
;
12
:
453
457
.

41.

Yoshihara
 
K.
,
Shahmoradgoli
 
M.
,
Martinez
 
E.
,
Vegesna
 
R.
,
Kim
 
H.
,
Torres-Garcia
 
W.
,
Trevino
 
V.
,
Shen
 
H.
,
Laird
 
P.W.
,
Levine
 
D.A.
 et al. .  
Inferring tumour purity and stromal and immune cell admixture from expression data
.
Nat. Commun.
 
2013
;
4
:
2612
.

42.

Racle
 
J.
,
de Jonge
 
K.
,
Baumgaertner
 
P.
,
Speiser
 
D.E.
,
Gfeller
 
D
 
Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data
.
eLife
.
2017
;
6
:
e26476
.

43.

Finotello
 
F.
,
Mayer
 
C.
,
Plattner
 
C.
,
Laschober
 
G.
,
Rieder
 
D.
,
Hackl
 
H.
,
Krogsdam
 
A.
,
Loncova
 
Z.
,
Posch
 
W.
,
Wilflingseder
 
D.
 et al. .  
Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data
.
Genome Med.
 
2019
;
11
:
34
.

44.

Petitprez
 
F.
,
Levy
 
S.
,
Sun
 
C.M.
,
Meylan
 
M.
,
Linhard
 
C.
,
Becht
 
E.
,
Elarouci
 
N.
,
Tavel
 
D.
,
Roumenina
 
L.T.
,
Ayadi
 
M.
 et al. .  
The murine microenvironment cell population counter method to estimate abundance of tissue-infiltrating immune and stromal cell populations in murine samples using gene expression
.
Genome Med.
 
2020
;
12
:
86
.

45.

Li
 
B.
,
Liu
 
J.S.
,
Liu
 
X.S.
 
Revisit linear regression-based deconvolution methods for tumor gene expression data
.
Genome Biol.
 
2017
;
18
:
127
.

46.

Ru
 
B.
,
Wong
 
C.N.
,
Tong
 
Y.
,
Zhong
 
J.Y.
,
Zhong
 
S.S.W.
,
Wu
 
W.C.
,
Chu
 
K.C.
,
Wong
 
C.Y.
,
Lau
 
C.Y.
,
Chen
 
I.
 et al. .  
TISIDB: an integrated repository portal for tumor–immune system interactions
.
Bioinformatics
.
2019
;
35
:
4200
4202
.

47.

Jiang
 
C.
,
Cao
 
S.
,
Li
 
N.
,
Jiang
 
L.
,
Sun
 
T.
 
PD-1 and PD-L1 correlated gene expression profiles and their association with clinical outcomes of breast cancer
.
Cancer Cell Int.
 
2019
;
19
:
233
.

48.

Feng
 
Z.
,
Chen
 
Y.
,
Cai
 
C.
,
Tan
 
J.
,
Liu
 
P.
,
Chen
 
Y.
,
Shen
 
H.
,
Zeng
 
S.
,
Han
 
Y.
 
Pan-cancer and single-cell analysis reveals CENPL as a cancer prognosis and immune infiltration-related biomarker
.
Front. Immunol.
 
2022
;
13
:
916594
.

49.

Xu
 
H.
,
Zou
 
R.
,
Li
 
F.
,
Liu
 
J.
,
Luan
 
N.
,
Wang
 
S.
,
Zhu
 
L.
 
MRPL15 is a novel prognostic biomarker and therapeutic target for epithelial ovarian cancer
.
Cancer Med.
 
2021
;
10
:
3655
3673
.

50.

Li
 
D.
,
Zhao
 
W.
,
Zhang
 
X.
,
Lv
 
H.
,
Li
 
C.
,
Sun
 
L.
 
NEFM DNA methylation correlates with immune infiltration and survival in breast cancer
.
Clin. Epigenetics
.
2021
;
13
:
112
.

51.

Edgar
 
R.
,
Domrachev
 
M.
,
Lash
 
A.E.
 
Gene Expression Omnibus: NCBI gene expression and hybridization array data repository
.
Nucleic Acids Res.
 
2002
;
30
:
207
210
.

52.

Mailman
 
M.D.
,
Feolo
 
M.
,
Jin
 
Y.
,
Kimura
 
M.
,
Tryka
 
K.
,
Bagoutdinov
 
R.
,
Hao
 
L.
,
Kiang
 
A.
,
Paschall
 
J.
,
Phan
 
L.
 et al. .  
The NCBI dbGaP database of genotypes and phenotypes
.
Nat. Genet.
 
2007
;
39
:
1181
1186
.

53.

Papatheodorou
 
I.
,
Moreno
 
P.
,
Manning
 
J.
,
Fuentes
 
A.M.-P.
,
George
 
N.
,
Fexova
 
S.
,
Fonseca
 
N.A.
,
Füllgrabe
 
A.
,
Green
 
M.
,
Huang
 
N.
 et al. .  
Expression Atlas update: from tissues to single cells
.
Nucleic Acids Res.
 
2019
;
48
:
D77
D83
.

54.

Megill
 
C.
,
Martin
 
B.
,
Weaver
 
C.
,
Bell
 
S.
,
Prins
 
L.
,
Badajoz
 
S.
,
McCandless
 
B.
,
Pisco
 
A.O.
,
Kinsella
 
M.
,
Griffin
 
F.
 et al. .  
cellxgene: a performant, scalable exploration platform for high dimensional sparse matrices
.
2021
;
bioRxiv doi:
06 April 2021, preprint: not peer reviewed
https://doi.org/10.1101/2021.04.05.438318.

55.

Becht
 
E.
,
McInnes
 
L.
,
Healy
 
J.
,
Dutertre
 
C.A.
,
Kwok
 
I.W.H.
,
Ng
 
L.G.
,
Ginhoux
 
F.
,
Newell
 
E.W.
 
Dimensionality reduction for visualizing single-cell data using UMAP
.
Nat. Biotechnol.
 
2019
;
37
:
38
44
.

56.

Van der Maaten
 
L.
,
Hinton
 
G.
 
Visualizing data using t-SNE
.
J.Mach. Learn. Res.
 
2008
;
9
:
2579
2605
.

57.

Li
 
M.
,
Zhang
 
X.
,
Ang
 
K.S.
,
Ling
 
J.
,
Sethi
 
R.
,
Lee
 
N.Y.S.
,
Ginhoux
 
F.
,
Chen
 
J.
 
DISCO: a database of Deeply Integrated human Single-Cell Omics data
.
Nucleic Acids Res.
 
2022
;
50
:
D596
D602
.

58.

Shi
 
J.
,
Hou
 
S.
,
Fang
 
Q.
,
Liu
 
X.
,
Liu
 
X.
,
Qi
 
H.
 
PD-1 controls follicular T helper cell positioning and function
.
Immunity
.
2018
;
49
:
264
274
.

59.

Dickinson
 
M.E.
,
Flenniken
 
A.M.
,
Ji
 
X.
,
Teboul
 
L.
,
Wong
 
M.D.
,
White
 
J.K.
,
Meehan
 
T.F.
,
Weninger
 
W.J.
,
Westerberg
 
H.
,
Adissu
 
H.
 et al. .  
High-throughput discovery of novel developmental phenotypes
.
Nature
.
2016
;
537
:
508
514
.

60.

Groza
 
T.
,
Gomez
 
F.L.
,
Mashhadi
 
H.H.
,
Munoz-Fuentes
 
V.
,
Gunes
 
O.
,
Wilson
 
R.
,
Cacheiro
 
P.
,
Frost
 
A.
,
Keskivali-Bond
 
P.
,
Vardal
 
B.
 et al. .  
The International Mouse Phenotyping Consortium: comprehensive knockout phenotyping underpinning the study of human disease
.
Nucleic Acids Res.
 
2023
;
51
:
D1038
D1045
.

61.

Krupke
 
D.M.
,
Begley
 
D.A.
,
Sundberg
 
J.P.
,
Richardson
 
J.E.
,
Neuhauser
 
S.B.
,
Bult
 
C.J.
 
The Mouse Tumor Biology Database: a comprehensive resource for mouse models of human cancer
.
Cancer Res.
 
2017
;
77
:
e67
e70
.

62.

Bult
 
C.J.
,
Krupke
 
D.M.
,
Eppig
 
J.T.
 
Electronic access to mouse tumor data: the Mouse Tumor Biology Database (MTB) project
.
Nucleic Acids Res.
 
1999
;
27
:
99
105
.

63.

Kersten
 
K.
,
de Visser
 
K.E.
,
van Miltenburg
 
M.H.
,
Jonkers
 
J.
 
Genetically engineered mouse models in oncology research and cancer medicine
.
EMBO Mol. Med.
 
2017
;
9
:
137
153
.

64.

Conte
 
N.
,
Mason
 
J.C.
,
Halmagyi
 
C.
,
Neuhauser
 
S.
,
Mosaku
 
A.
,
Yordanova
 
G.
,
Chatzipli
 
A.
,
Begley
 
D.A.
,
Krupke
 
D.M.
,
Parkinson
 
H.
 et al. .  
PDX Finder: a portal for patient-derived tumor xenograft model discovery
.
Nucleic Acids Res.
 
2019
;
47
:
D1073
D1079
.

65.

Zeng
 
Z.
,
Wong
 
C.J.
,
Yang
 
L.
,
Ouardaoui
 
N.
,
Li
 
D.
,
Zhang
 
W.
,
Gu
 
S.
,
Zhang
 
Y.
,
Liu
 
Y.
,
Wang
 
X.
 et al. .  
TISMO: syngeneic mouse tumor database to model tumor immunity and immunotherapy response
.
Nucleic Acids Res.
 
2022
;
50
:
D1391
D1397
.

66.

Zeng
 
Z.
,
Gu
 
S.S.
,
Wong
 
C.J.
,
Yang
 
L.
,
Ouardaoui
 
N.
,
Li
 
D.
,
Zhang
 
W.
,
Brown
 
M.
,
Liu
 
X.S.
 
Machine learning on syngeneic mouse tumor profiles to model clinical immunotherapy response
.
Sci. Adv.
 
2022
;
8
:
eabm8564
.

67.

Blagg
 
J.
,
Workman
 
P.
 
Choose and use your chemical probe wisely to explore cancer biology
.
Cancer Cell
.
2017
;
32
:
9
25
.

68.

Arrowsmith
 
C.H.
,
Audia
 
J.E.
,
Austin
 
C.
,
Baell
 
J.
,
Bennett
 
J.
,
Blagg
 
J.
,
Bountra
 
C.
,
Brennan
 
P.E.
,
Brown
 
P.J.
,
Bunnage
 
M.E.
 et al. .  
The promise and peril of chemical probes
.
Nat. Chem. Biol.
 
2015
;
11
:
536
541
.

69.

Antolin
 
A.A.
,
Workman
 
P.
,
Al-Lazikani
 
B.
 
Public resources for chemical probes: the journey so far and the road ahead
.
Future Med. Chem.
 
2021
;
13
:
731
747
.

70.

Antolin
 
A.A.
,
Tym
 
J.E.
,
Komianou
 
A.
,
Collins
 
I.
,
Workman
 
P.
,
Al-Lazikani
 
B.
 
Objective, quantitative, data-driven assessment of chemical probes
.
Cell Chem. Biol.
 
2018
;
25
:
194
205
.

71.

Chen
 
X.
,
Liu
 
M.
,
Gilson
 
M.K.
 
BindingDB: a web-accessible molecular recognition database
.
Comb. Chem. High Throughput Screen.
 
2001
;
4
:
719
725
.

72.

Gilson
 
M.K.
,
Liu
 
T.
,
Baitaluk
 
M.
,
Nicola
 
G.
,
Hwang
 
L.
,
Chong
 
J.
 
BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology
.
Nucleic Acids Res.
 
2016
;
44
:
D1045
D1053
.

73.

Gaulton
 
A.
,
Hersey
 
A.
,
Nowotka
 
M.
,
Bento
 
A.P.
,
Chambers
 
J.
,
Mendez
 
D.
,
Mutowo
 
P.
,
Atkinson
 
F.
,
Bellis
 
L.J.
,
Cibrian-Uhalte
 
E.
 et al. .  
The ChEMBL database in 2017
.
Nucleic Acids Res.
 
2017
;
45
:
D945
D954
.

74.

Wishart
 
D.S.
,
Feunang
 
Y.D.
,
Guo
 
A.C.
,
Lo
 
E.J.
,
Marcu
 
A.
,
Grant
 
J.R.
,
Sajed
 
T.
,
Johnson
 
D.
,
Li
 
C.
,
Sayeeda
 
Z.
 et al. .  
DrugBank 5.0: a major update to the DrugBank database for 2018
.
Nucleic Acids Res.
 
2018
;
46
:
D1074
D1082
.

75.

Muller
 
S.
,
Ackloo
 
S.
,
Arrowsmith
 
C.H.
,
Bauser
 
M.
,
Baryza
 
J.L.
,
Blagg
 
J.
,
Bottcher
 
J.
,
Bountra
 
C.
,
Brown
 
P.J.
,
Bunnage
 
M.E.
 et al. .  
Donated chemical probes for open science
.
eLife
.
2018
;
7
:
e34311
.

76.

Nalawansha
 
D.A.
,
Crews
 
C.M.
 
PROTACs: an emerging therapeutic modality in precision medicine
.
Cell Chem. Biol.
 
2020
;
27
:
998
1014
.

77.

Bekes
 
M.
,
Langley
 
D.R.
,
Crews
 
C.M.
 
PROTAC targeted protein degraders: the past is prologue
.
Nat. Rev. Drug Discov.
 
2022
;
21
:
181
200
.

78.

Schneider
 
M.
,
Radoux
 
C.J.
,
Hercules
 
A.
,
Ochoa
 
D.
,
Dunham
 
I.
,
Zalmas
 
L.P.
,
Hessler
 
G.
,
Ruf
 
S.
,
Shanmugasundaram
 
V.
,
Hann
 
M.M.
 et al. .  
The PROTACtable genome
.
Nat. Rev. Drug Discov.
 
2021
;
20
:
789
797
.

79.

Weng
 
G.
,
Shen
 
C.
,
Cao
 
D.
,
Gao
 
J.
,
Dong
 
X.
,
He
 
Q.
,
Yang
 
B.
,
Li
 
D.
,
Wu
 
J.
,
Hou
 
T.
 
PROTAC-DB: an online database of PROTACs
.
Nucleic Acids Res.
 
2021
;
49
:
D1381
D1387
.

80.

Weng
 
G.
,
Li
 
D.
,
Kang
 
Y.
,
Hou
 
T.
 
Integrative modeling of PROTAC-mediated ternary complexes
.
J. Med. Chem.
 
2021
;
64
:
16271
16281
.

81.

Yu
 
C.
,
Mannan
 
A.M.
,
Yvone
 
G.M.
,
Ross
 
K.N.
,
Zhang
 
Y.L.
,
Marton
 
M.A.
,
Taylor
 
B.R.
,
Crenshaw
 
A.
,
Gould
 
J.Z.
,
Tamayo
 
P.
 et al. .  
High-throughput identification of genotype-specific cancer vulnerabilities in mixtures of barcoded tumor cell lines
.
Nat. Biotechnol.
 
2016
;
34
:
419
423
.

82.

Corsello
 
S.M.
,
Nagari
 
R.T.
,
Spangler
 
R.D.
,
Rossen
 
J.
,
Kocak
 
M.
,
Bryan
 
J.G.
,
Humeidi
 
R.
,
Peck
 
D.
,
Wu
 
X.
,
Tang
 
A.A.
 et al. .  
Discovering the anti-cancer potential of non-oncology drugs by systematic viability profiling
.
Nat. Cancer
.
2020
;
1
:
235
248
.

83.

Collins
 
F.S.
,
Barker
 
A.D.
 
Mapping the cancer genome. Pinpointing the genes involved in cancer will help chart a new course across the complex landscape of human malignancies
.
Sci. Am.
 
2007
;
296
:
50
57
.

84.

Blum
 
A.
,
Wang
 
P.
,
Zenklusen
 
J.C.
 
SnapShot: TCGA-analyzed tumors
.
Cell
.
2018
;
173
:
530
.

85.

Cerami
 
E.
,
Gao
 
J.
,
Dogrusoz
 
U.
,
Gross
 
B.E.
,
Sumer
 
S.O.
,
Aksoy
 
B.A.
,
Jacobsen
 
A.
,
Byrne
 
C.J.
,
Heuer
 
M.L.
,
Larsson
 
E.
 et al. .  
The cBio Cancer Genomics Portal: an open platform for exploring multidimensional cancer genomics data
.
Cancer Discov.
 
2012
;
2
:
401
404
.

86.

Gao
 
J.
,
Aksoy
 
B.A.
,
Dogrusoz
 
U.
,
Dresdner
 
G.
,
Gross
 
B.
,
Sumer
 
S.O.
,
Sun
 
Y.
,
Jacobsen
 
A.
,
Sinha
 
R.
,
Larsson
 
E.
 et al. .  
Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal
.
Sci. Signal.
 
2013
;
6
:
pl1
.

87.

Hoadley
 
K.A.
,
Yau
 
C.
,
Hinoue
 
T.
,
Wolf
 
D.M.
,
Lazar
 
A.J.
,
Drill
 
E.
,
Shen
 
R.
,
Taylor
 
A.M.
,
Cherniack
 
A.D.
,
Thorsson
 
V.
 et al. .  
Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer
.
Cell
.
2018
;
173
:
291
304
.

88.

Etemadmoghadam
 
D.
,
Weir
 
B.A.
,
Au-Yeung
 
G.
,
Alsop
 
K.
,
Mitchell
 
G.
,
George
 
J.
Australian Ovarian Cancer Study Group Australian Ovarian Cancer Study Group
Davis
 
S.
,
D’Andrea
 
A.D.
,
Simpson
 
K.
 et al. .  
Synthetic lethality between CCNE1 amplification and loss of BRCA1
.
Proc. Natl Acad. Sci. U.S.A.
 
2013
;
110
:
19489
19494
.

89.

Zhang
 
B.
,
Wang
 
J.
,
Wang
 
X.
,
Zhu
 
J.
,
Liu
 
Q.
,
Shi
 
Z.
,
Chambers
 
M.C.
,
Zimmerman
 
L.J.
,
Shaddox
 
K.F.
,
Kim
 
S.
 et al. .  
Proteogenomic characterization of human colon and rectal cancer
.
Nature
.
2014
;
513
:
382
387
.

90.

Mertins
 
P.
,
Mani
 
D.R.
,
Ruggles
 
K.V.
,
Gillette
 
M.A.
,
Clauser
 
K.R.
,
Wang
 
P.
,
Wang
 
X.
,
Qiao
 
J.W.
,
Cao
 
S.
,
Petralia
 
F.
 et al. .  
Proteogenomics connects somatic mutations to signalling in breast cancer
.
Nature
.
2016
;
534
:
55
62
.

91.

Zhang
 
H.
,
Liu
 
T.
,
Zhang
 
Z.
,
Payne
 
S.H.
,
Zhang
 
B.
,
McDermott
 
J.E.
,
Zhou
 
J.Y.
,
Petyuk
 
V.A.
,
Chen
 
L.
,
Ray
 
D.
 et al. .  
Integrated proteogenomic characterization of human high-grade serous ovarian cancer
.
Cell
.
2016
;
166
:
755
765
.

92.

Clark
 
D.J.
,
Dhanasekaran
 
S.M.
,
Petralia
 
F.
,
Pan
 
J.
,
Song
 
X.
,
Hu
 
Y.
,
da Veiga Leprevost
 
F.
,
Reva
 
B.
,
Lih
 
T.M.
,
Chang
 
H.Y.
 et al. .  
Integrated proteogenomic characterization of clear cell renal cell carcinoma
.
Cell
.
2019
;
179
:
964
983
.

93.

Dou
 
Y.
,
Kawaler
 
E.A.
,
Cui Zhou
 
D.
,
Gritsenko
 
M.A.
,
Huang
 
C.
,
Blumenberg
 
L.
,
Karpova
 
A.
,
Petyuk
 
V.A.
,
Savage
 
S.R.
,
Satpathy
 
S.
 et al. .  
Proteogenomic characterization of endometrial carcinoma
.
Cell
.
2020
;
180
:
729
748
.

94.

Satpathy
 
S.
,
Krug
 
K.
,
Jean Beltran
 
P.M.
,
Savage
 
S.R.
,
Petralia
 
F.
,
Kumar-Sinha
 
C.
,
Dou
 
Y.
,
Reva
 
B.
,
Kane
 
M.H.
,
Avanessian
 
S.C.
 et al. .  
A proteogenomic portrait of lung squamous cell carcinoma
.
Cell
.
2021
;
184
:
4348
4371
.

95.

Cao
 
L.
,
Huang
 
C.
,
Cui Zhou
 
D.
,
Hu
 
Y.
,
Lih
 
T.M.
,
Savage
 
S.R.
,
Krug
 
K.
,
Clark
 
D.J.
,
Schnaubelt
 
M.
,
Chen
 
L.
 et al. .  
Proteogenomic characterization of pancreatic ductal adenocarcinoma
.
Cell
.
2021
;
184
:
5031
5052
.

96.

Wang
 
L.B.
,
Karpova
 
A.
,
Gritsenko
 
M.A.
,
Kyle
 
J.E.
,
Cao
 
S.
,
Li
 
Y.
,
Rykunov
 
D.
,
Colaprico
 
A.
,
Rothstein
 
J.H.
,
Hong
 
R.
 et al. .  
Proteogenomic and metabolomic characterization of human glioblastoma
.
Cancer Cell
.
2021
;
39
:
509
528
.

97.

Petralia
 
F.
,
Tignor
 
N.
,
Reva
 
B.
,
Koptyra
 
M.
,
Chowdhury
 
S.
,
Rykunov
 
D.
,
Krek
 
A.
,
Ma
 
W.
,
Zhu
 
Y.
,
Ji
 
J.
 et al. .  
Integrated proteogenomic characterization across major histological types of pediatric brain cancer
.
Cell
.
2020
;
183
:
1962
1985
.

98.

Vasaikar
 
S.
,
Huang
 
C.
,
Wang
 
X.
,
Petyuk
 
V.A.
,
Savage
 
S.R.
,
Wen
 
B.
,
Dou
 
Y.
,
Zhang
 
Y.
,
Shi
 
Z.
,
Arshad
 
O.A.
 et al. .  
Proteogenomic analysis of human colon cancer reveals new therapeutic opportunities
.
Cell
.
2019
;
177
:
1035
1049
.

99.

Prakash
 
A.
,
Taylor
 
L.
,
Varkey
 
M.
,
Hoxie
 
N.
,
Mohammed
 
Y.
,
Goo
 
Y.A.
,
Peterman
 
S.
,
Moghekar
 
A.
,
Yuan
 
Y.
,
Glaros
 
T.
 et al. .  
Reinspection of a Clinical Proteomics Tumor Analysis Consortium (CPTAC) dataset with cloud computing reveals abundant post-translational modifications and protein sequence variants
.
Cancers (Basel)
.
2021
;
13
:
5034
.

100.

Chandrashekar
 
D.S.
,
Karthikeyan
 
S.K.
,
Korla
 
P.K.
,
Patel
 
H.
,
Shovon
 
A.R.
,
Athar
 
M.
,
Netto
 
G.J.
,
Qin
 
Z.S.
,
Kumar
 
S.
,
Manne
 
U.
 et al. .  
UALCAN: an update to the integrated cancer data analysis platform
.
Neoplasia
.
2022
;
25
:
18
27
.

101.

Zhang
 
Y.
,
Chen
 
F.
,
Chandrashekar
 
D.S.
,
Varambally
 
S.
,
Creighton
 
C.J.
 
Proteogenomic characterization of 2002 human cancers reveals pan-cancer molecular subtypes and associated pathways
.
Nat. Commun.
 
2022
;
13
:
2669
.

102.

Uhlen
 
M.
,
Bjorling
 
E.
,
Agaton
 
C.
,
Szigyarto
 
C.A.
,
Amini
 
B.
,
Andersen
 
E.
,
Andersson
 
A.C.
,
Angelidou
 
P.
,
Asplund
 
A.
,
Asplund
 
C.
 et al. .  
A human protein atlas for normal and cancer tissues based on antibody proteomics
.
Mol. Cell. Proteomics
.
2005
;
4
:
1920
1932
.

103.

Uhlen
 
M.
,
Fagerberg
 
L.
,
Hallstrom
 
B.M.
,
Lindskog
 
C.
,
Oksvold
 
P.
,
Mardinoglu
 
A.
,
Sivertsson
 
A.
,
Kampf
 
C.
,
Sjostedt
 
E.
,
Asplund
 
A.
 et al. .  
Proteomics. Tissue-based map of the human proteome
.
Science
.
2015
;
347
:
1260419
.

104.

Uhlen
 
M.
,
Zhang
 
C.
,
Lee
 
S.
,
Sjostedt
 
E.
,
Fagerberg
 
L.
,
Bidkhori
 
G.
,
Benfeitas
 
R.
,
Arif
 
M.
,
Liu
 
Z.
,
Edfors
 
F.
 et al. .  
A pathology atlas of the human cancer transcriptome
.
Science
.
2017
;
357
:
eaan2507
.

105.

Lanczky
 
A.
,
Gyorffy
 
B.
 
Web-based survival analysis tool tailored for medical research (KMplot): development and implementation
.
J. Med. Internet Res.
 
2021
;
23
:
e27633
.

106.

Tian
 
L.
,
Zhao
 
L.
,
Sze
 
K.M.
,
Kam
 
C.S.
,
Ming
 
V.S.
,
Wang
 
X.
,
Zhang
 
V.X.
,
Ho
 
D.W.
,
Cheung
 
T.T.
,
Chan
 
L.K.
 et al. .  
Dysregulation of RalA signaling through dual regulatory mechanisms exerts its oncogenic functions in hepatocellular carcinoma
.
Hepatology
.
2022
;
76
:
48
65
.

107.

Koscielny
 
G.
,
An
 
P.
,
Carvalho-Silva
 
D.
,
Cham
 
J.A.
,
Fumis
 
L.
,
Gasparyan
 
R.
,
Hasan
 
S.
,
Karamanis
 
N.
,
Maguire
 
M.
,
Papa
 
E.
 et al. .  
Open Targets: a platform for therapeutic target identification and validation
.
Nucleic Acids Res.
 
2017
;
45
:
D985
D994
.

108.

De Cesco
 
S.
,
Davis
 
J.B.
,
Brennan
 
P.E.
 
TargetDB: a target information aggregation tool and tractability predictor
.
PLoS One
.
2020
;
15
:
e0232644
.

109.

Mitsopoulos
 
C.
,
Di Micco
 
P.
,
Fernandez
 
E.V.
,
Dolciami
 
D.
,
Holt
 
E.
,
Mica
 
I.L.
,
Coker
 
E.A.
,
Tym
 
J.E.
,
Campbell
 
J.
,
Che
 
K.H.
 et al. .  
canSAR: update to the cancer translational research and drug discovery knowledgebase
.
Nucleic Acids Res.
 
2021
;
49
:
D1074
D1082
.

110.

Ochoa
 
D.
,
Hercules
 
A.
,
Carmona
 
M.
,
Suveges
 
D.
,
Baker
 
J.
,
Malangone
 
C.
,
Lopez
 
I.
,
Miranda
 
A.
,
Cruz-Castillo
 
C.
,
Fumis
 
L.
 et al. .  
The next-generation Open Targets Platform: reimagined, redesigned, rebuilt
.
Nucleic Acids Res.
 
2023
;
51
:
D1353
D1359
.

111.

Ochoa
 
D.
,
Hercules
 
A.
,
Carmona
 
M.
,
Suveges
 
D.
,
Gonzalez-Uriarte
 
A.
,
Malangone
 
C.
,
Miranda
 
A.
,
Fumis
 
L.
,
Carvalho-Silva
 
D.
,
Spitzer
 
M.
 et al. .  
Open Targets Platform: supporting systematic drug-target identification and prioritisation
.
Nucleic Acids Res.
 
2021
;
49
:
D1302
D1310
.

112.

di Micco
 
P.
,
Antolin
 
A.A.
,
Mitsopoulos
 
C.
,
Villasclaras-Fernandez
 
E.
,
Sanfelice
 
D.
,
Dolciami
 
D.
,
Ramagiri
 
P.
,
Mica
 
I.L.
,
Tym
 
J.E.
,
Gingrich
 
P.W.
 et al. .  
canSAR: update to the cancer translational research and drug discovery knowledgebase
.
Nucleic Acids Res.
 
2023
;
51
:
D1212
D1219
.

113.

Halling-Brown
 
M.D.
,
Bulusu
 
K.C.
,
Patel
 
M.
,
Tym
 
J.E.
,
Al-Lazikani
 
B.
 
canSAR: an integrated cancer public translational research and drug discovery resource
.
Nucleic Acids Res.
 
2012
;
40
:
D947
D956
.

114.

Thorsson
 
V.
,
Gibbs
 
D.L.
,
Brown
 
S.D.
,
Wolf
 
D.
,
Bortone
 
D.S.
,
Ou Yang
 
T.H.
,
Porta-Pardo
 
E.
,
Gao
 
G.F.
,
Plaisier
 
C.L.
,
Eddy
 
J.A.
 et al. .  
The immune landscape of cancer
.
Immunity
.
2018
;
48
:
812
830
.

115.

Paul
 
D.
,
Sanap
 
G.
,
Shenoy
 
S.
,
Kalyane
 
D.
,
Kalia
 
K.
,
Tekade
 
R.K.
 
Artificial intelligence in drug discovery and development
.
Drug Discov. Today
.
2021
;
26
:
80
93
.

116.

You
 
Y.
,
Lai
 
X.
,
Pan
 
Y.
,
Zheng
 
H.
,
Vera
 
J.
,
Liu
 
S.
,
Deng
 
S.
,
Zhang
 
L.
 
Artificial intelligence in cancer target identification and drug discovery
.
Signal Transduct. Target. Ther.
 
2022
;
7
:
156
.

117.

Gupta
 
R.
,
Srivastava
 
D.
,
Sahu
 
M.
,
Tiwari
 
S.
,
Ambasta
 
R.K.
,
Kumar
 
P.
 
Artificial intelligence to deep learning: machine intelligence approach for drug discovery
.
Mol. Divers.
 
2021
;
25
:
1315
1360
.

118.

Jayatunga
 
M.K.P.
,
Xie
 
W.
,
Ruder
 
L.
,
Schulze
 
U.
,
Meier
 
C.
 
AI in small-molecule drug discovery: a coming wave?
.
Nat. Rev. Drug Discov.
 
2022
;
21
:
175
176
.

119.

Tran
 
T.T.V.
,
Surya Wibowo
 
A.
,
Tayara
 
H.
,
Chong
 
K.T.
 
Artificial intelligence in drug toxicity prediction: recent advances, challenges, and future perspectives
.
J. Chem. Inf. Model.
 
2023
;
63
:
2628
2643
.

120.

WW
 
P.D.B.C.
 
Protein Data Bank: the single global archive for 3D macromolecular structure data
.
Nucleic Acids Res.
 
2019
;
47
:
D520
D528
.

121.

Jumper
 
J.
,
Evans
 
R.
,
Pritzel
 
A.
,
Green
 
T.
,
Figurnov
 
M.
,
Ronneberger
 
O.
,
Tunyasuvunakool
 
K.
,
Bates
 
R.
,
Zidek
 
A.
,
Potapenko
 
A.
 et al. .  
Applying and improving AlphaFold at CASP14
.
Proteins
.
2021
;
89
:
1711
1721
.

122.

Akdel
 
M.
,
Pires
 
D.E.V.
,
Pardo
 
E.P.
,
Janes
 
J.
,
Zalevsky
 
A.O.
,
Meszaros
 
B.
,
Bryant
 
P.
,
Good
 
L.L.
,
Laskowski
 
R.A.
,
Pozzati
 
G.
 et al. .  
A structural biology community assessment of AlphaFold2 applications
.
Nat. Struct. Mol. Biol.
 
2022
;
29
:
1056
1067
.

123.

Pereira
 
J.
,
Simpkin
 
A.J.
,
Hartmann
 
M.D.
,
Rigden
 
D.J.
,
Keegan
 
R.M.
,
Lupas
 
A.N.
 
High-accuracy protein structure prediction in CASP14
.
Proteins
.
2021
;
89
:
1687
1699
.

124.

Beuming
 
T.
,
Martin
 
H.
,
Diaz-Rovira
 
A.M.
,
Diaz
 
L.
,
Guallar
 
V.
,
Ray
 
S.S.
 
Are deep learning structural models sufficiently accurate for free-energy calculations? Application of FEP+ to AlphaFold2-predicted structures
.
J. Chem. Inf. Model.
 
2022
;
62
:
4351
4360
.

125.

Varadi
 
M.
,
Anyango
 
S.
,
Deshpande
 
M.
,
Nair
 
S.
,
Natassia
 
C.
,
Yordanova
 
G.
,
Yuan
 
D.
,
Stroe
 
O.
,
Wood
 
G.
,
Laydon
 
A.
 et al. .  
AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models
.
Nucleic Acids Res.
 
2022
;
50
:
D439
D444
.

126.

Ren
 
F.
,
Ding
 
X.
,
Zheng
 
M.
,
Korzinkin
 
M.
,
Cai
 
X.
,
Zhu
 
W.
,
Mantsyzov
 
A.
,
Aliper
 
A.
,
Aladinskiy
 
V.
,
Cao
 
Z.
 et al. .  
AlphaFold accelerates artificial intelligence powered drug discovery: efficient discovery of a novel CDK20 small molecule inhibitor
.
Chem. Sci.
 
2023
;
14
:
1443
1452
.

127.

Grotehans
 
N.
,
McGarry
 
L.
,
Nolte
 
H.
,
Kroker
 
M.
,
Narbona-Pérez
 
Á.J.
,
Deshwal
 
S.
,
Giavalisco
 
P.
,
Langer
 
T.
,
MacVicar
 
T.
 
Ribonucleotide synthesis by NME6 fuels mitochondrial gene expression
.
2022
;
bioRxiv doi:
30 November 2022, preprint: not peer reviewed
https://doi.org/10.1101/2022.11.29.518352.

128.

Halatsch
 
M.E.
,
Gehrke
 
E.E.
,
Vougioukas
 
V.I.
,
Botefur
 
I.C.
,
F
 
A.B.
,
Efferth
 
T.
,
Gebhart
 
E.
,
Domhof
 
S.
,
Schmidt
 
U.
,
Buchfelder
 
M
 
Inverse correlation of epidermal growth factor receptor messenger RNA induction and suppression of anchorage-independent growth by OSI-774, an epidermal growth factor receptor tyrosine kinase inhibitor, in glioblastoma multiforme cell lines
.
J. Neurosurg.
 
2004
;
100
:
523
533
.

129.

Westphal
 
M.
,
Maire
 
C.L.
,
Lamszus
 
K.
 
EGFR as a target for glioblastoma treatment: an unfulfilled promise
.
CNS Drugs
.
2017
;
31
:
723
735
.

130.

Eskilsson
 
E.
,
Rosland
 
G.V.
,
Solecki
 
G.
,
Wang
 
Q.
,
Harter
 
P.N.
,
Graziani
 
G.
,
Verhaak
 
R.G.W.
,
Winkler
 
F.
,
Bjerkvig
 
R.
,
Miletic
 
H.
 
EGFR heterogeneity and implications for therapeutic intervention in glioblastoma
.
Neuro-Oncology
.
2018
;
20
:
743
752
.

131.

Selenz
 
C.
,
Compes
 
A.
,
Nill
 
M.
,
Borchmann
 
S.
,
Odenthal
 
M.
,
Florin
 
A.
,
Bragelmann
 
J.
,
Buttner
 
R.
,
Meder
 
L.
,
Ullrich
 
R.T.
 
EGFR inhibition strongly modulates the tumour immune microenvironment in EGFR-driven non-small-cell lung cancer
.
Cancers (Basel)
.
2022
;
14
:
3943
.

132.

Bausart
 
M.
,
Preat
 
V.
,
Malfanti
 
A.
 
Immunotherapy for glioblastoma: the promise of combination strategies
.
J. Exp. Clin. Cancer Res.
 
2022
;
41
:
35
.

133.

Binder
 
Z.A.
,
Thorne
 
A.H.
,
Bakas
 
S.
,
Wileyto
 
E.P.
,
Bilello
 
M.
,
Akbari
 
H.
,
Rathore
 
S.
,
Ha
 
S.M.
,
Zhang
 
L.
,
Ferguson
 
C.J.
 et al. .  
Epidermal growth factor receptor extracellular domain mutations in glioblastoma present opportunities for clinical imaging and therapeutic development
.
Cancer Cell
.
2018
;
34
:
163
177
.

Author notes

The authors wish it to be known that, in their opinion, the first three authors should be regarded as Joint First Authors.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.