DepLink: an R Shiny app to systematically link genetic and pharmacologic dependencies of cancer

Abstract Motivation Large-scale genetic and pharmacologic dependency maps are generated to reveal genetic vulnerabilities and drug sensitivities of cancer. However, user-friendly software is needed to systematically link such maps. Results Here, we present DepLink, a web server to identify genetic and pharmacologic perturbations that induce similar effects on cell viability or molecular changes. DepLink integrates heterogeneous datasets of genome-wide CRISPR loss-of-function screens, high-throughput pharmacologic screens and gene expression signatures of perturbations. The datasets are systematically connected by four complementary modules tailored for different query scenarios. It allows users to search for potential inhibitors that target a gene (Module 1) or multiple genes (Module 2), mechanisms of action of a known drug (Module 3) and drugs with similar biochemical features to an investigational compound (Module 4). We performed a validation analysis to confirm the capability of our tool to link the effects of drug treatments to knockouts of the drug’s annotated target genes. By querying with a demonstrating example of CDK6, the tool identified well-studied inhibitor drugs, novel synergistic gene and drug partners and insights into an investigational drug. In summary, DepLink enables easy navigation, visualization and linkage of rapidly evolving cancer dependency maps. Availability and implementation The DepLink web server, demonstrating examples and detailed user manual are available at https://shiny.crc.pitt.edu/deplink/. Supplementary information Supplementary data are available at Bioinformatics Advances online.


Data sources and pre-processing
For the two CRISPR loss-of-function screens of the Broad (version 21Q4) and Sanger (21Q2) DepMap projects, we used Chronos-normalized gene-effect scores to measure gene fitness effects (Behan, et al., 2019;Dharia, et al., 2021).A more negative gene-effect score indicates stronger essentiality of the gene in a cell line, and a score < -1 denotes effective inhibition of cell viability by knocking out the gene.All downloaded data were pre-processed to remove any cell line IDs labeled as "FAIL" or blank (1,054 and 318 cell lines, respectively).Gene dependency data were processed to extract only genes shared in the Broad DepMap and Sanger CRISPR datasets (17,078 common genes) (Dempster, et al., 2019).
For drug screens, PRISM measures treatment response by log-fold changes in cell counts relative to DMSO (4,686 compounds and 568 cell lines in version 19Q4).GDSC uses the half-maximal inhibitory concentration (IC50).The downloaded GDSC data set combined drugs from GDSC1 and GDSC2; this was reorganized to extract an IC50 value per cell line/drug pair.For duplicate drugs per cell line, only one value (first occurrence) was chosen for representational purposes (287 compounds and 973 cell lines in version 2019).A total of 205 drugs were shared between PRISM and GDSC (Corsello, et al., 2020).All screening datasets were downloaded from the Data Portal of Broad DepMap (https://depmap.org/portal/download).
We analyzed the molecular signatures of LINCS to determine if a list of query genes is up-or down-regulated by a drug.The dataset was downloaded from the LINCS database (GSE92742), which contains changes in expression levels of 12,328 genes induced by treatment of 19,811 drugs.We used level-5 data of Z-scores computed by comparing inferred log2-RPKM values between post-and pre-treatment samples for each cell line at each dose and time point, summarized by a weighted average across replicates.For a cell line-drug pair with multiple doses and/or time points, we use the one with the largest standard deviation as the representative sample as it achieves the strongest response after treatment.A series of statistical tests are performed by DepLink to infer whether the query list of genes is significantly perturbed by each drug in the LINCS dataset.

DepLink user interface
Supplementary Figure S1 shows the homepage of DepLink.The navigation bar as a top ribbon of the webpage allows users to select a function, including viewing the global network among gene knockouts and drugs, four main modules for querying gene(s) or a known/novel drug, and a "Help" section.In each module, user starts a query by selecting from a dropdown list for gene names, drug names, and/or cancer types, or typing in a text box.For each input item, a pre-filled default query input or an example is provided to assist users and/or clarify the input requirement.The help page provides detailed documentation of the overall tool and each module, including data sources and examples.Every output figure or table has a question mark button that provides a description and/or legends.

Construction of a global correlation network among gene knockouts and drugs
The main goal of DepLink is to link CRISPR and drug screens to identify similar inhibitory effects on cell viability between a gene knockout and a drug treatment.Pearson correlation coefficients were computed between gene-effect scores (similarities between two gene knockouts), between gene-effect scores and drug responses (between a knockout and a drug treatment), or between drug responses.We implemented the Pearson correlation as the measure of similarity because the interpretation of results is straightforward and meaningful for most biomedical researchers.Thus, the PRISM study performed correlation analysis on data of gene knockouts and drug screens to validate a drug's intended targets (Corsello, et al., 2020).We constructed a global correlation network by pairwise analysis among all genes from Broad DepMap and all drugs in PRISM on 456 common cell lines between the two datasets.We performed a systematic analysis to determine the optimal cutoffs on correlation coefficients for constructing the global gene-drug correlation network.To ensure that the resulting network size is manageable and can be easily visualized, we identified correlation thresholds that yielded 100-1000 connections in each category.For genegene connections, thresholds on absolute correlation coefficients >0.40 and >0.66 yielded 1000 and 100 gene-gene connections, respectively (Supplementary Fig. S2).Similarly, we identified cutoffs of 0.61 and 0.77 for drug-drug pairs, and 0.23 and 0.36 for gene-drug pairs.Within these ranges, we analyzed all networks constructed by any possible combinations of cutoffs with a step size of 0.01.As a result, the thresholds of 0.61, 0.61, 0.27 on gene-gene, drug-drug, and gene-drug correlations, respectively, were selected as they yielded a network with the largest average degree of 6.3 (123 genes, 396 drugs, and 1,630 edges; Supplementary Fig. S3), indicating a dense network.While using the DepLink tool, users can specify different criteria using slide bars.Pairs meeting the criteria are merged in a network with nodes representing genes and drugs, and edges denoting correlations.

Similarity analysis among gene knockouts and drug treatments
In main query modules ("Query Gene" and "Query Drug"), we generated networks with "seconddegree" connections to visualize the top 10 first-degree correlations of the queried gene or drug, and second-degree correlations among these top 10 nodes.So, if a user begins with a query gene to find the ten most highly correlated drugs (first-degree connection) with respect to cell viability.Among those ten drugs, a second-degree edge is established between two drugs if the correlation is ranked among the top 10 for either or both drugs under examination.

Statistical inference of drugs that perturb a set of genes (Module "Query Gene List")
We analyzed the molecular signatures of LINCS to determine if a list of query genes (  ) is up-or down-regulated by a drug.Mathematically, the Z-score of gene g following treatment of drug d in cell line c can be represented as  & (, ), where  ∈ [1. . & ],  ∈ [1. .],  & is the number of cell lines tested for drug d, and D denotes the number of drugs.Three statistical tests were calculated to infer the significance of changes in gene expression due to drug treatment d:   is a set of randomly selected genes with the same size as   .In each iteration of permutation, we randomly selected 7  7 genes from all genes and calculated the mean of averaged Z-scores among the selected genes.We then built a normal distribution based on the means and standard deviations of the 100,000 permutations and used the distribution to assess the empirical P-value of the observed result.Like the previous two tests, the permutation test was performed with a one-tailed setting based on a user's selection of up-or down-regulation of query genes.

Validation analysis using observed-to-expected (O/E) ratio of annotated drug targets
We validated the results using drug target annotations provided by the PRISM project.For each annotated gene-drug pair, we calculated the correlation coefficient using corresponding data from PRISM and Broad DepMap.Multiple targets of a drug were analyzed independently.We calculated an O/E ratio between the observed number of annotated gene-drug pairs passing a threshold on absolute correlation coefficients (0 to 0.7 with a step of 0.01) versus an expected number estimated by the product of the number of all gene-drug pairs passing a threshold and the fraction of annotated gene-drug pairs among all gene-drug combinations.Statistical significance of an O/E ratio was assessed by the right-tailed Fisher's exact test.

Similarity analysis of biochemical fingerprints for a new query drug
Drug fingerprints that consist of sequences of binary bits are commonly used to characterize molecular compounds or to find similarities.We generated drug fingerprints from all drugs of PRISM using canonical Simplified Molecular Input Line Entry System (SMILES) codes by the R/rcdk package (Guha, 2007).In DepLink, a drug fingerprint contains 881 bits defined by PubChem and 166 bits by the Molecular ACCess System (MACCS) keys.The Tanimoto similarity index is used to assess the similarity between a query drug and any drug included in PRISM,  -./01232 ( .,  4 ) =  .∩  4  .∪  4 , where Fa and Fb are fingerprints of drugs a and b, respectively.Dimension reduction on molecular drug fingerprints using t-SNE enables visual inspection of relative distances among the query drug and its top similar drugs in the database.
Generating the visualization of the CDK6 network shown in Fig. 2C of main text The CDK6 network shown in Fig. 2C of main text was generated by combining correlation networks from modules 'Query Gene' and 'Query Drug' to help visualize the validation of CDK4/6 network and serve as an example on how users can use DepLink.

Supplementary Discussion
By querying with CDK6, we demonstrated the capability of DepLink to identify well-characterized gene-drug relationships and shed light on synergistic effects between CDK and MEK inhibitors and investigational CDK4/6 inhibitors.Analysis of drugs highly correlated to CDK6 knockout revealed two distinct clusters: well-known CDK4/6 inhibitors and novel MEK inhibitors.Our data revealed potential synergistic anti-tumor effects between the two types of inhibitor drugs, both of which are designed to inhibit cell proliferation while action on different pathways.MEK inhibitors target the MAPK pathway, which is responsible for regulating expression levels of D-type cyclins; these in turn regulate cell cycles.Cyclin D regulates CDK4 and CDK6 activation, and therefore plays a crucial role in various cancer types (Lavoie, et al., 1996;Scheiblecker, et al., 2020).Both CDK4 and CDK6 are part of the G1 cell cycle checkpoint and allow cells to enter S-phase (Meyerson and Harlow, 1994).In BRAF-V600E-mutated melanomas that are refractory to BRAF/MEK inhibitors, a combination therapy of CDK4/6 (palbociclib) and MEK (trametinib) inhibitors significantly suppressed cell-cycle pathways and reduced tumor growth both in vivo and in vitro (Nassar, et al., 2021).This demonstrating example shows the capability of DepLink in not only identifying well-known drugs but also shedding light on novel mechanisms that warrant further investigations.

• Per-gene P-value:
For a gene g, a one-tailed t-test was performed to calculate Z-scores across cell lines, { & (, )| ∈ [1. .& ]}, against zero to ensure the drug significantly up-(or down-) regulates the gene across different cells.•Across-gene P-value: An average Z-score was calculated per gene across cell lines to measure the overall change in gene expression levels by a drug, i.e.,  & To generate the network using DepLink, one should search 'PRISM' from the output table (Panel 1H) under the module 'Query Gene', sort the column 'Correlation with Broad DepMap', and the CDK6 network is generated in Panel 1I.Similarly, the top 10 genes from each of the CDK4/6 and MEK inhibitors from Fig.2Cof main text can be visualized from the DepLink by entering the drug name in the 'Query Drug' module and select the PRISM database.Upon sorting the column 'Correlation with Broad DepMap', the top 10 genes are selected from Panel 3H of the web tool, excluding CDK6.If any two or more genes exhibited the same correlation with a query drug, either one of them was selected.The data table was exported to Cytoscape for better network visualization.Finally, the network was slightly modified using Adobe Illustrator for publication purposes.