DeepSynergy: predicting anti-cancer drug synergy with Deep Learning

Abstract Motivation While drug combination therapies are a well-established concept in cancer treatment, identifying novel synergistic combinations is challenging due to the size of combinatorial space. However, computational approaches have emerged as a time- and cost-efficient way to prioritize combinations to test, based on recently available large-scale combination screening data. Recently, Deep Learning has had an impact in many research areas by achieving new state-of-the-art model performance. However, Deep Learning has not yet been applied to drug synergy prediction, which is the approach we present here, termed DeepSynergy. DeepSynergy uses chemical and genomic information as input information, a normalization strategy to account for input data heterogeneity, and conical layers to model drug synergies. Results DeepSynergy was compared to other machine learning methods such as Gradient Boosting Machines, Random Forests, Support Vector Machines and Elastic Nets on the largest publicly available synergy dataset with respect to mean squared error. DeepSynergy significantly outperformed the other methods with an improvement of 7.2% over the second best method at the prediction of novel drug combinations within the space of explored drugs and cell lines. At this task, the mean Pearson correlation coefficient between the measured and the predicted values of DeepSynergy was 0.73. Applying DeepSynergy for classification of these novel drug combinations resulted in a high predictive performance of an AUC of 0.90. Furthermore, we found that all compared methods exhibit low predictive performance when extrapolating to unexplored drugs or cell lines, which we suggest is due to limitations in the size and diversity of the dataset. We envision that DeepSynergy could be a valuable tool for selecting novel synergistic drug combinations. Availability and implementation DeepSynergy is available via www.bioinf.jku.at/software/DeepSynergy. Supplementary information Supplementary data are available at Bioinformatics online.


S1 Content
This report gives supplementary information to the manuscript "DeepSynergy: Prediction of anticancer drug synergies with Deep Learning". It provides more detailed information in the following three sections: data set, methods and results. The first section describes the drugs and cell lines the data set consisted of. The second section informs about the hyperparamter space for the different methods and about the order independence of Deep Synergy. The last section provides further information for the 3 different cross validation strategies. Table S1 displays the cancer cell lines included in the Merck oncology combination screen. The 39 cell lines originated from 7 different tissue types. Table S2 displays the 38 drugs tested in the Merck oncology combination screen. 14 experimental and 24 approved anticancer drugs with diverse targets, modes of action and structure were tested in pairwise combinations against the 39 cell lines. Those in the 'exhaustive' set were combined with all compounds in the set, whereas those in the 'supplemental' set only featured in combination with those in the 'exhaustive' set  -211H  PLEURA  NCI-H1650  LUNG  NCI-H2122  LUNG  NCI-H23  LUNG  NCI-H460  LUNG  NCI-H520  LUNG  OCUB-M  BREAST  OV-90  OVARY  OVCAR-3  OVARY  PA-1  OVARY  RKO  LARGE_INTESTINE  RPMI-7951  SKIN  SK-MEL-30  SKIN  SK-MES-1  LUNG  SK-OV-3  OVARY  SW620  LARGE_INTESTINE  SW837  LARGE_INTESTINE  T47D  BREAST  UACC-62  SKIN  UWB1_289  OVARY  UWB1_289_BRCA1 OVARY VCAP PROSTATE ZR-75-1 BREAST   Order Independence. Drug combinations were presented twice to DeepSynergy in order to generate an order independent network. Both orders (drug A -drug B and drug B -drug A) were used for training and prediction. Therefore, each combination was propagated twice through the network. Figure S1 shows the predictions for the two different ways of ordering. All values are close to the identity line and Pearson correlation coefficient of 0.98 was achieved, which shows that the network is able to neglect the order of the drug combination. Figure S1: Scatter plot of the predictions obtained by the two different orderings of drug combinations. On the x-axis and y-axis the predictions for the orderings drug A -drug B -cell line and drug B -drug A -cell line are shown, respectively. The Pearson correlation coefficient between the two predictions is 0.98.

S4 Results
Predictive performance on novel drug combinations. In addition to the results shown in the main manuscript we provide the (ROC) and precision recall (PR) curves ( Figure S2 and S3), respectively. Figure S2: Receiver operating characteristics (ROC) curves for all methods averaged over the 5 cross validation folds. Averaged ROC curves are shown as solid lines. Error bars in terms of one standard deviation are shown as shaded areas. The mean area under curve ± standard deviation is displayed in the legend.

S4 Results
Figure S3: Precision recall (PR) curves for all methods averaged over the 5 cross validation folds. Averaged PRC curves are shown as solid lines. Error bars in terms of one standard deviation are shown as shaded areas. The mean area under curve ± standard deviation is displayed in the legend.

9
Measured and predicted synergy scores. Figures S4 and S5 display the distributions of the measured and predicted synergy scores per cell line and drug, respectively. The distributions are ordered by their correlation coefficient between measured and predicted values. Neither the distributions of the predicted nor of the true synergy scores are associated with the performance. Predictive performance on novel drugs. We performed a method comparison with respect to the predictive performance on novel drugs, for which we used "leave drugs out" stratified cross validation strategy (see column 3 of Figure 3 in main manuscript) to evaluate the performance. Table  S8 shows the methods comparison based on the mean squared error (MSE) with corresponding confidence intervals and p-values. Furthermore, we provide the mean root mean squared error RMSE) and the mean Pearson correlation coefficient over the 38 drugs. Overall, all methods yield a low predictive value and thus do not generalize well enough in order to reliably predict novel drugs. We assume that the low predictive performance is caused by the low number of training examples. Concretely, all models can only be trained on 38 drugs, whereas the space of possible drugs is much larger. In Figure S6 and S7 the methods are compared based on their receiver operating characteristics (ROC) and precision recall (PR) curves obtained on the leave drugs out cross validation, respectively. Figure S6: Receiver operating characteristics (ROC) curves for all methods averaged over drugs.
Averaged ROC curves are shown as solid lines. Error bars in terms of one standard deviation are shown as shaded areas. The mean area under curve ± standard deviation is displayed in the legend. Figure S7: Precision recall (PR) curves for all methods averaged over drugs. Averaged PRC curves are shown as solid lines. Error bars in terms of one standard deviation are shown as shaded areas. The mean area under curve ± standard deviation is displayed in the legend.  Predictive performance on novel cell lines. We performed a method comparison with respect to the predictive performance on novel drugs, for which we used "leave cell lines out" stratified cross validation strategy (see column 4 of Figure 3 in main manuscript) to evaluate the performance. Table S9 shows the methods comparison based on the mean squared error (MSE) with corresponding confidence intervals and p-values. Furthermore, we provide the mean root mean squared error (RMSE) and the mean Pearson correlation coefficient over the 39 cell lines. Overall, all methods yield a low predictive value and thus do not generalize well enough in order to reliably predict novel cell lines. We assume that the low predictive performance is caused by the low number of training examples. Concretely, all models can only be trained on 39 cell lines, whereas the space of cancer cell lines is much larger. In Figure S8 and S9 the methods are compared based on their receiver operating characteristics (ROC) and precision recall (PR) curves obtained on the  Table S9: Methods comparison for the leave one cell line out cross validation based on mean squared error (MSE) with corresponding confidence intervals and p-values, mean root mean squared error (RMSE) as well as mean Pearson correlation coefficient over the 39 cell lines. Figure S8: Receiver operating characteristics (ROC) curves for all methods averaged over cell lines. Averaged ROC curves a re shown as solid lines. Error bars in terms of one standard deviation are shown as shaded areas. The mean area under curve ± standard deviation is displayed in the legend. Figure S9: Precision recall (PR) curves for all methods averaged over cell lines. Averaged PRC curves are shown as solid lines. Error bars in terms of one standard deviation are shown as shaded areas. The mean area under curve ± standard deviation is displayed in the legend.