NeoDTI: neural integration of neighbor information from a heterogeneous network for discovering new drug–target interactions

Wan, Fangping; Hong, Lixiang; Xiao, An; Jiang, Tao; Zeng, Jianyang

doi:10.1093/bioinformatics/bty543

Abstract

Motivation

Accurately predicting drug–target interactions (DTIs) in silico can guide the drug discovery process and thus facilitate drug development. Computational approaches for DTI prediction that adopt the systems biology perspective generally exploit the rationale that the properties of drugs and targets can be characterized by their functional roles in biological networks.

Results

Inspired by recent advance of information passing and aggregation techniques that generalize the convolution neural networks to mine large-scale graph data and greatly improve the performance of many network-related prediction tasks, we develop a new nonlinear end-to-end learning model, called NeoDTI, that integrates diverse information from heterogeneous network data and automatically learns topology-preserving representations of drugs and targets to facilitate DTI prediction. The substantial prediction performance improvement over other state-of-the-art DTI prediction methods as well as several novel predicted DTIs with evidence supports from previous studies have demonstrated the superior predictive power of NeoDTI. In addition, NeoDTI is robust against a wide range of choices of hyperparameters and is ready to integrate more drug and target related information (e.g. compound–protein binding affinity data). All these results suggest that NeoDTI can offer a powerful and robust tool for drug development and drug repositioning.

Availability and implementation

The source code and data used in NeoDTI are available at: https://github.com/FangpingWan/NeoDTI.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Identifying drug–target interactions (DTIs) through computational approaches can greatly narrow down the large search space of drug candidates for downstream experimental validation, and thus significantly reduce the high cost and the long period of developing a new drug (Langley et al., 2017). Currently, the structure based (Morris et al., 2009), ligand-similarity based (Keiser et al., 2007) and machine learning based methods (Luo et al., 2017; Yuan et al., 2016) are three main classes of prediction approaches in computational aided drug screening. The structure based methods generally require the three-dimensional structures of proteins and have limited performance for those proteins with unknown structures, which unfortunately is the case for a majority of targets. The ligand-similarity based methods exploit the common knowledge of known interacting ligands to make prediction. Such approaches cannot lead to confident prediction results if the compound of interest is not indicated in the library of reference ligands. Recently, the machine learning based methods (Bleakley and Yamanishi, 2009; Luo et al., 2017), which fully exploit the latent correlations among the related features of drugs and targets have become a highly promising strategy for DTI prediction. For instance, the DTI network data have been integrated with the drug-structure and protein sequence information into a network-based machine learning model (e.g. a regularized least squares framework) for predicting new DTIs (van Laarhoven et al., 2011; van Laarhoven and Marchiori, 2013; Xia et al., 2010). Inspired by the recent surge of deep learning techniques, models with higher predictive capacity have also been developed in various drug discovery settings (e.g. compound–protein interaction prediction, drug discovery with one-shot learning) (Altae-Tran et al., 2017; Hamanaka et al., 2017; Tian et al., 2016; Wang and Zeng, 2013; Wan and Zeng, 2016; Xu et al., 2017).

In addition to known DTI data, chemical structure, protein sequence information and other properties of drugs and targets can also be characterized by their various functional roles in biological systems (e.g. protein–protein interactions and drug–disease associations). Indeed, by integrating diverse information from heterogeneous data sources, methods like DTINet (Luo et al., 2017), MSCMF (Zheng et al., 2013) and HNM (Wang et al., 2014) can further improve the accuracy of DTI prediction. However, these methods still suffer from certain limitations that need to be addressed. For example, in MSCMF (Zheng et al., 2013), the employed matrix factorization operation of a given DTI network is regularized by the corresponding drug and protein similarity matrices, which are obtained by integrating multiple data sources through a weighted averaging scheme. Under such a data integration strategy, substantial loss of information may occur and thus result in a sub-optimal solution. DTINet (Luo et al., 2017) first uses an unsupervised manner to automatically learn low-dimensional feature representations of drugs and targets from heterogeneous network (HN) data, and then applies inductive matrix completion (Natarajan and Dhillon, 2014) to predict new DTIs based on the learnt features. In such a framework, separating feature learning from the prediction task at hand may not yield the optimal solution, as the features learnt from the unsupervised learning procedure may not be the most suitable representations of drugs or targets for the final DTI prediction task. In addition, by constraining the learning models to only take relatively simple forms (e.g. bilinear or log-bilinear functions), these methods may not be sufficient enough to capture the complex hidden features behind the heterogeneous data. Recent advance of information passing and aggregation techniques that generalize the conventional convolution neural networks (CNNs) to large-scale graph data have shown substantial performance improvement on the network-related prediction tasks (Gilmer et al., 2017; Hamilton et al., 2017). This inspires us to incorporate deeper learning models to extract complex information from a highly HN and discover new DTIs.

In this paper, we propose a new framework, called NeoDTI (NEural integration of neighbOr information for DTI prediction) to predict new DTIs from heterogeneous data. NeoDTI integrates neighborhood information of the HN constructed from diverse data sources via a number of information passing and aggregation operations, which are achieved through the non-linear feature extraction by neural networks. After that, NeoDTI applies a network topology-preserving learning procedure to enforce the extracted feature representations of drugs and targets to match the observed networks. Comprehensive tests on several challenging and realistic scenarios in DTI prediction have demonstrated that our end-to-end prediction model can significantly outperform several baseline prediction methods. Moreover, several novel DTIs predicted by NeoDTI with evidence supports from previous studies in the literature further indicate the strong predictive power of NeoDTI. In addition, the robustness of NeoDTI and its extendability to integrate more heterogeneous data (e.g. compound–protein binding affinity data) have been examined through various tests. All these results suggest that NeoDTI can provide a powerful and useful tool in predicting unknown DTIs, and thus advance the drug discovery and repositioning fields.

2 Materials and methods

2.1 Problem formulation

NeoDTI predicts unknown DTIs from a drug and target related HN, in which drugs, targets and other objects are represented as nodes, and DTIs and other interactions or associations are represented as edges. We first introduce the definition of a HN.

Definition 1 (heterogeneous network). A HN is defined as a directed (or undirected) graph $G = (V, E)$ ⁠, in which each node v in the node set V belongs to an object type from an object type set O, and each edge e in the edge set $E \subset V \times V \times R$ belongs to a relation type from a relation type set R.

The datasets used in our framework to construct the HN (also see Section 3.1) include the object type set $O = {d r u g, t a r g e t, s i d e - e f f e c t, d i s e a s e}$ ⁠, and the relation type set $R = {d r u g - s t r u c t u r e - s i m i l a r i t y, d r u g - s i d e - e f f e c t - a s s o c i a t i o n, d r u g - p r o t e i n - i n t e r a c t i o n, d r u g - d r u g - i n t e r a c t i o n, d r u g - d i s e a s e - a s s o c i a t i o n, p r o t e i n - s e q u e n c e - s i m i l a r i t y, p r o t e i n - d r u g - i n t e r a c t i o n, p r o t e i n - d i s e a s e - a s s o c i a - t i o n, p r o t e i n - p r o t e i n - i n t e r a c t i o n, d i s e a s e - p r o t e i n - a s s o c i a t i o n, d i s e a s e - d r u g - a s s o c i a t i o n, s i d e - e f f e c t - d r u g - a s s o c i a t i o n}$ ⁠. In our current framework, each node only belongs to a single object type although it can be relatively easily extended to a multi-object-type mapping scenario. In addition, all edges are undirected and non-negatively weighted. Also, the same two nodes can be linked by more than one edge, e.g. two drugs can be linked by a $d r u g - d r u g - i n t e r a c t i o n$ edge and a $d r u g - s t r u c t u r e - s i m i l a r i t y$ edge simultaneously.

Given an HN G, NeoDTI aims to automatically learn a network topology-preserving node-level embedding (i.e. a function that maps nodes to their corresponding feature representations that preserve the original topological characteristics as much as possible) from G that can be used to greatly facilitate the prediction of DTIs. Most existing techniques for learning the embeddings of structured data mainly exploit the rationale that the elements of these structured data can be well characterized by their contextual information. For example, in natural language processing, the Word2vec technique (Mikolov et al., 2013) enforces the embedding of words to preserve the semantic relationships with their corresponding surrounding words. The graph embedding techniques, such as Deepwalk (Perozzi et al., 2014) and metapath2vec (Dong et al., 2017), have extended this embedding strategy to further learn the latent representations of network data. Recent advance in generalizing CNNs to analyze large-scale graph data (Defferrard et al., 2016; Kipf and Welling, 2016) and the integration of the information passing and aggregation techniques with different graph convolution operations into a unified framework (Gilmer et al., 2017; Hamilton et al., 2017) have brought significant performance improvement for many network-related prediction tasks, such as predicting the biological activities of small molecules, graph signal processing and social network data analysis. Similar to GraphSAGE (Hamilton et al., 2017) and message passing neural networks (MPNNs) (Gilmer et al., 2017), our framework NeoDTI also applies neural networks to integrate neighborhood information from individual nodes. However, unlike GraphSAGE which mainly focuses on learning a node-level embedding from a homogeneous network or MPNNs which aim at learning a graph-level embedding from heterogeneous graphs for predicting molecular properties, NeoDTI focuses on learning a node-level embedding from a HN. In addition, to the best of our knowledge, NeoDTI is the first framework to systematically integrate the neural information passing and aggregation techniques with the topology-preserving optimization scheme into an end-to-end learning framework to extract the latent features of drugs and targets from a HN to make DTI prediction.

2.2 The workflow of NeoDTI

NeoDTI consists of the following three main steps: (i) neighborhood information aggregation; (ii) updating the node embedding and (iii) topology-preserving learning of the node embedding. Through Steps (i) and (ii), each node in a given HN generates a new feature representation by integrating its neighborhood information with its own features. Through Step (iii), we enforce the embedding of nodes to be topology-preserving, which is useful for extracting the topological features of individual nodes for accurate DTI prediction. Next, we will introduce the mathematical formulations of these three steps.

Definition 2 (neighborhood information aggregation). Given an HN G, an initial node embedding function

f^{0} : V \to ℝ^{d}

that maps each node

v \in V

to its d-dimensional vector representation

f^{0} (v)

and an edge weight mapping function

s : E \to ℝ

that maps each edge

e \in E

to its edge weight s(e), neighborhood information aggregation for node v is defined as:

a_{v} = \sum_{r \in R} \underset{n e i g h b o r h o o d i n f o r m a t i o n a g g r e g a t i o n w i t h r e s p e c t t o e d g e t y p e r}{\underset{︸}{\sum_{\begin{matrix} u \in N_{r} (v), \\ e = (u, v, r) \in E \end{matrix}} \frac{s (e)}{M_{v, r}} σ (W_{r} f^{0} (u) + b_{r})}},

(1)

where

N_{r} (v) = {u, u \in V, u \neq v, (u, v, r) \in E}

denotes the set of adjacent nodes connected to

v \in V

through edges of type

r \in R, σ (\cdot)

stands for a nonlinear activation function over a single-layer neural network parameterized by weights

W_{r} \in ℝ^{d \times d}

and a bias term

b_{r} \in ℝ^{d}

, and

stands for a normalization term.

More specifically, for each edge-type r, the neighborhood information aggregation operation for node v with respect to r can be obtained by first nonlinearly transforming the embedded feature representations of the corresponding adjacent nodes $f^{0} (u), u \in N_{r} (v)$ through an edge-type specific single-layer neural network that is parameterized by weights $W_{r} \in ℝ^{d \times d}$ ⁠, a bias term $b_{r} \in ℝ^{d}$ and a nonlinear activation function $σ (\cdot)$ ⁠, and then averaging by the normalized edge weight, i.e. $\frac{s (e)}{M_{v, r}}$ ⁠. Finally, the output of the neighborhood information aggregation operation a_v for node v is the summation of neighborhood information aggregation with respect to every edge-type r. Here, the initial node embedding $f^{0} (u), \forall u \in V$ is obtained through a random mapping.

Definition 3 (updating the node embedding). Given the aggregated neighbor information a_v’s for all nodes v’s, the process of updating the node embedding is defined as:

f^{1} (v) = \frac{σ (W^{1} c o n c a t (f^{0} (v), a_{v}) + b^{1})}{| | σ (W^{1} c o n c a t (f^{0} (v), a_{v}) + b^{1}) | |_{2}} .

(2)

The above equation states that the new embedding of node $f^{1} (v)$ can be obtained using a single-layer neural network that is parameterized by weights $W^{1} \in ℝ^{d \times (2 d)}$ ⁠, a bias term $b^{1} \in ℝ^{d}$ and a nonlinear activation function $σ (\cdot)$ to nonlinearly transform the concatenation of the original embedding $f^{0} (v)$ and the neighborhood aggregation information a_v, and then normalized by its l₂ norm.

Noted that in principle we could repeat the previous two steps alternately several times to produce more embeddings of nodes (e.g. $f^{2} (\cdot), f^{3} (\cdot), \dots$ ⁠). In practice, we find that we only need to conduct such a process once to obtain reasonably good prediction results, according to our validation tests (as described in Supplementary Materials). In the rest part of this section, we will mainly use $f^{1} (\cdot)$ to demonstrate our algorithm for convenience. In addition, we choose to use $R e L U (x) = m a x (0, x)$ as the activation function $σ (\cdot)$ ⁠.

Definition 4 (topology-preserving learning of the node embedding). Given the embedding of nodes

f^{1} (\cdot)

⁠, topology-preserving learning of the node embedding is defined as:

\min_{\begin{matrix} {f^{0} (u), W^{1}, \\ b^{1}, W_{r}, b_{r}, \\ G_{r}, H_{r}, | u \in V, r \in R} \end{matrix}} \sum_{r \in R} \sum_{\overset{u, v \in V,}{e = (u, v, r) \in E}} {[s (e) - f^{1} {(u)}^{⊤} G_{r} H_{r}^{⊤} f^{1} (v)]}^{2},

(3)

where

G_{r}, H_{r} \in ℝ^{d \times k}

are edge-type specific projection matrices.

The above equation states that, after edge-specific projections of $f^{1} (u)$ and $f^{1} (v)$ by G_r and H_r, respectively, the inner product of the two projected vectors should reconstruct the original edge weight s(e) as much as possible. Note that a similar reconstruction strategy has also been used in (Luo et al., 2017; Natarajan and Dhillon, 2014) to solve the link prediction problems. In addition, if the edge-type r is symmetric, i.e. $r \in {d r u g - s t r u c t u r e - s i m i l a r i t y, p r o t e i n - s e q u e n c e - s i m i l a r i t y, d r u g - d r u g - i n t e r a c t i o n, p r o t e i n - p r o t e i n - i n t e r a c t i o n}$ ⁠, we use the tie weights (i.e. G_r = H_r) to enforce this symmetric property. Here, the summation of the squared reconstruction errors is minimized for all edges with respect to all unknown parameters. Since all mathematical operations in Equations (1), (2) and (3) are differentiable or subdifferentiable (e.g. for the ReLU activation function), all parameters can be trained through an end-to-end manner by performing gradient descent to minimize the final objective function described in Equation (3).

Finally, after Step (iii), the predicted interaction confidence score between drug node u and protein node v can be obtained by

\begin{matrix} f^{1} {(u)}^{⊤} G_{r} H_{r}^{⊤} f^{1} (v), \\ subject to ϕ (u) = d r u g, \\ ϕ (v) = p r o t e i n, \\ r = d r u g - p r o t e i n - i n t e r a c t i o n, \end{matrix}

(4)

where

ϕ (u)

and

ϕ (v)

stand for the node types of u and v, respectively, and r represents their edge-type.

The above operation is equivalent to reconstructing the

d r u g - p r o t e i n

edge weight between nodes u and v. By collecting

f^{1} (u)

’s for all drugs and

f^{1} (v)

’s for all targets, we can form a drug feature matrix F_drug and a target feature matrix F_target. Then, the reconstructed DTI matrix can be written as:

W_{D T I_r e c o n s t r u c t} = F_{d r u g} G_{r} H_{r}^{⊤} F_{t a r g e t}^{⊤} .

(5)

In this sense, we can consider our DTI prediction task as a matrix factorization or completion problem. However, unlike the conventional matrix factorization approaches (Natarajan and Dhillon, 2014; Zheng et al., 2013), NeoDTI incorporates a deeper learning model to construct the feature matrices F_d and F_t by explicitly defining the construction processes of F_d and F_t through Steps (i) and (ii). In addition, through these two steps, NeoDTI incorporates the prior knowledge of network topology into F_d and F_t and specifies the forms of these two matrices to guide the downstream optimization process. As a result, NeoDTI prevents the DTI network as well as other networks from being factorized arbitrarily in Step (iii), which can serve as a useful regularizer and thus lead to performance improvement for DTI prediction (as also demonstrated in our cross-validation tests; see the Results section).

3 Results

3.1 Datasets

We adopted the datasets that were curated in our previous study (Luo et al., 2017), which included six individual drug/protein related networks: drug–protein interaction and drug–drug interaction networks [interactions were extracted from Drugbank Version 3.0 (Knox et al., 2011)], the protein–protein interaction network [interactions were extracted from the HPRD database Release 9 (Keshava Prasad et al., 2009)], drug–disease association and protein–disease association networks [associations were extracted from the Comparative Toxicogenomics Database (Davis et al., 2013)] and the drug-side-effect association network [associations were extracted from the SIDER database Version 2 (Kuhn et al., 2010)]. The basic statistics of these datasets can be found in Supplementary Table S1. We also incorporated drug chemical structure information as well as protein sequence information by creating two extra networks: the drug-structure similarity network [i.e. a pair-wise chemical structure similarity network measured by the dice similarities of the Morgan fingerprints with radius 2 (Rogers and Hahn, 2010), which were computed by RDKit (http://www.rdkit.org)] and the protein sequence similarity network [which was obtained based on the pair-wise Smith–Waterman scores (Smith and Waterman, 1981)). All networks had binary edge weights (one represents a known interaction or association, and zero otherwise) except the drug-structure similarity and the protein sequence similarity networks, which had non-negative real-valued edge weights. We combined all these eight networks to construct the HN (Fig. 1) for evaluating the prediction performance of NeoDTI.

Fig. 1.

Open in new tab Download slide

The schematic workflow of NeoDTI. (a) NeoDTI uses eight individual drug or target related networks (see Section 3.1 for more details of the used datasets). (b) NeoDTI first constructs a heterogeneous network from these eight networks. Different types of nodes are connected by distinct types of edges. Two nodes can be connected by more than one edge (e.g. a solid link representing drug–drug-interaction and a dashed link representing drug-structure-similarity). In addition, NeoDTI associates each node with a feature representation. (c) To extract information from neighborhood, each node adopts a neighborhood information aggregation operation (see Definition 2 in the main text). Each colored arrow represents a specific aggregation function with respect to a specific edge-type. Then each node updates its feature representation by integrating its current representation with the aggregated information (see Definition 3 in the main text). (d) By enforcing the node features to reconstruct the original individual networks as much as possible (see Definition 4 in the main text), NeoDTI effectively learns the topology-preserving node features that are useful for drug–target interaction prediction

3.2 NeoDTI yields superior performance in predicting new drug–target interactions

The DTI prediction can be considered as a binary classification problem, in which the known interacting drug–target pairs are regarded as positive examples, while the unknown interacting pairs are treated as negative examples. Several challenging and realistic scenarios were considered in our tests to evaluate the prediction performance of NeoDTI. The hyperparameters of NeoDTI were determined using an independent validation set (as described in Supplementary Materials). We first ran a 10-fold cross-validation test on all positive pairs and a set of randomly sampled negative pairs, whose number was 10 times as many as that of positive samples. This scenario basically mimicked the practical situation in which the DTIs are sparsely labeled. For each fold, a randomly chosen subset of 90% positive and negative pairs was used as training data to construct the HN and then train the parameters of NeoDTI (i.e. during the topology-preserving learning process, we only calculated the reconstruction loss of the DTI network with respect to training data, while the reconstruction losses of other types of networks were computed as usual), and the remaining 10% positive and negative pairs were held out as the test set. We also compared the performance of NeoDTI with that of six baseline methods, including DTINet (Luo et al., 2017), HNM (Wang et al., 2014), MSCMF (Zheng et al., 2013), NetLapRLS (Xia et al., 2010), DT-Hybrid (Alaimo et al., 2013) and BLMNII (Mei et al., 2013). The details on how to integrate heterogeneous data and how to determine the hyperparameters in these baseline methods can be found in Section 2 of Supplementary Materials. The area under precision recall (AUPR) curve and the area under receiver operating characteristic (AUROC) curve were used to evaluate prediction performance of all prediction methods. We observed that NeoDTI greatly outperformed other baseline methods, with significant improvement (3.5% in terms of AUPR and 3.0% in terms of AUROC) over the second best method (Fig. 2a and Supplementary Fig. S1a).

Fig. 2.

Open in new tab Download slide

Performance evaluation of NeoDTI on several challenging scenarios in terms of the AUPR scores. (a) A 10-fold cross-validation test in which the ratio between positive and negative samples was set to $1 : 10$ ⁠. (b) A 10-fold cross-validation test in which all unknown drug–target interacting pairs were considered. (c–e) Ten-fold cross-validation with positive: negative ratios = $1 : 10$ on several scenarios of removing redundancy in data: (c) DTIs with similar drugs and proteins were removed; (d) DTIs with drugs sharing similar drug interactions were removed; (e) DTIs with drugs sharing similar side-effects were removed. (f) NeoDTI was trained on non-unique drug–target interacting pairs and tested on unique drug–target interacting pairs. More details on the baseline methods can be found in Supplementary Materials. All results were summarized over 10 trials and expressed as mean±SD

Next, we further increased the positive-negative ratio by including all negative examples (i.e. all unknown drug–target interacting pairs) in the 10-fold cross-validation procedure (the ratio between positive and negative samples was around $1.8 \times 10^{- 3}$ ⁠). We observed a larger AUPR improvement (14.1%) over the second best method (Fig. 2b). Although NeoDTI, DTINet, HNM and NetLapRLS achieved comparable results in terms of AUROC in this scenario (Supplementary Fig. S1b), as also stated in previous work (Davis and Goadrich, 2006), here, AUPR generally provides a more informative criterion than AUROC for the highly skewed datasets. Since drug discovery is generally a needle-in-a-haystack problem, the substantial improvement in AUPR truely demonstrated the superior prediction performance of NeoDTI over other methods.

Since the datasets may contain ‘redundant’ DTIs (i.e. a same protein is connected to more than one similar drugs and vice versa), the prediction performance can be easily inflated by easy predictions in this case (Luo et al., 2017). To consider this issue, we followed the same evaluation strategies as in (Luo et al., 2017) by conducting the following additional 10-fold cross-validation tests: (i) removing DTIs with similar drugs (i.e. drug chemical structure similarities >0.6) or similar proteins (i.e. protein sequence similarities $> 40 %$ ⁠); (ii) removing DTIs with drugs sharing similar drug interactions (i.e. Jaccard similarities >0.6); (iii) removing DTIs with drugs sharing similar side-effects (i.e. Jaccard similarities >0.6); (iv) removing DTIs with drugs or proteins sharing similar diseases (i.e. Jaccard similarities >0.6). In all these test scenarios, we kept the ratios between positive and negative samples to be 1:10. As expected, we observed a drop in prediction performance for all prediction methods after the removal of redundant DTIs (Fig. 2c–e and Supplementary Fig. S1c–g). However, NeoDTI still consistently outperformed other prediction methods in terms of both AUPR and AUROC, which also indicated the robustness of NeoDTI after removing the redundancy in data.

In dyadic prediction, if a dataset contains many drugs or targets with only one interacting partner, conventional cross-validation may not be a proper way to evaluate the prediction performance. Here, we call such drugs, proteins and interactions as ‘unique’. In such a case, conventional training methods may lean to exploit the bias toward those unique drugs and targets to boost the performance (van Laarhoven and Marchiori, 2014). To investigate this issue, we further evaluated the prediction performance of NeoDTI by separating unique DTIs from non-unique ones. That is, all methods were trained on non-unique DTIs and then evaluated on unique DTIs. Note that in such a case, the negative examples in the test data were sampled by enforcing the corresponding drugs or targets (or both) to be unique. This scenario basically mimicked the situation in which the DTIs of new drugs or targets are predicted without much prior DTI knowledge. We found that NeoDTI significantly outperformed all the baseline methods at least by 13.3% in terms of AUPR, which suggested that NeoDTI can have a much better generalization capacity over other state-of-the-art methods, when predicting new DTIs for those drugs or targets without much prior DTI knowledge.

3.3 Robustness of NeoDTI

In this section, we further evaluated the robustness of NeoDTI by varying different types of data used in the HN as well as the hyperparameters of NeoDTI. All computational experiments in this section were conducted using a 10-fold cross-validation procedure in which the ratios of positive versus negative samples were set to $1 : 10$ ⁠.

To examine the effects of incorporating heterogeneous data, we first evaluated the performance of NeoDTI when being trained using only the drug–protein interaction network. We observed a substantial drop of prediction performance (11.1% in terms of AUPR and 9% in terms of AUROC), compared to that of the original NeoDTI model trained on all eight networks (Fig. 3a). We then investigated the effects of incorporating individual networks by training NeoDTI again on a HN constructed from each individual network and the drug–protein interaction network. As expected, we found that adding individual drug or target related networks can improve the prediction performance (Fig. 3a and Supplementary Fig. S2a–e). These results suggested that diverse information from multiple data sources can better characterize the latent properties of drugs and targets, and thus incorporating heterogeneous information is necessary to improve the accuracy of DTI prediction. In addition, to examine whether NeoDTI can also be easily extended to incorporate more drug or target related information beyond the previously used datasets, we further incorporated compound-protein binding affinity information into the HN. More specifically, we collected all the binding affinity data between drug-like compounds and proteins that satisfied $K i \leq 1.0 n m$ from the ZINC15 database (Sterling and Irwin, 2015). In total, we extracted 1696 edges that connected 1244 compounds to the proteins used in our previous datasets. We set the negative logarithm of Ki between a pair of compound and protein as the weight of their corresponding interaction. We also linked the compounds and drugs by drug-structure-similarity edges. In addition, if a compound and a drug were similar (i.e. chemical structure similarities >0.6) and connected to the same protein in the test set, we removed this compound–protein pair from the training set to ease the inflation of prediction performance that may be resulted from the redundancy in data. We found that NeoDTI trained on this new HN further improved AUPR from 85.3 to 86.2% and AUROC from 94.6 to 95.1% (Fig. 3b), which demonstrated the easy extendability of NeoDTI to integrate more heterogeneous information.

Fig. 3.

Open in new tab Download slide

Incorporating more drug or target related information can improve the prediction performance of NeoDTI. (a) Incorporating the drug-structure similarity network or protein sequence similarity network. (b) Incorporating the compound-protein binding affinity data. All results were summarized over 10 trials and expressed as mean±SD

The drug/protein-disease association edges used for constructing our HN were derived from the Comparative Toxicogenomics Database (Davis et al., 2013). These edges can be further distinguished as marker, therapeutic and inferred types. The investigation of the effect of distinguishing different edge-types on the prediction performance can be found in Section 3 of Supplementary Materials. We found that utilizing all binary drug/protein-disease associations without distinguishing individual edge-types yielded the best prediction performance.

In our topology-preserving learning of the node embedding, we enforce the feature representations of nodes to reconstruct all types of edges as much as possible. We further investigated the effect of this edge reconstruction strategy by conducting an additional test in which we constrained NeoDTI to only reconstruct the DTI edges. In this test, we observed a decrease in prediction performance with 5.5% in terms of AUPR and 2.7% in terms of AUROC (Supplementary Fig. S2f). Thus, reconstructing other types of network edges is useful for boosting the prediction performance. Such an operation probably serves as a beneficial regularizer to further overcome the potential overfitting problem. In the neighborhood information aggregation step, edge weight s(e) is normalized by $M_{v, r}$ [see Equation (1)]. The investigation of the effect of using $M_{v, r}$ on the prediction performance can be found in Section 4 of Supplementary Materials. We found that incorporating this term is necessary for yielding promising prediction performance.

In addition, we investigated the robustness of NeoDTI against different choices of hyperparameters: (i) for the dimension d of the node embedding, we tested $d = 256, 512$ and 1024; (ii) for the dimension k of the projection matrices, we tested $k = 256, 512$ and 1024; (iii) for the repetition time p of neighborhood information aggregation, we examined $p = 0, 1, 2$ and 3. We found that NeoDTI can produce relatively stable results over a wide range of choices for both d and k, although we observed that increasing the value of d can slightly improve the prediction results (Supplementary Fig. S3a, b). More importantly, we observed significant performance improvement when $p \geq 1$ ⁠, demonstrating the necessity of integrating neighborhood information for the representation learning of node features (Supplementary Fig. S3c). However, we found that increasing the repetition time of neighborhood information aggregation from one to three did not improve the prediction performance (Supplementary Fig. S3c). Thus, in practice, we only need to run the operation of integrating neighborhood information once. In such a case, the first dimension of W¹ in Equation (2) needs not be d. The investigation of the effect of this parameter on the prediction performance can be found in Section 4 of Supplementary Materials. We found that the choice of this parameter did not significantly affect the prediction performance.

3.4 NeoDTI reveals novel DTIs with literature supports

We also predicted the novel DTIs by training NeoDTI using the whole HN, including the aforementioned binding affinity data. We excluded those easy predictions by removing the predicted DTIs that were similar to the known DTIs (i.e. drug chemical structure similarities >0.6 and protein sequence similarities $> 40 %$ ⁠). We then analyzed the predicted DTIs whose prediction confidence scores were significant (three-sigma rule) with respect to the corresponding drugs and targets. The network visualization of the top 100 novel DTIs predicted by NeoDTI can be found in Figure 4.

Fig. 4.

Open in new tab Download slide

Network visualization of the top 100 novel drug–target interactions predicted by NeoDTI. Blue and orange nodes represent proteins and drugs, respectively. Dashed and solid lines represent the known and predicted drug–target interactions, respectively (Color version of this figure is available at Bioinformatics online.)

Among the top 20 predicted DTIs ranked according to their confidence scores, eight DTIs can be supported by previous studies in the literature (Supplementary Table S2). For instance, sorafenib, a drug previously approved for the treatment of advanced renal cell carcinoma, was predicted by NeoDTI to interact with the colony stimulating factor 1 receptor (CSF1R), which plays an important role in the development of mammary gland and mammary gland carcinogenesis (Tamimi et al., 2008). Such a prediction can be supported by a previous study indicating that sorafenib can block CSF1R and induce apoptosis in various classical Hodgkin lymphoma cell lines (Ullrich et al., 2011). In addition, the carbonic anhydrase 6 (CA6), an enzyme abundantly found in salivary glands, has been previously reported to be the target of three drugs, including zonisamide, ellagic acid and mafenide (Knox et al., 2011), was predicted by NeoDTI to also interact with acetazolamide. This prediction can be supported by the previous finding on the CA6 inhibitory activity of acetazolamide (Nishimori et al., 2007). Overall, these novel DTIs predicted by NeoDTI with literature supports further demonstrated its strong predictive power.

4 Conclusion

In this paper, we develop a new framework, called NeoDTI, to integrate diverse information from a HN to predict new DTIs. NeoDTI extracts the complex hidden features of drugs and targets by applying neural networks to integrate neighborhood information in the input HN. By simultaneously optimizing the feature extraction process and the DTI prediction model through an end-to-end manner, NeoDTI can achieve superior prediction performance over other state-of-the-art methods. The effectiveness and robustness of NeoDTI have been extensively validated on several realistic prediction scenarios and supported by the finding that many of the novel predicted DTIs agree well with the previous studies in the literature. Moreover, NeoDTI can incorporate more drug and target related information readily (e.g. compound–protein binding affinity data). Therefore, we believe that NeoDTI can provide a powerful and useful tool to facilitate the drug discovery and drug repositioning processes. In the future, we will further extend NeoDTI by integrating more heterogeneous information and validate some of the prediction results through wet-lab experiments.

Acknowledgements

The authors thank Yunan Luo and Hailin Hu for helpful discussions.

Funding

This work was supported in part by the National Natural Science Foundation of China [61472205, 81630103], China’s Youth 1000-Talent Program, Beijing Advanced Innovation Center for Structural Biology. Beijing Advanced Innovation Center for Structural biology, and the US National Science Foundation [IIS-1646333]. We acknowledge the support of NVIDIA Corporation with the donation of the Titan X GPU used for this research.

Conflict of Interest: none declared.

References

Alaimo

S.

et al. (

2013

)

Drug–target interaction prediction through domain-tuned network-based inference

.

Bioinformatics

,

29

,

2004

–

2008

.

Altae-Tran

H.

et al. (

2017

)

Low data drug discovery with one-shot learning

.

ACS Cent. Sci

.,

3

,

283

–

293

.

Bleakley

K.

,

Yamanishi

Y.

(

2009

)

Supervised prediction of drug-target interactions using bipartite local models

.

Bioinformatics

,

25

,

2397

–

2403

.

Davis

A.P.

et al. (

2013

)

The comparative toxicogenomics database: update 2013

.

Nucleic Acids Res

.,

41

,

D1104

–

D1114

.

Davis

J.

,

Goadrich

M.

(

2006

). The relationship between precision-recall and ROC curves. In:

Proceedings of the 23rd International Conference on Machine Learning

, pp.

233

–

240

.

ACM

,

New York, NY, USA

.

Defferrard

M.

et al. (

2016

)

Convolutional neural networks on graphs with fast localized spectral filtering

. In

Lee

D.D.

et al. (eds.)

Advances in Neural Information Processing Systems 29

. Curran Associates, Inc, pp.

3844

–

3852

. http://papers.nips.cc/paper/6081-convolutional-neural-networks-on-graphs-with-fast-localized-spectral-filtering.pdf.

Google Scholar

OpenURL Placeholder Text

WorldCat

Dong

Y.

et al. (

2017

) Metapath2vec: scalable representation learning for heterogeneous networks. In:

Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

, pp.

135

–

144

.

ACM

,

New York, NY, USA

.

Gilmer

J.

et al. (

2017

). Neural message passing for quantum chemistry. In:

Precup

D.

,

Teh

Y.W.

(ed.)

Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research

, pp.

1263

–

1272

,

International Convention Centre

,

Sydney, Australia

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Hamanaka

M.

et al. (

2017

)

Cgbvs-dnn: prediction of compound-protein interactions based on deep learning

.

Mol. Inform

.,

36

,

1600045.

Google Scholar

Crossref

WorldCat

Hamilton

W.

et al. (

2017

). Inductive representation learning on large graphs. In:

Guyon

I.

et al. (ed.)

Advances in Neural Information Processing Systems 30

.

Curran Associates, Inc

,

New York

, pp.

1025

–

1035

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Keiser

M.J.

et al. (

2007

)

Relating protein pharmacology by ligand chemistry

.

Nat. Biotechnol

.,

25

,

197

–

206

.

Keshava Prasad

T.

et al. (

2009

)

Human protein reference database2009 update

.

Nucleic Acids Res

.,

37

,

D767

–

D772

.

Kipf

T.N.

,

Welling

M.

(

2016

)

Semi-supervised classification with graph convolutional networks

.

arXiv

,

1609

,

02907

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Knox

C.

et al. (

2011

)

Drugbank 3.0: a comprehensive resource for ‘omics’ research on drugs

.

Nucleic Acids Res

.,

39

,

D1035

–

D1041

.

Kuhn

M.

et al. (

2010

)

A side effect resource to capture phenotypic effects of drugs

.

Mol. Syst. Biol

.,

6

,

1

–

343

.

Google Scholar

Crossref

WorldCat

Langley

G.R.

et al. (

2017

)

Towards a 21st-century roadmap for biomedical research and drug discovery: consensus report and recommendations

.

Drug Discov. Today

,

22

,

327

–

339

.

Luo

Y.

et al. (

2017

)

A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information

.

Nat. Commun

.,

8

,

Google Scholar

OpenURL Placeholder Text

WorldCat

Mei

J.-P.

et al. (

2013

)

Drug–target interaction prediction by learning from local information and neighbors

.

Bioinformatics

,

29

,

238

–

245

.

Mikolov

T.

et al. (

2013

)

Distributed representations of words and phrases and their compositionality

. In

Burges

C.J.C.

et al. (eds)

Advances in Neural Information Processing Systems 26

. Curran Associates, Inc., pp.

3111

–

3119

. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf.

Google Scholar

OpenURL Placeholder Text

WorldCat

Morris

G.M.

et al. (

2009

)

Autodock4 and autodocktools4: automated docking with selective receptor flexibility

.

J. Comput. Chem

.,

30

,

2785

–

2791

.

Natarajan

N.

,

Dhillon

I.S.

(

2014

)

Inductive matrix completion for predicting gene–disease associations

.

Bioinformatics

,

30

,

i60

–

i68

.

Nishimori

I.

et al. (

2007

)

Carbonic anhydrase inhibitors. dna cloning, characterization, and inhibition studies of the human secretory isoform vi, a new target for sulfonamide and sulfamate inhibitors

.

J. Med. Chem

.,

50

,

381

–

388

.

Perozzi

B.

et al. (

2014

) Deepwalk: online learning of social representations. In:

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

, pp.

701

–

710

,

ACM

,

New York, NY, USA

.

Rogers

D.

,

Hahn

M.

(

2010

)

Extended-connectivity fingerprints

.

J. Chem. Inf. Model

.,

50

,

742

–

754

.

Smith

T.F.

,

Waterman

M.S.

(

1981

)

Identification of common molecular subsequences

.

J. Mol. Biol

.,

147

,

195

–

197

.

Sterling

T.

,

Irwin

J.J.

(

2015

)

Zinc 15–ligand discovery for everyone

.

J. Chem. Inf. Model

,

55

,

2324

–

2337

.

Tamimi

R.M.

et al. (

2008

)

Circulating colony stimulating factor-1 and breast cancer risk

.

Cancer Res

.,

68

,

18

–

21

.

Tian

K.

et al. (

2016

)

Boosting compound-protein interaction prediction by deep learning

.

Methods

,

110

,

64

–

72

.

Ullrich

K.

et al. (

2011

)

Bay 43-9006/sorafenib blocks csf1r activity and induces apoptosis in various classical hodgkin lymphoma cell lines

.

Br. J. Haematol

.,

155

,

398

–

402

.

van Laarhoven

T.

,

Marchiori

E.

(

2013

)

Predicting drug-target interactions for new drug compounds using a weighted nearest neighbor profile

.

PLoS One

,

8

,

e66952.

van Laarhoven

T.

,

Marchiori

E.

(

2014

) Biases of drug–target interaction network data. In:

IAPR International Conference on Pattern Recognition in Bioinformatics

,

Springer

,

Berlin

, pp.

23

–

33

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

van Laarhoven

T.

et al. (

2011

)

Gaussian interaction profile kernels for predicting drug–target interaction

.

Bioinformatics

,

27

,

3036

–

3043

.

Wan

F.

,

Zeng

J.

(

2016

)

Deep learning with feature embedding for compound-protein interaction prediction

.

bioRxiv

,

086033

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Wang

W.

et al. (

2014

)

Drug repositioning by integrating target information through a heterogeneous network model

.

Bioinformatics

,

30

,

2923

–

2930

.

Wang

Y.

,

Zeng

J.

(

2013

)

Predicting drug-target interactions using restricted boltzmann machines

.

Bioinformatics

,

29

,

i126

–

i134

.

Xia

Z.

et al. (

2010

)

Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces

.

BMC Syst. Biol

.,

4

,

S6

.

Xu

Y.

et al. (

2017

)

Deep learning based regression and multiclass models for acute oral toxicity prediction with automatic chemical feature extraction

.

J. Chem. Inf. Model

.,

57

,

2672

–

2685

.

Yuan

Q.

et al. (

2016

)

Druge-rank: improving drug–target interaction prediction of new candidate drugs or targets by ensemble learning to rank

.

Bioinformatics

,

32

,

i18

–

i27

.

Zheng

X.

et al. (

2013

) Collaborative matrix factorization with multiple similarities for predicting drug-target interactions. In:

Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

, pp.

1025

–

1033

,

ACM

,

New York, NY, USA

.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Associate Editor:

Download all slides

Month:	Total Views:
July 2018	49
August 2018	24
September 2018	49
October 2018	108
November 2018	68
December 2018	150
January 2019	133
February 2019	90
March 2019	133
April 2019	147
May 2019	100
June 2019	100
July 2019	109
August 2019	76
September 2019	66
October 2019	131
November 2019	137
December 2019	129
January 2020	185
February 2020	181
March 2020	192
April 2020	267
May 2020	150
June 2020	151
July 2020	203
August 2020	166
September 2020	189
October 2020	187
November 2020	303
December 2020	249
January 2021	202
February 2021	147
March 2021	321
April 2021	250
May 2021	179
June 2021	189
July 2021	208
August 2021	200
September 2021	194
October 2021	251
November 2021	274
December 2021	282
January 2022	199
February 2022	260
March 2022	246
April 2022	299
May 2022	306
June 2022	223
July 2022	303
August 2022	201
September 2022	212
October 2022	218
November 2022	202
December 2022	188
January 2023	157
February 2023	134
March 2023	215
April 2023	200
May 2023	213
June 2023	157
July 2023	199
August 2023	124
September 2023	159
October 2023	149
November 2023	197
December 2023	165
January 2024	227
February 2024	138
March 2024	203
April 2024	105

Article Contents

NeoDTI: neural integration of neighbor information from a heterogeneous network for discovering new drug–target interactions

Abstract

1 Introduction

2 Materials and methods

2.1 Problem formulation

2.2 The workflow of NeoDTI

3 Results

3.1 Datasets

3.2 NeoDTI yields superior performance in predicting new drug–target interactions

3.3 Robustness of NeoDTI

3.4 NeoDTI reveals novel DTIs with literature supports

4 Conclusion

Acknowledgements

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

Article Contents

NeoDTI: neural integration of neighbor information from a heterogeneous network for discovering new drug–target interactions

Abstract

1 Introduction

2 Materials and methods

2.1 Problem formulation

2.2 The workflow of NeoDTI

3 Results

3.1 Datasets

3.2 NeoDTI yields superior performance in predicting new drug–target interactions

3.3 Robustness of NeoDTI

3.4 NeoDTI reveals novel DTIs with literature supports

4 Conclusion

Acknowledgements

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

This Feature Is Available To Subscribers Only