Drug repositioning with adaptive graph convolutional networks

Abstract Motivation Drug repositioning is an effective strategy to identify new indications for existing drugs, providing the quickest possible transition from bench to bedside. With the rapid development of deep learning, graph convolutional networks (GCNs) have been widely adopted for drug repositioning tasks. However, prior GCNs based methods exist limitations in deeply integrating node features and topological structures, which may hinder the capability of GCNs. Results In this study, we propose an adaptive GCNs approach, termed AdaDR, for drug repositioning by deeply integrating node features and topological structures. Distinct from conventional graph convolution networks, AdaDR models interactive information between them with adaptive graph convolution operation, which enhances the expression of model. Concretely, AdaDR simultaneously extracts embeddings from node features and topological structures and then uses the attention mechanism to learn adaptive importance weights of the embeddings. Experimental results show that AdaDR achieves better performance than multiple baselines for drug repositioning. Moreover, in the case study, exploratory analyses are offered for finding novel drug–disease associations. Availability and implementation The soure code of AdaDR is available at: https://github.com/xinliangSun/AdaDR.


Introduction
Computational drug repositioning is considered as an important alternative to the traditional drug discovery (Baker et al. 2018).It involves the use of de-risked compounds, with potentially lower overall development costs and shorter development timelines (Pushpakom et al. 2019).In other words, computational drug repositioning narrows down the search space for drug-disease associations by suggesting drug candidates for wet-lab validation.Hence, it has attracted remarkable attention.More importantly, some drugs have been successfully repositioned, bringing huge market and social benefits.For example, Sildenafil was initially employed as chest pain treatment when it was later discovered that it was a PDE5 inhibitor, which made Sildenafil a hit on the market.
In the past decades, machine learning-based approaches have gained considerable attention due to their high-quality prediction results in drug repositioning tasks.Most of these are data-driven methods that generally yield the latent feature from the known drug-disease interactive data, and then adopt various machine learning techniques to predict potential indications for a given drug.For example, Gottlieb et al. (2011) developed a computational approach called PREDICT to identify unknown drug-disease associations by integrating drug similarities and disease similarities.Moreover, Connectivity Map data (Lamb et al. 2006) is also employed in drug repositioning research.For instance, Iorio et al. (2010) used transcriptional responses to perform drug repositioning.However, feature-based machine learning methods heavily rely on the extraction of features and the selection of negative samples.With the development of high throughput technology and continuously updating databases, there are other types of biological entities frequently involved in drug-disease prediction, such as proteins, diseases, genes, and side effects.Therefore, network-based methods have been widely adopted.For example, Fiscon and Paci (2021) developed a network-based method named SAveRUNNER for drug repurposing, which offers a promising framework to efficiently detect putative novel indications for currently marketed drugs against diseases of interest.Wang et al. (2022) presented a novel scoring algorithm to repurpose drugs.Although the network-based methods have the advantage of good interpretability, their performances are not satisfactory (Luo et al. 2021).
To this end, a surge of more sophisticated techniques, such as matrix factorization and matrix completion approaches, have been applied to the drug repositioning tasks.In particular, matrix factorization and matrix completion techniques are of great popularity in drug repositioning tasks, due to their flexibility in integrating prior knowledge, and have shown promising results in application.In the constraint of bounded nuclear norm regularization, Yang et al. (2019) proposed BNNR method to complete the drug-disease matrix.To incorporate more prior knowledge, iDrug (Chen et al. 2020) was presented, which takes the drugs as the bridge to comprehensively utilize the target and disease information.Nevertheless, due to the high-complexity matrix operations, it is challenging to deploy matrix factorization and matrix completion approaches on large-scale datasets.
Recently, graph convolutional networks (GCNs) have achieved promising results in various tasks by utilizing both node features and graph topology.A few GCNs-based methods have been proposed for drug-disease association prediction.They generally formulate known drug-disease associations as a bipartite graph and then treat the drug repositioning problem as a link task.Besides, prior knowledge, e.g.drug-drug similarities and disease-disease similarities, is also used in their proposed models.For instance, Based on the heterogeneous information fusion strategy, Cai et al. (2021) design inter-and intra-domain feature extraction modules to learn the embedding of drugs and diseases.Considering the possible interactions between neighbors, Meng et al. (2022) presented a new weighted bilinear graph convolution operation to integrate the information of the known drug-disease association.Sun et al. (2022) considered the drug's mechanism of action, and proposed an end-to-end partner-specific drug repositioning approach.
Although existing GCNs methods have achieved promising results in drug repositioning tasks, these methods have shortcomings in the following aspects.Firstly, they ignore the dependency between node features and topological structures to tasks, which limits their capabilities in distinguishing the contribution of components.Secondly, the proposed multi-source models based on GCNs heavily rely on the data sources.When some data are missing, the model performance will be decreased (Li et al. 2022).Despite these approaches can boost the model performance, their models still suffer from the bottleneck of data and are incapable of capturing the interactive information between topology and features.Therefore, directly applying the general GCNs framework on a drug-disease network inevitably restricts graph structure learning capability.
To tackle the above challenges, in this paper, we propose an adaptive GCN approach for drug repositioning.Inspired by the work (Wang et al. 2020), our key motivation is that the similarity between features and that inferred by topological structures are complementary to each other and can be fused adaptively to derive deeper correlation information.In order to fully exploit the information in feature space, we obtain the k-nearest neighbor graph generated from drug similarity features and disease similarity features as their feature structural graph, respectively.Taking the feature graph and the topology graph, we propagate the drug and disease features over both the topology space and feature space, so as to extract two embeddings in these two spaces.Considering common characteristics between the two spaces, we exploit the consistency constraint to extract embeddings shared by them.We further utilize the attention mechanism to automatically learn the importance weights for different embeddings.
In summary, the main contributions of this work are provided as the following: • We propose a novel adaptive GCNs framework for drug repositioning tasks, which performs graph convolution operation over both topology and feature spaces.
• Considering the difference in topological structures and features, we adopt the attention mechanism to adequately fuse them, so as to distinguish the contribution to model results.
• Experimental results on the benchmark datasets clearly show that AdaDR outperforms the baseline models by a large margin in terms of AUPRC and demonstrates our proposed model's utility in drug repositioning tasks.

Materials and methods
In this section, we first describe the benchmark dataset used in the proposed model.We then introduce the AdaDR model framework, which mainly comprises three components.As the Fig. 1 depicts, (i) graph convolution module which contains the feature convolution layer and the topology convolution layer to represent the graph embeddings.(ii) Adaptive learning module to distinguish the importance of obtained embeddings by utilizing attention mechanism.Besides, in this module, the common semantics information between feature and topology space is extracted with the consistency constraint.(iii) Finally, prediction module to concatenate embeddings as the output to predict results.

Datasets
To comprehensively evaluate the proposed model performance, we exploit four benchmark datasets, e.g.Gdataset (Gottlieb et al. 2011)

Feature convolution layer
In order to capture the underlying structure of drugs and diseases in feature space, we construct a k-nearest neighbor graph (kNN) based on their similarity matrix, respectively.
Here, we denote the drug similarity matrix by X r 2 R nÂn , where n is the number of drugs.The adjacency matrix of drug kNN graph is represented by the binary matrix A r 2 R nÂn , where each entry of A r is constructed based on the similarity of each pair of drugs.The entry A r ij of A r is defined as: where In the same way, we denote the disease similarity matrix by X d 2 R mÂm , where m is the number of diseases.The entry A d ij of matrix A d is defined as: where in feature space, we utilize the typical GCN (Kipf and Welling 2017) to represent constructed graphs' lth layer output: We denote the drug and disease last layer output embedding as Z Fr and Z Fd , respectively.In this way, we can learn the embedding which captures the specific information in feature space.

Topology convolution layer
As for the topology space, we take the known drug-disease associations as the input graph.Specifically, we build a GCMC (Berg et al. 2018) as the backbone to obtain the drugdisease representations of drugs and diseases.
In our scenario, the known and unknown drug-disease associations are treated as different edge type and assigned separate processing channels for each edge type t 2 f0; 1g.To be specific, each edge type of graph convolution can be seen as a form of message passing, where vector-valued messages are being passed and transformed across the edges of the graph.In our model, we assign a specific transformation for each edge type, resulting in edge-type specific messages l j!i;t , from diseases(d) j to drugs(r) i of the following form: where c ij is a symmetric normalization constant ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi jN ðr i ÞjjN ðd j Þj p , with N ðr i Þ denoting the set of neighbors of drug node i and N ðd j Þ denoting the set of neighbors of disease node j.W t is an edge-type specific parameter matrix and x j is the feature vector of disease node j.Messages MPðl i!j;t Þ from drugs to diseases are processed in an analogous way.After the message passing step, we can accumulate incoming messages at every node by summing over all neighbors N t2f0;1g ðr i Þ connected by a specific edge-type, and by accumulating the results for each edge type into a single vector representation: where sum denotes an accumulation operation; r denotes an activation function such as the tanh.To obtain the final representation of drugs, we transform the intermediate output z i by a linear operator: The disease embedding z j is computed analogously.Note that, in the linear operator, the parameter matrix W of drug

Attention mechanism for adaptive learning
Now we obtain specific drug embeddings Z Fr and Z Tr , and specific disease embeddings Z Fd and Z Td in feature space and topology space, respectively.Considering the prediction result can be correlated with them, we use the attention mechanism to adaptively learn the corresponding importance of drug embeddings and disease embeddings as follows: here att is a neural network which performs the attention operation.a fr ; a tr 2 R nÂ1 and a fd ; a td 2 R mÂ1 indicate the attention values of drug nodes and disease nodes with embeddings Z Fr ; Z Tr and Z Fd ; Z Td , respectively.Specifically, taking the z i Fr 2 R 1Âh , that is, the ith row of Z Fr , as an example, we transform the embedding through a nonlinear transformation.After one shared attention vector q 2 R h 0 Â1 is used to get the attention value x i Fr as follows: where W Fr 2 R h 0 Âh is the weight matrix and b Fr 2 R h 0 Â1 is the bias vector for embedding matrix Z Fr .Similarly, we can get the attention values x i Tr for drug node i in embedding matrices Z Tr .With the analogous way, for jth disease node, we can get x j Fd and x j Td from Z Fd and Z Td , respectively.We then normalize the attention values with softmax function to get the final drug weight and disease weight: For all the n drug nodes and m disease nodes, we obtain the learned weights a fr ¼ ½a i Fr ; a tr ¼ ½a i Tr 2 R nÂ1 and a fd ¼ ½a Then we combine these embeddings to obtain the final drug embedding Z r and disease embedding Z d :

Prediction and optimization
To obtain the final prediction result, we concatenate two obtained embeddings to represent the drug-disease pair.
Particularly, we utilize a three-layer MLP neural network to represent ŷij , that is, how likely it is that the drug can be indicated for the disease: The binary cross-entropy (BCE) loss is used as the main loss: here (i, j) denotes the pair for drug i and disease j; y ij is the truth label.Considering the common semantics between feature space and topology space, we exploit a consistency constraint to enhance their commonality.For drug embeddings, we use L 2 -normalization to normalize the embedding matrix.
Then, the two normalized matrix can be utilized to capture the similarity of n drug nodes in different spaces as S r F and S r T as follows: Therefore, we can give rise to the following constraint: In the same way, the disease embedding constraint L Cd is caculated.We achieve the final loss L by weighted combing the BCE loss L bce , the consistency constraint L Cr and L Cd .
where k is the hyperparameter to balance the three terms.

Model discussion
By integrating the different space information of the same drug/disease into model can provides the rich semantic for drug/disease representation.Consequently, the combined prediction of the two spaces can be further boosted.Neverthless, when an adaptive graph neural network method is used to predict a drug-disease prediction problem, some questions must be answered.Whether it is appropriate to exploit isomorphic graph and heterogeneous graph to extract embeddings together.In our model, the basic assumption is that the similarity between features and that inferred by topological structures are complementary to each other.In other words, the constructed drug/disease feature graph and the known drug-disease association topology graph should be approximate.However, the known drug-disease association topology graph is a bipartite graph in which a drug directly links a disease, while a drug/disease directly links a drug/disease in the drug/disease feature graph.That is, the graph information derived from the constructed drug feature graph and disease feature graph will conflict with the known drug-disease association bipartite graph information.Therefore, this graph learning mechanism will make some confusion about the proposed model.
To shed more light on graph patterns learning in adaptive GCNs models, we provide an illustration as shown in Fig. 2. It illustrates the concept of high-order connectivity.The target drug is r 2 , labeled with the double circle in the left subfigure of drug-disease association graph.The right subfigure shows the tree structure that is expanded from r 2 .The high-order connectivity denotes the path that reaches r 2 from any node with the path length l larger than 1.In this sense, we demonstrate that when the path length l gets an even number, the drugs still link drugs in the drug-disease bipartite graph.In the analogous way, the same conclusion can be drawn from diseases.Consequently, under the path of even number length, the odd hop connected nodes practically act as a bridge to make the target drug/disease node still links the same type of nodes.To this end, we empirically adopt two layers of convolution in the topology convolution module, since deeper layers can result in bad generalization performance.To sum up, the basic assumption that the constructed drug/disease feature graph and the known drug-disease association topology graph should be approximate still can be supported.

Parameter setting
There are several hyperparameters in AdaDR such as the total training epoch a, the learning rate lr, the regular dropout rate c, the number of neighbors K in the feature graph and the trade-off parameter k.We consider different combinations of these parameters from the ranges a 2 f1000; 2000; 3000; 4000g; lr 2 f0:001; 0:01; 0:1g; c 2 f0:1; 0:2; 0:3; 0:4g.By adjusting the parameters empirically, we set the parameter a ¼ 4000, lr ¼ 0.01 and c ¼ 0:3 for AdaDR in all experiments.For parameters, i.e.K and k, the detailed tuning process is described in the 3.5 part.Besides, the parameters in the compared approaches are set to the default values on their papers.

Baseline model
To evaluate the performance of our proposed model, we compare AdaDR with the seven state-of-the-art drug repositioning methods listed below.The baseline model contains three GCNs based models (e.g.DRHGCN, NIMGGCN, DRWBNCF) and three matrix completion based models (e.g.MBiRW, iDrug, BNNR).To evaluate the performance of our proposed model, we compare AdaDR with the seven state-ofthe-art drug repositioning methods listed below.The baseline model contains four GCNs based models and three matrix completion based models.
• MBiRW (Luo et al. 2016) is a bi-random walk algorithm, which uses sparse drug-disease associations to enhance the similarity measures of drug and disease to perform association prediction.

Performance of AdaDR in cross-validation
We execute 10-fold cross-validation to evaluate the performance of AdaDR.During the 10-fold cross-validation, all known and unknown drug-disease associations are randomly divided into 10 exclusive subsets of approximately equal size, respectively.Each subset is treated as the testing set in turn, while the remaining nine subsets are used as the training set.
Then, the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) are adopted to measure the overall performance of AdaDR.It should be noted that AUPRC is often more informative than AUROC when the data has class imbalance problem (Davis andGoadrich 2006, Saito andRehmsmeier 2015).Therefore, in our experimental scenario, we pay more attention to the performance of the model AUPRC.Moreover, to relieve the potential data bias of cross-validation, we repeat 10 times 10-fold cross-validation for AdaDR and other models and calculate the average value and standard deviation of the results.The results of four benchmark datasets are shown in Table 2.
Based on the results, we can first see that the final average results over four datasets obtained by AdaDR outperform all comparison methods in 10 times 10-fold cross-validation due to the feature integration capacity.For instance, we observe that AdaDR achieves the final average AUROC value of 0.937, which is 0.6% higher than the second-best method DRHGCN, and the average AUPRC obtained by AdaDR is 0.576, which is 8.8% higher than that obtained by the second-best method DRHGCN.It is worth noting that AdaDR achieves the highest AUPRC over three datasets (i.e.Gdataset, Cdataset and Ldataset) and obtains the second-best AUROC on the LRSSL dataset, which is lower than the best method DRHGCN.Meanwhile, compared with GCNs based methods, e.g.DRHGCN, NIMCGCN and DRWBNCF, AdaDR is superior to them in terms of average results because of its strong ability to integrate topology and features.Most importantly, it is obvious that our AdaDR significantly surpasses other methods by a large margin on four benchmarks under AUPRC metrics.For example, our results are 9.8%, 9.1%, 9.1%, and 7.1% higher than that of the second-best method DRHGCN in terms of AUPRC on Gdataset, Cdataset, LRSSL and Ldataset, respectively.The above results can well demonstrate the effectiveness of our proposed method.

Predicting indications for new drugs
The newly predicted drug-disease associations can aid in drug repositioning.To this end, we conduct a new experiment to evaluate the capability of AdaDR for predicting potential indications for new drugs.Specifically, for each drug r i , we delete all known drug-disease associations about drug r i as the testing set and use all the remaining associations as the training samples.It should be noted that Gdataset is also known as the gold standard dataset which collects comprehensive associations from multiple data sources.Thus, we use Gdataset to evaluate the model performance and calculate the average of all test results.In total, we test 593 drugs and perform the experiment by once.The results of Gdataset are shown in Fig. 3. Compared with the seven other methods, AdaDR achieves the top performance.In terms of AUROC, as shown in Fig. 3a, we observe that AdaDR achieves an AUROC value of 0.948, which is better than that of the other methods.Meanwhile, as shown in Fig. 3b, AdaDR achieves an AUPRC of 0.393, which are higher than all the other approaches.

Parameter analysis
We further make the experimental verification about the impact of the trade-off parameter k and the number of neighbors in the feature graph on all datasets.The number of neighbors K in the feature graph is crucial for model performance, we analyze the stabilities of AdaDR on all datasets by varying K.The results about the impact of the number of neighbors are shown in Fig. 4. Intuitively, we vary K value in range of ½1; 4; 8; 12; 16.As we can see, for Gdataset, Cdataset and Ldataset, the number of neighbors in feature space is set as K ¼ 4, AdaDR achieves the best results.Another interesting results can observe that, for LRSSL, as the number of K increases, the results of AdaDR generally improves.This is because that LRSSL is very sparse.When the number of neighbors increases, more information in feature space will be incorporated.Trade-off parameter k is introduced to appropriately weigh BCE loss and consistency constraint loss.We let the trade-off value k vary from ½0:001; 0:01; 0:1; 1; 10; 100 for all datasets.Figure 5 shows the variation of AUROC and AUPR with different k.It can be seen that, for Gdataset, Cdataset, and Ldataset, when the values of k are 0.1, the optimal AUROC and AUPR performance are obtained.Therefore, we set k ¼ 0:1 on the above three datasets as the model parameter.For LRSSL, we can observe that AdaDR gets satisfactory AUROC and AUPR when the trade-off value is set k ¼ 0:1 and k ¼ 0:01, respectively.Finally, for LRSSL, the trade-off value is selected as k ¼ 0:01 in our model due to the unbalance of positive and negative samples.

Ablation study
In this section, we compare different strategies for training our AdaDR on all datasets to investigate their effectiveness.
For training an adaptive GCNs, we analyze the following four cases: • AdaDR-w/o-l: AdaDR without constraint L Cr and L Cd .
• AdaDR-w/o-f: AdaDR without using the feature space information.
• AdaDR-w/o-t: AdaDR without using the topology space information.
Table 3 reports the results of different strategies for training AdaDR.It clearly demonstrates that each kind of strategy of AdaDR can improve the model performance, especially after using drug/disease topology features in the adaptive GCNs.Moreover, we mainly make the following four observations: (i) The topology space information is the most important component.Because it directly contains drug-disease association information which helps the model to learn the potential drug-disease association pattern.Thus, compared with other training strategies, it has the most significant improvement in model performance.(ii) The feature space information benefits the model.Without the feature space information, the model is only learned with topology space information and therefore fails to sufficiently exploit data information.(iii) Removing the consistency constraint from the AdaDR will decrease the performance.This is due to the fact that the consistency constraint improves the generality of the representations and thus benefits the learning of the model.(iv) The attention mechanism can better encode topology space information and feature space information.When removing the attention mechanism from the AdaDR, the model performance will decrease.The above observations  Adaptive graph convolutional networks verify the effectiveness and importance of each component in the AdaDR.

Analysis of attention mechanism
In order to investigate whether the attention values learned by AdaDR are meaningful, we analyze the attention distribution.Our proposed model learns two specific drug and two specific disease embeddings, each of which is associated with the attention values.We conduct the attention distribution analysis on all datasets, where the results are shown in Fig. 6.As we can see, for Gdataset, Cdataset, LRSSL, the attention values of drug specific embeddings in topology space are larger than the values in feature space.Besides, we find that the attention values of drug specific embeddings in feature space are larger than the values in topology space on Ldataset.This implies that the information in topology space should be more important than the information in feature space.For specific disease embeddings, on Gdataset and Cdataset, the attention values of disease specific embeddings in feature space are larger than the values in topology space.Conversely, on LRSSL and Ldataset, the attention values of disease specific embeddings in topology space are larger than the values in feature space.
In summary, the experiment demonstrates that our proposed AdaDR is able to adaptively assign larger attention values for more important information.

Case studies
We conduct two case studies to further verify AdaDR by performing a literature-based evaluation of new hits.Specifically, we apply AdaDR to predict candidate drugs for two diseases including Alzheimer's disease (AD) and Breast carcinoma (BRCA).AD is a progressive neurological degenerative disease that has no efficacious medications available yet.BRCA is a phenomenon in which breast epithelial cells proliferate out of control under the action of a variety of oncogenic factors.Although there are many drugs for breast cancer, such as Paclitaxel, Carboplatin and so on, a wider choice of drugs may provide better treatment options.
During the process, all the known drug-disease associations in the Gdataset are treated as the training set and the missing drug-disease associations regarded as the candidate set.After all the missing drug-disease associations are predicted, we subsequently rank the candidate drugs by the predicted probabilities for each drug.We focus on the top five   (Cummings et al. 2022).We focus on five drugs: Caffeine, Escitalopram, Guanfacine, Hydralazine and Metformin and their association with AD.
Our model predicts the drug-disease associations with the highest median rank compared to the six baseline models (Supplementary Table S1).We can also observe that our model predicts more drugs among the top 100 predictions.
In addition to the above analysis, we also conduct gene ontology enrichment analysis for the predicted drugs to demonstrate the utility of AdaDR.Taking AD as an example, we collect target information from DrugBank for the predicted top 5 drugs.Then, the Bioconductor package clusterProfiler (Yu et al. 2012) is used to perform the gene ontology enrichment analysis.It utilizes the Gene Ontology database (The Gene Ontology Consortium 2019).To better display the potential biology processes related to AD protein targets, we select the top 15 terms based on adjusted P value.The result is shown in Fig. 7. Gene ontology enrichment analysis recovers existing mechanisms and also helps identify new processes related to AD protein targets, such as monoamine transport, dopamine uptake and vascular process in circulatory system.The enriched gene ontology categories indicate that predicted AD-related drug targets modulate common regulatory processes.Besides, for biological processes that have not been explored in depth, e.g.serotonin receptor(Geldenhuys and Van der Schyf 2011) and urotransmitter reuptake (Francis 2005), may provide new perspectives for the treatment of AD.

Conclusion
In this paper, we have proposed AdaDR based on graph neural networks and attention mechanism to model the drug-disease   Adaptive graph convolutional networks associations in drug repositioning tasks.We integrate the feature space and topology space information and then introduce the consistency constraint to regularize the embeddings in different spaces and propose a simple, efficient, yet effective method AdaDR, which significantly enhanced the performance of drug repositioning tasks.Extensive experiments demonstrated that AdaDR is superior to current prediction methods and various ablation and model studies demystified the working mechanism behind such performance.
Even though AdaDR has achieved better performance, there are still some limitations.First, the integration of multidimensional drug and disease data for precision repositioning plays an important role, but AdaDR only uses drug-drug and disease-disease similarity.In future work, we will consider more biological information involved in drug repositioning, such as genes, targets, chemical structures, drug-target interactions and pathways.Second, despite our proposed model can infer new drugs for diseases by using similarity feature, it still lacks the explainability for the predicted result.In the future, we can collect more prior biological knowledge, such as disease phenotypes, drug side effects, disease semantic similarity and so on to construct a knowledge graph network and design an interpretable model.

Figure 1 .
Figure 1.The overall framework of AdaDR consists of three parts: (i) graph convolution module to represent the drug/disease embeddings in feature and topology space; (ii) adaptive learning module with attention mechanism to distinguish the importance of obtained embeddings.Besides, the consistency constraint is used to push closer the embeddings in different spaces in this module; (iii) prediction module to concatenate embeddings as the output to predict results.
n are the kth layer's information propagated for drugs and diseases, respectively; W ðlÞ r ; W ðlÞ d are the weight matrices of the lth layer in GCN.ReLU denotes the Relu activation function and the initial Z ð0Þ r ¼ X r ; Z ð0Þ d ¼ X d ; D r ; D d represent the diagonal degree matrix of A r and A d , respectively.

•
iDrug(Chen et al. 2020) is a matrix completion based method, which utilizes the cross-network drug-related information to achieve better model performance.• BNNR (Yang et al. 2019) completes the drug-disease matrix under the low-rank assumption, which integrates the drug-drug, disease-disease and drug-disease information.• DRHGCN (Cai et al. 2021) fuses the inter-and intradomain embeddings to enhance the representation of drug and disease.• NIMCGCN (Li et al. 2020) is a variant induction matrix completion method.It is widely used to predict drug-disease associations.• DRWBNCF (Meng et al. 2022) models the complex drugdisease associations based on weighted bilinear neural collaborative filtering approach.

Figure 2 .
Figure 2.An illustration of the drug-disease high-order connectivity.(a) is the known drug-disease association bipartite graph; (b) depicts the high-order connectivity with tree structure.The node r 1 labeled with the double circle is the target drug to treat diseases.

Figure 4 .
Figure 4. Effect of different neighbor numbers on the performance of AdaDR.(a) The variation of AUROC.(b) The variationof AUPRC.

Figure 5 .
Figure 5.Effect of different k values on the performance of AdaDR.(a) The variation of AUROC.(b) The variation of AUPRC.

Figure 6 .
Figure 6.Analysis of attention distribution.r-topology and r-feature denote the drug topology attention value and drug feature attention value, respectively.d-topology and d-feature denote the disease topology attention value and disease feature attention value, respectively.(a), (b), (c) and (d) represent the results of attention values for Gdataset, Cdataset, Ldataset and LRSSL, respectively.

Figure 7 .
Figure 7. Enriched gene ontology terms (Biological Process) among all predicted AD drug targets.The x axis shows the proportion of targets mapped to each pathway.

Table 1 .
Statistics of the four benchmark datasets.Dataset No. of drugs No. of diseases No. of associations SparsityAdaptive graph convolutional networks nodes is the same as that of disease nodes, because the model is trained without side information of the nodes.By applying the above transformation to all nodes in the drug-disease graph, we can obtain the final representation of drugs Z Tr and diseases Z Td in the topology space.

Table 3 .
The AUROC and AUPRC of models corresponding to the different training strategies on all datasets.drugsforbreast carcinoma and AD and adopt highly reliable sources (i.e.CTD and PubMed) to check the predicted drug-disease associations.Table4reports candidate drugs with evidence.For AD and breast carcinoma, we can see that among the top five drugs ranked according to their predicted scores have been validated by various evidence from authoritative sources and literature (100% success rate).Moreover, our model can make interpretable results.Taking Paclitaxel as an example, our model predict it can treat breast cancer.This is indeed supported by authoritative sources and literature.Interestingly, we find that Docetaxel appears in our training set.It is worth noting that Paclitaxel and Docetaxel are similar molecules with the same taxane core.This reflects that our model can utilize drug similarity information to make meaningful predictions.Besides, we also predict the drug-disease associations of AD repositioning candidates in phase 3 clinical trials as of 2021 potential