Multi-scale topology and position feature learning and relationship-aware graph reasoning for prediction of drug-related microbes

Abstract Motivation The human microbiome may impact the effectiveness of drugs by modulating their activities and toxicities. Predicting candidate microbes for drugs can facilitate the exploration of the therapeutic effects of drugs. Most recent methods concentrate on constructing of the prediction models based on graph reasoning. They fail to sufficiently exploit the topology and position information, the heterogeneity of multiple types of nodes and connections, and the long-distance correlations among nodes in microbe–drug heterogeneous graph. Results We propose a new microbe–drug association prediction model, NGMDA, to encode the position and topological features of microbe (drug) nodes, and fuse the different types of features from neighbors and the whole heterogeneous graph. First, we formulate the position and topology features of microbe (drug) nodes by t-step random walks, and the features reveal the topological neighborhoods at multiple scales and the position of each node. Second, as the features of nodes are high-dimensional and sparse, we designed an embedding enhancement strategy based on supervised fully connected autoencoders to form the embeddings with representative features and the more discriminative node distributions. Third, we propose an adaptive neighbor feature fusion module, which fuses features of neighbors by the constructed position- and topology-sensitive heterogeneous graph neural networks. A novel self-attention mechanism is developed to estimate the importance of the position and topology of each neighbor to a target node. Finally, a heterogeneous graph feature fusion module is constructed to learn the long-distance correlations among the nodes in the whole heterogeneous graph by a relationship-aware graph transformer. Relationship-aware graph transformer contains the strategy for encoding the connection relationship types among the nodes, which is helpful for integrating the diverse semantics of these connections. The extensive comparison experimental results demonstrate NGMDA’s superior performance over five state-of-the-art prediction methods. The ablation experiment shows the contributions of the multi-scale topology and position feature learning, the embedding enhancement strategy, the neighbor feature fusion, and the heterogeneous graph feature fusion. Case studies over three drugs further indicate that NGMDA has ability in discovering the potential drug-related microbes. Availability and implementation Source codes and Supplementary Material are available at https://github.com/pingxuan-hlju/NGMDA.


Introduction
The human microbiome is a collection of all microbiota that reside in or on human organs, including bacteria, viruses, protists, fungi, and archaea.Previous human microbiome studies demonstrated that interactions between the human microbes and corresponding hosts regulate human health, such as controlling immune function, providing resistance to pathogens, and even influencing brain physiology and behavior (Duvallet et al. 2017, Zhu et al. 2020).An imbalance of human microbiota and some diseases are closely related, including chronic inflammation, neurological disorders, and breast cancer (Wang et al. 2019, Rackaityte andLynch 2020).
Microbes can change the toxicity and inhibitory activity of drugs (Nejman et al. 2020, Algavi andBorenstein 2023) and impact the effectiveness of disease treatments by biologically altering a drug's chemical structure (Yin et al. 2022).Hacioglu et al. (2019) suggested that cooperation between Staphylococcus aureus and Candida albicans leads to drug resistance by strengthening biofilm formation.Also, the gut microbiome produces large quantities of bacterial enzymes that affect therapeutic efficacy (Zimmermann et al. 2019).Therefore, discovering new microbe-drug associations is essential in drug functional studies and precision medicine.
Recently, the computational methods were proposed for predicting the drug-target interactions (Li et al. 2022), incRNA-miRNA interactions (Wang et al. 2022), miRNAdisease associations (Peng et al. 2022a), metabolite-disease associations (Gao et al. 2023), and incRNA-disease associations (Wang et al. 2023a).Computational methods have also shown the ability to determine potential microbedrug associations and identify reliable drug-related candidates for wet experiments.Microbe-drug association probabilities can be inferred by prediction models using Conditional Random Field (CRF) and Graph Convolutional Network (GCN) (Long et al. 2020a).Long et al. (2020b) proposed EGTMDA to learn node features for microbes and drugs using meta-paths and hierarchical attention mechanism.SCSMDA enhanced representations of drugs and microbes using graph contrastive learning and elaborate meta-paths (Tian et al. 2023).However, shortcomings exist in these methods.GCNMDA used vanilla homogeneous models to learn representations of drugs and microbes without considering abundantly available heterogeneous information.In addition, these methods based on meta-paths focus on neighbors originating from meta-paths while ignoring other nonneighboring nodes across the entire heterogeneous graph.
We proposed NGMDA to predict candidate microbes for drugs by learning the features of drugs and microbes from neighbors and the whole heterogeneous graph.Our contributions are summarized as follows: • The multi-scale topology information of nodes reflects neighbor regions of different ranges, which is important for microbe-drug association prediction.Therefore, topology features of microbe (drug) nodes are designed based on t-step random walks to obtain multi-scale topological neighborhoods of nodes.We also extracted node position features to form the position of each node in the entire heterogeneous graph.• An embedding enhancement strategy (ES) based on fully connected autoencoders with node class labels is proposed to extract important low-dimensional features of the microbe or drug nodes.This strategy also enhances the differences of feature distributions among different types of nodes by determining the node class.

Materials and methods
We propose a microbe-drug association prediction model called NGMDA (Fig. 1) that consists of an embedding ES, NFF module, and heterogeneous graph feature fusion (GFF) module.A heterogeneous graph is constructed to describe the diverse connectivity relationships between drugs and microbes (Fig. 1a).The node features of these drugs and microbes are projected into a low-dimensional feature space, and their differences are enhanced to obtain a fine node embedding (Fig. 1a).NFF learns similarity, position, and topology representations between nodes based on position-and topology-sensitive HGNN (Fig. 1b).We use GFF to learn multi-modal representations of various nodes across the heterogeneous graph by a RAGT (Fig. 1c).These four representations are combined into fully connected layers to predict microbe-drug association probabilities.

Dataset
Associations between drugs and microbes, similarities between drugs, and the attribute features of the microbes X micr are collected from previously published microbe-drug association prediction work (Long et al. 2020b).We extracted 2470 microbe-drug association data from the Microbe-Drug Associations Database (MDAD) (Sun et al. 2018), which contains 173 microbes and 1373 drugs.Drugbank (Knox et al. 2024) provides the interactions among the drugs.On the basis of the biological hypothesis that the drugs with similar treatment functions are more likely interact with the similar microbes, EGTMDA calculated the Gaussian kernel similarities of drugs based on their interactions.The structural similarity of two drugs was measured based on the common subgraphs within their chemical structures (Hattori et al. 2010).The final drug similarities were obtained by the weighted sum of the drug Gaussian kernel similarities and the drug structure similarities.The sequences of microbes were extracted from NCBI database, and then principal component analysis was utilized to obtain their important features.

Calculation of microbe similarity
As two microbes with similar gene sequences are typically similar, we calculate the cosine similarity on the attribute characteristics for each microbe.The similarity between microbe m i and m j is K micr ij 2 ½0; 1, where X micr i is the i-th row of X micr , which contains the main gene sequence characteristics of m i , and ðXÞ T is a transposition of X.The microbe similarities were listed in the Supplementary File SF1.

Microbe-drug heterogeneous graph
We constructed a microbe-drug heterogeneous graph G ¼ ðV; EÞ as shown in Fig. 1a.The node set V consists the drug node subset V drug and microbe node subset V micr and < i; j > 2 E represents an edge from node v j to v i .The drug similarity matrix and drug-microbe association matrix are expressed as K drug and B bipa 2 R N d ÂNm , respectively, where N d (or N m ) denotes the number of drugs (or microbes).If there is a known association exists between drugs d i and m j , then B bipa i;j ¼ 1.Further, B bipa i;j ¼ 0 indicates that no connection has yet been found.There are many low similarity data in the similarity matrix, which might be noise in microbedrug association prediction.When constructing microbemicrobe (or drug-drug) adjacent matrix, connecting edges are added between the microbe (or drug) nodes with a similarity not less than a threshold b.The adjacency matrix of the heterogeneous graph G is represented as B hete 2 R ðN d þNmÞÂðN d þNmÞ , such that where Kdrug (or Kmicr ) is the drug (or microbe) similarity matrix after thresholding.

Heterogeneous graph node feature construction
The heterogeneous graph node features are constructed by a drug-drug similarity matrix, microbe-microbe similarity matrix, and drug-microbe association matrix.The similarity feature matrix is formed by combining the drug and microbe similarities defined above as where K drug i (or K micr j ) contains the similarities between d i (or m j ) and other drugs (or microbes).The multi-modal feature matrix H moda 2 R ðN d þNmÞÂðN d þNmÞ can be represented as where the i-th row in H moda records the similarities between d i and all other drugs and the associations between d i and all other microbes.The association with drugs and similarities between microbes are contained in the (N d þ j)-th row.
Because similarity and multi-modal features are common node attributes for microbe-drug association prediction (Peng et al. 2017, 2021, Meng et al. 2023, Wang et al. 2023), we designate these as the original features of the nodes.Existing GNN models fail to fully consider the position and topology information of nodes, so we construct position and topology features of the microbe and drug nodes.The position of v i within the heterogeneous graph is determined by the connection between v i and other nodes.The position feature matrix is defined as H posi ¼ B hete , where the position feature of v i is H posi;i .A random walk of t-steps contains a t-hop topological neighborhood of nodes within a heterogeneous graph (Dwivedi et al. 2022) and is defined as where t is the number of walking steps and D hete is degree matrix of B hete .RW t i;j represents the probability of visiting v i to v j in the t-th step random walk and contains the topological Microbe-drug association prediction neighborhood information of the t-th step of v i .The topology feature H topo;i 2 R t of v i is defined as which contains the multi-scale topological neighborhood information of v i .

Enhancing node embedding
The original features specified above are high-dimensional sparse and contain some noise.A projection operation maps drug and microbe node features into the same embedding space, which drops information about the differences in the embedding distributions of different types of nodes.
Figure 2 outlines our node embedding ES to learn representative embeddings and enhance the embedding distribution differences of the microbe and drug nodes.As autoencoders could effectively reduce the noise component in these embeddings, we learn important low-dimensional node embeddings based on fully connected autoencoders.The projection and reconstruction process of multi-modal and similarity features are similar, and we use similarity features as an example to describe the process here.The similarity feature of v i , H simi;i , is projected into N p dimensional space to form simi;/ðviÞ ðH simi;i ÞÞ; where Linear enco is a linear layer, r represents the nonlinear activation function ReLU, and /ðv i Þ indicates the type of v i .The similarity embedding of v i is learned from the l-th fully connected encoding layer as simi;/ðviÞ ðH enco;lÀ1 simi;i ÞÞ; l ¼ 1; 2; . . .; L enco ; (8) where L enco is the total number of encoding layers.H enco;Lenco simi;i is used as the input of the decoder, and the output of the l-th fully connected decoding layer is simi;/ðviÞ ðH deco;lÀ1 simi;i ÞÞ; l ¼ 1; 2; . . .; L deco ; where L deco is total number of decoding layers, Linear deco denotes the linear layer and H deco;0 simi;i ¼ H enco;Lenco simi;i . After projection, the multi-modal embedding H enco;Lenco moda 2 R ðN d þNmÞÂðNpÞ can be learned.The mean square error estimates the reconstruction loss of the node similarity features as where T is the batch of nodes in the training set.Similarly, the reconstruction loss of the multi-modal feature is c reco;moda .We classify the projected node embedding to enhance the differences between the drug and microbe embedding distributions.Considering a multi-modal embedding, as an example, is the input of the classify and label i is the corresponding classification labels.The classification loss of the multi-modal embedding in the training samples is estimated by the crossentropy loss function where Linear clas moda 2 R NpÂ2 .The classification loss of the similarity embedding is represented as c inty;simi .The total loss of the embedding classification of the drug and microbe nodes is

Neighbor feature fusion
The topological neighborhood and position information of the neighboring nodes impact their importance with a target node.We propose a NFF module based on HGNN with a PTA to learn representative similarity and the position and topology representations of each microbe and drug nodes.The relationship types between the nodes are critical auxiliary features, so we also calculate the importance of the integrated features, as shown in Fig. 3.The type of relationship between nodes contains the similarity relationship between drugs (or microbes) and the association relationship between drugs (or microbes) and microbes (or drugs).The relationship type of v j to v i is represented as Then, the importance of the relationship type of v j to v i is r l wð < i;j>Þ , which is learned during the training process at the l-th layer, and r 0 wð < i;j>Þ ¼ 1. Multi-head attention can reasonably stabilize the learning process of self-attention by allocating the attention value of each head (Veli ckovi c et al. 2018).After obtaining similarity representations H lÀ1 simi;j and H lÀ1 simi;i of v i and v j , respectively, at l À 1-th layer, we compute the importance of the similarity representation of v j to v i in the next layer by i is the distribution of the similarity representation importance, which is converted to a probability distribution using the softmax function.As the magnitude of the latent representation affects the importance score of v j to v i , we standardize W l /ðvjÞ;k H lÀ1 simi;j with L 2 normalization.The importance score of v j to v i is calculated by the inner product of the importance distribution of v i and the representation of v j .The multiple neighbors of node v i have their various topological neighborhoods and positions, so these neighbors have different importance for v i 's feature learning.Therefore, the importance of each neighbor node for v i was calculated before the v i 's features were updated.We calculate the importance c l i;j 2 ½0; 1 of the position and topology of v j to v i by where H lÀ1 posi;i and H lÀ1 topo;i are the position and topology representations of v i , respectively, H 0 posi;i ¼ H posi;i and H 0 topo;i ¼ H topo;i .The parameter s 2 ½0; 1 balances the contributions between the position and topology representations.The importance score of v j to v i is then a l i;j;k ¼ softmax j2NðiÞ ðr l wð < i;j>Þ Á s l i;j;k Á c l i;j Þ: Here, a l i;j;k is position and topology sensitive by integrating the importance of the neighbor position and topology.
As residuals can alleviate over-smoothing and vanishing gradients (Lv et al. 2021), we add a node residual for every attention head.The similarity representation of v i is then updated as where W l resðkÞ 2 R NpÂNp is the weight matrix.The similarity representations of the different heads are aggregated at the l-th layer to obtain where K nff is the head number of the NFF.As the similarity, position, and topology representations have the same update procedure, the position and topology representations are updated as H l posi;i and H l topo;i in the l-th layer, respectively.

Relationship type encoding
Relationship types of similarity and association can reflect diverse semantic connections between drug and microbe nodes.The relationship type wð < i; j >Þ of v j to v i is represented as a one-hot vector that is linearly transformed to obtain the embedding of the relationship type e wð < i;j>Þ 2 R Np .

Relationship-aware graph transformer
To capture the connection between the target node and distant nodes, we designed a heterogeneous GFF module presented in Fig. 1c.A RAGT is proposed within the GFF and inspired by these methods (Diao and Loynd 2022, Peng et al. 2022b, c).To embed the relationship type wð < i; j >Þ of v j to v i into the query, key, and value vectors, we concatenate a multi-modal feature of v i (or v j ) and the relationship type embedding e wð < i;j>Þ .We obtain the query, key, and value vectors of the h-th head in the l-th layer by linear transformations such that q l i;j;h ¼ ½H After aggregating the neighbor representations of v i in the hth head, we concatenate the representations from each head to form where k is the concatenation operation.The application of layer normalization (LayerNorm) is crucial for this training process and for expressing the capacity of attention (Brody et al. 2023).The multi-modal representation H l moda;i of v i is updated based on LayerNorm, such that where W l 1 ; W l 2 ; W l 3 2 R NpÂNp represent weight matrices.

Representation integration and optimization
The original features and representations learned from shallower layers retain the detailed information of the nodes, along with more abstract information are learned from deeper layers.By concatenating the original features and representations from each layer of NFF and GFF, the final representation of the drug d i is formed as Likewise, we obtain the final representation HjþN d of microbe m j .Following the stack of linear layers and the nonlinear activation function, Hi and HjþN d are combined to compute the association prediction score pred i;j 2 R 2 , such that where W attr (or W pred ) is the weight matrix and b attr (or b pred ) is a bias vector.Then, pred i;j ¼ ½ðpred i;j Þ 0 ; ðpred i;j Þ 1 , where ðpred i;j Þ 0 represents the unrelated probability between d i and m j , and the associated probability is ðpred i;j Þ 1 .The loss of the association prediction is represented by where B is the training example set and labelði; jÞ is the association label of d i and m j .The final loss of NGMDA c is the weighted sum of the ES loss c proc and the association prediction loss c pred , such that where the balance factor e 2 ½0; 1 is a hyper-parameter.
3 Experimental evaluation and discussion

Evaluation metrics
The performance of NGMDA and other comparison methods is evaluated with 5-fold cross-validation.All known associations between drugs and microbes are classified as positive samples, with associations equally divided into five parts.All unobserved microbe-drug associations are taken as negative samples to form a set of negative samples.Four positive examples and an equal number of negative examples are randomly selected from the negative sample to be utilized for training, and the remainder are test examples.
The area under the receiver operating characteristic curve (AUC) (Huang and Ling 2005), the area under the precisionrecall curve (AUPR) (Saito and Rehmsmeier 2015), and the recall rate of the top-k candidate microbes associated with drugs are selected as our evaluation indicators.If the association score between d i and m j is less than a threshold h, then it is considered a negative sample.Otherwise, it is identified as a positive sample.The TPRs, FPRs, precisions, and recalls of each drug were calculated at different threshold h, we calculated the average AUCs and average AUPRs of 1373 drugs for each fold.The 5-fold AUCs (or AUPRs) were averaged as the final AUC (or AUPR).Considering that high-ranking candidates may be chosen by biologists for humidity experiments, more positive samples are expected to appear as top-rank candidates.Hence, we compute a recall rate of the top-k candidate microbes of drug d i .

Parameter settings
NGMDA runs on a 2080ti server based on the PyTorch framework and is optimized with the Adam algorithm.The proposed model has some hyper-parameters including the steps of random walking, the layer numbers of NFF and that of GFF, and the balance factor of loss .We firstly establish the variation range for each hyper-parameter, and then select the value which obtains the best performance for the model as the final value of the hyper-parameter.To assess the effect of random walk step size on the prediction performance, the step size was selected from f1, 2, 4, 8, 16, 32g.The model achieves the highest AUC (AUC ¼ 0.944) and AUPR (AUPR ¼ 0.728) when step size is 2 (Supplementary Table ST1).The random walk steps are set to two for the topological embedding formation.For NFF and GFF, we finetuned the layer number within a range, f1, 2, 3g, and performed all the combinations of the layer number of NFF and GFF.As shown in the Supplementary Table ST2, the model gets the best performance when their layer numbers are two.The balance factor e regulates the importance of the loss of embedding ES and that of the association prediction loss, and it was chosen from the range of f0, 0.1, Á Á Á, 0.5g.Supplementary Table ST3 demonstrates the corresponding results and e was set to 0.2 finally.The drug (microbe) similarity threshold, b, was selected from f0.5, 0.6, Á Á Á, 0.9g, and it was set to 0.9 in our experiment (Supplementary Table ST4).Parameter s is utilized to balance the importance of the topology and position features, and s varies from 0 to 1 with a step size of 0.2.Supplementary Table ST5 indicates s value of 0.4 is more favorable for the prediction performance of the model.

Ablation experiments
We perform ablation experiments to evaluate the contributions of position and topology feature learning (PTL), ES, NFF, and GFF as listed in Table 1.For NGMDA without GFF, the AUC and AUPR metrics drop by 1.1% and 6.0%, respectively.The AUC and AUPR of NGMDA without NFF decrease by 1.0% and 5.3%, respectively, compared to the whole model.The AUC and AUPR decrease by 0.6% and 4.6%, respectively, if NGMDA has no ES.The AUC and AUPR of our model achieve 0.9% and 1.5%, respectively, higher than NGMDA without PTL.We built the prediction model without multi-scale topological feature learning and the one without position feature learning, respectively.Their AUCs decreased by 0.8% and 0.4%, and their AUPR decreased by 1% and 0.5%, respectively.After the relationship type integration was eliminated from the prediction model, its AUC and AUPR decreased by 0.6% and 2.7%.
The ablation experiments indicate that merging node features of the heterogeneous graph contributes the most to model performance (Table 1).A possible reason is that some non-neighboring nodes exist across the entire heterogeneous graph that are also closely related to the target node.NFF achieves the second most significant contribution, suggesting that the neighboring node information of the target node is also important.The embedding ES boosts the prediction performance, which suggests its value in reducing noise in the node embeddings and enhancing differences in the node distributions.Multi-scale topology and position features indicated the neighbors with multiple ranges and the location information of each node were important for the improved prediction performance.The experimental results (Table 1) also demonstrated the relationship type integration is helpful for improving the prediction performance.

Comparison with other methods
NGMDA is compared with five state-of-the-art microbe-drug association prediction methods, including GCNMDA (Long et al. 2020a), EGATMDA (Long et al. 2020b), GSAMDA (Tan et al. 2022), GACNNMDA (Ma et al. 2023), and SCSMDA (Tian et al. 2023).NGMDA and five compared methods were trained and tested by using the same data separation during 5-fold cross-validation.The hyper-parameters of these methods are set according to their corresponding literature.We briefly describe these comparison methods in the following.
• GCNMDA (Long et al.We first compute the AUC and AUPR and then calculate the average AUC and AUPR over 1373 drugs.As shown in Table 2, NGMDA achieves the best average AUC of 0.944, which is 0.4% higher than the second-best EGATMDA model, 4.1% better than GCNMDA, 10.1% over GACNNMDA, 4.2% superior to GSAMDA, and 2.8% greater than SCSMDA.NGMDA also produces the best average AUPR of 72.8%, which is 38.8%, 41.3%, 42.1%, 53.2%, and 48.1% better than SCSMDA, GCNMDA, EGATMDA, GACNNMDA, and GASMDA, respectively.We compute the average AUCs (AUPRs) of 1373 drugs per fold for NGMDA and each of the compared methods.To observe whether NGMDA's prediction performance is significantly higher than each compared method, the statistical test was conducted.NGMDA has 1373 AUCs (AUPRs) for the 1373 drugs, and the compared methods also have 1373 AUCs (AUPRs) for these drugs.The paired Wilcoxon test was executed on NGMDA's AUCs (AUPRs) and the AUCs (AUPRs) of the compared methods (Table 3).The results indicated NGMDA obtained the significantly higher prediction performance than all the compared methods.The performances of GCNMDA, GACNNMDA, and GSAMDA are not as good as NGMDA, EGATMDA, and SCSMDA.This outcome is likely because these learn node representations using simple models (e.g.GCN and GAT) without considering node or edge types in the microbe-drug heterogeneous graph.EGATMDA and SCSMDA learn the features of drugs and microbes from semantic information based on meta-paths.These models only focus on learning features of the neighbor nodes derived from meta-paths and do not consider the remaining nodes across the entire heterogeneous graph.
The average recalls under different top-k candidate microbes for all drugs are presented in Fig. 4. NGMDA outperforms all other methods at different top cutoffs due to its enhanced embedding of the nodes and fusing the features of neighbor nodes and the whole heterogeneous graph.When k ¼ 3, our model achieves the highest recall rate of 76.6%, where the second-best 48.7% is attained by EGATMDA.SCSMDA achieves the fourth-best result with a recall rate of 44.7%, which is 0.5% below GCNMDA.When k is 6, 9, and 12, NGMDA maintains the best recall values of 81.3%, 83.4%, and 85.7%, respectively.The second performer is EGATMDA with recall rates of 67.8%, 74.9%, and 80.1%, respectively.SCSMDA surpasses GCNMDA with recall rates of 63.6%, 67.6%, and 71.4%, respectively, while the recall rates of GCNMDA are lower at 61.5%, 66.8%, and 70.8%, respectively.GSAMDA does not perform well with recall rates of 55.7%, 63.7%, and 68.4%, respectively, while still being consistently higher than GACNNMDA, which obtained the lowest recall rates of 42.5%, 49.9%, and 56.2%, respectively.

Case studies on three drugs
To confirm NGMDA's discovery potential of drug-related microbial candidates, case studies with Ciprofloxacin, Moxifloxacin, and Vancomycin are performed.Ciprofloxacin treats skin infections, typhoid fever, pneumonia, endocarditis, and other bacterial infections.Moxifloxacin treats pneumonia, tuberculosis, sinusitis, and chronic bronchitis.
Vancomycin is an antibiotic that treats bloodstream infections, endocarditis, and orthopedic infections.All the known microbe-drug associations and the randomly selected equal number of unobserved microbe-drug associations were utilized to train the model for case studies.Candidate microbes are obtained for each of these drugs, and we collected the top 20 candidates, as listed in Tables 4-6.
The two microbes, Staphylococcus epidermidis and Salmonella enterica, were validated to be highly susceptible to Ciprofloxacin (Eibach et al. 2016, Szczuka et al. 2017).In addition, Vibrio harveyi, Enterococcus faecalis, and Listeria monocytogenes are identified as Ciprofloxacin-resistant microbes (Stalin and Srinivasan 2016, Escolar et al. 2017, Kim and Woo 2017).For the microbe candidates related to Moxifloxacin in Table 5, three candidates are included in MDAD, two in the aBiofilm database and 14 candidates are supported by the literature.Considering the candidate microbes of Vancomycin in Table 6, Staphylococcus is confirmed by the MDAD and aBiofilm databases, and 14 candidates are supported by literature.Among all 60 microbe candidates, nine are unconfirmed, which indicates that no relevant evidence is found to support their association.The above analysis demonstrates that NGMDA can discover potential candidate microbes for target drugs under study.

Prediction of novel microbe-drug associations
NGMDA is implemented to predict the potential candidate microbes for all drugs.The top-ranked 20 microbe candidates are listed in the Supplementary File SF2, which can be leveraged by biologists to screen reliable candidate microbes.

Conclusion
We proposed a novel microbe-drug association prediction model to encode node neighborhood topologies across multiple scales and perform graph inference by propagating different types of connections and information about the nodes.The multi-scale topology feature is formed by estimating the probability that a random walker accesses itself in different steps.The established node embedding strategy enhances the representations of microbe and drug nodes that form the specific distribution of the corresponding node.The NFF combines the features of different types of neighbors and target nodes by adaptively evaluating the weights of the position features, topology features, and original features of the neighbor nodes.The long-distance connections and encoding of the relationship types between the nodes through GFF enable the knowledge propagation of the entire graph and the capture of diverse relationships.Cross-validation experimental results on public datasets suggest the superiority and effectiveness of NGMDA.The average recall rate of drugs and case analyses of experimental results further demonstrate that NGMDA provides reliable microbe candidates for related drugs under investigation.

Figure 1 .
Figure 1.Framework of the proposed NGMDA model.(a) Construct microbe-drug heterogeneous graph and enhance similarity and multi-modal embeddings.(b) Fuse features of neighbor nodes by position-and topology-sensitive HGNN.(c) Learn long-distance connection from the entire heterogeneous graph based on RAGT.

Figure 2 .
Figure 2. Enhancing node embeddings of microbes and drugs by supervised autoencoders.

Figure 3 .
Figure 3. Calculation of position-sensitive and topology-sensitive selfattention of v j to v i .
2020a): It established a microbedrug heterogeneous network and integrated multiple kinds of similarities.These similarities were measured based on the chemical structures of drugs, the Gaussian interaction profiles of drugs (microbes), and the microbe sequences.The prediction model was constructed based on GCN and CRF.• EGATMDA (Long et al. 2020b): It constructed a microbe-disease-drug network and then inferred the microbe-drug associations by a hierarchical attention mechanism.• GSAMDA (Tan et al. 2022): The model calculated the drug (microbe) similarities based on the Gaussian interaction profiles and Hamming interaction profiles of drugs (or microbes), and learned the node features by the graph attention networks and sparse auto-encoder.• GACNNMDA (Ma et al. 2023): The multiple microbedrug heterogeneous networks were constructed based on the Gaussian interaction and Hamming interaction profiles of drugs (microbes).The potential microbe-drug associations were identified by the convolutional neural networks.• SCSMDA (Tian et al. 2023): The model constructed the microbe-drug networks based on the microbe gene sequence information, the Gaussian kernel interaction profiles of drugs (or microbes), and the chemical structures of drugs.It learned the features of the microbe and drug nodes by graph contrastive learning.
A microbe or drug node may be closely related to distant nodes due to the heterogeneity of the microbe-drug graph.
•In the microbe-drug heterogeneous graph, different neighbor nodes often have special topological neighborhoods and positional features that affect the importance of neighbors with a target node.A new position-sensitive and topology-sensitive self-attention mechanism (PTA) adaptively distinguishes the contributions of different neighbor nodes.Also, neighbor feature fusion (NFF) models heterogeneity of the graph and aggregates the representations of nodes based on heterogeneous graph neural networks (HGNN) with PTA.• . . .; L gff and h ¼ 1; 2; . . .; K gff ; , L gff is the total layer number, and K gff indicates the head number of GFF.The importance of v j to v i is

Table 1 .
Results of the ablation studies.

Table 2 .
AUCs and AUPRs of different methods in comparison all the 1373 drugs.

Table 3 .
The paired Wilcoxon test result on AUCs and AUPRs of 1373 drugs comparing NGMDA with other compared methods.
Figure 4.The average recalls of drugs at different top k settings.8Xuanetal.literature to verify the microbe-drug association prediction results of NGMDA.Among the top 20 candidate microbes related to Ciprofloxacin, six are recorded by MDAD, and four are contained in the aBiofilm database, which suggests that these microbes are indeed associated with the drug Ciprofloxacin, and these 13 candidates are further confirmed

Table 4 .
The top-20 candidate microbes of Ciprofloxacin.

Table 5 .
The top-20 candidate microbes of Moxifloxacin.

Table 6 .
The top-20 candidate microbes of Vancomycin.Microbe-drug association prediction by the literature.For example, several microbes, including Candida albicans, Human immunodeficiency virus 1, Streptococcus mutans, and Streptococcus pneumoniae, are inhibited (or killed) by Ciprofloxacin