Abstract

Drug sensitivity is essential for identifying effective treatments. Meanwhile, circular RNA (circRNA) has potential in disease research and therapy. Uncovering the associations between circRNAs and cellular drug sensitivity is crucial for understanding drug response and resistance mechanisms. In this study, we proposed DeepHeteroCDA, a novel circRNA–drug sensitivity association prediction method based on multi-scale heterogeneous network and graph attention mechanism. We first constructed a heterogeneous graph based on drug–drug similarity, circRNA–circRNA similarity, and known circRNA–drug sensitivity associations. Then, we embedded the 2D structure of drugs into the circRNA–drug sensitivity heterogeneous graph and use graph convolutional networks (GCN) to extract fine-grained embeddings of drug. Finally, by simultaneously updating graph attention network for processing heterogeneous networks and GCN for processing drug structures, we constructed a multi-scale heterogeneous network and use a fully connected layer to predict the circRNA–drug sensitivity associations. Extensive experimental results highlight the superior of DeepHeteroCDA. The visualization experiment shows that DeepHeteroCDA can effectively extract the association information. The case studies demonstrated the effectiveness of our model in identifying potential circRNA–drug sensitivity associations. The source code and dataset are available at https://github.com/Hhhzj-7/DeepHeteroCDA.

Introduction

Circular RNAs (circRNAs) constitute a distinct category of endogenous non-coding RNAs, mainly produced by back-splicing or lariat-driven processes occurring within genes [1]. In contrast to linear RNAs that feature 5’ caps and 3’ poly(A) tails, circRNAs create covalently closed loops, devoid of both 5’ to 3’ polarity and polyadenylation [1]. This unique structure not only renders circRNAs resistant to exonuclease-mediated degradation, thereby providing them with significantly enhanced stability compared to linear RNAs [2], but also contributes to their ubiquity across various cell types and organisms. As research into circRNAs advances, their broad biological significance and potential functions are becoming increasingly recognized, attracting widespread attention in the scientific community. For instance, certain circRNAs contain miRNA response elements (MREs), allowing them to function as miRNA sponges, a mechanism that regulates miRNA activity and downstream gene expression [3]. MiRNAs, |$\sim $|22 nucleotides in length, play a key role in post-transcriptional regulation by binding to 3’-untranslated regions (3’-UTRs) of target mRNAs, leading to mRNA destabilization and translational repression [4]. Moreover, circRNAs such as CircHIPK3, Circ_0006528, and Circ_0004870 have been linked to cancer-related hallmarks, promoting tumorigenesis by modulating tumor suppressors, apoptotic pathways, and cellular replication [5]. Due to their stability, conservation, ubiquity, and specificity, circRNAs are emerging as highly promising prognostic and diagnostic biomarkers in cancer research [6].

Drug resistance continues to be a major obstacle in cancer treatment, and recent studies have highlighted the crucial role of circRNAs in regulating this resistance [7]. For example, circRNA-circ_0076305 has been demonstrated to promote resistance to cisplatin (DDP) in non-small cell lung cancer (NSCLC) by regulating ABCC1 through miR-186-5p, thereby reducing the efficacy of the drug in NSCLC treatment [8]. In glioma, circ_0072083 has been implicated in temozolomide (TMZ) resistance by enhancing NANOG expression through multiple pathways, including regulation by miR-1252-5p and ALKBH5-mediated demethylation [9]. Moreover, CircNR3C1 regulates the BRD4/C-myc complex in bladder cancer (BC). CircNR3C1 interacts with BRD4, disrupting the BRD4/C-myc complex that promotes BC progression, and in vivo studies showed that ectopic expression of C-myc can partially reverse the tumor-suppressive effects of circNR3C1 [10]. These findings shed light on the complex mechanisms of drug resistance involving circRNAs. However, our understanding of the complex interactions between circRNAs and drug sensitivity is still limited.

Identifying associations between circRNAs and drug sensitivity using experimental methods is frequently hindered by high costs, intensive labor, and time-consuming processes.

Nevertheless, numerous advanced deep learning methods have been developed for link prediction [11–15]. By reformulating the circRNA–drug sensitivity associations prediction as a link prediction task, deep learning techniques can also be employed to address this challenge. Deng et al. [16] introduced GATECDA, a framework based on a graph attention auto-encoder, which utilizes graph attention mechanisms to effectively predict circRNA–drug sensitivity associations by capturing essential information from sparse, high-dimensional data. Similarly, Yang et al. [17] developed MNGACDA, leveraging multimodal networks that incorporate graph auto-encoders and attention mechanisms to predict these associations. Luo et al. [18] proposed DPMGCDA, a framework that employs dual perspective learning and a path-masked graph autoencoder to predict circRNA–drug sensitivity associations. Although existing methods have made certain progress, they lack the extraction of the intrinsic features of drug molecules and fail to consider the differences in information weights among different nodes in the networks. This leads to a reduced capacity of the model to capture the information of drug molecule nodes, resulting in the insufficient consideration of crucial structural information and node features within the network.

Here, we introduce DeepHeteroCDA, an innovative deep learning framework that utilizes multi-scale heterogeneous networks for predicting circRNA–drug sensitivity associations. The approach starts by organizing circRNA host gene sequences, drug structural information, and associations between circRNAs and drugs. A heterogeneous graph is then constructed based on drug–drug and circRNA–circRNA similarity and known circRNA–drug sensitivity associations. We employ a Graph Attention Network (GAT) [19] to update the representations of heterogeneous nodes with adaptive weights, thereby enhancing the extraction of network-level information. To mine the molecular-level information of chemical structure of drugs, the model represents the drug as a 2D structure, with atoms as nodes and chemical bonds as edges, and utilizes GCN [20] to compute drug embedding vectors. Therefore, we constructed a multi-scale heterogeneous graph, in which the atomic-level 2D topological features of drugs were also adaptively updated during the feature update process of the circRNA–drug network. Compared to the circRNA–drug networks proposed by other methods, our multi-scale heterogeneous networks can further capture the fine-grained structural features of small molecules, enabling the model to better explore the complex interactions between drugs and circRNAs. Finally, a dense neural network is used to predict potential associations.

To evaluate the effectiveness of DeepHeteroCDA, we conducted a comprehensive assessment using the benchmark dataset, comparing our proposed method with the state-of-the-art methods. The results highlight the superior predictive performance of DeepHeteroCDA. Additionally, an ablation study is performed to evaluate the performance of each component. The case study involving three specific drugs further demonstrates the practical utility of DeepHeteroCDA and the predicted results can aid conventional experiments by enabling the preselection of effective circRNA–drug sensitivity associations.

Materials and methods

Data sets

From the circRic database [21], following a stringent filtering process, associations with a false discovery rate exceeding 0.05 were excluded. The resulting benchmark dataset comprises 4134 experimentally validated circRNA–drug sensitivity associations, involving 271 unique circRNAs and 218 distinct drugs. Based on key associations, a bipartite circRNA–drug topology network is constructed, denoted |${AM} \in \mathbb{R}^{271 \times 218}$|⁠. If an experimental association between the pth circRNA and the qth drug is verified, the corresponding entry |$AM_{pq}$| is assigned a value of 1. Additionally, 4134 unverified circRNA–drug associations were randomly selected as negative samples, making up just 6.99% of all potential circRNA–drug pairs, thereby representing a relatively small portion. Therefore, the probability that genuine interactions are misclassified as negative samples is low. Following this selection, we curated a comprehensive dataset of 8268 circRNA–drug pairs, ensuring a balanced dataset by maintaining an equal number of positive and negative samples. The SMILES [22] representations of the drugs were retrieved from the DrugBank database [23].

Overview of DeepHeteroCDA

To predict potential circRNA–drug sensitivity associations, we present DeepHeteroCDA, an innovative method based on a multi-scale heterogeneous network and attention mechanism. Our approach employs an inductive method to extract topological information of drugs, which is subsequently combined with circRNA–drug heterogeneous network. The workflow of the DeepHeteroCDA model, as illustrated in Fig. 1, consists of the following five key steps:

The framework of DeepHeteroCDA. (I) Feature extraction: extracts similarity features for circRNA and drugs, including sequence, structure, and known association data. GCN is used to aggregate information from neighboring atoms and chemical bonds through graph convolutional layers, learning the structural features of molecules. (II) Heterogeneous network construction: builds a network integrating circRNA and drug interactions. (III) Information fusing by GAT: uses graph attention networks to aggregate node features. GAT leverage attention mechanisms to weigh and aggregate features of circRNAs and drugs based on their relevance. (IV) DNN-based association prediction: predicts circRNA-drug associations using a deep neural network.
Figure 1

The framework of DeepHeteroCDA. (I) Feature extraction: extracts similarity features for circRNA and drugs, including sequence, structure, and known association data. GCN is used to aggregate information from neighboring atoms and chemical bonds through graph convolutional layers, learning the structural features of molecules. (II) Heterogeneous network construction: builds a network integrating circRNA and drug interactions. (III) Information fusing by GAT: uses graph attention networks to aggregate node features. GAT leverage attention mechanisms to weigh and aggregate features of circRNAs and drugs based on their relevance. (IV) DNN-based association prediction: predicts circRNA-drug associations using a deep neural network.

Step 1: CircRNA and drug similarity measurement. CircRNA similarity is quantified using two metrics: sequence-based similarity and GIP (Gaussian interaction profile) kernel similarity [24, 25]. Similarly, drug similarity is evaluated from two perspectives: structural similarity and GIP kernel similarity. The comprehensive similarity matrices for circRNAs and drugs are obtained by merging the respective similarity matrices.

Step 2: Construction of a circRNA–drug heterogeneous graph. A heterogeneous graph is constructed, wherein the similar neighbors for each circRNA or drug are retained to capture essential relationships. The features of circRNA and drug nodes are then mapped into a shared vector space, enabling a unified representation that facilitates the integration of multiple data modalities.

Step 3: Drug topology information extraction. In this step, drug nodes are transformed to 2D graph, where atoms are treated as nodes and chemical bonds as edges. GCNs are then employed to extract comprehensive and informative representations of drugs from these graph-based structures. Combining the drug topology graph with the circRNA–drug heterogeneous graph, we achieve a multi-scale heterogeneous network.

Step 4: Aggregation of central node information. GAT is utilized to aggregate information from neighboring nodes, with distinct attention weights assigned to each neighbor. This adaptive attention mechanism allows the model to prioritize more informative nodes during the aggregation process, thereby capturing nuanced relationships between the central node and its neighbors.

Step 5: CircRNA–drug associations prediction. We concatenated the final circRNA and drug features from step 4. This representation is passed through a fully connected layer to generate a prediction score, with cross-entropy loss used for optimization.

The subsequent sections will offer an in-depth analysis and explanation of each of the outlined steps.

CircRNA similarity measurement

As described in [17], we quantify circRNA similarity by calculating the sequence similarity between their host genes. This calculation is based on the Levenshtein distance between sequences, using the ratio function. This method produces an adjacency matrix, denoted |$CSM \in R^{271 \times 271}$|⁠, which stores the sequence similarity data of the host genes.

Meanwhile, it is significant to recognize that the adjacency matrix |$CSM$| is sparse. To achieve comprehensive information mining of circRNA similarity, we employ the GIP kernel similarity, which has proven effective in evaluating similarity across various biological entities. We calculate the GIP kernel similarity for circRNAs based on the circRNA–drug sensitivity association matrix |$AM$|⁠, under the assumption that circRNAs associated with the same drug sensitivity are more likely to be similar. The resulting GIP kernel similarity matrix of circRNA is denoted as |$CGM \in R^{271 \times 271}$|⁠. Inspired by Xiao et al. [26], we can derive the circRNA similarity as follows:

(1)

where |$CM_{ij}$| is the final similarity of circRNA |$i$| and |$j$|⁠, |$CSM_{ij}$| is their sequence similarity, and |$CGM_{ij}$| is their GIP kernel similarity. Since |$CSM_{ij}$| and |$CGM_{ij}$| capture similarity at different levels, the final similarity is computed as their average when |$CSM_{i j} \neq 0$| to integrate their complementary information. However, the adjacency matrix |$CSM$| is sparse. When |$CSM_{i j} = 0$|⁠, the final similarity is derived solely from |$CGM_{ij}$| as the final similarity measure to avoid the potential bias caused by the lack of information from |$CSM_{i j}$|⁠.

Drug similarity measurement

Drug similarity can be evaluated from two perspectives. First, given the significant influence of drug structure on its function, the drug similarity is assessed based on structural characteristics. Drug structure data are obtained from the PubChem database, and RDKit is used to compute the topological fingerprint for each drug. The structural similarity between drugs is then calculated using the Tanimoto coefficient, resulting in the drug structure similarity matrix, denoted as |$DSM \in R^{218 \times 218}$|⁠. Simultaneously, the GIP kernel similarity matrix for drugs, denoted as |$DGM \in R^{218 \times 218}$|⁠, is calculated.

Similar to circRNA, the drug similarity is derived by combining the two aforementioned similarity measures:

(2)

where |$DM_{mn}$| is the final similarity of drug |$m$| and |$n$|⁠, |$DSM_{mn}$| is their sequence similarity and |$DGM_{mn}$| is their GIP kernel similarity.

GCN for drug representation

To transform the SMILES of drugs into molecular graphs, we utilize the RDKit toolkit, which enables atoms to be modeled as nodes and bonds as edges in the graph. The initial features of nodes are computed using MoleculeNet [27]. The core of the framework is the implementation of GCN, specifically designed for drug feature extraction. This method allows for the effective extraction of pertinent knowledge from the intricate topological features of drugs inherent to drugs.

The GCN is a neural network architecture specifically designed to process data structured in the form of graphs. In graph-based data, the adjacency matrix typically represents the relationships between nodes. GCN functions by generating lower-dimensional node representations through an iterative process, utilizing multiple layers to capture the structural information of the graph.

The following outlines the iterative procedure:

(3)
(4)

where |$I_{N}$| denoting the identity matrix, |$A$| representing the adjacency matrix, and |$\hat{A}$| is the adjacency matrix of the undirected graph with added self-connections. |$H^{l}$| refers to the embeddings at the |$l$|th layer, |$D$| is the degree matrix of |$A$|⁠, |$W^{l}$| is a layer-specific trainable parameter, |$H^{0}$| is the initial node feature matrix, and |$\sigma (\cdot )$| represents ReLU [28].

To capture the overall structural information of the graph in a global feature representation, max pooling is used to aggregate the learned node embeddings into a graph-level feature vector. This operation ensures that local node-level information is effectively summarized at the graph level. Specifically, for drug molecular graphs, after the GCN has learned the embeddings for each atom, global max pooling is applied to derive a single feature representation for each drug. This process can be mathematically represented as:

(5)

where |$DR_{d}$| denotes the global feature vector of the |$d$|-th drug, |$H_{d}^{L}[o,:]$| represents the feature vector of the |$o$|th node in the |$d$|th drug’s molecular graph at the |$L$|th layer of the GCN, |$n_{d}$| is the number of atoms in the molecular graph of the |$d$|th drug. By applying max pooling, we select the maximum value from the feature vectors of all nodes to obtain a global representation for the drug.

Subsequently, the global feature vectors for all drugs are concatenated into a complete drug representation matrix |$DC$|⁠, represented as:

(6)

where |$DR \in \mathbb{R}^{218 \times 489}$|⁠, with 218 representing the number of drugs and 489 being the dimensionality of the feature vectors learned by the GCN. Through the global max pooling operation, the node-level representations of drug molecular graphs are aggregated into drug-level global feature representations, which are subsequently used for enhancing the representation of drugs in heterogeneous networks.

Heterogeneous graph construction

To achieve circRNA–drug association prediction, we construct a heterogeneous network comprising 271 circRNAs and 218 drugs. The heterogeneous network integrates two types of edges: interaction edges and similarity edges. The interaction edges originate from the circRNA–drug associations, representing exclusive connections between circRNA and drug pairs. However, the bipartite nature of this graph restricts its ability to effectively aggregate information across the network. To address this limitation, we enhance the bipartite graph by incorporating additional similarity networks, capturing the relationships between circRNA pairs and drug pairs. Since similarity edges in the circRNA–circRNA and drug–drug networks used to form dense connections, which can introduce noise and complicate the model. To mitigate this, we retain only the eight most similar neighbors for each circRNA or drug, thereby simplifying the graph while preserving critical similarity relationships. Let |$CM$| represents the similarity matrix of circRNA nodes, |$DM$| denotes the similarity matrix of drug nodes, and |$AM$| signifies the association matrix of circRNAs and drugs. We can define the heterogeneous network as follows:

(7)

The construction of the heterogeneous graph is essential for enriching the information flow across circRNA and drug nodes. By incorporating interaction and similarity relationships, the model can better capture the intricate patterns and associations that exist between circRNAs and drugs, thereby enhancing predictive performance.

GAT on heterogeneous graphs

Given that the heterogeneous network contains two distinct types of nodes, the initial representation dimensions of the nodes differ. Aligning their features to the same dimensional space ensures that the aggregation process is coherent and compatible. The initial circRNA node features, denoted as |$X_{c}$|⁠, are represented as follows:

(8)

where |$\left | \right |$| represents the concatenation.

In contrast to the circRNA nodes, which simply concatenate adjacency and similarity matrices to form their initial feature set, the drug nodes undergo an additional layer of processing to capture structural and relational complexities. Specifically, the initial drug features, denoted as |$X_{d}$|⁠, are constructed by first concatenating the adjacency matrix |$AM^{T}$| with |$DM$|⁠, followed by adding the drug representation matrix |$DR$|⁠. This representation goes beyond conventional feature vectors by embedding intricate graph-based descriptions of the drug, significantly enhancing the node’s capacity to retain chemical structure and similarity information. The initial drug features can be expressed as:

(9)

Next, we project the circRNA and drug features into the same dimensional space. We apply two transformation matrices, |$W_{c} \in \mathbb{R}^{256 \times 271}$| for circRNAs and |$W_{d} \in \mathbb{R}^{256 \times 218}$| for drugs, where “256” denotes the shared feature dimension. The transformation employs the ReLU nonlinearity to adjust the feature dimensions. The steps for projecting circRNA and drug nodes are outlined below:

(10)
(11)

where |$W_{c}$| and |$W_{d}$| are learnable parameters for mapping the 489-dimensional circRNA node features into a reduced 256-dimensional space. In this context, |$X_{c}$| refers to the initial feature set of the circRNA, and |$H_{c}$| is the features after projection. Similarly, |$W_{d}$| is the linear transformation matrix applied to the drug nodes, which also have 489-dimensional features that are reduced to 256 dimensions. The drug nodes’ original features are denoted by |$X_{d}$|⁠, and the corresponding projected features are represented by |$H_{d}$|⁠.

GAT utilizes a muti-head attention mechanism to extract the features of each node’s neighborhood by layering multiple network levels, assigning varying weights to the neighboring nodes based on their relative importance. The input to the GAT consists of the graph’s structural relationships and the individual node attributes, resulting in a new set of node embeddings as output. For instance, considering nodes |$a$| and |$b$|⁠, GAT applies a linear transformation to both nodes individually to derive higher level node representations. Subsequently, a self-attention mechanism is applied to node pairs to calculate the weights |$e_{ab}$|⁠, reflecting the influence of node |$b$| on node |$a$|⁠. The computation of the weights can be expressed using the following formula:

(12)

where |$\sigma $| denotes the LeakyReLU. To ensure comparability among the weights of node |$a$|’s adjacent nodes, we further apply softmax normalization to adjust the weights of the target node and its surrounding neighbors. After acquiring the normalized attention coefficients, we compute the linear combination of the corresponding neighbor features. Finally, the output feature vector is derived by applying a nonlinear activation function.

Deep neural network layer

To predict the interaction probability |$\hat{P_{cd}}$| between circRNA |$c$| and drug |$d$|⁠, we concatenate the final feature vectors of the circRNA and drug nodes derived from the GAT. These concatenated features are then passed through a fully connected layer to compute the predicted probability |$\hat{P_{cd}}$|⁠, as follows:

(13)

Subsequently, the model’s performance is evaluated by calculating the loss between the predicted probability and the actual label using the cross-entropy loss function.

Results

Evaluation metrics

To ensure the reliability and stability of DeepHeteroCDA, we perform five-fold cross-validation (5-CV) and 10-fold cross-validation (10-CV) to evaluate the performance of DeepHeteroCDA. In this experiment, we introduced the area under receiver operating characteristic (ROC) curve (AUC), the area under precision-recall (PR) curve (AUPR), accuracy (ACC), F1-Score (F1), and recall (R) as assessment metrics for evaluating the performance of DeepHeteroCDA and comparison methods. Their definition is as follows:

(14)
(15)
(16)
(17)

where |$TP$| and |$TN$| denote true positive and true negative instances, respectively, while |$FP$| and |$FN$| represent false positive and false negative instances. AUC is the area under the ROC curve that measures a model’s ability to distinguish between positive and negative classes across all thresholds, with a higher value indicating better performance. AUPR is the area under the precision–recall curve that evaluates a model’s ability to handle imbalanced data by focusing on positive class predictions, with a higher value indicating better performance.

Comparison with other methods

To the best of our knowledge, there are currently only three computational approaches for predicting potential associations between circRNAs and drug sensitivity, namely DPMGCDA, MNGACDA, and GATECDA. To thoroughly assess the performance of DeepHeteroCDA, we extend our comparison to include these three models as well as four additional models that focus on other association prediction tasks within the bioinformatics domain. Additionally, we incorporate several well-established machine learning models, such as Support Vector Machine (SVM) [29], Random Forest (RF) [30], k-Nearest Neighbors (KNN) [31], XGBoost [32], and AdaBoost [33], which are widely recognized for their robustness and strong performance in association prediction across various domains. Additional details regarding the baseline methods are provided in the Supplementary material.

As shown in Table 1, our proposed approach, DeepHeteroCDA, demonstrated state-of-the-art performance in the 5-CV experiments, achieving an AUC of 0.9233, which is relatively higher than the second-best method, MNGACDA, by 1.5%, followed by improvements of 2.4% over DPMGCDA, 2.6% over XGBoost, 2.8% over LAGCN, 4.0% over RF, 4.3% over AdaBoost, 4.4% over GATECDA, 5.2% over GCNMDA, 5.6% over VGAMF, 6.8% over SVM, 6.9% over KNN, and 7.0% over VGAE. Similarly, DeepHeteroCDA outperformed all other methods in terms of AUPR, with an average value of 0.9293, which represents a relative improvement of 1.3% over DPMGCDA, 1.6% over MNGACDA, 3.0% over LAGCN, 3.3% over XGBoost, 4.1% over GATECDA, 4.6% over AdaBoost and RF, 6.1% over GCNMDA and KNN, 6.4% over VGAE, 7.3% over VGAMF, and 8.7% over SVM. In addition to AUC and AUPR, DeepHeteroCDA outperformed the other methods across all other metrics, including recall, F1-score, and accuracy.These results highlight the robustness and effectiveness of DeepHeteroCDA compared to the other seven methods evaluated in this study.

Table 1

Comparison experiments under five-fold CV

MethodsAUCAUPRF1-scoreAccuracyRecall
SVM0.86480.85470.80490.79280.8550
RF0.88810.88850.82040.81650.8383
KNN0.86420.87600.79260.79010.8020
XGBoost0.89970.89960.82940.82520.8494
AdaBoost0.88520.88880.82070.81550.8443
VGAE0.86280.87300.79880.78920.8227
VGAMF0.87400.86620.81760.81040.8437
GCNMDA0.87780.87620.81980.81190.8428
GATECDA0.88460.89290.81940.81680.8316
LAGCN0.89820.90230.82850.82610.8403
MNGACDA0.90980.91500.84130.83790.8592
DPMGCDA0.90150.91730.84080.84240.8325
DeepHeteroCDA0.92330.92930.85610.85200.8807
MethodsAUCAUPRF1-scoreAccuracyRecall
SVM0.86480.85470.80490.79280.8550
RF0.88810.88850.82040.81650.8383
KNN0.86420.87600.79260.79010.8020
XGBoost0.89970.89960.82940.82520.8494
AdaBoost0.88520.88880.82070.81550.8443
VGAE0.86280.87300.79880.78920.8227
VGAMF0.87400.86620.81760.81040.8437
GCNMDA0.87780.87620.81980.81190.8428
GATECDA0.88460.89290.81940.81680.8316
LAGCN0.89820.90230.82850.82610.8403
MNGACDA0.90980.91500.84130.83790.8592
DPMGCDA0.90150.91730.84080.84240.8325
DeepHeteroCDA0.92330.92930.85610.85200.8807

Note: The best performance for each metric is marked in bold, while the second-best performance is marked in underlined.

Table 1

Comparison experiments under five-fold CV

MethodsAUCAUPRF1-scoreAccuracyRecall
SVM0.86480.85470.80490.79280.8550
RF0.88810.88850.82040.81650.8383
KNN0.86420.87600.79260.79010.8020
XGBoost0.89970.89960.82940.82520.8494
AdaBoost0.88520.88880.82070.81550.8443
VGAE0.86280.87300.79880.78920.8227
VGAMF0.87400.86620.81760.81040.8437
GCNMDA0.87780.87620.81980.81190.8428
GATECDA0.88460.89290.81940.81680.8316
LAGCN0.89820.90230.82850.82610.8403
MNGACDA0.90980.91500.84130.83790.8592
DPMGCDA0.90150.91730.84080.84240.8325
DeepHeteroCDA0.92330.92930.85610.85200.8807
MethodsAUCAUPRF1-scoreAccuracyRecall
SVM0.86480.85470.80490.79280.8550
RF0.88810.88850.82040.81650.8383
KNN0.86420.87600.79260.79010.8020
XGBoost0.89970.89960.82940.82520.8494
AdaBoost0.88520.88880.82070.81550.8443
VGAE0.86280.87300.79880.78920.8227
VGAMF0.87400.86620.81760.81040.8437
GCNMDA0.87780.87620.81980.81190.8428
GATECDA0.88460.89290.81940.81680.8316
LAGCN0.89820.90230.82850.82610.8403
MNGACDA0.90980.91500.84130.83790.8592
DPMGCDA0.90150.91730.84080.84240.8325
DeepHeteroCDA0.92330.92930.85610.85200.8807

Note: The best performance for each metric is marked in bold, while the second-best performance is marked in underlined.

In the 10-fold cross-validation (10-CV), the DeepHeteroCDA method delivered the highest performance, achieving an AUC score of 0.9270. Comparing to the other methods, DeepHeteroCDA showed relative gains, with a 1.4% increase over MNGACDA, 2.1% over DPMGCDA, 2.3% over LAGCN, 2.9% over XGBoost, 3.9% over GATECDA, 4.1% over RF, 4.9% over AdaBoost and GCNMDA, 6.2% over VGAMF, 6.6% over KNN, 7.4% over VGAE, and 7.5% over SVM. Similarly, DeepHeteroCDA attained an average AUPR score of 0.9322, marking relative improvements of 1.1% over DPMGCDA, 1.4% over MNGACDA, 2.7% over LAGCN, 3.2% over XGBoost, 3.4% over GATECDA, 4.7% over RF, 5.2% over GCNMDA, 5.5% over AdaBoost and KNN, 6.8% over VGAE, 7.3% over VGAMF, and 9.2% over SVM. While DeepHeteroCDA achieved the highest accuracy and F1 score among all methods, it showed a slightly lower recall compared to LAGCN. However, the superior performance in terms of AUC, AUPR, and F1 score suggests that DeepHeteroCDA provides the most reliable predictive capabilities overall, as shown in Table 2. Notably, the overall performance in the 10-CV experiment slightly exceeded that of the 5-CV experiment. This enhancement is likely due to the larger dataset used in 10-CV, facilitating better model training and resulting in superior predictive outcomes. Fig. 2 displays the ROC and PR curves of DeepHeteroCDA and the comparison models under 5-CV and 10-CV. The results further demonstrate the superior performance of DeepHeteroCDA.

Table 2

Comparison experiments under 10-fold CV

MethodsAUCAUPRF1-scoreAccuracyRecall
SVM0.86230.85330.80520.79770.8361
RF0.89030.89060.82130.81760.8385
KNN0.87000.88340.79700.79610.8005
XGBoost0.90100.90360.83100.82730.8494
AdaBoost0.88360.88390.82400.81980.8439
VGAE0.86340.87250.79870.78650.8314
VGAMF0.87290.86820.81130.80300.8471
GCNMDA0.88340.88640.82250.81830.8420
GATECDA0.89190.90160.82340.82110.8343
LAGCN0.90650.90760.84250.83720.8708
MNGACDA0.91430.91960.84450.84680.8324
DPMGCDA0.90840.92180.84430.84580.8359
DeepHeteroCDA0.92700.93220.85700.85710.8567
MethodsAUCAUPRF1-scoreAccuracyRecall
SVM0.86230.85330.80520.79770.8361
RF0.89030.89060.82130.81760.8385
KNN0.87000.88340.79700.79610.8005
XGBoost0.90100.90360.83100.82730.8494
AdaBoost0.88360.88390.82400.81980.8439
VGAE0.86340.87250.79870.78650.8314
VGAMF0.87290.86820.81130.80300.8471
GCNMDA0.88340.88640.82250.81830.8420
GATECDA0.89190.90160.82340.82110.8343
LAGCN0.90650.90760.84250.83720.8708
MNGACDA0.91430.91960.84450.84680.8324
DPMGCDA0.90840.92180.84430.84580.8359
DeepHeteroCDA0.92700.93220.85700.85710.8567

Note: The best performance for each metric is marked in bold, while the second-best performance is marked in underlined.

Table 2

Comparison experiments under 10-fold CV

MethodsAUCAUPRF1-scoreAccuracyRecall
SVM0.86230.85330.80520.79770.8361
RF0.89030.89060.82130.81760.8385
KNN0.87000.88340.79700.79610.8005
XGBoost0.90100.90360.83100.82730.8494
AdaBoost0.88360.88390.82400.81980.8439
VGAE0.86340.87250.79870.78650.8314
VGAMF0.87290.86820.81130.80300.8471
GCNMDA0.88340.88640.82250.81830.8420
GATECDA0.89190.90160.82340.82110.8343
LAGCN0.90650.90760.84250.83720.8708
MNGACDA0.91430.91960.84450.84680.8324
DPMGCDA0.90840.92180.84430.84580.8359
DeepHeteroCDA0.92700.93220.85700.85710.8567
MethodsAUCAUPRF1-scoreAccuracyRecall
SVM0.86230.85330.80520.79770.8361
RF0.89030.89060.82130.81760.8385
KNN0.87000.88340.79700.79610.8005
XGBoost0.90100.90360.83100.82730.8494
AdaBoost0.88360.88390.82400.81980.8439
VGAE0.86340.87250.79870.78650.8314
VGAMF0.87290.86820.81130.80300.8471
GCNMDA0.88340.88640.82250.81830.8420
GATECDA0.89190.90160.82340.82110.8343
LAGCN0.90650.90760.84250.83720.8708
MNGACDA0.91430.91960.84450.84680.8324
DPMGCDA0.90840.92180.84430.84580.8359
DeepHeteroCDA0.92700.93220.85700.85710.8567

Note: The best performance for each metric is marked in bold, while the second-best performance is marked in underlined.

Comparison results of ROC curves and PR curves of DeepHeteroCDA with other methods under five-fold CV and 10-fold CV.
Figure 2

Comparison results of ROC curves and PR curves of DeepHeteroCDA with other methods under five-fold CV and 10-fold CV.

Parameter sensitivity analysis

To evaluate the performance impact of six key hyperparameters: learning rate, dropout rate, hidden dimension of GAT, GAT layers, GAT heads and GCN layers, we conducted a series of experiments on the benchmark dataset under 5-CV by varying the values of six key hyperparameters within their respective ranges while keeping other settings constant.

The learning rate determines how fast the model updates its weights during training. A well-tuned learning rate ensures stable and efficient convergence, avoiding overshooting and slow progress. We experimented with learning rates of 5e-3, 5e-4, 5e-5 and 5e-6. As shown in Fig. 3(A), the model performance steadily improved as the learning rate increased from 5e-3 to 5e-5. However, when the learning rate was further reduced to 5e-6, the performance dropped significantly, indicating that the rate was too small for the model to optimize its parameters effectively and reach optimal performance during training.

The influence of different hyperparameters on the model performance under 5-CV. (A) Learning rate; (B) dropout rate; (C) number of GAT layer; (D) hidden dimension of GAT; (E) number of GAT head; (F) number of GCN layer.
Figure 3

The influence of different hyperparameters on the model performance under 5-CV. (A) Learning rate; (B) dropout rate; (C) number of GAT layer; (D) hidden dimension of GAT; (E) number of GAT head; (F) number of GCN layer.

Dropout helps reduce overfitting by randomly deactivating neurons during training, encouraging the model to learn more robust features. We experimented with dropout rates of 0.1, 0.3, 0.5, and 0.7. As shown in Fig. 3(B), model performance significantly declined as the dropout rate increased. Excessive dropout caused the model to lose too much information, leading to unstable training and preventing effective learning. The optimal performance was achieved when the dropout rate was set to 0.3.

The number of GAT layers determines how many levels of node interactions are captured. Proper tuning of this parameter helps the model learn useful long-range dependencies without losing node-specific information. As shown in Fig. 3(C), the model achieves its best performance when the value is 1. The reason is that an excessively large number of GAT layers caused the model to overfit.

The hidden dimension of GAT defines the size of feature embeddings for each node. Tuning this hyperparameter optimizes the model’s expressiveness, allowing it to learn rich, meaningful node representations. As shown in Fig. 3(D), when the hidden dimension of GAT is 32, the predictive performance of DeepHeteroCDA reaches the best.

The number of GAT heads determines how many parallel attention mechanisms are used in the graph attention layers. Proper tuning of this hyperparameter helps the model capture diverse relationships between nodes from multiple perspectives. As shown in Fig. 3(E), the model achieves its best performance when the number of GAT heads is 3. This indicates that a suboptimal number of attention heads results in reduced performance, with the optimal number balancing between model complexity and the ability to capture sufficient information.

The number of GCN layers controls how many layers of convolutional operations are applied on the drugs, which influences the depth of information aggregation. Tuning this hyperparameter is essential for ensuring that the model captures both immediate neighborhood and multi-hop dependencies effectively. As shown in Fig. 3(F), the performance of DeepHeteroCDA reaches its peak when the number of GCN layers is 3. This suggests that deeper GCNs lead to better performance up to a certain point, after which increasing the number of layers may cause overfitting.

Experimental analysis under different data conditions

The circRNA–drug sensitivity association scenarios under actual experimental conditions are often diverse. To ensure a comprehensive evaluation of our model’s robustness, we conducted experiments under various data conditions, including different training data sizes and data noise. Additionally, we performed a performance comparison with DPMGCDA.

To evaluate the model’s efficiency in leveraging data with limited circRNA–drug sensitivity associations, we first performed experiments using five-fold cross-validation with varying sizes of training data, where subsets of the training set (e.g. 90%, 80%, 70%, 60%) were randomly selected for training while the test set remained unchanged. As shown in the 4(A)-(C), the performance of both DeepHeteroCDA and DPMGCDA, as measured by AUC, AUPR, and F1 scores, declines as the size of training data decreases. Across all training data sizes, DeepHeteroCDA consistently outperforms DPMGCDA in all three metrics. The results demonstrate that DeepHeteroCDA maintains stable and superior performance across varying scales of training data.

Performance comparison on three metrics under different data conditions. (A)-(C) are performance comparison on three metrics under different training data size. (D)-(E) are performance comparison on three metrics under different data noise.
Figure 4

Performance comparison on three metrics under different data conditions. (A)-(C) are performance comparison on three metrics under different training data size. (D)-(E) are performance comparison on three metrics under different data noise.

Under real experimental conditions, data noise is often inevitable. To evaluate the models’ sensitivity to data quality under varying noise levels, we further tested the performance of models by altering a portion of unknown associations in the training set of the five-fold cross-validation to known associations (e.g. 10%, 20%, 30%). As shown in Fig. 4(D)-(F), when the data noise level increases from 0% to 30%, DeepHeteroCDA maintains relatively stable performance across all three metrics, while DPMGCDA exhibits a more significant decline. The reason may be that DPMGCDA relies entirely on the circRNA–drug network to learn association information. In contrast, our proposed DeepHeteroCDA not only leverages GAT to dynamically aggregate effective neighbor information but also employs GCN to learn fine-grained drug features independently of the association network. The results demonstrate that DeepHeteroCDA possesses strong generalization capabilities and robustness. To evaluate the performance of DeepHeteroCDA under different circRNA–drug sensitivity association scenarios, we further conducted additional experiments under different positive-negative sample ratios. The results can be found in Supplementary Table S1.

Ablation study

To comprehensively evaluate the individual contributions of each component within DeepHeteroCDA, we have constructed several variants of DeepHeteroCDA. In contrast to prior methodologies, our DeepHeteroCDA model integrates the drug molecular graph in circRNA–drug heterogeneous networks and utilizes GCN to extract fine-grained drug-specific features.

Here are the specific meanings of each ablation variant:

  • DeepHeteroCDAw/o MG removes the step of mining topological graph features of drugs by GCN, and only used the structural similarity network and association network information of drugs to extract drug embeddings.

  • DeepHeteroCDAw/o GIP removes the GIP kernel similarity of circRNAs and drugs in the model.

  • DeepHeteroCDAw/o SG removes the sequence similarity of circRNAs and structure similarity of drugs in the model.

  • DeepHeteroCDAw/o GAT uses the circRNA and drug embeddings from the feature extraction module instead of the embeddings extracted by GAT.

In the conducted experiments, as presented in Table 3, the outcomes for DeepHeteroCDAw/o GIP and DeepHeteroCDAw/o SG exhibit lower performance compared to DeepHeteroCDA. This finding underscores the notion that the integration of diverse types of circRNA and drug features within DeepHeteroCDA yields superior results in comparison to incorporating only a single type of circRNA or drug feature. The performance of DeepHeteroCDAw/o MG is notably inferior to that of DeepHeteroCDA, indicating the pivotal role of molecular structural features of drugs extracted by the GCN component. This observation underscores the importance of capturing the nuanced structural characteristics of drugs in enhancing predictive accuracy. Furthermore, the performance of DeepHeteroCDAw/o GAT are also worse than DeepHeteroCDA’s performance. This outcome suggests that the integration of the GAT within DeepHeteroCDA holds value. The GAT module’s capability to assign variable weights to each edge, thereby emphasizing significant neighbors through enhanced weights, contributes to the overall performance boost observed in DeepHeteroCDA.

Table 3

Ablation studys under five-fold CV

MethodsAUCAUPRF1-scoreAccuracyRecall
DeepHeteroCDAw/o GIP0.89200.89980.81980.82170.8068
DeepHeteroCDAw/o SG0.90390.91020.82290.82880.7877
DeepHeteroCDAw/o MG0.91320.91980.83990.84410.8119
DeepHeteroCDAw/o GAT0.91290.92020.83780.84120.8102
DeepHeteroCDA0.92330.92930.85610.85200.8807
MethodsAUCAUPRF1-scoreAccuracyRecall
DeepHeteroCDAw/o GIP0.89200.89980.81980.82170.8068
DeepHeteroCDAw/o SG0.90390.91020.82290.82880.7877
DeepHeteroCDAw/o MG0.91320.91980.83990.84410.8119
DeepHeteroCDAw/o GAT0.91290.92020.83780.84120.8102
DeepHeteroCDA0.92330.92930.85610.85200.8807
Table 3

Ablation studys under five-fold CV

MethodsAUCAUPRF1-scoreAccuracyRecall
DeepHeteroCDAw/o GIP0.89200.89980.81980.82170.8068
DeepHeteroCDAw/o SG0.90390.91020.82290.82880.7877
DeepHeteroCDAw/o MG0.91320.91980.83990.84410.8119
DeepHeteroCDAw/o GAT0.91290.92020.83780.84120.8102
DeepHeteroCDA0.92330.92930.85610.85200.8807
MethodsAUCAUPRF1-scoreAccuracyRecall
DeepHeteroCDAw/o GIP0.89200.89980.81980.82170.8068
DeepHeteroCDAw/o SG0.90390.91020.82290.82880.7877
DeepHeteroCDAw/o MG0.91320.91980.83990.84410.8119
DeepHeteroCDAw/o GAT0.91290.92020.83780.84120.8102
DeepHeteroCDA0.92330.92930.85610.85200.8807

Visualization of DeepHeteroCDA embeddings

To further illustrate that DeepHeteroCDA can learn the association information between circRNA and drug sensitivity, we used t-SNE [34] to reduce the dimensionality and visualize the concatenation embedding of circRNA and drug from heterogenous network of DeepHeteroCDA. t-SNE is a dimensionality reduction technique used to visualize data in a lower-dimensional space. As shown in Fig. 5, the green points indicate an association between the circRNA and drug that form the concatenation embedding, while the blue points represent no association. As the number of training epochs increases, the green and blue points become more locally separated, with clearer boundaries between different classes and tighter clustering within the same class. The visualization results indicate that even without the prediction module, the embeddings derived from terogenous network of DeepHeteroCDA already capture rich association knowledge between circRNA and drugs.

Visualization of DeepHeteroCDA embeddings across epochs in 5-CV. Two types of nodes represent whether circRNA and drug are associated in the concatenation embedding. (A) Epoch 1; (B) Epoch 20; (C) Epoch 200.
Figure 5

Visualization of DeepHeteroCDA embeddings across epochs in 5-CV. Two types of nodes represent whether circRNA and drug are associated in the concatenation embedding. (A) Epoch 1; (B) Epoch 20; (C) Epoch 200.

Case study

To assess DeepHeteroCDA’s capability in predicting novel associations between circRNAs and drugs, we conducted case study in the independent CTRP database [35]. We trained DeepHeteroCDA on our dataset and predicted the circRNA–drug sensitivity associations related to three drugs: Linifanib, Piperlongumine, and Vorinostat, in the independent CTRP database. The predicted results were ranked based on their predicted scores. The top 20 associations will be validated against the results in the CTRP database that have been experimentally confirmed.

For Linifanib [36], a multi-target VEGF and PDGFR receptor family inhibitor, the top 20 predicted circRNAs were evaluated. Among these, 17 circRNAs have been verified in the CTRP database, reinforcing the accuracy of DeepHeteroCDA’s predictions (see Table 4).

Table 4

The top 20 circRNAs associated with Linifanib

RankCircRNAVertifiedRankCircRNAVertified
1ANXA2*|$\bullet $|11COL7A1*|$\bullet $|
2RIM1*|$\bullet $|12PKM*|$\bullet $|
3CALD1*|$\bullet $|13HSP90B1*|$\bullet $|
4VIM*|$\bullet $|14LTBP3*|$\bullet $|
5LINC01089|$\circ $|15COL8A1*|$\bullet $|
6COL6A2*|$\bullet $|16KDELR1|$\circ $|
7DCBLD2*|$\bullet $|17KRT7*|$\bullet $|
8FBLN1|$\circ $|18MGAT4B*|$\bullet $|
9TAGLN2*|$\bullet $|19HMGA2*|$\bullet $|
10CTTN*|$\bullet $|20PYGB*|$\bullet $|
RankCircRNAVertifiedRankCircRNAVertified
1ANXA2*|$\bullet $|11COL7A1*|$\bullet $|
2RIM1*|$\bullet $|12PKM*|$\bullet $|
3CALD1*|$\bullet $|13HSP90B1*|$\bullet $|
4VIM*|$\bullet $|14LTBP3*|$\bullet $|
5LINC01089|$\circ $|15COL8A1*|$\bullet $|
6COL6A2*|$\bullet $|16KDELR1|$\circ $|
7DCBLD2*|$\bullet $|17KRT7*|$\bullet $|
8FBLN1|$\circ $|18MGAT4B*|$\bullet $|
9TAGLN2*|$\bullet $|19HMGA2*|$\bullet $|
10CTTN*|$\bullet $|20PYGB*|$\bullet $|

|$\bullet $|’ indicates verified associations from CTRP, ‘|$\circ $|’ indicates non-significant associations. circRNAs marked with ‘*’ are verified.

Table 4

The top 20 circRNAs associated with Linifanib

RankCircRNAVertifiedRankCircRNAVertified
1ANXA2*|$\bullet $|11COL7A1*|$\bullet $|
2RIM1*|$\bullet $|12PKM*|$\bullet $|
3CALD1*|$\bullet $|13HSP90B1*|$\bullet $|
4VIM*|$\bullet $|14LTBP3*|$\bullet $|
5LINC01089|$\circ $|15COL8A1*|$\bullet $|
6COL6A2*|$\bullet $|16KDELR1|$\circ $|
7DCBLD2*|$\bullet $|17KRT7*|$\bullet $|
8FBLN1|$\circ $|18MGAT4B*|$\bullet $|
9TAGLN2*|$\bullet $|19HMGA2*|$\bullet $|
10CTTN*|$\bullet $|20PYGB*|$\bullet $|
RankCircRNAVertifiedRankCircRNAVertified
1ANXA2*|$\bullet $|11COL7A1*|$\bullet $|
2RIM1*|$\bullet $|12PKM*|$\bullet $|
3CALD1*|$\bullet $|13HSP90B1*|$\bullet $|
4VIM*|$\bullet $|14LTBP3*|$\bullet $|
5LINC01089|$\circ $|15COL8A1*|$\bullet $|
6COL6A2*|$\bullet $|16KDELR1|$\circ $|
7DCBLD2*|$\bullet $|17KRT7*|$\bullet $|
8FBLN1|$\circ $|18MGAT4B*|$\bullet $|
9TAGLN2*|$\bullet $|19HMGA2*|$\bullet $|
10CTTN*|$\bullet $|20PYGB*|$\bullet $|

|$\bullet $|’ indicates verified associations from CTRP, ‘|$\circ $|’ indicates non-significant associations. circRNAs marked with ‘*’ are verified.

Regarding Piperlongumine [37, 38], an alkaloid with recognized anti-tumor properties, the top 20 predicted circRNAs were also examined. Among these predicted results, 14 circRNAs have been corroborated by biological experiments in the CTRP database (see Table 5).

Table 5

The top 20 circRNAs associated with piperlongumine

RankCircRNAVertifiedRankCircRNAVertified
1COL3A1|$\circ $|11LINC01089*|$\bullet $|
2BPTF|$\circ $|12KRT7*|$\bullet $|
3EFEMP1*|$\bullet $|13COL6A1*|$\bullet $|
4FBLN1*|$\bullet $|14CALR|$\circ $|
5POLR2A*|$\bullet $|15PTMS*|$\bullet $|
6ASPH*|$\bullet $|16FLOT1*|$\bullet $|
7SERPINH1*|$\bullet $|17CSRP1|$\circ $|
8LTBP3*|$\bullet $|18MCAM|$\circ $|
9EFEMP2*|$\bullet $|19PLCB3|$\circ $|
10FBN1*|$\bullet $|20WASF1*|$\bullet $|
RankCircRNAVertifiedRankCircRNAVertified
1COL3A1|$\circ $|11LINC01089*|$\bullet $|
2BPTF|$\circ $|12KRT7*|$\bullet $|
3EFEMP1*|$\bullet $|13COL6A1*|$\bullet $|
4FBLN1*|$\bullet $|14CALR|$\circ $|
5POLR2A*|$\bullet $|15PTMS*|$\bullet $|
6ASPH*|$\bullet $|16FLOT1*|$\bullet $|
7SERPINH1*|$\bullet $|17CSRP1|$\circ $|
8LTBP3*|$\bullet $|18MCAM|$\circ $|
9EFEMP2*|$\bullet $|19PLCB3|$\circ $|
10FBN1*|$\bullet $|20WASF1*|$\bullet $|

circRNAs marked with ‘*’ are verified.

Table 5

The top 20 circRNAs associated with piperlongumine

RankCircRNAVertifiedRankCircRNAVertified
1COL3A1|$\circ $|11LINC01089*|$\bullet $|
2BPTF|$\circ $|12KRT7*|$\bullet $|
3EFEMP1*|$\bullet $|13COL6A1*|$\bullet $|
4FBLN1*|$\bullet $|14CALR|$\circ $|
5POLR2A*|$\bullet $|15PTMS*|$\bullet $|
6ASPH*|$\bullet $|16FLOT1*|$\bullet $|
7SERPINH1*|$\bullet $|17CSRP1|$\circ $|
8LTBP3*|$\bullet $|18MCAM|$\circ $|
9EFEMP2*|$\bullet $|19PLCB3|$\circ $|
10FBN1*|$\bullet $|20WASF1*|$\bullet $|
RankCircRNAVertifiedRankCircRNAVertified
1COL3A1|$\circ $|11LINC01089*|$\bullet $|
2BPTF|$\circ $|12KRT7*|$\bullet $|
3EFEMP1*|$\bullet $|13COL6A1*|$\bullet $|
4FBLN1*|$\bullet $|14CALR|$\circ $|
5POLR2A*|$\bullet $|15PTMS*|$\bullet $|
6ASPH*|$\bullet $|16FLOT1*|$\bullet $|
7SERPINH1*|$\bullet $|17CSRP1|$\circ $|
8LTBP3*|$\bullet $|18MCAM|$\circ $|
9EFEMP2*|$\bullet $|19PLCB3|$\circ $|
10FBN1*|$\bullet $|20WASF1*|$\bullet $|

circRNAs marked with ‘*’ are verified.

Lastly, for Vorinostat [39, 40], an HDAC inhibitor with anti-proliferative effects on various cancer cells, the top 20 predicted circRNAs were assessed. Among these predicted results, 15 circRNAs have been substantiated by experimental evidence in the CTRP database (see Table 6).

Table 6

The top 20 circRNAs associated with vorinostat

RankCircRNAVertifiedRankCircRNAVertified
1JUP|$\circ $|11TFAP2A|$\circ $|
2ANP32B|$\bullet $|12CXCL1|$\bullet $|
3NOP53-13KDELR2|$\bullet $|
4PYGB|$\bullet $|14HNRNPA2B1|$\circ $|
5TAGLN2|$\bullet $|15KDELR1|$\bullet $|
6ARID1A|$\bullet $|16SMC1A|$\bullet $|
7HSP90B1|$\circ $|17CTSB|$\bullet $|
8PLOD1|$\bullet $|18CALD1|$\bullet $|
9FLNB|$\bullet $|19PRRC2A|$\bullet $|
10LGALS3BP|$\bullet $|20FN1|$\bullet $|
RankCircRNAVertifiedRankCircRNAVertified
1JUP|$\circ $|11TFAP2A|$\circ $|
2ANP32B|$\bullet $|12CXCL1|$\bullet $|
3NOP53-13KDELR2|$\bullet $|
4PYGB|$\bullet $|14HNRNPA2B1|$\circ $|
5TAGLN2|$\bullet $|15KDELR1|$\bullet $|
6ARID1A|$\bullet $|16SMC1A|$\bullet $|
7HSP90B1|$\circ $|17CTSB|$\bullet $|
8PLOD1|$\bullet $|18CALD1|$\bullet $|
9FLNB|$\bullet $|19PRRC2A|$\bullet $|
10LGALS3BP|$\bullet $|20FN1|$\bullet $|

-’ indicates unconfirmed associations.

Table 6

The top 20 circRNAs associated with vorinostat

RankCircRNAVertifiedRankCircRNAVertified
1JUP|$\circ $|11TFAP2A|$\circ $|
2ANP32B|$\bullet $|12CXCL1|$\bullet $|
3NOP53-13KDELR2|$\bullet $|
4PYGB|$\bullet $|14HNRNPA2B1|$\circ $|
5TAGLN2|$\bullet $|15KDELR1|$\bullet $|
6ARID1A|$\bullet $|16SMC1A|$\bullet $|
7HSP90B1|$\circ $|17CTSB|$\bullet $|
8PLOD1|$\bullet $|18CALD1|$\bullet $|
9FLNB|$\bullet $|19PRRC2A|$\bullet $|
10LGALS3BP|$\bullet $|20FN1|$\bullet $|
RankCircRNAVertifiedRankCircRNAVertified
1JUP|$\circ $|11TFAP2A|$\circ $|
2ANP32B|$\bullet $|12CXCL1|$\bullet $|
3NOP53-13KDELR2|$\bullet $|
4PYGB|$\bullet $|14HNRNPA2B1|$\circ $|
5TAGLN2|$\bullet $|15KDELR1|$\bullet $|
6ARID1A|$\bullet $|16SMC1A|$\bullet $|
7HSP90B1|$\circ $|17CTSB|$\bullet $|
8PLOD1|$\bullet $|18CALD1|$\bullet $|
9FLNB|$\bullet $|19PRRC2A|$\bullet $|
10LGALS3BP|$\bullet $|20FN1|$\bullet $|

-’ indicates unconfirmed associations.

These case studies underscore the strong predictive ability of DeepHeteroCDA in identifying circRNA–drug associations, thereby emphasize its potential as a valuable tool in the realm of circRNA–drug sensitivity prediction. To improve the interpretability of our model, we further conducted SHAP (SHapley Additive exPlanations) [41] analysis on all circRNA–drug samples in our dataset to enhance the interpretability of our model. The results can be found in Supplementary Figs S2 and S3.

Conclusions and discussion

In this work, we present a novel deep learning method namely DeepHeteroCDA for circRNA–drug sensitivity associations prediction. We initially constructed a heterogeneous network using drug–drug and circRNA–circRNA similarity along with known circRNA–drug sensitivity associations, and utilized GAT to dynamically model the intricate relationships among diverse nodes. By further representing drugs as 2D structures and utilizing GCN to mine structural features of drugs during the process of information extraction with GAT, we have achieved multi-scale information propagation in heterogeneous graph networks. Finally, we use a dense neural network to predict circRNA–drug sensitivity associations. Extensive experiments, including five-fold and 10-fold cross-validation, demonstrate that DeepHeteroCDA outperforms existing methods. Experiments under different data conditions further validate the robustness of DeepHeteroCDA. In case studies, the top-ranked circRNA predicted by DeepHeteroCDA show strong alignment with experimental evidence, further supporting its reliability. These results underscore DeepHeteroCDA’s potential as a powerful tool for advancing circRNA-related research and enhancing drug response predictions.

While the current study primarily focuses on predictive modeling, we recognize the importance of validating these predictions through experimental biology as part of our further work. We have outlined a workflow for integrating the model’s predictions with experimental biology. Specifically, our future research will focus on the following steps to bridge the gap between predictive modeling and experimental validation. First, we can use our model to rank significant circRNA–drug sensitivity instances to identify potential association. These predictions can then guide functional enrichment and pathway analyses to uncover potential mechanisms. Collaborating with experimental biologists, we can design targeted validation experiments based on our model’s prediction, including in vitro assays and in vivo studies. The experimental results can be used to iteratively refine our model, enhancing its accuracy and biological relevance. Finally, we can explore clinical applications, such as biomarkers for personalized medicine, leveraging the validated predictions from our model.

Key Points
  • DeepHeteroCDA is a novel method that predict circRNA–drug sensitivity associations using multi-scale heterogeneous network and graph attention mechanism.

  • We constructed a multi-scale heterogeneous network that can adaptively update the fine-grained features of drug molecules while mining the features of circRNAs and drugs in the heterogeneous network, enabling the model to better explore the complex interactions between circRNAs and drugs.

  • The results of the comparative experiments highlight the superior predictive performance of DeepHeteroCDA. The case study underscore the strong predictive ability of DeepHeteroCDA in identifying potential circRNA–drug associations.

  • DeepHeteroCDA is a useful bioinformatics tool that can assist biological experiments by filtering valid circRNA–drug sensitivity associations in advance.

Conflict of interest: None declared.

Funding

This research was supported by the National Natural Science Foundation of China (Grant Nos U23A20321 and 62272490) and the Natural Science Foundation of Hunan Province of China (Grant No. 2025JJ20062).

Data availability

The source code and data sets are available at https://github.com/Hhhzj-7/DeepHeteroCDA.

References

1.

Chen
 
L-L
,
Yang
 
L
.
Regulation of circrna biogenesis
.
RNA Biol
 
2015
;
12
:
381
8
.

2.

Suzuki
 
H
,
Tsukahara
 
T
.
A view of pre-mrna splicing from rnase r resistant rnas
.
Int J Mol Sci
 
2014
;
15
:
9331
42
.

3.

Min
 
S
,
Xiao
 
Y
,
Ma
 
J
. et al. .  
Circular rnas in cancer: emerging functions in hallmarks, stemness, resistance and roles as potential biomarkers
.
Mol Cancer
 
2019
;
18
:
1
17
.

4.

Bartel
 
DP
.
Micrornas: genomics, biogenesis, mechanism, and function
.
Cell
 
2004
;
116
:
281
97
.

5.

Hanahan
 
D
,
Weinberg
 
RA
.
The hallmarks of cancer
.
Cell
 
2000
;
100
:
57
70
.

6.

Li
 
X
,
Yang
 
L
,
Chen
 
L-L
.
The biogenesis, functions, and challenges of circular rnas
.
Mol Cell
 
2018
;
71
:
428
42
.

7.

Liu
 
Y
,
Dong
 
Y
,
Zhao
 
L
. et al. .  
Circular rna-mto1 suppresses breast cancer cell viability and reverses monastrol resistance through regulating the traf4/eg5 axis
.
Int J Oncol
 
2018
;
53
:
1752
62
.

8.

Wang
 
X
,
Wang
 
H
,
Jiang
 
H
. et al. .  
Circular rnacirc_0076305 promotes cisplatin (ddp) resistance of non-small cell lung cancer cells by regulating abcc1 through mir-186-5p
.
Cancer Biother Radiopharm
 
2023
;
38
:
293
304
.

9.

Ding
 
C
,
Yi
 
X
,
Chen
 
X
. et al. .  
Warburg effect-promoted exosomal circ_0072083 releasing up-regulates nango expression through multiple pathways and enhances temozolomide resistance in glioma
.
J Exp Clin Cancer Res
 
2021
;
40
:
164
.

10.

Xie
 
F
,
Xiao
 
X
,
Tao
 
D
. et al. .  
circnr3c1 suppresses bladder cancer progression through acting as an endogenous blocker of brd4/c-myc complex
.
Mol Ther-Nucleic Acids
 
2020
;
22
:
510
9
.

11.

Huang
 
Z-A
,
Pengwei
 
H
,
Lun
 
H
. et al. .  
Toward multilabel classification for multiple disease prediction using gut microbiota profiles
.
IEEE Trans Neural Netw Learn Syst
 
2024
;
1
14
.

12.

Wang
 
L
,
Wong
 
L
,
You
 
Z-H
. et al. .  
Amdecda: attention mechanism combined with data ensemble strategy for predicting circrna-disease association
.
IEEE Trans Big Data
 
2023
;
10
:
320
9
.

13.

Li
 
Q
,
You
 
T
,
Chen
 
J
. et al. .  
Biodyngrap: biomedical event prediction via interpretable learning framework for heterogeneous dynamic graphs
.
Expert Syst Appl
 
2024
;
244
:122964.

14.

Wei
 
M
,
Wang
 
L
,
Li
 
Y
. et al. .  
Biokg-cmi: a multi-source feature fusion model based on biological knowledge graph for predicting circrna-mirna interactions
.
Sci Chin Inform Sci
 
2024
;
67
:189104.

15.

Chang-Qing
 
Y
,
Wang
 
X-F
,
Li
 
L-P
. et al. .  
Rbne-cmi: an efficient method for predicting circrna-mirna interactions via multiattribute incomplete heterogeneous network embedding
.
J Chem Inf Model
 
2024
;
64
:
7163
72
.

16.

Deng
 
L
,
Liu
 
Z
,
Qian
 
Y
. et al. .  
Predicting circrna-drug sensitivity associations via graph attention auto-encoder
.
BMC Bioinform
 
2022
;
23
:
1
15
.

17.

Yang
 
B
,
Chen
 
H
.
Predicting circrna-drug sensitivity associations by learning multimodal networks using graph auto-encoders and attention mechanism
.
Brief Bioinform
 
2023
;
24
.

18.

Luo
 
Y
,
Deng
 
L
.
Dpmgcda: Deciphering circrna–drug sensitivity associations with dual perspective learning and path-masked graph autoencoder
.
J Chem Inf Model
 
2024
;
64
:
4359
72
.

19.

Veličković
 
P
,
Cucurull
 
G
,
Casanova
 
A
. et al. .  
Graph attention networks.
 
arXiv preprint arXiv:1710.10903
.
2017
.

20.

Kipf
 
TN
,
Welling
 
M
.
Semi-supervised classification with graph convolutional networks.
 
arXiv preprint, arXiv:1609.02907
.
2016
.

21.

Hang Ruan
 
Y
,
Xiang
 
JK
,
Li
 
S
. et al. .  
Comprehensive characterization of circular rnas in 1000 human cancer cell lines
.
Genome Med
 
2019
;
11
:
1
14
.

22.

Weininger
 
D
.
Smiles, a chemical language and information system. 1. Introduction to methodology and encoding rules
.
J Chem Inf Comput Sci
 
1988
;
28
:
31
6
.

23.

Knox
 
C
,
Wilson
 
M
,
Klinger
 
CM
. et al. .  
Drugbank 6.0: the drugbank knowledgebase for 2024
.
Nucleic Acids Res
 
2024
;
52
:
D1265
75
.

24.

Li
 
J
,
Zhang
 
S
,
Liu
 
T
. et al. .  
Neural inductive matrix completion with graph convolutional networks for mirna-disease association prediction
.
Bioinformatics
 
2020
;
36
:
2538
46
.

25.

Zhong
 
T
,
Li
 
Z
,
You
 
Z-H
. et al. .  
Predicting mirna–disease associations based on graph random propagation network and attention network
.
Brief Bioinform
 
2022
;
23
:
bbab589
.

26.

Xiao
 
Q
,
Haiming
 
Y
,
Zhong
 
J
. et al. .  
An in-silico method with graph-based multi-label learning for large-scale prediction of circrna-disease associations
.
Genomics
 
2020
;
112
:
3407
15
.

27.

Zhenqin
 
W
,
Ramsundar
 
B
,
Feinberg
 
EN
. et al. .  
Moleculenet: a benchmark for molecular machine learning
.
Chem Sci
 
2018
;
9
:
513
30
.

28.

Nair
 
V
,
Hinton
 
GE
.
Rectified linear units improve restricted Boltzmann machines
. In:
Proceedings of the 27th International Conference on Machine Learning (ICML-10)
, pp.
807
14
. Madison, WI, USA: Omnipress,
2010
.

29.

Cortes
 
C
,
Vapnik
 
V
.
Support-vector networks
.
Mach Learn
 
1995
;
20
:
273
97
.

30.

Breiman
 
L
.
Random forests
.
Mach Learn
 
2001
;
45
:
5
32
.

31.

Aha
 
DW
,
Kibler
 
D
,
Albert
 
MK
.
Instance-based learning algorithms
.
Mach Learn
 
1991
;
6
:
37
66
.

32.

Chen
 
T
,
Guestrin
 
C
.
Xgboost: a scalable tree boosting system
. In:
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, pp.
785
94
,
2016
.

33.

Freund
 
Y
,
Schapire
 
RE
.
A decision-theoretic generalization of on-line learning and an application to boosting
.
J Comput Syst Sci
 
1997
;
55
:
119
39
.

34.

Van der Maaten
 
L
,
Hinton
 
G
.
Visualizing data using t-SNE
.
J Mach Learn Res
 
2008
;
9
.

35.

Rees
 
MG
,
Seashore-Ludlow
 
B
,
Cheah
 
JH
. et al. .  
Correlating chemical sensitivity and basal gene expression reveals mechanism of action
.
Nat Chem Biol
 
2016
;
12
:
109
16
.

36.

Aversa
 
C
,
Leone
 
F
,
Zucchini
 
G
. et al. .  
Linifanib: current status and future potential in cancer therapy
.
Expert Rev Anticancer Ther
 
2015
;
15
:
677
87
.

37.

Bezerra
 
DP
,
Pessoa
 
C
,
de Moraes
 
MO
. et al. .  
Overview of the therapeutic potential of piplartine (piperlongumine)
.
Eur J Pharm Sci
 
2013
;
48
:
453
63
.

38.

Kashyap
 
VK
,
Darkwah
 
GP
,
Dhasmana
 
S
. et al. .  
Novel nanoformulation of piperlongumine for pancreatic cancer therapy
.
Cancer Res
 
2022
;
82
.

39.

Grant
 
S
,
Easley
 
C
,
Kirkpatrick
 
P
.
Vorinostat
.
Nat Rev Drug Discov
 
2007
;
6
:
21
2
.

40.

Banerjee
 
NS
,
Moore
 
DW
,
Broker
 
TR
. et al. .  
Vorinostat, a pan-hdac inhibitor, abrogates productive hpv-18 dna amplification
.
Proc Natl Acad Sci
 
2018
;
115
:
E11138
47
.

41.

Lundberg
 
S
.
A unified approach to interpreting model predictions.
 
arXiv preprint, arXiv:1705.07874
.
2017
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.