FGCNSurv: dually fused graph convolutional network for multi-omics survival prediction

Abstract Motivation Survival analysis is an important tool for modeling time-to-event data, e.g. to predict the survival time of patient after a cancer diagnosis or a certain treatment. While deep neural networks work well in standard prediction tasks, it is still unclear how to best utilize these deep models in survival analysis due to the difficulty of modeling right censored data, especially for multi-omics data. Although existing methods have shown the advantage of multi-omics integration in survival prediction, it remains challenging to extract complementary information from different omics and improve the prediction accuracy. Results In this work, we propose a novel multi-omics deep survival prediction approach by dually fused graph convolutional network (GCN) named FGCNSurv. Our FGCNSurv is a complete generative model from multi-omics data to survival outcome of patients, including feature fusion by a factorized bilinear model, graph fusion of multiple graphs, higher-level feature extraction by GCN and survival prediction by a Cox proportional hazard model. The factorized bilinear model enables to capture cross-omics features and quantify complex relations from multi-omics data. By fusing single-omics features and the cross-omics features, and simultaneously fusing multiple graphs from different omics, GCN with the generated dually fused graph could capture higher-level features for computing the survival loss in the Cox-PH model. Comprehensive experimental results on real-world datasets with gene expression and microRNA expression data show that the proposed FGCNSurv method outperforms existing survival prediction methods, and imply its ability to extract complementary information for survival prediction from multi-omics data. Availability and implementation The codes are freely available at https://github.com/LiminLi-xjtu/FGCNSurv.


Introduction
Survival analysis for cancer patients, which aims to model the relationship between multiple explanatory variables and survival outcome, has been an interesting and challenging problem in cancer research. With advances in high-throughput sequencing technologies, a great amount of high-dimensional genomic data is available for survival prediction and treatment improvement, and thus it has been promising to understand disease prognosis with these omics data with high dimensionality. Although the prediction of survival outcome resembles logistic regression in machine learning, the main difficulty is that the time-to-event data are often highly skewed and censored (Dey et al. 2022), where the former means a few patients may survive much longer than the median, and the latter means the failure to fully observe the time-to-event value for all patients due to the termination of the study, or missing patients during the study.
Survival analysis methods can be roughly divided into two categories: parametric models (Klein andMoeschberger 2003, Lemeshow et al. 2011) and Cox-based semi-parametric models (Cox 1972, Bradburn et al. 2003. The parametric models assume that the survival time of cancer patient obeys a specific distribution. The Cox proportional hazard (Cox-PH) model makes parametric assumptions about how predictors affect the hazard function, but has no restrictions on the baseline hazard function. Compared with parametric models, the Cox-PH model cannot estimate the hazard rate at each time point but can analyse the effect of covariates on survival (Kleinbaum and Klein 2012). Because the true hazard function is too complex to model in real world scenarios, the Cox-PH model becomes the most popular method. Furthermore, various extensions to Cox-PH model such as regularized Cox models (Park and Hastie 2007, Goeman 2010, Yang and Zou 2013 and deep neural networks supervised by Cox partial likelihood loss (Luck et al. 2017, Katzman et al. 2018, Kvamme et al. 2019) have been successfully proposed and showed promising performance.
Deep learning approaches have been widely used in survival analysis on a single type of omics data such as microRNA expression, gene expression, DNA methylation for survival prediction, basically by feature-based methods and graph-based methods. For example, Yousefi et al. (2017) proposed to use stacked denoising autoencoders to learn a low dimension feature representation of gene expression data for survival analysis. Ching et al. (2018) developed a Cox-nnet algorithm to predict prognosis of cancer patients. Cox-nnet uses hidden node features for dimension reduction and reveals much richer biological information at the gene levels by evaluating the relative importance of specific genes to the prognosis results. Since deep-learning methods based on limited high-dimensional gene expression data are prone to suffer from overfitting issues, Kim et al. (2020) put forward a VAECox algorithm, which pretrained a variational autoencoder on all gene expression data from TCGA and transferred the trained weights to survival prediction model of target cancer to reduce the possibility of overfitting. Recently, graph convolutional network (GCN), which exploits both node attributes and edges information in the form of a graph, has achieved promising performance in many areas including survival prediction. For example, Ling et al. (2022) proposed a survival model namely AGGSurv based on GCNs with geometric graphs directly constructed from high-dimensional RNA sequencing features. AGGSurv used random subsets of features to construct diverse sparse graphs and trained a Ridge-Cox model to learn how to best aggregate the predictions from the GCNs with the sparse graphs.
Multi-omics learning methods are considered to be more promising than single-omics methods since the integration of information from different sources are helpful to understand inter-patient heterogeneity and improve the performance of survival prediction (Chaudhary et al. 2018). According to the fusion strategies, the multi-omics methods can be categorized to feature-based fusion methods, which fuse the features from multiple omics, and graph-based fusion methods, which fuse the connections among samples from multiple omics. Feature fusion strategies enhance predictive performance of survival models by exploring intrinsic relationship of the features across different omics. For example, Cheerla and Gevaert (2019) developed an unsupervised encoder to compress patient multi-omics data into an informative representation for prognosis. Wang et al. (2021a) proposed GraphSurv method which introduced the KEGG pathway association relationship between genes in the form of a graph and used GCN to compress multi-omics data including mRNA, CNV and methylation to new embeddings for survival prediction. The GPDBN method (Wang et al. 2021b) and HFBSurv method (Li et al. 2022) integrated low-dimensional single-omics features by utilizing a bilinear feature encoding module to fully exploit intrinsic inter-modality and intra-modality relationships, and thus could improve the prediction performance. Besides feature fusion, graph-based fusion method GCGCN (Wang et al. 2020) improved the predictive performance of survival models by using SNF algorithm (Wang et al. 2014) to fuse multiple graphs for capturing shared and complementary information among multi-omics data. Although the advantages of multi-omics methods have been shown for survival prediction, it has large space to improve the prediction accuracy. For example, existing feature-based fusion methods may ignore the connections among different samples, while graphbased fusion methods are prone to lose the interactive information between features from multiple omics. It is still challenging to explore how to best utilize deep learning models for multi-omics survival prediction.
In this work, to take full advantages of the complementary information between different omics, we propose a novel multi-omics method for survival prediction namely FGCNSurv, by dually fusing features and graphs simultaneously in GCN. The generative model FGCNSurv learns a factorized bilinear model (FBM) to capture cross-omics features and quantify complex relations from multi-omics data. By fusing single-omics features and their cross-omics features (feature fusion), and simultaneously fusing multiple graphs from different omics (graph fusion), GCN with the generated dually fused graph could capture higher-level features for computing the survival loss in the Cox-PH model. To verify the effectiveness of FGCNSurv approach, we conducted comprehensive experiments on 11 cancer datasets from The Cancer Genome Atlas (TCGA) and the Pan-Cancer Analysis of Whole Genomes (PCAWG) dataset, and the results demonstrate that FGCNSurv shows better performance than other comparison methods for survival prediction.

Materials and methods
In this section, we first introduce and describe the problem of survival analysis, and then propose the dually fused GCN for multi-omics survival prediction.

Survival analysis
Survival data are usually given as fx i ; O i ; D i g n i¼1 , where x i , O i , and D i represent the observed features, the observed time-toevent and the censoring indicator for ith patient. Censoring occurs when a patient is lost or the maximum follow-up time is shorter than the survival time. For the ith patient, the observed time-to-event O i ¼ minðT i ; C i Þ is either the true survival time T i for uncensored patients (D i ¼ 1), or the censoring time C i , for censored patient (D i ¼ 0).
The primary objective of survival analysis is to estimate the distribution of survival time T after diagnosis or treatment of a patient. The probability distribution of the survival time T can be characterized by a survival function or a hazard function equivalently. The survival function S(t) represents the probability of an individual surviving beyond time t: where f(s) represents the probability density function. The hazard function, denoted by kðtÞ, is defined as the rate that an event occurs in an infinitesimal interval after time t, given it has not yet occurred at time t: The full likelihood accounting for both censored and uncensored instances is The Cox model is widely used in the survival analysis (Cox 1972) and assumes hazard function has a multiplicative form As the baseline hazard function k 0 ðÁÞ can be an arbitrary nonnegative function of time, one cannot estimate k 0 ðtÞ and b directly from the likelihood function. To deal with this estimation problem, the partial likelihood L cox (Cox 1972) that does not involve the baseline hazard function is proposed to fit the model, In practice, the negative partial log likelihood is often taken as loss function to estimate b: With the estimated b, the hazard function kðtÞ can be further estimated through the Breslow estimator (Darroch 1984).

Dually fused GCN for multi-omics survival prediction (FGCNSurv)
Multi-omics survival prediction aims to learn a survival prediction model based on multi-omics data for n patients and their corresponding observed times and right censoring indicators. In this work, each patient has both gene expression and microRNA expression representations, which are collected as X ð1Þ ¼ ½x n and X ð2Þ ¼ ½x n , respectively, and thus the survival data are collected in set Our FGCNSurv takes a feature fusion strategy based on the FBM, which can capture cross-omics features and quantify complex relations from multi-omics data, and simultaneously fuses multiple graphs from different omics to accurately unveil the neighborhood relationship among patients. Feature fusion tends to aggregate complementary omics information for each patient, while graph fusion tends to share and propagate the complementary neighborhood information among patients from different omics. With the dually fused attributed graph, the GCN could further capture higher-level features for patients to better interchange the complementary information between the fused graph and features for survival prediction. The survival outcome is finally predicted from the node embeddings by GCN. The architecture of our proposed FGCNSurv is presented in Fig. 1. We describe the details for FGCNSurv in the following.

Feature encoding by highway networks
For preprocessed gene expression data X ð1Þ and microRNA expression data X ð2Þ , we learn two three-layer highway networks (Srivastava et al. 2015) to capture features, respectively. The highway network makes use of a LSTM-style sigmoidal gate to control gradient flow between fully connected (FC) layers, as shown below: where H represents an FC layer and T represents a transformation gate with sigmoid activation function. The highway network is commonly used in deep neural networks and simple tasks, to speed up information propagation by skipping layers that do not work well. Accordingly, we could obtain low-dimensional feature representations Z ð1Þ ; Z ð2Þ 2 R nÂl for all patients with gene expression data and microRNA expression data, respectively.

Feature fusion by FBM
Multi-omics fusion for mining complementary information has been successfully introduced in cancer prognosis. Direct feature combination (Huang et al. 2019) or score level fusion (Sahasrabudhe et al. 2021) to integrate data from different omics may not be able to capture the complex inter-omics relations. To address this issue, we employ an FBM to capture cross-omics feature z ðcÞ for each patient from its corresponding single-omics feature z ð1Þ and z ð2Þ . Through the FBM, for any patient, the jth element in its cross-omics representation z ðcÞ is defined as follows: Figure 1. Illustration of the proposed FGCNSurv architecture. FGCNSurv first captures single-omics features by learnable highway networks, and further construct cross-omics features by a learnable FBM model. By fusing single-omics features and cross-omics features, and simultaneously fusing two single-omics graphs, a GCN model is used to further capture high-level features, which are then used to predict survival outcome by Cox-PH model where U j and V j are two factorized matrices with d columns. To obtain the cross-omics vector Z c 2 R nÂm for all patients, the FBM needs to learn two three-order tensors U ¼ ½U 1 ; . . . ; U m 2 R lÂdÂm and V ¼ ½V 1 ; . . . ; V m 2 R lÂdÂm . The fused feature Z for all patients is finally obtained by concatenating the low level fusion feature Z low ¼ Z ð1Þ þ Z ð2Þ , which is the simple sum of omics-specific features, and the cross-omics feature Z ðcÞ , as defined below: where represents the concatenation of two matrices.

Graph fusion
Although feature fusion could capture the comprehensive information from different omics for each single patient, it ignores the relationship among different patients. Our FGCNSurv further fuses multiple graphs constructed from single-omics data, and generates a fused graph which can accurately unveil the neighborhood relationship among patients. Specifically, given data representations fx i g n i¼1 for n patients, we denote qðx i ; x j Þ as the Euclidean distance between x i and x j and use the exponential similarity kernel to construct a k-nn graph with an adjacency matrix A 2 R nÂn as follows: where N i represents k-nearest neighborhood for patient i, d 2 is used to eliminate the scaling problem, which is experimentally set to be the median of all pairwise squared distances among all patients and l is a hyperparameter that can be set to 0.3 for gene-omics graph and 0.2 for microRNA-omics graph.
By the above k-nn graph construction strategy, based on gene expression data X ð1Þ and microRNA expression data X ð2Þ for all patients, we could construct two single-omics graphs A ð1Þ and A ð2Þ , respectively. Then we fuse the two graphs by taking the average of the two adjacent matrices A ð1Þ and A ð2Þ , to obtain a fused graph with adjacent matrix A for survival analysis.

Dually fused GCN for survival prediction
Graph convolutional neural network (GCN) could aggregate and exchange information between neighboring nodes (samples) to obtain embedding for nodes of graph, and thus has been applied successfully in many classification tasks (Kipf andWelling 2016, Veli ckovi c et al. 2017, Gao and Ji 2019). To capture higher-level features for survival prediction, we could further combine the multi-omics fused feature representations Z and the fused graph A by GCN. Given graph A, the convolution matrixÂ in GCN is given byD À 1 2ÃD À 1 2 , wherẽ A ¼ A þ I n and D~¼ diagðd~1; d~2; . . . ; d~nÞ is the degree matrix calculated fromÃ withd i ¼ P jÃ ij . With node representation matrix Z ¼ ½z 1 ; z 2 ; . . . ; z n for n patients and convolution matrixÂ, we could adapt the GCN to learn the higher-level features Z h as shown below: where W 1 ; W 2 are the weight matrices for the graph convolutional layers and f 1 , f 2 are tanh and sigmoid activation functions, respectively. Based on the features z h i , the Cox model kðtjz h i Þ ¼ k 0 ðtÞexpðb T z h i Þ is used to perform survival prediction with parameter b. The loss function of the FGCNSurv model is defined as the negative of the partial loglikelihood: and all the parameters are learnt by minimizing the loss function.
To learn a FGCNSurv model based on multi-omics survival data, we need to learn four modules including highway networks for capturing single-omics features, FBM for further capturing cross-omics features, GCNs for capturing high-level features based on the constructed dually fused graph, and finally Cox-PH model for predicting the survival outcome.

Results
In this section, we evaluated the survival prediction performance of our FGCNSurv method by comparing it with other methods on 11 experiments from two datasets, using different evaluations metrics.

Datasets and preprocessing
The comparison of survival analysis performance was conducted on 11 different cancer datasets from The Cancer Genome Atlas (TCGA) and the Pan-Cancer Analysis of Whole Genomes (PCAWG) dataset as follows: • TCGA dataset (Zhu et al. 2014) TCGA dataset includes multi-omics data for over 11 000 samples from 33 most frequent cancer types and has been widely used in survival prediction methods (Sun et al. 2018a, Shao et al. 2020, Tong et al. 2020. We selected top 10 cancer types with respect to the number of uncensored patient, except LUSC and STAD due to the poor performance of single-omics methods on both gene expression and microRNA expression. The 10 selected cancer types from TCGA are listed as follows: breast invasive carcinoma (BRCA), kidney renal clear cell carcinoma (KIRC), lung adenocarcinoma (LUAD), urothelial bladder carcinoma (BLCA), head and neck squamous cell carcinoma (HNSC), brain lower grade glioma (LGG), liver hepatocellular carcinoma (LIHC), ovarian serous cystadenocarcinoma (OV), colon adenocarcinoma (COAD), and skin cutaneous melanoma (SKCM). We downloaded RNA-Seq data for gene expression and miRNA-Seq data for microRNA expression to evaluate the multi-omics survival prediction methods. • PCAWG dataset (Campbell et al. 2018), the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium aggregated whole-genome sequencing data from 2658 cancer patients across 38 tumor types generated by the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) projects. We also downloaded RNA-Seq data for gene expression and miRNA-Seq data for microRNA expression to evaluate the multi-omics survival prediction methods.
The summary of the 11 datasets were listed in Table 1. We preprocessed each dataset with gene expression and microRNA expression data using the following procedure. First, similar to prior work (Dhillon and Singh 2020), we removed the genes/microRNAs with missing values over 10%, estimated the missing values by weighted nearest neighbors algorithm and added a pseudocount 1 to all features, followed by a log transformation. Second, we filtered out the low information burden genes/microRNAs whose values remain almost unchanged across samples to reduce the number of noise-sensitive features, by which the top 6000 genes and 600 microRNA features are chosen, respectively. Finally, we standardized the log-transformed data such that each feature has zero mean and unit variance, and then discretized them to multiple categories.

Evaluation metrics
In this study, we evaluated the performance of FGCNSurv in survival prediction with two metrics: C-index and AUC. The C-index is the most commonly used evaluation metric to measure the predictive ability of the model in survival analysis, which estimates the probability that the predicted results are consistent with the observed ground truth results. For any pair of patients, if the predicted survival time of the one with longer survival time is also longer than the predicted survival time of the other, the predicted outcome is consistent with the actual outcome. A pair is not considered if the earlier time in the pair is censored or both events in the pair are censored. The C-index is defined by: Another evaluation metric, AUC, quantifies the quality of the ranking at event-time level and is defined as follows: where Y and num refer to the set of all possible event time duration in the dataset and the cumulative number of comparable pairs computed across all event time duration, respectively, and IðÁÞ is the indicator function. Both the values of C-index and AUC range from 0 to 1, and a higher value of C-index or AUC indicates better model prediction performance and vice versa.

Evaluation of survival prediction performance
In our experiments, we compared our FGCNSurv method with existing traditional and deep learning-based survival prediction methods on 11 cancer datasets obtained from TCGA and PCAWG dataset. The comparison methods consist of five single-omics methods including RSF ( To comprehensively evaluate our survival prediction method, we adopt repeated holdout cross-validation following Ching et al. (2018). In particular, the dataset was randomly partitioned into 80% training set and 20% testing set. We trained the prediction models by different methods on training set, and then calculated the C-index and AUC by survival prediction for the test patients. We repeated the process for 20 times and reported the mean value and standard deviation of the C-indices and AUCs for each method. For our method FGCNSurv, the dimensions l and m, i.e. the dimensionality of the feature vector Z low of low-level fusion and the cross-omics features Z ðcÞ , were set to 50 and 10, respectively. The neighborhood size k for graph construction was selected by cross-validation on the training data from the set f2; . . . ; 15g. We trained the model with the stochastic gradient descent algorithm Adam optimizer with learning rate 2eÀ4.
We first reported the C-index and AUC values for the breast cancer dataset BRCA from TCGA by different prediction methods in Table 2. Deep learning-based approaches could generally obtain better performance than traditional methods, by only using single-omics data. For example, the deep single-omics method DeepSurv could obtained much higher C-index value than the traditional single-omics method En-Cox, with significant improvement on gene expression and microRNA expression data, respectively. Another deep single-omics method AGGSurv tends to achieve more competitive performance than DeepSurv for either omics, probably because the graph embedding used in AGGSurv could exploit the connections among the samples. More importantly, the multi-omics prediction methods including GCGCN, HFBSurv and our FGCNSurv perform better than single-omics methods, with respect to both C-index and AUC values. Our FGCNSurv could obtain the highest C-index and AUC value among all the methods. These results show the advantages of the FGCNSurv and imply that it could deeply capture the complementary information from multi-omics data for survival prediction.
We also reported the prediction results for the other 10 experiments, including nine TCGA cancer datasets and the PCAWG dataset, with the C-indices in Table 3. As can be seen from the table, our method outperformed existing  The best results are shown in bold.
Dually fused graph convolutional network methods on the 7 of 10 TCGA cancer datasets and the PCAWG dataset, and achieved the second-best C-index values for the other three TCGA cancer datasets. Furthermore, for LIHC dataset, FGCNSurv has a significant improvement over the second-best method. Overall, the results on the total 11 datasets show the advantage of FGCNSurv for predicting survival of different cancer patients with multi-omics data.
Similarly to C-indices, FGCNSurv could also obtain better AUCs than other methods, which were skipped here due to the space limitation.
To further evaluate the performance of FGCNSurv, we performed the logrank test to test whether the patients in predicted high risk group and predicted low risk group have significantly different survival curves. Specifically, we divided the patients in the BRCA dataset into a high risk group and a low risk group based on their predicted hazard ratios, by using the threshold of the median hazard ratios. We then conducted the log-rank test to test whether the ground truth survival time of the samples in the two group are significantly different. The more significant is the difference between the two groups, the better is the prediction method. In Fig. 2, we plotted the Kaplan-Meier curves for the predicted high risk group and low risk group obtained by all methods based on the breast cancer dataset, and also reported the log-rank P-values. We can see that, for single-omic methods, deep learning method DeepSurv could obtain better results than traditional methods En-Cox and RSF for either omics. For example, DeepSurv obtained more significant log-rank P-values of 2.82eÀ13 and 1.69eÀ04 than En-Cox (4.98eÀ02 and 4.01eÀ02) and RSF (6.57eÀ03 and 1.86eÀ02) for gene expression and microRNA expression, respectively. This implies that DeepSurv could produce better survival predictions with more significant differences between the high risk group and low risk group. Furthermore, multi-omics methods including HFBSurv and our FGCNSurv could obtain even more significant results than the above two deep learning based singleomics methods DeepSurv and AGGSurv. Furthermore, our proposed FGCNSurv generated the most significant P-value of 7.81eÀ16, among all the four muti-omics methods, and this shows that our FGCNSurv could obtain the best survival prediction with more significant differences between the high risk group and low risk group, which reflects the effectiveness of the proposed method in survival prediction.

Univariate and multivariate Cox proportional hazards analysis
To evaluate whether the survival prediction performance of FGCNSurv is associated with five clinicopathologic factors of patients, including age at diagnosis, histologic grade, tumor size (T stage), lymph node invasion (N stage), and metastatic spread (M stage), we classified the BRCA patients to a high risk group and a low risk group by the predicted risks from the test sets, and then took the predicted risk as the sixth factor besides the above five. We then performed univariate and multivariate Cox proportional hazards analysis on the six factors including our predicted risk and clinicopathologic factors (Yu et al. 2019). Table 4 reports the hazard ratio by using different factors in univariate and multivariate analysis. As shown in Table 4, two factors of histologic grade and the predicted risk show significant association with survival in both univariate and multivariate Cox proportional analysis, and our predicted risk shows the most evident hazard for survival among all the six factors. The multivariate analysis implies that our risk factor could retain strong and significant independent prognostic factor when correcting for other clinicopathologic variables. The above analysis demonstrates that our FGCNSurv has great predictive power for survival.

Ablation study and parameter robustness
To verify the contribution of our dual fusion strategy in FGCNSurv for multi-omics survival prediction, we conducted ablation experiments on BRCA dataset by changing different parts in our model configuration. We adopted the following six different configurations to evaluate each component of the proposed method: • GCN: GCN on the single-omics graph with patient features obtained from highway networks. • LOW-FF: only using low-level feature fusion, i.e. the simple sum of omics-specific features Z ð1Þ and Z ð2Þ , for loss construction, without FBM and GCN modules. • FBM-FF: only using the cross-omics features Z ðcÞ by FBM for loss construction, without GCN module. • HIGH-FF: using high-level feature fusion by concatenating the low-level features Z low and the cross-omics features Z ðcÞ , without GCN module. • w/o Graph: removing the graph from the whole network structure, and replacing the graph by an identity matrix in GCN. • w/o FBM: removing FBM module from the whole network structure, and replacing Z by the low-level features Z low .
We performed the above six variants of FGCNSurv by changing different parts on the BRCA dataset. GCN is singleomics methods, LOW-FF, FBM-FF, and HIGH-FF are feature Table 3. Performance comparison of FGCNSurv and existing methods using C-index values on 9 TCGA datasets and PCAWG dataset. The best and second best results are highlighted in bold and underlined respectively. fusion methods using different fusion strategies without GCN, w/o graph is FGCNSurv without graph, i.e. the highlevel feature fusion followed by a fully connected network, and w/o FBM is FGCNSurv without FBM, i.e. removing FBM from the FGCNSurv. For a fair comparison, we used the same neural networks in the above six configurations to extract the features of each omics.

C-index
We reported the C-index and AUC values for different configurations in ablation study of FGCNSurv for BRCA dataset in Table 5, and other datasets in Supplementary Table 1. We can see that from Table 5 that, even without GCN module, our feature fusion strategy HIGH improves the simple fusion LOW and FBM. It can be also seen that FGCNSurv outperforms w/o FBM with respect to both C-index and AUC values, which reflects the importance of cross-omics features in our dual fusion strategy. Furthermore, the C-index and AUC values of FGCNSurv both have improvement over w/o Graph, implying the effectiveness of our graph-fusion strategy. The ablation study shows that the graph fusion strategy and factorized bilinear module are two important factors for the performance improvement of FGCNSurv.
We also plotted the corresponding Kaplan-Meier curves of the predicted results by the above approaches in Fig. 3, similar to Section 3.3. We can see that FGCNSurv gives the most  Dually fused graph convolutional network significant P-value of 7.81eÀ16 among all configurations in ablation experiments. The results show that FGCNSurv enables better classification of patients into low and high risk groups with remarkably better stratification, and imply the dual fusion strategy can explore the complementary information of microRNA and gene data for prognosis prediction of breast cancer. Although the only hyperparameter k in FGCNSurv, i.e. the neighborhood size for constructing k-nn graph, could be selected by cross-validation on training data, to show its robustness in a relatively large interval, we trained the FGCNSurv with different k values from 2 to 15, and reported the resulting C-indices for BRCA dataset by FGCNSurv in Fig. 4 and other datasets in Supplementary Fig. 1. We can see that, although the C-index obtained by FGCNSurv changes when k varies, it tends to be robust, especially when k is from 8 to 15, and furthermore, FGCNSurv could always obtain better C-indices than other comparison methods, with different k-values in a relatively large interval. This implies the robustness of the selection of k in real applications.

Conclusion
In this study, we proposed a novel cancer survival prediction method FGCNSurv by dually fused GCN. Different from the existing feature fusion or graph fusion methods for multiomics survival prediction, FGCNSurv takes a strategy of both feature fusion and graph fusion to capture the complementary information across different omics. The experiments on 11 cancer datasets show its advantages for survival prediction. Although FGCNSurv has obtained the best predictive performance, survival prediction is still a challenging problem and there is large space to improve the accuracy. For example, the prediction accuracy might be improved by introducing other omics data such as DNA methylation or pathological images, which have been reported to be helpful for cancer prognosis prediction. Besides, proportional hazard model in our FGCNSurv approach assumes that the patient risk does not vary over time, which is too restrictive in real case. It is promising to further improve our framework by jointly predicting the time of event and its rank in the Cox partial log-likelihood framework based on multi-task learning.