-
PDF
- Split View
-
Views
-
Cite
Cite
Gang Wen, Limin Li, Federated transfer learning with differential privacy for multi-omics survival analysis, Briefings in Bioinformatics, Volume 26, Issue 2, March 2025, bbaf166, https://doi.org/10.1093/bib/bbaf166
- Share Icon Share
Abstract
Multi-omics data often suffer from the “big |$p$|, small |$n$|” problem where the dimensionality of features is significantly larger than the sample size, making the integration of multi-omics data for survival analysis of a specific cancer particularly challenging. One common strategy is to share multi-omics data from other related cancers across multiple institutions and leverage the abundant data from these cancers to enhance survival predictions for the target cancer. However, due to data privacy and data-sharing regulations, it is challenging to aggregate multi-omics data of related cancers from multiple institutions into a centralized database to learn more accurate and robust models for the target cancer. To address the limitation, we propose a multi-omics survival prediction model with self-attention mechanism (MOSAHit), trained within a federated transfer learning framework with differential privacy. This approach enables the learning of a more robust multi-omics survival prediction model for a local target cancer with limited training data by effectively leveraging multi-omics data of related cancers distributed across multiple institutions while preserving individual privacy. Results from the comprehensive experiments on real-world datasets show that the proposed method effectively alleviates data insufficiency and significantly improves the generalization performance of multi-omics survival prediction model for a target cancer while avoiding the direct sharing of multi-omics data for related cancers.
Introduction
Survival analysis, which aims to understand the relationship between the covariates and the survival time, has been widely used in various fields such as economics, finance, engineering, and medicine. In contrast to regression analysis, the main challenge of survival analysis comes from the fact that the time-to-event data often exhibit significant skewness and censoring [1]. In clinical practice, historically, survival analysis primarily relies on low-dimensional patient features, such as age, gender, and tumor T/N/M stage [2]. Advancements in high-throughput sequencing technology have led to the availability of large volumes of high-dimensional omics data, which are increasingly used to predict survival and improve treatment in cancer patients. Despite the potential of high-throughput omics data to improve disease prognosis understanding, its high-dimensional characteristics present new difficulty for survival analysis, particularly in multi-omics datasets with limited training data. An effective strategy is to share patient multi-omics data of related cancer across multiple institutions and leverage the abundant data of related cancers to improve survival predictions for target cancers. However, data privacy and data-sharing regulations pose both challenges and opportunities in aggregating multi-omics datasets of related cancers from various institutions into a centralized repository to develop more generalizable models for the target cancer.
One common method for survival analysis is the Cox proportional hazards (Cox-PH) model [3]. Cox-PH model assumes that the relative proportional hazards between individuals remain constant over time and has become the most popular method. Deep learning has excelled in various tasks [4, 5] and is widely applied in survival analysis [6, 7]. For example, Deepsurv [6] combines the deep neural network with Cox regression model to predict survival for breast cancer patients and outperforms traditional linear and nonlinear machine learning methods. Compared with Deepsurv, DeepHit [7] uses a deep neural network to instead directly model the distribution of survival time. DeepHit avoids the problem of assuming a specific form for the underlying stochastic process and can capture the time-dependent influence of covariates on survival.
Advancements in high-throughput sequencing technology make large volumes of omics data available for survival prediction. Survival analysis methods based on omics data can be categorized to single-omics methods [8, 9] and multi-omics methods[10–12]. Single-omics methods refer to conducting survival prediction with a single type of biological data, which can reveal the impact of a specific biological feature on survival outcomes. For example, Cox-nnet [8] and CoxPASNet [9] use the multilayer networks to map gene expression data into a low-dimensional space for survival prediction. By scoring the features, Cox-nnet can identify the relative importance of specific genes associated with patient survival and reveal useful information regarding biological functions related to prognosis. In contrast to Cox-nnet, CoxPASNet provides explicit model interpretation and can capture the nonlinear and hierarchical mechanisms of biological pathways and identify important prognostic factors associated with patient survival. Multi-omics methods offer a more comprehensive understanding of cancer heterogeneity and complexity than single-omics methods by capturing complementary information from multiple omics and have shown superior potential for survival prediction. For example, GraphSurv [12] introduces the gene relationship in KEGG pathway as interpretable graph constraint and compresses the multi-omics data including gene expression, copy number variation, and DNA methylation by GCN module to obtain more informative embeddings for survival prediction.
High-throughput omics data often suffer from the “big |$p$|, small |$n$|” problem that the dimensionality of features is significantly larger than the sample size, and deep learning methods are prone to overfitting in this situation. An effective strategy is to share multi-omics data from other related cancers across multiple institutions and leverage the information from the abundant data of related cancers to improve survival predictions for target cancers. The work [13] proposes Transfer-Cox method, which transfers useful knowledge from the source domain to the target domain by learning shared representations across source and target domains, thus potentially improving survival prediction on target cancer datasets. Different from Transfer-Cox method, Kim et al. [14] put forward a transfer learning model, called VAECox, where there is no information about the survival time of patients in the source domain. VAECox first trains an unsupervised VAE model on gene expression data from multiple cancers, then initializes the survival prediction model weights for the target cancers with the VAE model parameters. Compared with knowledge transfer methods[13, 14] based on single-omics data, leveraging multi-omics data from related cancers to help improve survival predictions of specific cancers is more challenging. Cho et al. [15] propose to learn prior knowledge across various training tasks of related cancers by meta-learning strategy, to enhance the performance of multi-omics survival analysis for the target cancer.
In all machine learning methods, increasing exposure to data diversity by aggregating sample data from multiple institutions into a centralized database can help develop more accurate and robust models. But data centralization can be challenging that data owners cannot share data across multiple institutions due to privacy concerns and incompatible data-sharing agreements. To mitigate these challenges, federated learning [16–18] offers means for algorithms to learn from distributed data of various institutions, without exposing sensitive data of patients beyond the security of institutional firewalls. Currently, there are two main modes to implement federated learning: master-server and peer-to-peer. In master-server mode, each node (participating institution) trains model parameters only on local data; the master server aggregates model parameters from these different nodes and then sends back the updated model parameters to local node for the next iteration. In peer-to-peer mode, each node transfers the locally trained parameters to all nodes for the next federated round. Although the nodes never share patient data, leaks or attacks on model specifics may indirectly expose sensitive information of patients. This happens because parts of the training data can be reconstructed by gradients [19, 20] or inversion of model parameters [21, 22]. A popular strategy to address this problem is differential privacy [23]. Differential privacy reduces the individually identifiable information while preserving the global distribution of the data by adding certain levels of noise to the model parameters.
Recently, federated learning has already appeared in survival analysis to solve the problems of data silos and privacy protection in this field. Andreux et al. [24] argue that traditional Cox-PH loss, non-separable with respect to the samples, does not fit into federated learning framework and propose federated survival analysis with discrete-time cox model. Compared with Cox-PH model, discrete-time Cox model is more efficient and more amenable in federated learning setting. Lu et al. [25] introduce privacy-preserving federated learning for survival analysis based on pathological image data and develop accurate weakly supervised survival prediction models from distributed data silos without direct data sharing. In comparison with the horizontal federated learning methods [24, 25], Wang et al. [26] propose an adaptive vertical federated learning framework (AFEI), which assumes that each data site holds different omics-features of the same set of samples and integrates the distributed multi-omics features shared across multiple institutions by data encryption technique for cancer prognosis prediction. Although federated learning method has achieved some success in survival analysis, it remains challenging to effectively leverage multi-omics data from related cancers distributed across multiple institutions to develop more accurate and robust models for specific cancers with limited training data.
In this work, we propose a multi-omics survival prediction model with self-attention mechanism (MOSAHit), trained within a federated transfer learning framework with differential privacy. This approach enables the development of a more robust multi-omics survival prediction model for a local target cancer with limited training data by effectively leveraging related cancer multi-omics data distributed across multiple institutions while preserving individual privacy. Specifically, federated transfer learning for MOSAHit captures common features across different cancers distributed among multiple institutions through the parameter sharing mechanism, to help learn more robust and universal combinatorial feature representations across multiple omics for survival prediction of local target cancers. Results from the comprehensive experiments on multiple real-world datasets show that the proposed method can effectively alleviate data insufficiency and significantly improve generalization performance of multi-omics survival prediction model on target cancer while avoiding the direct sharing of multi-omics data from related cancers across multiple institutions.
Method: Federated transfer learning with differential privacy for multi-omics survival analysis
Multi-omics features |$X = \{X^{(1)},\cdots ,X^{(v)}\}$| and clinical data (observed time |$O$| and censoring status |$\Delta $|) of patients constitute multi-omics survival data, where |$v$| denotes the number of omics. In general, we assume that the patient’s actual survival time |$T$| is independent of the censoring time |$C$|. |$\Delta _{i}=1$| indicates that the observed time |$O_{i}$| of patient |$i$| is equal to its actual survival time |$T_{i}$|, and the death event occurs before the censoring event. Conversely, |$\Delta _{i}=0$| signifies that the observed time |$O_{i}$| corresponds to the censoring time |$C_{i}$|.
Multi-omics data often suffer from the “big |$p$|, small |$n$|” problem, making the integration of multi-omics data for survival analysis of specific cancers particularly challenging. One common strategy is to transfer knowledge from the plentiful multi-omics data of related cancers across multiple institutions to improve survival predictions of target cancers. Due to privacy concerns, institutional policies, or incompatible data-sharing agreements, one often has difficulty aggregating related cancer multi-omics data from multiple institutions into a centralized database to develop more generalizable models for the target cancer. To solve the problems of data silos and privacy protection in multi-omics survival analysis, we propose a multi-omics survival prediction model with self-attention mechanism (MOSAHit), trained within a federated transfer learning framework with differential privacy. This approach leverages knowledge from the abundant multi-omics data of related cancers distributed across multiple institutions through the parameter sharing mechanism, to help learn more robust and universal combinatorial feature representations across multiple omics for survival prediction of local target cancer with limited training data while preserving individual privacy. In this section, we first introduce survival analysis method with multi-head self-attention mechanism (MOSAHit), and then describe how to train the model parameters using federated transfer learning with differential privacy.
Multi-omics survival analysis model with self-attention mechanism
Integrating multi-omics data offers valuable insights into the heterogeneity and complexity of cancer, demonstrating promising potential for improving the prediction of survival outcomes. In this section, we propose a novel multi-omics survival analysis method, namely MOSAHit. The architecture of our proposed MOSAHit method is presented in Fig. 1. For preprocessed gene expression data |$X^{(1)}$| and microRNA expression data |$X^{(2)}$|, we learn two three-layered fully connected networks to map high-dimensional features |$X^{(1)}$| and |$X^{(2)}$| into the same embedding space for alleviating differences in statistical properties across different omics. Specifically, each fully connected layer contains 1000, 500, and 500 neurons, respectively, with the feature embeddings for gene expression and microRNA expression data denoted as |$Z^{(1)}=[z^{(1)}_{1},\cdots ,z^{(1)}_{n}]^{T}$| and |$Z^{(2)}=[z^{(2)}_{1},\cdots ,z^{(2)}_{n}]^{T}$|. Learning feature interactions is a fundamental problem in multi-omics data integration. A very popular approach is Factorized Bilinear Machine [27], which has shown its superiority in cancer survival prediction by capturing and quantifying the complex relationships across multimodal data [28]. In this study, we learn interaction features among multiple omics for survival prediction by multi-head self-attention mechanism [29]. Self-attention module can learn long-distance contextual dependencies, and has been successfully applied to machine translation [29] and text classification[30] in natural language processing. The attention mechanism essentially learns a set of weight coefficients to highlight important regions in the data while suppressing irrelevant areas. The principle of the attention mechanism can be intuitively explained by human visual cognition: when viewing an image, the brain ignores irrelevant information and focuses on parts of the image that is relevant to judgment.

Illustration of the proposed MOSAHit architecture. Attention mechanism helps us determine which multi-omics features should be combined to form meaningful combinatorial features. Multi-head self-attention mechanism maps the low-dimensional feature representation of multi-omics data into different subspaces and learns multiple meaningful combinatorial features separately. MOSAHit method integrates multiple meaningful feature combinations for directly predicting the probability distribution of survival time.
For the integration of multi-omics data, attention mechanism helps us determine which multi-omics features should be combined to form meaningful combinatorial features. Specifically, given the feature embeddings |$Z^{(1)}$| and |$Z^{(2)}$| of gene and microRNA, we first learn the vector representations |$\{Q^{(1)}=[q^{(1)}_{1},\cdots ,q^{(1)}_{n}]^{T}, K^{(1)}=[k^{(1)}_{1},\cdots ,k^{(1)}_{n}]^{T}, V^{(1)}=[v^{(1)}_{1},\cdots ,v^{(1)}_{n}]^{T}\}$| for gene and |$\{Q^{(2)}=[q^{(2)}_{1},\cdots ,q^{(2)}_{n}]^{T}, K^{(2)}=[k^{(2)}_{1},\cdots ,k^{(2)}_{n}]^{T}, V^{(2)}=[v^{(2)}_{1},\cdots ,v^{(2)}_{n}]^{T}\}$| for microRNA in query, key, and value space by the parameter matrices |$\{W^{Q}, W^{K}, W^{V}\}$| as follows:
We then obtain feature representations |$\{Z^{(1,c)}=[z^{(1,c)}_{1}, \cdots , $| |$z^{(1,c)}_{n}]^{T}, Z^{(2,c)}=[z^{(2,c)}_{1}, \cdots , z^{(2,c)}_{n}]^{T}\}$| by weighted sum of values |$\{V^{(1)}, V^{(2)}\}$| as follows:
with weights being |$\hat{\alpha }^{u_{1},u_{2}}_{i} = \frac{exp(\alpha ^{u_{1},u_{2}}_{i})}{\sum _{u_{2}}exp(\alpha ^{u_{1},u_{2}}_{i})}$|, where |$\alpha ^{u_{1},u_{2}}_{i} = \frac{\langle \mathbf{q}^{(u_{1})}_{i}, \mathbf{k}^{(u_{2})}_{i} \rangle }{\sqrt{d}},$| |$d$| is the dimension of |$v^{(1)}_{i}$|, and |$\langle \cdot , \cdot \rangle $| denotes inner product operation. Here the weight is computed by scaled dot-product attention and defines the correlation between gene features and microRNA features.
To obtain multiple meaningful feature combinations, we learn distinct combinatorial features |$\{Z^{(1,c_{1})},Z^{(2,c_{1})}\},\{Z^{(1,c_{2})},$| |$Z^{(2,c_{2})}\}$| separately using multiple heads. We concatenate learned combinatorial features by
and compress them using a fully connected network to comprehensive multi-omics representations |$Z=[z_{1},\cdots ,z_{n}]^{T}$| as follows:
where |$W_{i}, f_{j} (i=0,1,2; j=1,2)$| are the parameter matrices and activation function of the fully connected network, respectively, and |$\oplus $| represents the concatenation of two matrices.
Since the Cox proportional hazards (Cox-PH) model, which optimizes parameters by learning a loss function that reflects the relative risk among patients, does not align with existing federated learning frameworks [24], we instead directly model the distribution of survival time [7] in this study. We denote the maximum time horizon as |$T_{max}$|, and partition the time interval |$[0, T_{max}]$| into |$R$| sequential sub-intervals |$\{\mathcal{T}_{1},\cdots ,\mathcal{T}_{R}\}$|. Similar to a classification problem, we map the multi-omics feature representation |$z_{i}$| of patient |$i$| to a probability distribution |$y_{i} = [y_{i,1}, \cdots , y_{i,R}]$| over the |$R$| sub-intervals using a softmax layer, where |$y_{i,r}$| represents the probability of death for patient |$i$| during |$r$|th time sub-interval |$\mathcal{T}_{r}$|. Accordingly, for patient |$i$| in the training dataset, the observation time |$O_{i}$| is associated with an integer |$r_{i}$| such that |$O_{i}\in \mathcal{T}_{r_{i}}$|. To learn the probability distribution of the first hitting time over |$\{\mathcal{T}_{1},\cdots ,\mathcal{T}_{R}\}$|, we define a loss function |$L_{1}$| by
where cumulative incidence function |$\hat{F}(r_{i}|z_{i})=\sum _{k=1}^{r_{i}}y_{i,k}$| represents the probability of death event during or before the sub-interval |$\mathcal{T}_{r_{i}}$| conditional on covariates |$z_{i}$|. The first term of |$L_{1}$| captures information on deaths at specific time points from uncensored patients, while the second term addresses the censoring bias by maximizing the probability of death for censored patients after their respective censoring time.
Meanwhile, a ranking loss function |$L_{2}$| is used to penalize incorrect ordering of pairs. The ranking loss |$L_{2}$| is defined as follows:
where |$I$| denotes the indicator function and |$\mu $| is a hyperparameter that can be set to 0.1 experimentally. The ranking loss enforces the criterion that a patient who dies at time |$O_{i}$| should be assigned a higher predicted death risk at time |$O_{i}$| compared with a patient whose survival time beyond |$O_{i}$|, i.e. |$O_{j}>O_{i}$|. The ranking loss makes sure the predicted survival outcome preserves the partial ranking information inferred from the training samples.
To accurately learn the probability distribution of patient survival time, we combine loss |$L_{1}$| with loss |$L_{2}$|. The total loss of our MOSAHit model is shown below:
where |$\lambda $| balances the two terms |$L_{1}$| and |$L_{2}$|, and is set to 0.1 in our experiments. Note that the softmax output layer consists of 30 neurons, and due to the high skewness of time-to-event data, the time intervals defined by the neurons are different. For the first 20 neurons, we search the time interval from {60, 90, 120} (days), and for the last 10 neurons, we set the time interval to 180 days.
Federated transfer learning with differential privacy for MOSAHit method
A generalizable MOSAHit model could be well learned when sufficient training data samples are available for a specific target cancer, but multi-omics data often suffer from the “big |$p$|, small |$n$|” problem, making MOSAHit method prone to overfitting in the specific cancer. Transfer learning approaches can help learn the MOSAHit model for the target cancer via leveraging knowledge from extensive data of related cancers, thereby reducing its reliance on large volumes of labeled data specific to target cancers. However, these approaches could not be directly used when the datasets are distributed across different institutions without data sharing agreement. Due to privacy concerns, developing a generalizable MOSAHit model is often challenging when relying on data silos from multiple institutions instead of a centralized data repository. To integrate data silos from different institutions while preserving privacy, we propose a federated transfer learning framework with differential privacy to learn our multi-omics model MOSAHit. The architecture of the proposed framework is presented in Fig. 2.

Overview of federated transfer learning with differential privacy for MOSAHit method. Each node trains its respective model with local data, adds random noise to the weight parameters and uploads the values of the trainable model parameters to the hub server at a consistent frequency. Once the hub server receives all parameters, it adopts a weighted average strategy to update the parameters of the global model (|$\theta = \sum _{k=1}^{4} w_{k}\theta _{k}$|, and |$w_{k}$| denotes weight of local model |$k$|), and sends the new parameters to back each node for synchronization.
Suppose |$M$| institutions possess multi-omics survival datasets for different cancers, where institution node with dataset |$D_{M}$| is for a specific target cancer, and institutions with datasets |$\{D_{1},\cdots ,D_{M-1}\}$| are for related cancers. We adopted a master-server architecture for federated transfer learning, where the local MOSAHit model with parameters |$\theta _{k}$| for each node |$k$| and the global MOSAHit model with parameters |$\theta $| for a centralized hub server have the same network structure and collaborate with each other. The local and global models are updated in an iterative way.
For local model update, each local node first downloads the model parameter |$\theta $| from the hub server, and then updates the parameters using its local data |$D$|. To prevent the leakage of specific patient information, differential privacy mechanism is employed to preserve individual privacy by obscuring the parameters of each local node |$k$| with random noise |$\omega _{k}$|. For each local node, we define a gradient descent mapping |$t:\hat{\mathcal{D}}\rightarrow \Theta $| as
where |$\hat{\mathcal{D}}$| represents the collection of local data (|$D \in \hat{\mathcal{D}}$|), |$\alpha $| is the learning rate, and |$\nabla L_{D}(\theta )$| represents the gradient of the loss function for local data |$D$|. Gaussian mechanism |$G$| with parameter |$\sigma $|, defined as
where |$\omega $| are random vectors drawn from |$N(0,\sigma ^{2}I)$|, simply computes |$t(D)$|, and perturbs its outputs with noise drawn from the Gaussian distribution. With Gaussian mechanism, the parameter update process for local node |$k$| is as shown below:
where |$\omega _{k}$| follows |$N(0,\sigma ^{2} I)$|, and |$L_{k}$| is the loss function for local node |$k$| with dataset |$D_{k}$|.
The differential privacy mechanism ensures that attackers cannot infer any sensitive information about patient |$i$| from the weight parameters of the shared model when patient |$i$| is removed from or added to the data node. In Gaussian mechanism, the parameter |$\sigma $| controls the balance between the privacy preservation and data utilization. Mathematically, given two adjacent datasets, |$D$| and |$D^{\prime}\in \mathcal{D}$|, which differ by only one individual, a randomized algorithm |$F:\mathcal{D} \rightarrow \mathcal{S}\subset \mathbb{R}^{l}$| satisfies |$(\epsilon ,\delta )$| differential privacy if, for any subset |$S\subset \mathcal{S}$|:
where |$\epsilon $| is the privacy budget to control the privacy level, and |$\delta $| represents the probability that |$\epsilon $|-differential privacy may be violated. By theorem 3.22 in Dwork et al.’s [23] book, for arbitrary |$\epsilon \in (0, 1)$|, if
then Gaussian mechanism |$G$| with |$\sigma $| defined in (4) satisfies |$(\epsilon ,\delta )$|-differential privacy, where |$s_{t}$| is the |$l_{2}$|-sensitivity of function |$t$| defined as
From condition (7), we can obtain
This implies that, for a given level |$\delta $|, increasing the standard deviation |$\sigma $| in Gaussian mechanism |$G$| allows a smaller |$\epsilon $| to satisfy the requirements of differential privacy. Following the work [25], in the deep learning framework, we set |$\sigma = \rho * \eta $|, where |$\rho $| is an adjustable parameter and |$\eta $| represents the standard deviation of the increments in network training parameters at each update of the local model. By adjusting |$\rho $|, we can change the standard deviation |$ \sigma $| of the added Gaussian noise |$\omega $|, thus controlling the level of differential privacy protection.
After updating the local parameters by equation (5), local node |$k$| uploads the model parameters |$\theta _{k}$| to the hub server. Once the hub server receives all parameters, it adopts a weighted average strategy to update the parameters of the global model and sends the new parameters back to each local node for the next iteration. The parameter update process for the global model is as follows:
where |$\sum _{k=1}^{M} w_{k} = 1$| and |$w_{k}$| denotes weight of local model |$k$|. In multitask learning, all the tasks are learned simultaneously and equally weighted, and use interconnections to boost each other. Compared with multitask learning, our proposed federated transfer learning empirically required that weights of samples from the target cancer be greater than those from related cancers, which induces the global model focusing more on the target cancer. The federated transfer learning framework with differential privacy for MOSAHit leverages knowledge from multi-omics data of related cancers distributed across multiple institutions through a parameter-sharing mechanism. This approach enables the learning of more robust and universal combinatorial feature representations across multiple omics, facilitating the development of a more generalized model for survival prediction of target cancers in a privacy-preserving setting.
Experiments
Datasets and preprocessing
The Cancer Genome Atlas (TCGA) [31] is a publicly available cancer genome database containing whole-genome data of |$\sim $|11 000 patients from 33 common cancers. We integrated RNASeq data for gene expression and miRNA-Seq data for microRNA expression to evaluate the performance of multi-omics survival analysis method MOSAHit trained using federated transfer learning with differential privacy. In this study, the preprocessing procedures of gene expression data and microRNA expression data are as follows. First, we removed low-quality features (genes/microRNAs) with over 10% missing values and used median imputation strategy to fill in the remaining missing data. Second, we performed log transformation and filtered out noise-sensitive features with low variance across samples. Then we standardized these features to eliminate the effects of unit and scale differences among features. Finally, 17 720 genes and 1881 microRNA features are chosen for further study.
We selected 10 prevalent cancer types from TCGA as target cancers, which are listed as follows: Breast Invasive Carcinoma (BRCA), Kidney Renal Clear Cell Carcinoma (KIRC), Lung Adenocarcinoma (LUAD), Lung Squamous Cell Carcinoma (LUSC), Head And Neck Squamous Cell Carcinoma (HNSC), Bladder Urothelial Carcinoma (BLCA), Colon Adenocarcinoma (COAD), Liver Hepatocellular Carcinoma (LIHC), Ovarian Serous Cystadenocarcinoma (OV), and Esophageal Carcinoma (ESCA). To further validate its effectiveness of multi-omics survival analysis method MOSAHit trained within a federated transfer learning framework with differential privacy, we conducted an external experiment on the independent dataset TARGET-WT [32]. TARGET-WT dataset collects multi-omics and clinical data from pediatric patients with three different high-risk kidney tumors. TARGET-WT dataset is publicly available and can be obtained from UCSC Xena database (https://xena.ucsc.edu/). Detailed descriptions of the 11 target cancer datasets are provided in Table 1.
Cancer type . | # patients . | Prop.Censored . | Cancer type . | # patients . | Prop.Censored . |
---|---|---|---|---|---|
BRCA | 1021 | 0.861 | HNSC | 495 | 0.564 |
KIRC | 508 | 0.671 | COAD | 439 | 0.770 |
LIHC | 367 | 0.651 | OV | 372 | 0.390 |
LUAD | 496 | 0.637 | LUSC | 469 | 0.582 |
BLCA | 402 | 0.562 | ESCA | 164 | 0.610 |
Target-WT | 131 | 0.588 |
Cancer type . | # patients . | Prop.Censored . | Cancer type . | # patients . | Prop.Censored . |
---|---|---|---|---|---|
BRCA | 1021 | 0.861 | HNSC | 495 | 0.564 |
KIRC | 508 | 0.671 | COAD | 439 | 0.770 |
LIHC | 367 | 0.651 | OV | 372 | 0.390 |
LUAD | 496 | 0.637 | LUSC | 469 | 0.582 |
BLCA | 402 | 0.562 | ESCA | 164 | 0.610 |
Target-WT | 131 | 0.588 |
Cancer type . | # patients . | Prop.Censored . | Cancer type . | # patients . | Prop.Censored . |
---|---|---|---|---|---|
BRCA | 1021 | 0.861 | HNSC | 495 | 0.564 |
KIRC | 508 | 0.671 | COAD | 439 | 0.770 |
LIHC | 367 | 0.651 | OV | 372 | 0.390 |
LUAD | 496 | 0.637 | LUSC | 469 | 0.582 |
BLCA | 402 | 0.562 | ESCA | 164 | 0.610 |
Target-WT | 131 | 0.588 |
Cancer type . | # patients . | Prop.Censored . | Cancer type . | # patients . | Prop.Censored . |
---|---|---|---|---|---|
BRCA | 1021 | 0.861 | HNSC | 495 | 0.564 |
KIRC | 508 | 0.671 | COAD | 439 | 0.770 |
LIHC | 367 | 0.651 | OV | 372 | 0.390 |
LUAD | 496 | 0.637 | LUSC | 469 | 0.582 |
BLCA | 402 | 0.562 | ESCA | 164 | 0.610 |
Target-WT | 131 | 0.588 |
Evaluation metrics
In this study, we evaluated the performance of multi-omics survival analysis approach MOSAHit trained using federated transfer learning with the time-dependent concordance index (|$C^{td}$|-index) [33]. The ordinary C-index [34] measures the predictive ability of survival analysis models by estimating the probability of concordance between the ordering of predicted risks and actual survival times. The C-index cannot capture changes in risk that may occur over time. |$C^{td}$|-index takes time into account and provides an appropriate assessment of how covariates influence survival over time, defined as follows:
where |$\hat{F}$| is the cumulative incidence function. Note that when the proportional hazards assumption holds, |$C^{td}$|-index is equivalent to the usual C-index. The value of |$C^{td}$|-index ranges from 0 to 1, with higher values of |$C^{td}$|-index indicating better prediction performance of the model and conversely.
Experimental setting
In our experiments, we considered four distinct institutional nodes, aggregating the multi-omics data for the target cancer to target node |$4$| and distributing multi-omics data for the related cancers from TCGA database alphabetically across source nodes |$1, 2$|, and |$3$|. We randomly partitioned the target cancer datasets into 80% for training and 20% for testing, stratified by censoring status. In federated transfer learning setting, we train a prediction model using the training set of the target cancer and multi-omics datasets from related cancers across multiple source nodes and evaluate the model on the paired testing set of the target cancer. To comprehensively evaluate the performance of MOSAHit method trained using federated transfer learning with differential privacy, we repeated the above process 20 times and reported the mean value and standard deviation of the |$C^{td}$|-indices.
In this study, MOSAHit method follows a contemporary deep learning design and is implemented on PyTorch platform. When training MOSAHit method using federated transfer learning with differential privacy, we set the Gaussian noise level |$\rho $| to 0.05 and selected Adam optimizer with learning rate of 2e-4 as the optimization algorithm. Furthermore, the number of cancer samples from the target node is significantly smaller than that from the source nodes; thus when updating the parameters of the global model, we did not specifically design the weights for training samples across different nodes and instead averaged the contributions of each local model, which enables the global model to effectively learn the training samples from the specific target cancer.
Results
Performance evaluation of MOSAHit method trained using federated transfer learning with differential privacy
In order to evaluate the performance of MOSAHit method trained using federated transfer learning with differential privacy in multi-omics survival analysis, we introduced single-omics methods and three typical single-fusion methods for comparison: (i) DeepHit: Only using single omics (gene/microRNA) embedding features for learning the probability distribution of patient survival time; (ii) MODCHit: direct concatenation from multi-omics embedding features; (iii) MOEAHit: element-wise addition from multi-omics embedding features; (iv) MOFBHit: factorized bilinear fusion of multi-omics embedding features. For a fair comparison, we extracted embedding features for the aforementioned methods using the same neural networks.
Table 2 shows the |$C^{td}$|-index values of different methods in the contexts of federated transfer learning with differential privacy and direct learning using only the target cancer data, and several important observations are summarized as outlined below. We observed that all methods, trained on multi-institutional data using federated transfer learning with differential privacy, significantly improve performance by leveraging knowledge from related cancer datasets, compared with learning from the target cancer data alone. For example, DeepHit method, trained on gene/microRNA expression data from multiple institutions using federated transfer learning, has an average |$C^{td}$|-index value 0.658/0.630 across all target cancers and obtains an improvement of 3.9%/3.3% compared with only learning from the gene/microRNA expression data of the target cancers. Moreover, it is of note that, when these methods are trained on multi-institutional data using federated transfer learning, MOSAHit method achieves the most satisfactory performance and brings significant improvement over single-omics methods and three typical single-fusion methods. Specifically, in the setting of federated transfer learning with differential privacy, MOSAHit method outperforms DeepHit (gene/microRNA), MODCHit, MOEAHit, and MOFBHit methods by about 5.5%/10.2%, 3.6%, 9.5%, and 3.3%, respectively. These results fully demonstrate that the proposed method can capture common features across different cancers distributed among multiple institutions through the parameter sharing mechanism, enabling the learning of more robust and universal combinatorial feature representations across multiple omics for survival prediction of target cancers while preserving differential privacy.
|$C^{td}$|-index values for different methods in the contexts of federated transfer learning with differential privacy and direct learning using only the target cancer data. The average of |$C^{td}$|-index values for various methods in different contexts are shown in the last column. The top two results are emphasized in bold and underlined, respectively
Training Mode . | Data . | Method . | BRCA . | KIRC . | BLCA . | COAD . | LUAD . | LUSC . | HNSC . | ESCA . | OV . | LIHC . | Ave . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
microRNA | DeepHit | 0.610 | 0.638 | 0.649 | 0.607 | 0.609 | 0.553 | 0.612 | 0.649 | 0.539 | 0.638 | 0.610 | |
gene | DeepHit | 0.668 | 0.686 | 0.653 | 0.684 | 0.622 | 0.581 | 0.618 | 0.580 | 0.577 | 0.662 | 0.633 | |
Direct Learning | multi-omics | MODCHit | 0.639 | 0.680 | 0.670 | 0.660 | 0.613 | 0.580 | 0.618 | 0.591 | 0.559 | 0.672 | 0.628 |
MOEAHit | 0.626 | 0.670 | 0.645 | 0.629 | 0.595 | 0.572 | 0.616 | 0.615 | 0.553 | 0.648 | 0.617 | ||
MOFBHit | 0.656 | 0.689 | 0.665 | 0.651 | 0.608 | 0.583 | 0.623 | 0.611 | 0.585 | 0.674 | 0.635 | ||
MOSAHit | 0.666 | 0.742 | 0.684 | 0.674 | 0.636 | 0.583 | 0.629 | 0.634 | 0.601 | 0.688 | 0.654 | ||
microRNA | DeepHit | 0.645 | 0.649 | 0.659 | 0.638 | 0.638 | 0.565 | 0.633 | 0.667 | 0.565 | 0.642 | 0.630 | |
gene | DeepHit | 0.693 | 0.720 | 0.674 | 0.694 | 0.646 | 0.577 | 0.641 | 0.633 | 0.615 | 0.680 | 0.658 | |
Federated Transfer | multi-omics | MODCHit | 0.697 | 0.734 | 0.710 | 0.714 | 0.647 | 0.587 | 0.658 | 0.665 | 0.604 | 0.687 | 0.670 |
MOEAHit | 0.650 | 0.679 | 0.664 | 0.651 | 0.641 | 0.562 | 0.638 | 0.620 | 0.575 | 0.656 | 0.634 | ||
MOFBHit | 0.702 | 0.740 | 0.707 | 0.712 | 0.654 | 0.589 | 0.655 | 0.647 | 0.620 | 0.697 | 0.672 | ||
MOSAHit | 0.725 | 0.773 | 0.729 | 0.719 | 0.679 | 0.600 | 0.655 | 0.703 | 0.641 | 0.711 | 0.694 |
Training Mode . | Data . | Method . | BRCA . | KIRC . | BLCA . | COAD . | LUAD . | LUSC . | HNSC . | ESCA . | OV . | LIHC . | Ave . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
microRNA | DeepHit | 0.610 | 0.638 | 0.649 | 0.607 | 0.609 | 0.553 | 0.612 | 0.649 | 0.539 | 0.638 | 0.610 | |
gene | DeepHit | 0.668 | 0.686 | 0.653 | 0.684 | 0.622 | 0.581 | 0.618 | 0.580 | 0.577 | 0.662 | 0.633 | |
Direct Learning | multi-omics | MODCHit | 0.639 | 0.680 | 0.670 | 0.660 | 0.613 | 0.580 | 0.618 | 0.591 | 0.559 | 0.672 | 0.628 |
MOEAHit | 0.626 | 0.670 | 0.645 | 0.629 | 0.595 | 0.572 | 0.616 | 0.615 | 0.553 | 0.648 | 0.617 | ||
MOFBHit | 0.656 | 0.689 | 0.665 | 0.651 | 0.608 | 0.583 | 0.623 | 0.611 | 0.585 | 0.674 | 0.635 | ||
MOSAHit | 0.666 | 0.742 | 0.684 | 0.674 | 0.636 | 0.583 | 0.629 | 0.634 | 0.601 | 0.688 | 0.654 | ||
microRNA | DeepHit | 0.645 | 0.649 | 0.659 | 0.638 | 0.638 | 0.565 | 0.633 | 0.667 | 0.565 | 0.642 | 0.630 | |
gene | DeepHit | 0.693 | 0.720 | 0.674 | 0.694 | 0.646 | 0.577 | 0.641 | 0.633 | 0.615 | 0.680 | 0.658 | |
Federated Transfer | multi-omics | MODCHit | 0.697 | 0.734 | 0.710 | 0.714 | 0.647 | 0.587 | 0.658 | 0.665 | 0.604 | 0.687 | 0.670 |
MOEAHit | 0.650 | 0.679 | 0.664 | 0.651 | 0.641 | 0.562 | 0.638 | 0.620 | 0.575 | 0.656 | 0.634 | ||
MOFBHit | 0.702 | 0.740 | 0.707 | 0.712 | 0.654 | 0.589 | 0.655 | 0.647 | 0.620 | 0.697 | 0.672 | ||
MOSAHit | 0.725 | 0.773 | 0.729 | 0.719 | 0.679 | 0.600 | 0.655 | 0.703 | 0.641 | 0.711 | 0.694 |
|$C^{td}$|-index values for different methods in the contexts of federated transfer learning with differential privacy and direct learning using only the target cancer data. The average of |$C^{td}$|-index values for various methods in different contexts are shown in the last column. The top two results are emphasized in bold and underlined, respectively
Training Mode . | Data . | Method . | BRCA . | KIRC . | BLCA . | COAD . | LUAD . | LUSC . | HNSC . | ESCA . | OV . | LIHC . | Ave . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
microRNA | DeepHit | 0.610 | 0.638 | 0.649 | 0.607 | 0.609 | 0.553 | 0.612 | 0.649 | 0.539 | 0.638 | 0.610 | |
gene | DeepHit | 0.668 | 0.686 | 0.653 | 0.684 | 0.622 | 0.581 | 0.618 | 0.580 | 0.577 | 0.662 | 0.633 | |
Direct Learning | multi-omics | MODCHit | 0.639 | 0.680 | 0.670 | 0.660 | 0.613 | 0.580 | 0.618 | 0.591 | 0.559 | 0.672 | 0.628 |
MOEAHit | 0.626 | 0.670 | 0.645 | 0.629 | 0.595 | 0.572 | 0.616 | 0.615 | 0.553 | 0.648 | 0.617 | ||
MOFBHit | 0.656 | 0.689 | 0.665 | 0.651 | 0.608 | 0.583 | 0.623 | 0.611 | 0.585 | 0.674 | 0.635 | ||
MOSAHit | 0.666 | 0.742 | 0.684 | 0.674 | 0.636 | 0.583 | 0.629 | 0.634 | 0.601 | 0.688 | 0.654 | ||
microRNA | DeepHit | 0.645 | 0.649 | 0.659 | 0.638 | 0.638 | 0.565 | 0.633 | 0.667 | 0.565 | 0.642 | 0.630 | |
gene | DeepHit | 0.693 | 0.720 | 0.674 | 0.694 | 0.646 | 0.577 | 0.641 | 0.633 | 0.615 | 0.680 | 0.658 | |
Federated Transfer | multi-omics | MODCHit | 0.697 | 0.734 | 0.710 | 0.714 | 0.647 | 0.587 | 0.658 | 0.665 | 0.604 | 0.687 | 0.670 |
MOEAHit | 0.650 | 0.679 | 0.664 | 0.651 | 0.641 | 0.562 | 0.638 | 0.620 | 0.575 | 0.656 | 0.634 | ||
MOFBHit | 0.702 | 0.740 | 0.707 | 0.712 | 0.654 | 0.589 | 0.655 | 0.647 | 0.620 | 0.697 | 0.672 | ||
MOSAHit | 0.725 | 0.773 | 0.729 | 0.719 | 0.679 | 0.600 | 0.655 | 0.703 | 0.641 | 0.711 | 0.694 |
Training Mode . | Data . | Method . | BRCA . | KIRC . | BLCA . | COAD . | LUAD . | LUSC . | HNSC . | ESCA . | OV . | LIHC . | Ave . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
microRNA | DeepHit | 0.610 | 0.638 | 0.649 | 0.607 | 0.609 | 0.553 | 0.612 | 0.649 | 0.539 | 0.638 | 0.610 | |
gene | DeepHit | 0.668 | 0.686 | 0.653 | 0.684 | 0.622 | 0.581 | 0.618 | 0.580 | 0.577 | 0.662 | 0.633 | |
Direct Learning | multi-omics | MODCHit | 0.639 | 0.680 | 0.670 | 0.660 | 0.613 | 0.580 | 0.618 | 0.591 | 0.559 | 0.672 | 0.628 |
MOEAHit | 0.626 | 0.670 | 0.645 | 0.629 | 0.595 | 0.572 | 0.616 | 0.615 | 0.553 | 0.648 | 0.617 | ||
MOFBHit | 0.656 | 0.689 | 0.665 | 0.651 | 0.608 | 0.583 | 0.623 | 0.611 | 0.585 | 0.674 | 0.635 | ||
MOSAHit | 0.666 | 0.742 | 0.684 | 0.674 | 0.636 | 0.583 | 0.629 | 0.634 | 0.601 | 0.688 | 0.654 | ||
microRNA | DeepHit | 0.645 | 0.649 | 0.659 | 0.638 | 0.638 | 0.565 | 0.633 | 0.667 | 0.565 | 0.642 | 0.630 | |
gene | DeepHit | 0.693 | 0.720 | 0.674 | 0.694 | 0.646 | 0.577 | 0.641 | 0.633 | 0.615 | 0.680 | 0.658 | |
Federated Transfer | multi-omics | MODCHit | 0.697 | 0.734 | 0.710 | 0.714 | 0.647 | 0.587 | 0.658 | 0.665 | 0.604 | 0.687 | 0.670 |
MOEAHit | 0.650 | 0.679 | 0.664 | 0.651 | 0.641 | 0.562 | 0.638 | 0.620 | 0.575 | 0.656 | 0.634 | ||
MOFBHit | 0.702 | 0.740 | 0.707 | 0.712 | 0.654 | 0.589 | 0.655 | 0.647 | 0.620 | 0.697 | 0.672 | ||
MOSAHit | 0.725 | 0.773 | 0.729 | 0.719 | 0.679 | 0.600 | 0.655 | 0.703 | 0.641 | 0.711 | 0.694 |
To further evaluate the performance of MOSAHit method trained using federated transfer learning with differential privacy, we categorized cancer patients in the testing dataset of target cancers into longer term or shorter term survivors by the criterion of 5-year survival and estimated the |$C^{td}$|-index values between longer term survivors and shorter term survivors. Since longer term survivors are rare in ESCA dataset, we only reported the results of different methods for nine common target cancers in Table 3. From the experimental results, we could see that these methods trained using federated transfer learning with differential privacy, compared with direct learning only on the local data of target cancers, can better distinguish longer term survivors from shorter term survivors by leveraging the information from the abundant multi-omics data of related cancers. Moreover, among these methods trained using federated transfer learning with differential privacy, our proposed method MOSAHit outperforms single-omics methods and three typical single-fusion methods. Especially for KIRC, LUAD, and BLCA, MOSAHit method trained using federated transfer learning has achieved significant improvements compared with the second-best method. Taken together, these results clearly demonstrate the superiority of multi-omics survival analysis method MOSAHit in federated transfer learning setting.
Performance comparison of different methods in the contexts of federated transfer learning with differential privacy and direct learning using the |$C^{td}$|-index values between longer-term survivors and shorter-term survivors. The average of |$C^{td}$|-index values for various in different contexts are shown in the last column. The top two results are emphasized in bold and underlined, respectively
Training Mode . | Data . | Method . | BRCA . | KIRC . | BLCA . | COAD . | LUAD . | LUSC . | HNSC . | OV . | LIHC . | Ave . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
microRNA | DeepHit | 0.629 | 0.664 | 0.695 | 0.600 | 0.617 | 0.562 | 0.583 | 0.526 | 0.633 | 0.612 | |
gene | DeepHit | 0.685 | 0.728 | 0.728 | 0.694 | 0.617 | 0.620 | 0.604 | 0.585 | 0.689 | 0.661 | |
Direct Learning | multi-omics | MODCHit | 0.660 | 0.720 | 0.742 | 0.649 | 0.621 | 0.605 | 0.606 | 0.557 | 0.678 | 0.649 |
MOEAHit | 0.639 | 0.707 | 0.689 | 0.617 | 0.601 | 0.583 | 0.615 | 0.550 | 0.672 | 0.630 | ||
MOFBHit | 0.672 | 0.737 | 0.741 | 0.658 | 0.626 | 0.619 | 0.629 | 0.595 | 0.671 | 0.661 | ||
MOSAHit | 0.700 | 0.776 | 0.763 | 0.708 | 0.654 | 0.606 | 0.636 | 0.620 | 0.710 | 0.686 | ||
microRNA | DeepHit | 0.654 | 0.673 | 0.702 | 0.641 | 0.691 | 0.581 | 0.692 | 0.567 | 0.691 | 0.655 | |
gene | DeepHit | 0.710 | 0.739 | 0.725 | 0.732 | 0.684 | 0.583 | 0.663 | 0.643 | 0.711 | 0.688 | |
Federated Transfer | multi-omics | MODCHit | 0.714 | 0.768 | 0.755 | 0.732 | 0.698 | 0.619 | 0.697 | 0.619 | 0.721 | 0.703 |
MOEAHit | 0.667 | 0.708 | 0.687 | 0.642 | 0.677 | 0.577 | 0.673 | 0.593 | 0.677 | 0.656 | ||
MOFBHit | 0.719 | 0.779 | 0.767 | 0.752 | 0.703 | 0.624 | 0.693 | 0.657 | 0.705 | 0.711 | ||
MOSAHit | 0.747 | 0.824 | 0.818 | 0.770 | 0.739 | 0.650 | 0.701 | 0.679 | 0.748 | 0.742 |
Training Mode . | Data . | Method . | BRCA . | KIRC . | BLCA . | COAD . | LUAD . | LUSC . | HNSC . | OV . | LIHC . | Ave . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
microRNA | DeepHit | 0.629 | 0.664 | 0.695 | 0.600 | 0.617 | 0.562 | 0.583 | 0.526 | 0.633 | 0.612 | |
gene | DeepHit | 0.685 | 0.728 | 0.728 | 0.694 | 0.617 | 0.620 | 0.604 | 0.585 | 0.689 | 0.661 | |
Direct Learning | multi-omics | MODCHit | 0.660 | 0.720 | 0.742 | 0.649 | 0.621 | 0.605 | 0.606 | 0.557 | 0.678 | 0.649 |
MOEAHit | 0.639 | 0.707 | 0.689 | 0.617 | 0.601 | 0.583 | 0.615 | 0.550 | 0.672 | 0.630 | ||
MOFBHit | 0.672 | 0.737 | 0.741 | 0.658 | 0.626 | 0.619 | 0.629 | 0.595 | 0.671 | 0.661 | ||
MOSAHit | 0.700 | 0.776 | 0.763 | 0.708 | 0.654 | 0.606 | 0.636 | 0.620 | 0.710 | 0.686 | ||
microRNA | DeepHit | 0.654 | 0.673 | 0.702 | 0.641 | 0.691 | 0.581 | 0.692 | 0.567 | 0.691 | 0.655 | |
gene | DeepHit | 0.710 | 0.739 | 0.725 | 0.732 | 0.684 | 0.583 | 0.663 | 0.643 | 0.711 | 0.688 | |
Federated Transfer | multi-omics | MODCHit | 0.714 | 0.768 | 0.755 | 0.732 | 0.698 | 0.619 | 0.697 | 0.619 | 0.721 | 0.703 |
MOEAHit | 0.667 | 0.708 | 0.687 | 0.642 | 0.677 | 0.577 | 0.673 | 0.593 | 0.677 | 0.656 | ||
MOFBHit | 0.719 | 0.779 | 0.767 | 0.752 | 0.703 | 0.624 | 0.693 | 0.657 | 0.705 | 0.711 | ||
MOSAHit | 0.747 | 0.824 | 0.818 | 0.770 | 0.739 | 0.650 | 0.701 | 0.679 | 0.748 | 0.742 |
Performance comparison of different methods in the contexts of federated transfer learning with differential privacy and direct learning using the |$C^{td}$|-index values between longer-term survivors and shorter-term survivors. The average of |$C^{td}$|-index values for various in different contexts are shown in the last column. The top two results are emphasized in bold and underlined, respectively
Training Mode . | Data . | Method . | BRCA . | KIRC . | BLCA . | COAD . | LUAD . | LUSC . | HNSC . | OV . | LIHC . | Ave . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
microRNA | DeepHit | 0.629 | 0.664 | 0.695 | 0.600 | 0.617 | 0.562 | 0.583 | 0.526 | 0.633 | 0.612 | |
gene | DeepHit | 0.685 | 0.728 | 0.728 | 0.694 | 0.617 | 0.620 | 0.604 | 0.585 | 0.689 | 0.661 | |
Direct Learning | multi-omics | MODCHit | 0.660 | 0.720 | 0.742 | 0.649 | 0.621 | 0.605 | 0.606 | 0.557 | 0.678 | 0.649 |
MOEAHit | 0.639 | 0.707 | 0.689 | 0.617 | 0.601 | 0.583 | 0.615 | 0.550 | 0.672 | 0.630 | ||
MOFBHit | 0.672 | 0.737 | 0.741 | 0.658 | 0.626 | 0.619 | 0.629 | 0.595 | 0.671 | 0.661 | ||
MOSAHit | 0.700 | 0.776 | 0.763 | 0.708 | 0.654 | 0.606 | 0.636 | 0.620 | 0.710 | 0.686 | ||
microRNA | DeepHit | 0.654 | 0.673 | 0.702 | 0.641 | 0.691 | 0.581 | 0.692 | 0.567 | 0.691 | 0.655 | |
gene | DeepHit | 0.710 | 0.739 | 0.725 | 0.732 | 0.684 | 0.583 | 0.663 | 0.643 | 0.711 | 0.688 | |
Federated Transfer | multi-omics | MODCHit | 0.714 | 0.768 | 0.755 | 0.732 | 0.698 | 0.619 | 0.697 | 0.619 | 0.721 | 0.703 |
MOEAHit | 0.667 | 0.708 | 0.687 | 0.642 | 0.677 | 0.577 | 0.673 | 0.593 | 0.677 | 0.656 | ||
MOFBHit | 0.719 | 0.779 | 0.767 | 0.752 | 0.703 | 0.624 | 0.693 | 0.657 | 0.705 | 0.711 | ||
MOSAHit | 0.747 | 0.824 | 0.818 | 0.770 | 0.739 | 0.650 | 0.701 | 0.679 | 0.748 | 0.742 |
Training Mode . | Data . | Method . | BRCA . | KIRC . | BLCA . | COAD . | LUAD . | LUSC . | HNSC . | OV . | LIHC . | Ave . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
microRNA | DeepHit | 0.629 | 0.664 | 0.695 | 0.600 | 0.617 | 0.562 | 0.583 | 0.526 | 0.633 | 0.612 | |
gene | DeepHit | 0.685 | 0.728 | 0.728 | 0.694 | 0.617 | 0.620 | 0.604 | 0.585 | 0.689 | 0.661 | |
Direct Learning | multi-omics | MODCHit | 0.660 | 0.720 | 0.742 | 0.649 | 0.621 | 0.605 | 0.606 | 0.557 | 0.678 | 0.649 |
MOEAHit | 0.639 | 0.707 | 0.689 | 0.617 | 0.601 | 0.583 | 0.615 | 0.550 | 0.672 | 0.630 | ||
MOFBHit | 0.672 | 0.737 | 0.741 | 0.658 | 0.626 | 0.619 | 0.629 | 0.595 | 0.671 | 0.661 | ||
MOSAHit | 0.700 | 0.776 | 0.763 | 0.708 | 0.654 | 0.606 | 0.636 | 0.620 | 0.710 | 0.686 | ||
microRNA | DeepHit | 0.654 | 0.673 | 0.702 | 0.641 | 0.691 | 0.581 | 0.692 | 0.567 | 0.691 | 0.655 | |
gene | DeepHit | 0.710 | 0.739 | 0.725 | 0.732 | 0.684 | 0.583 | 0.663 | 0.643 | 0.711 | 0.688 | |
Federated Transfer | multi-omics | MODCHit | 0.714 | 0.768 | 0.755 | 0.732 | 0.698 | 0.619 | 0.697 | 0.619 | 0.721 | 0.703 |
MOEAHit | 0.667 | 0.708 | 0.687 | 0.642 | 0.677 | 0.577 | 0.673 | 0.593 | 0.677 | 0.656 | ||
MOFBHit | 0.719 | 0.779 | 0.767 | 0.752 | 0.703 | 0.624 | 0.693 | 0.657 | 0.705 | 0.711 | ||
MOSAHit | 0.747 | 0.824 | 0.818 | 0.770 | 0.739 | 0.650 | 0.701 | 0.679 | 0.748 | 0.742 |
We trained MOSAHit method using federated transfer learning with different Gaussian noise levels |$\delta $|, to analyze the impact of privacy protection degree on performance of federated transfer learning method. Following the work [35], we controlled the Gaussian noise level by adjusting |$\rho $|. Figure 3 reported |$C^{td}$|-index values of MOSAHit method trained using federated transfer learning with differential privacy on all target cancers when the adjustable parameter |$\rho $| varies in |$\{0, 0.05, 0.5, 1\}$|. Note that the larger value of |$\rho $| indicates the higher level of privacy protection for patients. From the results listed, it is obvious that the model performance significantly deteriorates when |$\rho $| was set too high (e.g. |$\rho $| = 1). Specifically, the average |$C^{td}$|-index values across all target cancers ranged from 0.626 to 0.694 when different levels of Gaussian random noise are added during federated weight averaging. Clearly, adding high level of Gaussian noise to the model parameters leads to vast quantities of loss of learned knowledge, hindering the model’s ability to learn effectively. From the above analysis, we confirmed that there is indeed a trade-off between model performance and privacy protection in federated transfer learning setting.

|$C^{td}$|-index values of MOSAHit method trained using federated transfer learning with different levels of Gaussian random noise. We can observe that when patient information privacy is highly protected, a substantial amount of useful information is lost, making it difficult for the model to learn effectively.
In addition, we compared the performance of MOSAHit method trained using federated transfer learning with differential privacy in the scenarios with one, two, and three source nodes, to further evaluate the effectiveness of federated transfer learning in multi-omics survival analysis. Figure 4 shows the |$C^{td}$|-index values of MOSAHit method trained using federated transfer learning in the scenarios with different source nodes. From the results, it is obvious that as more source nodes participate in the training process of federated transfer learning, we can integrate the useful knowledge from multi-omics datasets of more related cancers and learn more robust and universal combinatorial feature representations across multiple omics for survival prediction of target cancers. Specifically, MOSAHit method trained using federated transfer learning with three source nodes reaches a superior average |$C^{td}$|-index value of 0.694 across all target cancers, surpassing the results of 0.683 with one source node and 0.687 with two source nodes by 1.6% and 1.0%, respectively. These results again confirm the effectiveness of federated transfer learning with differential privacy in multi-omics survival analysis.

|$C^{td}$|-index values of MOSAHit method trained using federated transfer learning with differential privacy in the scenarios with one, two, and three source nodes. When more source nodes participate in the federated transfer learning training process, we can learn more robust and universal combinatorial feature representations across multiple omics for survival prediction of target cancers.
Furthermore, we analyzed the impact of Gaussian noise on model convergence when training MOSAHit method on multi-institutional data using federated transfer learning with differential privacy. Given |$\rho $|, Gaussian noise level |$\sigma = \rho * \eta $| is mainly determined by |$\eta $|, which represents the standard deviation of the increments in network training parameters at each update of the local model. When |$\rho $| is set to be relatively small, the changes in model parameters during the training process of federated transfer learning gradually decrease, making |$\eta $| gradually reduce. Figure 1 of the supplementary material showed the loss value of MOSAHit method on the training set of target cancers during the training process of federated transfer learning when |$\rho $| is set to 0.05. From the figure, we observed that for all target cancers, the training loss decreases gradually with increasing iterations until convergence.
Performance comparison with existing methods trained using federated transfer learning with differential privacy
MOSAHit method trained using federated transfer learning with differential privacy is further evaluated by comparing with several existing state-of-the-art methods, including traditional method Cox [3], deep learning-based survival prediction methods DeepSurv[6], HFBSurv [28], MDNNMD [36], and SurvCNN [11]. For fair comparison, we similarly trained the parameters of existing methods within a federated transfer learning framework with differential privacy. Figure 5 reported the |$C^{td}$|-index values of the different methods trained using federated transfer learning with differential privacy for all target cancers. From the experimental results, it is obvious that deep learning methods outperform traditional linear approaches when trained using federated transfer learning, showcasing their strong learning capability. For example, the |$C^{td}$|-index of DeepSurv method with gene/microRNA expression data shows an average improvement of 1.4%/1.6% over Cox method across all cancer datasets in federated transfer learning setting.

|$C^{td}$|-index values of different methods trained using federated transfer learning with differential privacy.
Moreover, we observed that multi-omics methods including HFBSurv, MDNNMD, and MOSAHit produce better survival predictions compared with single-omics methods when trained using federated transfer learning with differential privacy. Specifically, the three multi-omics methods obtained better performance than the best-performing single-omics method on the 7 of 10 TCGA cancer datasets in federated transfer learning setting. Meanwhile, it is of note that multi-omcis method SurvCNN when trained using federated transfer learning does not perform well. SurvCNN method converts gene expression and microRNA expression data into image formats, and trains convolutional neural networks (CNNs) to extract features for survival prediction. CNNs, while significantly reducing the number of model parameters, may limit the learning of complex feature relationships, especially in federated transfer learning settings. More importantly, when training model parameters within a federated transfer learning framework with differential privacy, our proposed method MOSAHit achieved the best performance among all the methods. Especially for KIRC, COAD, LUSC, ESCA, and OV, MOSAHit method can outperform the second-best method by a large margin in federated transfer learning settings. These results clearly illustrate the significant superiority of MOSAHit method, trained using federated transfer learning with differential privacy, in addressing data silos and privacy protection issues in multi-omics survival analysis.
External validation of MOSAHit method trained using federated transfer learning with differential privacy
We performed external validation on the independent dataset TARGET-WT, which is not included in TCGA database, to further demonstrate the effectiveness of MOSAHit method trained using federated transfer learning with differential privacy. We reported |$C^{td}$|-index values of DeepHit (gene/microRNA), MODCHit, MOEAHit, MOFBHit, and MOSAHit methods on TARGET-WT dataset in the contexts of federated transfer learning with differential privacy and direct learning using only target cancer data, as shown in Fig. 6(a). From the experimental results, we could see that federated transfer learning can help improve the performance on TARGET-WT dataset of the target node by leveraging the useful knowledge from TCGA cancer datasets distributed across multiple source nodes. For example, the average |$C^{td}$|-index value of different methods trained using federated transfer learning is 0.631, with an improvement of 3.4% compared with the corresponding |$C^{td}$|-index value of 0.610 from direct learning. Moreover, it is of note that MOSAHit method outperforms DeepHit (gene/microRNA), MODCHit, MOEAHit, and MOFBHit methods when trained using federated transfer learning with differential privacy.

(a) Performance comparison of DeepHit (gene/microRNA), MODCHit, MOEAHit, MOFBHit, and MOSAHit methods on TARGET-WT dataset in the contexts of federated transfer learning with differential privacy and direct learning using only target cancer data. (b) Performance comparison of MOSAHit method and the existing methods on TARGET-WT dataset when trained using federated transfer learning with differential privacy.
Meanwhile, we also compared the performance of MOSAHit method and the existing methods on TARGET-WT dataset in the setting of federated transfer learning with differential privacy. The experimental results presented in Fig. 6(b) indicate that MOSAHit method trained using federated transfer learning achieved the best performance among all methods. Specifically, the |$C^{td}$|-index value for MOSAHit method trained using federated transfer learning is 0.665, outperforming the second-best method, MDNNMD, with a |$C^{td}$|-index value of 0.656, by |$\sim $|1.5%. Taken together, these results fully demonstrate that federated transfer learning with differential privacy for MOSAHit can help develop a more generalized multi-omics survival prediction model for the local TARGET-WT dataset with limited training data, by leveraging information from the abundant multi-omics data of TCGA datasets distributed across multiple institutionss while preserving individual privacy.
MOSAHit method trained using federated transfer learning with differential privacy on different types of multi-omics data
MOSAHit method, trained using federated transfer learning with differential privacy, is further assessed on multi-omics data, including gene expression and DNA methylation, or microRNA expression and DNA methylation. Similarly, for DNA methylation data, we filtered out the noise-sensitive variables with low variance across samples and retained the top 10 000 features for further study. Note that OV cancer type has been removed due to very few training samples for DNA methylation in OV cancer within TCGA database. Thus, we performed MOSAHit method on different types of multi-omics data from the nine target cancers in the contexts of federated transfer learning with differential privacy and direct learning using only the target cancer data. We reported the results in Tables 4 and 5.
|$C^{td}$|-index values of different approaches with gene expression and DNA methylation data in the contexts of federated transfer learning with differential privacy and direct learning using only target cancer data. The last column displays the average rankings of different methods trained using federated transfer learning and direct learning for the nine target cancers. Note that the top two results are emphasized in bold and underlined, respectively
Training mode . | Data . | Method . | BRCA . | KIRC . | BLCA . | COAD . | LUAD . | LUSC . | HNSC . | ESCA . | LIHC . | Avg.ranking . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
DNA methylation | DeepHit | 0.631 | 0.638 | 0.603 | 0.548 | 0.666 | 0.567 | 0.589 | 0.650 | 0.618 | 9.7 | |
gene | DeepHit | 0.692 | 0.699 | 0.648 | 0.667 | 0.611 | 0.562 | 0.618 | 0.587 | 0.672 | 8.2 | |
Direct Learning | multi-omics | MODCHit | 0.650 | 0.703 | 0.644 | 0.625 | 0.631 | 0.556 | 0.615 | 0.599 | 0.657 | 8.9 |
MOEAHit | 0.618 | 0.685 | 0.609 | 0.620 | 0.620 | 0.575 | 0.609 | 0.590 | 0.643 | 10.1 | ||
MOFBHit | 0.670 | 0.702 | 0.646 | 0.640 | 0.644 | 0.589 | 0.620 | 0.608 | 0.666 | 6.8 | ||
MOSAHit | 0.736 | 0.771 | 0.685 | 0.686 | 0.689 | 0.578 | 0.623 | 0.609 | 0.673 | 3.7 | ||
DNA methylation | DeepHit | 0.676 | 0.695 | 0.615 | 0.629 | 0.670 | 0.567 | 0.628 | 0.667 | 0.633 | 7.0 | |
gene | DeepHit | 0.719 | 0.762 | 0.672 | 0.717 | 0.641 | 0.590 | 0.654 | 0.628 | 0.676 | 3.6 | |
Federated Transfer | multi-omics | MODCHit | 0.693 | 0.765 | 0.650 | 0.681 | 0.653 | 0.576 | 0.646 | 0.640 | 0.657 | 5.3 |
MOEAHit | 0.665 | 0.682 | 0.620 | 0.612 | 0.628 | 0.567 | 0.608 | 0.614 | 0.638 | 9.3 | ||
MOFBHit | 0.705 | 0.762 | 0.652 | 0.686 | 0.663 | 0.586 | 0.658 | 0.655 | 0.674 | 3.4 | ||
MOSAHit | 0.765 | 0.804 | 0.701 | 0.731 | 0.687 | 0.603 | 0.654 | 0.689 | 0.680 | 1.2 |
Training mode . | Data . | Method . | BRCA . | KIRC . | BLCA . | COAD . | LUAD . | LUSC . | HNSC . | ESCA . | LIHC . | Avg.ranking . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
DNA methylation | DeepHit | 0.631 | 0.638 | 0.603 | 0.548 | 0.666 | 0.567 | 0.589 | 0.650 | 0.618 | 9.7 | |
gene | DeepHit | 0.692 | 0.699 | 0.648 | 0.667 | 0.611 | 0.562 | 0.618 | 0.587 | 0.672 | 8.2 | |
Direct Learning | multi-omics | MODCHit | 0.650 | 0.703 | 0.644 | 0.625 | 0.631 | 0.556 | 0.615 | 0.599 | 0.657 | 8.9 |
MOEAHit | 0.618 | 0.685 | 0.609 | 0.620 | 0.620 | 0.575 | 0.609 | 0.590 | 0.643 | 10.1 | ||
MOFBHit | 0.670 | 0.702 | 0.646 | 0.640 | 0.644 | 0.589 | 0.620 | 0.608 | 0.666 | 6.8 | ||
MOSAHit | 0.736 | 0.771 | 0.685 | 0.686 | 0.689 | 0.578 | 0.623 | 0.609 | 0.673 | 3.7 | ||
DNA methylation | DeepHit | 0.676 | 0.695 | 0.615 | 0.629 | 0.670 | 0.567 | 0.628 | 0.667 | 0.633 | 7.0 | |
gene | DeepHit | 0.719 | 0.762 | 0.672 | 0.717 | 0.641 | 0.590 | 0.654 | 0.628 | 0.676 | 3.6 | |
Federated Transfer | multi-omics | MODCHit | 0.693 | 0.765 | 0.650 | 0.681 | 0.653 | 0.576 | 0.646 | 0.640 | 0.657 | 5.3 |
MOEAHit | 0.665 | 0.682 | 0.620 | 0.612 | 0.628 | 0.567 | 0.608 | 0.614 | 0.638 | 9.3 | ||
MOFBHit | 0.705 | 0.762 | 0.652 | 0.686 | 0.663 | 0.586 | 0.658 | 0.655 | 0.674 | 3.4 | ||
MOSAHit | 0.765 | 0.804 | 0.701 | 0.731 | 0.687 | 0.603 | 0.654 | 0.689 | 0.680 | 1.2 |
|$C^{td}$|-index values of different approaches with gene expression and DNA methylation data in the contexts of federated transfer learning with differential privacy and direct learning using only target cancer data. The last column displays the average rankings of different methods trained using federated transfer learning and direct learning for the nine target cancers. Note that the top two results are emphasized in bold and underlined, respectively
Training mode . | Data . | Method . | BRCA . | KIRC . | BLCA . | COAD . | LUAD . | LUSC . | HNSC . | ESCA . | LIHC . | Avg.ranking . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
DNA methylation | DeepHit | 0.631 | 0.638 | 0.603 | 0.548 | 0.666 | 0.567 | 0.589 | 0.650 | 0.618 | 9.7 | |
gene | DeepHit | 0.692 | 0.699 | 0.648 | 0.667 | 0.611 | 0.562 | 0.618 | 0.587 | 0.672 | 8.2 | |
Direct Learning | multi-omics | MODCHit | 0.650 | 0.703 | 0.644 | 0.625 | 0.631 | 0.556 | 0.615 | 0.599 | 0.657 | 8.9 |
MOEAHit | 0.618 | 0.685 | 0.609 | 0.620 | 0.620 | 0.575 | 0.609 | 0.590 | 0.643 | 10.1 | ||
MOFBHit | 0.670 | 0.702 | 0.646 | 0.640 | 0.644 | 0.589 | 0.620 | 0.608 | 0.666 | 6.8 | ||
MOSAHit | 0.736 | 0.771 | 0.685 | 0.686 | 0.689 | 0.578 | 0.623 | 0.609 | 0.673 | 3.7 | ||
DNA methylation | DeepHit | 0.676 | 0.695 | 0.615 | 0.629 | 0.670 | 0.567 | 0.628 | 0.667 | 0.633 | 7.0 | |
gene | DeepHit | 0.719 | 0.762 | 0.672 | 0.717 | 0.641 | 0.590 | 0.654 | 0.628 | 0.676 | 3.6 | |
Federated Transfer | multi-omics | MODCHit | 0.693 | 0.765 | 0.650 | 0.681 | 0.653 | 0.576 | 0.646 | 0.640 | 0.657 | 5.3 |
MOEAHit | 0.665 | 0.682 | 0.620 | 0.612 | 0.628 | 0.567 | 0.608 | 0.614 | 0.638 | 9.3 | ||
MOFBHit | 0.705 | 0.762 | 0.652 | 0.686 | 0.663 | 0.586 | 0.658 | 0.655 | 0.674 | 3.4 | ||
MOSAHit | 0.765 | 0.804 | 0.701 | 0.731 | 0.687 | 0.603 | 0.654 | 0.689 | 0.680 | 1.2 |
Training mode . | Data . | Method . | BRCA . | KIRC . | BLCA . | COAD . | LUAD . | LUSC . | HNSC . | ESCA . | LIHC . | Avg.ranking . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
DNA methylation | DeepHit | 0.631 | 0.638 | 0.603 | 0.548 | 0.666 | 0.567 | 0.589 | 0.650 | 0.618 | 9.7 | |
gene | DeepHit | 0.692 | 0.699 | 0.648 | 0.667 | 0.611 | 0.562 | 0.618 | 0.587 | 0.672 | 8.2 | |
Direct Learning | multi-omics | MODCHit | 0.650 | 0.703 | 0.644 | 0.625 | 0.631 | 0.556 | 0.615 | 0.599 | 0.657 | 8.9 |
MOEAHit | 0.618 | 0.685 | 0.609 | 0.620 | 0.620 | 0.575 | 0.609 | 0.590 | 0.643 | 10.1 | ||
MOFBHit | 0.670 | 0.702 | 0.646 | 0.640 | 0.644 | 0.589 | 0.620 | 0.608 | 0.666 | 6.8 | ||
MOSAHit | 0.736 | 0.771 | 0.685 | 0.686 | 0.689 | 0.578 | 0.623 | 0.609 | 0.673 | 3.7 | ||
DNA methylation | DeepHit | 0.676 | 0.695 | 0.615 | 0.629 | 0.670 | 0.567 | 0.628 | 0.667 | 0.633 | 7.0 | |
gene | DeepHit | 0.719 | 0.762 | 0.672 | 0.717 | 0.641 | 0.590 | 0.654 | 0.628 | 0.676 | 3.6 | |
Federated Transfer | multi-omics | MODCHit | 0.693 | 0.765 | 0.650 | 0.681 | 0.653 | 0.576 | 0.646 | 0.640 | 0.657 | 5.3 |
MOEAHit | 0.665 | 0.682 | 0.620 | 0.612 | 0.628 | 0.567 | 0.608 | 0.614 | 0.638 | 9.3 | ||
MOFBHit | 0.705 | 0.762 | 0.652 | 0.686 | 0.663 | 0.586 | 0.658 | 0.655 | 0.674 | 3.4 | ||
MOSAHit | 0.765 | 0.804 | 0.701 | 0.731 | 0.687 | 0.603 | 0.654 | 0.689 | 0.680 | 1.2 |
|$C^{td}$|-index values of different approaches with microRNA expression and DNA methylation data in the contexts of federated transfer learning with differential privacy and direct learning using only target cancer data. The last column displays the average rankings of different methods trained using federated transfer learning and direct learning for the nine target cancers. Note that the top two results are emphasized in bold and underlined, respectively
Training mode . | Data . | Method . | BRCA . | KIRC . | BLCA . | COAD . | LUAD . | LUSC . | HNSC . | ESCA . | LIHC . | Avg.ranking . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
microRNA | DeepHit | 0.637 | 0.647 | 0.617 | 0.627 | 0.602 | 0.559 | 0.607 | 0.666 | 0.641 | 7.8 | |
DNA methylation | DeepHit | 0.638 | 0.627 | 0.602 | 0.558 | 0.646 | 0.556 | 0.602 | 0.624 | 0.603 | 9.9 | |
Direct Learning | multi-omics | MODCHit | 0.630 | 0.655 | 0.599 | 0.584 | 0.620 | 0.541 | 0.613 | 0.645 | 0.620 | 9.8 |
MOEAHit | 0.636 | 0.622 | 0.593 | 0.599 | 0.610 | 0.558 | 0.590 | 0.641 | 0.600 | 10.7 | ||
MOFBHit | 0.660 | 0.646 | 0.598 | 0.604 | 0.629 | 0.572 | 0.617 | 0.676 | 0.624 | 7.7 | ||
MOSAHit | 0.693 | 0.704 | 0.641 | 0.598 | 0.681 | 0.556 | 0.623 | 0.649 | 0.635 | 5.4 | ||
microRNA | DeepHit | 0.653 | 0.650 | 0.613 | 0.653 | 0.628 | 0.587 | 0.632 | 0.639 | 0.641 | 6.7 | |
DNA methylation | DeepHit | 0.667 | 0.695 | 0.606 | 0.613 | 0.671 | 0.585 | 0.616 | 0.659 | 0.628 | 6.3 | |
Federated Transfer | multi-omics | MODCHit | 0.676 | 0.682 | 0.618 | 0.657 | 0.641 | 0.591 | 0.639 | 0.676 | 0.654 | 3.3 |
MOEAHit | 0.670 | 0.651 | 0.606 | 0.633 | 0.610 | 0.589 | 0.635 | 0.666 | 0.647 | 5.4 | ||
MOFBHit | 0.704 | 0.701 | 0.636 | 0.648 | 0.645 | 0.597 | 0.645 | 0.686 | 0.646 | 2.6 | ||
MOSAHit | 0.699 | 0.731 | 0.656 | 0.657 | 0.673 | 0.600 | 0.645 | 0.680 | 0.645 | 1.7 |
Training mode . | Data . | Method . | BRCA . | KIRC . | BLCA . | COAD . | LUAD . | LUSC . | HNSC . | ESCA . | LIHC . | Avg.ranking . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
microRNA | DeepHit | 0.637 | 0.647 | 0.617 | 0.627 | 0.602 | 0.559 | 0.607 | 0.666 | 0.641 | 7.8 | |
DNA methylation | DeepHit | 0.638 | 0.627 | 0.602 | 0.558 | 0.646 | 0.556 | 0.602 | 0.624 | 0.603 | 9.9 | |
Direct Learning | multi-omics | MODCHit | 0.630 | 0.655 | 0.599 | 0.584 | 0.620 | 0.541 | 0.613 | 0.645 | 0.620 | 9.8 |
MOEAHit | 0.636 | 0.622 | 0.593 | 0.599 | 0.610 | 0.558 | 0.590 | 0.641 | 0.600 | 10.7 | ||
MOFBHit | 0.660 | 0.646 | 0.598 | 0.604 | 0.629 | 0.572 | 0.617 | 0.676 | 0.624 | 7.7 | ||
MOSAHit | 0.693 | 0.704 | 0.641 | 0.598 | 0.681 | 0.556 | 0.623 | 0.649 | 0.635 | 5.4 | ||
microRNA | DeepHit | 0.653 | 0.650 | 0.613 | 0.653 | 0.628 | 0.587 | 0.632 | 0.639 | 0.641 | 6.7 | |
DNA methylation | DeepHit | 0.667 | 0.695 | 0.606 | 0.613 | 0.671 | 0.585 | 0.616 | 0.659 | 0.628 | 6.3 | |
Federated Transfer | multi-omics | MODCHit | 0.676 | 0.682 | 0.618 | 0.657 | 0.641 | 0.591 | 0.639 | 0.676 | 0.654 | 3.3 |
MOEAHit | 0.670 | 0.651 | 0.606 | 0.633 | 0.610 | 0.589 | 0.635 | 0.666 | 0.647 | 5.4 | ||
MOFBHit | 0.704 | 0.701 | 0.636 | 0.648 | 0.645 | 0.597 | 0.645 | 0.686 | 0.646 | 2.6 | ||
MOSAHit | 0.699 | 0.731 | 0.656 | 0.657 | 0.673 | 0.600 | 0.645 | 0.680 | 0.645 | 1.7 |
|$C^{td}$|-index values of different approaches with microRNA expression and DNA methylation data in the contexts of federated transfer learning with differential privacy and direct learning using only target cancer data. The last column displays the average rankings of different methods trained using federated transfer learning and direct learning for the nine target cancers. Note that the top two results are emphasized in bold and underlined, respectively
Training mode . | Data . | Method . | BRCA . | KIRC . | BLCA . | COAD . | LUAD . | LUSC . | HNSC . | ESCA . | LIHC . | Avg.ranking . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
microRNA | DeepHit | 0.637 | 0.647 | 0.617 | 0.627 | 0.602 | 0.559 | 0.607 | 0.666 | 0.641 | 7.8 | |
DNA methylation | DeepHit | 0.638 | 0.627 | 0.602 | 0.558 | 0.646 | 0.556 | 0.602 | 0.624 | 0.603 | 9.9 | |
Direct Learning | multi-omics | MODCHit | 0.630 | 0.655 | 0.599 | 0.584 | 0.620 | 0.541 | 0.613 | 0.645 | 0.620 | 9.8 |
MOEAHit | 0.636 | 0.622 | 0.593 | 0.599 | 0.610 | 0.558 | 0.590 | 0.641 | 0.600 | 10.7 | ||
MOFBHit | 0.660 | 0.646 | 0.598 | 0.604 | 0.629 | 0.572 | 0.617 | 0.676 | 0.624 | 7.7 | ||
MOSAHit | 0.693 | 0.704 | 0.641 | 0.598 | 0.681 | 0.556 | 0.623 | 0.649 | 0.635 | 5.4 | ||
microRNA | DeepHit | 0.653 | 0.650 | 0.613 | 0.653 | 0.628 | 0.587 | 0.632 | 0.639 | 0.641 | 6.7 | |
DNA methylation | DeepHit | 0.667 | 0.695 | 0.606 | 0.613 | 0.671 | 0.585 | 0.616 | 0.659 | 0.628 | 6.3 | |
Federated Transfer | multi-omics | MODCHit | 0.676 | 0.682 | 0.618 | 0.657 | 0.641 | 0.591 | 0.639 | 0.676 | 0.654 | 3.3 |
MOEAHit | 0.670 | 0.651 | 0.606 | 0.633 | 0.610 | 0.589 | 0.635 | 0.666 | 0.647 | 5.4 | ||
MOFBHit | 0.704 | 0.701 | 0.636 | 0.648 | 0.645 | 0.597 | 0.645 | 0.686 | 0.646 | 2.6 | ||
MOSAHit | 0.699 | 0.731 | 0.656 | 0.657 | 0.673 | 0.600 | 0.645 | 0.680 | 0.645 | 1.7 |
Training mode . | Data . | Method . | BRCA . | KIRC . | BLCA . | COAD . | LUAD . | LUSC . | HNSC . | ESCA . | LIHC . | Avg.ranking . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
microRNA | DeepHit | 0.637 | 0.647 | 0.617 | 0.627 | 0.602 | 0.559 | 0.607 | 0.666 | 0.641 | 7.8 | |
DNA methylation | DeepHit | 0.638 | 0.627 | 0.602 | 0.558 | 0.646 | 0.556 | 0.602 | 0.624 | 0.603 | 9.9 | |
Direct Learning | multi-omics | MODCHit | 0.630 | 0.655 | 0.599 | 0.584 | 0.620 | 0.541 | 0.613 | 0.645 | 0.620 | 9.8 |
MOEAHit | 0.636 | 0.622 | 0.593 | 0.599 | 0.610 | 0.558 | 0.590 | 0.641 | 0.600 | 10.7 | ||
MOFBHit | 0.660 | 0.646 | 0.598 | 0.604 | 0.629 | 0.572 | 0.617 | 0.676 | 0.624 | 7.7 | ||
MOSAHit | 0.693 | 0.704 | 0.641 | 0.598 | 0.681 | 0.556 | 0.623 | 0.649 | 0.635 | 5.4 | ||
microRNA | DeepHit | 0.653 | 0.650 | 0.613 | 0.653 | 0.628 | 0.587 | 0.632 | 0.639 | 0.641 | 6.7 | |
DNA methylation | DeepHit | 0.667 | 0.695 | 0.606 | 0.613 | 0.671 | 0.585 | 0.616 | 0.659 | 0.628 | 6.3 | |
Federated Transfer | multi-omics | MODCHit | 0.676 | 0.682 | 0.618 | 0.657 | 0.641 | 0.591 | 0.639 | 0.676 | 0.654 | 3.3 |
MOEAHit | 0.670 | 0.651 | 0.606 | 0.633 | 0.610 | 0.589 | 0.635 | 0.666 | 0.647 | 5.4 | ||
MOFBHit | 0.704 | 0.701 | 0.636 | 0.648 | 0.645 | 0.597 | 0.645 | 0.686 | 0.646 | 2.6 | ||
MOSAHit | 0.699 | 0.731 | 0.656 | 0.657 | 0.673 | 0.600 | 0.645 | 0.680 | 0.645 | 1.7 |
It is evident from these results that federated transfer learning can effectively leverage the useful knowledge from related cancer datasets distributed across various source nodes to help improve survival prediction of the target cancer in most situations. For example, MOSAHit method trained using federated transfer learning can learn more robust and universal combinatorial feature representations across multiple omics for survival prediction of target cancers, showing significant improvement over direct learning using only the target cancer data. Moreover, we could see that MOSAHit method, trained using federated transfer learning, shows superior performance across various multi-omics data types compared with other federated transfer learning methods. In addition, the last column in Tables 4 and 5 displays the average rankings of |$C^{td}$|-index for different methods trained using federated transfer learning and direct learning, respectively, over the nine target cancer datasets. It is clear that MOSAHit method, trained using federated transfer learning, presents the lowest average ranking. These results fully demonstrate that our proposed method can enable the learning of a more robust multi-omics survival prediction model for a local target cancer with limited training data by effectively leveraging information from the abundant multi-omics data of related cancers distributed across multiple institutions while preserving individual privacy.
Conclusion
Over the past several years, deep learning algorithms for multi-omics data have received considerable attention as a decision support system to assist clinicians in cancer diagnosis and treatment. Although multi-omics data can help us explore inter-patient dramatic discrepancy in molecular to better understand cancer heterogeneity and complexity, it suffers from the “big |$p$|, small |$n^{\prime}$|’ problem, which makes the integration of multi-omics data for survival prediction of specific cancers particularly challenging. Thus, we leverage the information from the abundant data of related cancers to aid the model’s learning on specific cancer. We cannot aggregate the related cancer multi-omics data from multiple institutions into a centralized data repository due to data privacy and security. Federated learning provides a method for learning from distributed data across various institutions, without sharing sensitive patient data.
In this work, we propose a multi-omics survival prediction model with self-attention mechanism (MOSAHit), trained within a federated transfer learning framework with differential privacy. This approach enables the development of a more robust multi-omics survival prediction model for a local target cancer with limited training data by effectively leveraging related cancer multi-omics data distributed across multiple institutions while preserving individual privacy. Specifically, federated transfer learning for MOSAHit captures common features across different cancers distributed among multiple institutions through the parameter sharing mechanism, to help learn more robust and universal combinatorial feature representations across multiple omics for survival prediction of local target cancers. Results from the comprehensive experiments on TCGA datasets and TARGET-WT dataset show that the proposed method effectively alleviates data insufficiency and significantly improves the generalization performance of multi-omics survival prediction models for a target cancer, without directly sharing related cancer multi-omics data from multiple institutions. Recently, computational pathology has achieved impressive results on many clinically relevant tasks, and future research can explore how to effectively integrate multi-omics and pathology data in a distributed environment to improve patient survival predictions without exposing patients’ sensitive information.
We propose a multi-omics survival prediction model with self-attention mechanism (MOSAHit), trained within a federated transfer learning framework with differential privacy. This approach enables the development of a more robust multi-omics survival prediction model for a local target cancer with limited training data by effectively leveraging relevant cancer multi-omics data distributed across multiple institutions while preserving individual privacy.
Federated transfer learning for MOSAHit captures common features across different cancers distributed among multiple institutions through the parameter sharing mechanism, to help learn more generalized and universal combinatorial feature representations across multiple omics for survival prediction of local target cancers.
We experimentally validated MOSAHit method, trained using federated transfer learning with differential privacy, on TARGET-WT dataset and 10 cancer datasets from TCGA database. Results from the comprehensive experiments on multiple real-world datasets show that the proposed method effectively alleviates data insufficiency and significantly improves the generalization performance of multi-omics survival prediction models for a target cancer while avoiding the direct sharing of multi-omics data for related cancers.
Competing interests
No competing interest is declared.
Funding
This work was funded by National Natural Science Foundation of China projects under grant nos. 12222115 and 92470106.
Data availability
All data used in this manuscript are publicly available. The TCGA dataset underlying this article can be accessed from https://www.cancer.gov/ccg/research/genome-sequencing/tcga. The source codes are available at https://github.com/LiminLi-xjtu/Federated-Transfer-MOSAHit.