Abstract

Multi-omics data often suffer from the “big |$p$|⁠, small |$n$|” problem where the dimensionality of features is significantly larger than the sample size, making the integration of multi-omics data for survival analysis of a specific cancer particularly challenging. One common strategy is to share multi-omics data from other related cancers across multiple institutions and leverage the abundant data from these cancers to enhance survival predictions for the target cancer. However, due to data privacy and data-sharing regulations, it is challenging to aggregate multi-omics data of related cancers from multiple institutions into a centralized database to learn more accurate and robust models for the target cancer. To address the limitation, we propose a multi-omics survival prediction model with self-attention mechanism (MOSAHit), trained within a federated transfer learning framework with differential privacy. This approach enables the learning of a more robust multi-omics survival prediction model for a local target cancer with limited training data by effectively leveraging multi-omics data of related cancers distributed across multiple institutions while preserving individual privacy. Results from the comprehensive experiments on real-world datasets show that the proposed method effectively alleviates data insufficiency and significantly improves the generalization performance of multi-omics survival prediction model for a target cancer while avoiding the direct sharing of multi-omics data for related cancers.

Introduction

Survival analysis, which aims to understand the relationship between the covariates and the survival time, has been widely used in various fields such as economics, finance, engineering, and medicine. In contrast to regression analysis, the main challenge of survival analysis comes from the fact that the time-to-event data often exhibit significant skewness and censoring [1]. In clinical practice, historically, survival analysis primarily relies on low-dimensional patient features, such as age, gender, and tumor T/N/M stage [2]. Advancements in high-throughput sequencing technology have led to the availability of large volumes of high-dimensional omics data, which are increasingly used to predict survival and improve treatment in cancer patients. Despite the potential of high-throughput omics data to improve disease prognosis understanding, its high-dimensional characteristics present new difficulty for survival analysis, particularly in multi-omics datasets with limited training data. An effective strategy is to share patient multi-omics data of related cancer across multiple institutions and leverage the abundant data of related cancers to improve survival predictions for target cancers. However, data privacy and data-sharing regulations pose both challenges and opportunities in aggregating multi-omics datasets of related cancers from various institutions into a centralized repository to develop more generalizable models for the target cancer.

One common method for survival analysis is the Cox proportional hazards (Cox-PH) model [3]. Cox-PH model assumes that the relative proportional hazards between individuals remain constant over time and has become the most popular method. Deep learning has excelled in various tasks [4, 5] and is widely applied in survival analysis [6, 7]. For example, Deepsurv [6] combines the deep neural network with Cox regression model to predict survival for breast cancer patients and outperforms traditional linear and nonlinear machine learning methods. Compared with Deepsurv, DeepHit [7] uses a deep neural network to instead directly model the distribution of survival time. DeepHit avoids the problem of assuming a specific form for the underlying stochastic process and can capture the time-dependent influence of covariates on survival.

Advancements in high-throughput sequencing technology make large volumes of omics data available for survival prediction. Survival analysis methods based on omics data can be categorized to single-omics methods [8, 9] and multi-omics methods[10–12]. Single-omics methods refer to conducting survival prediction with a single type of biological data, which can reveal the impact of a specific biological feature on survival outcomes. For example, Cox-nnet [8] and CoxPASNet [9] use the multilayer networks to map gene expression data into a low-dimensional space for survival prediction. By scoring the features, Cox-nnet can identify the relative importance of specific genes associated with patient survival and reveal useful information regarding biological functions related to prognosis. In contrast to Cox-nnet, CoxPASNet provides explicit model interpretation and can capture the nonlinear and hierarchical mechanisms of biological pathways and identify important prognostic factors associated with patient survival. Multi-omics methods offer a more comprehensive understanding of cancer heterogeneity and complexity than single-omics methods by capturing complementary information from multiple omics and have shown superior potential for survival prediction. For example, GraphSurv [12] introduces the gene relationship in KEGG pathway as interpretable graph constraint and compresses the multi-omics data including gene expression, copy number variation, and DNA methylation by GCN module to obtain more informative embeddings for survival prediction.

High-throughput omics data often suffer from the “big |$p$|⁠, small |$n$|” problem that the dimensionality of features is significantly larger than the sample size, and deep learning methods are prone to overfitting in this situation. An effective strategy is to share multi-omics data from other related cancers across multiple institutions and leverage the information from the abundant data of related cancers to improve survival predictions for target cancers. The work [13] proposes Transfer-Cox method, which transfers useful knowledge from the source domain to the target domain by learning shared representations across source and target domains, thus potentially improving survival prediction on target cancer datasets. Different from Transfer-Cox method, Kim et al. [14] put forward a transfer learning model, called VAECox, where there is no information about the survival time of patients in the source domain. VAECox first trains an unsupervised VAE model on gene expression data from multiple cancers, then initializes the survival prediction model weights for the target cancers with the VAE model parameters. Compared with knowledge transfer methods[13, 14] based on single-omics data, leveraging multi-omics data from related cancers to help improve survival predictions of specific cancers is more challenging. Cho et al. [15] propose to learn prior knowledge across various training tasks of related cancers by meta-learning strategy, to enhance the performance of multi-omics survival analysis for the target cancer.

In all machine learning methods, increasing exposure to data diversity by aggregating sample data from multiple institutions into a centralized database can help develop more accurate and robust models. But data centralization can be challenging that data owners cannot share data across multiple institutions due to privacy concerns and incompatible data-sharing agreements. To mitigate these challenges, federated learning [16–18] offers means for algorithms to learn from distributed data of various institutions, without exposing sensitive data of patients beyond the security of institutional firewalls. Currently, there are two main modes to implement federated learning: master-server and peer-to-peer. In master-server mode, each node (participating institution) trains model parameters only on local data; the master server aggregates model parameters from these different nodes and then sends back the updated model parameters to local node for the next iteration. In peer-to-peer mode, each node transfers the locally trained parameters to all nodes for the next federated round. Although the nodes never share patient data, leaks or attacks on model specifics may indirectly expose sensitive information of patients. This happens because parts of the training data can be reconstructed by gradients [19, 20] or inversion of model parameters [21, 22]. A popular strategy to address this problem is differential privacy [23]. Differential privacy reduces the individually identifiable information while preserving the global distribution of the data by adding certain levels of noise to the model parameters.

Recently, federated learning has already appeared in survival analysis to solve the problems of data silos and privacy protection in this field. Andreux et al. [24] argue that traditional Cox-PH loss, non-separable with respect to the samples, does not fit into federated learning framework and propose federated survival analysis with discrete-time cox model. Compared with Cox-PH model, discrete-time Cox model is more efficient and more amenable in federated learning setting. Lu et al. [25] introduce privacy-preserving federated learning for survival analysis based on pathological image data and develop accurate weakly supervised survival prediction models from distributed data silos without direct data sharing. In comparison with the horizontal federated learning methods [24, 25], Wang et al. [26] propose an adaptive vertical federated learning framework (AFEI), which assumes that each data site holds different omics-features of the same set of samples and integrates the distributed multi-omics features shared across multiple institutions by data encryption technique for cancer prognosis prediction. Although federated learning method has achieved some success in survival analysis, it remains challenging to effectively leverage multi-omics data from related cancers distributed across multiple institutions to develop more accurate and robust models for specific cancers with limited training data.

In this work, we propose a multi-omics survival prediction model with self-attention mechanism (MOSAHit), trained within a federated transfer learning framework with differential privacy. This approach enables the development of a more robust multi-omics survival prediction model for a local target cancer with limited training data by effectively leveraging related cancer multi-omics data distributed across multiple institutions while preserving individual privacy. Specifically, federated transfer learning for MOSAHit captures common features across different cancers distributed among multiple institutions through the parameter sharing mechanism, to help learn more robust and universal combinatorial feature representations across multiple omics for survival prediction of local target cancers. Results from the comprehensive experiments on multiple real-world datasets show that the proposed method can effectively alleviate data insufficiency and significantly improve generalization performance of multi-omics survival prediction model on target cancer while avoiding the direct sharing of multi-omics data from related cancers across multiple institutions.

Method: Federated transfer learning with differential privacy for multi-omics survival analysis

Multi-omics features |$X = \{X^{(1)},\cdots ,X^{(v)}\}$| and clinical data (observed time |$O$| and censoring status |$\Delta $|⁠) of patients constitute multi-omics survival data, where |$v$| denotes the number of omics. In general, we assume that the patient’s actual survival time |$T$| is independent of the censoring time |$C$|⁠. |$\Delta _{i}=1$| indicates that the observed time |$O_{i}$| of patient |$i$| is equal to its actual survival time |$T_{i}$|⁠, and the death event occurs before the censoring event. Conversely, |$\Delta _{i}=0$| signifies that the observed time |$O_{i}$| corresponds to the censoring time |$C_{i}$|⁠.

Multi-omics data often suffer from the “big |$p$|⁠, small |$n$|” problem, making the integration of multi-omics data for survival analysis of specific cancers particularly challenging. One common strategy is to transfer knowledge from the plentiful multi-omics data of related cancers across multiple institutions to improve survival predictions of target cancers. Due to privacy concerns, institutional policies, or incompatible data-sharing agreements, one often has difficulty aggregating related cancer multi-omics data from multiple institutions into a centralized database to develop more generalizable models for the target cancer. To solve the problems of data silos and privacy protection in multi-omics survival analysis, we propose a multi-omics survival prediction model with self-attention mechanism (MOSAHit), trained within a federated transfer learning framework with differential privacy. This approach leverages knowledge from the abundant multi-omics data of related cancers distributed across multiple institutions through the parameter sharing mechanism, to help learn more robust and universal combinatorial feature representations across multiple omics for survival prediction of local target cancer with limited training data while preserving individual privacy. In this section, we first introduce survival analysis method with multi-head self-attention mechanism (MOSAHit), and then describe how to train the model parameters using federated transfer learning with differential privacy.

Multi-omics survival analysis model with self-attention mechanism

Integrating multi-omics data offers valuable insights into the heterogeneity and complexity of cancer, demonstrating promising potential for improving the prediction of survival outcomes. In this section, we propose a novel multi-omics survival analysis method, namely MOSAHit. The architecture of our proposed MOSAHit method is presented in Fig. 1. For preprocessed gene expression data |$X^{(1)}$| and microRNA expression data |$X^{(2)}$|⁠, we learn two three-layered fully connected networks to map high-dimensional features |$X^{(1)}$| and |$X^{(2)}$| into the same embedding space for alleviating differences in statistical properties across different omics. Specifically, each fully connected layer contains 1000, 500, and 500 neurons, respectively, with the feature embeddings for gene expression and microRNA expression data denoted as |$Z^{(1)}=[z^{(1)}_{1},\cdots ,z^{(1)}_{n}]^{T}$| and |$Z^{(2)}=[z^{(2)}_{1},\cdots ,z^{(2)}_{n}]^{T}$|⁠. Learning feature interactions is a fundamental problem in multi-omics data integration. A very popular approach is Factorized Bilinear Machine [27], which has shown its superiority in cancer survival prediction by capturing and quantifying the complex relationships across multimodal data [28]. In this study, we learn interaction features among multiple omics for survival prediction by multi-head self-attention mechanism [29]. Self-attention module can learn long-distance contextual dependencies, and has been successfully applied to machine translation [29] and text classification[30] in natural language processing. The attention mechanism essentially learns a set of weight coefficients to highlight important regions in the data while suppressing irrelevant areas. The principle of the attention mechanism can be intuitively explained by human visual cognition: when viewing an image, the brain ignores irrelevant information and focuses on parts of the image that is relevant to judgment.

Illustration of the proposed MOSAHit architecture. Attention mechanism helps us determine which multi-omics features should be combined to form meaningful combinatorial features. Multi-head self-attention mechanism maps the low-dimensional feature representation of multi-omics data into different subspaces and learns multiple meaningful combinatorial features separately. MOSAHit method integrates multiple meaningful feature combinations for directly predicting the probability distribution of survival time.
Figure 1

Illustration of the proposed MOSAHit architecture. Attention mechanism helps us determine which multi-omics features should be combined to form meaningful combinatorial features. Multi-head self-attention mechanism maps the low-dimensional feature representation of multi-omics data into different subspaces and learns multiple meaningful combinatorial features separately. MOSAHit method integrates multiple meaningful feature combinations for directly predicting the probability distribution of survival time.

For the integration of multi-omics data, attention mechanism helps us determine which multi-omics features should be combined to form meaningful combinatorial features. Specifically, given the feature embeddings |$Z^{(1)}$| and |$Z^{(2)}$| of gene and microRNA, we first learn the vector representations |$\{Q^{(1)}=[q^{(1)}_{1},\cdots ,q^{(1)}_{n}]^{T}, K^{(1)}=[k^{(1)}_{1},\cdots ,k^{(1)}_{n}]^{T}, V^{(1)}=[v^{(1)}_{1},\cdots ,v^{(1)}_{n}]^{T}\}$| for gene and |$\{Q^{(2)}=[q^{(2)}_{1},\cdots ,q^{(2)}_{n}]^{T}, K^{(2)}=[k^{(2)}_{1},\cdots ,k^{(2)}_{n}]^{T}, V^{(2)}=[v^{(2)}_{1},\cdots ,v^{(2)}_{n}]^{T}\}$| for microRNA in query, key, and value space by the parameter matrices |$\{W^{Q}, W^{K}, W^{V}\}$| as follows:

We then obtain feature representations |$\{Z^{(1,c)}=[z^{(1,c)}_{1}, \cdots , $|  |$z^{(1,c)}_{n}]^{T}, Z^{(2,c)}=[z^{(2,c)}_{1}, \cdots , z^{(2,c)}_{n}]^{T}\}$| by weighted sum of values |$\{V^{(1)}, V^{(2)}\}$| as follows:

(1)

with weights being |$\hat{\alpha }^{u_{1},u_{2}}_{i} = \frac{exp(\alpha ^{u_{1},u_{2}}_{i})}{\sum _{u_{2}}exp(\alpha ^{u_{1},u_{2}}_{i})}$|⁠, where |$\alpha ^{u_{1},u_{2}}_{i} = \frac{\langle \mathbf{q}^{(u_{1})}_{i}, \mathbf{k}^{(u_{2})}_{i} \rangle }{\sqrt{d}},$|  |$d$| is the dimension of |$v^{(1)}_{i}$|⁠, and |$\langle \cdot , \cdot \rangle $| denotes inner product operation. Here the weight is computed by scaled dot-product attention and defines the correlation between gene features and microRNA features.

To obtain multiple meaningful feature combinations, we learn distinct combinatorial features |$\{Z^{(1,c_{1})},Z^{(2,c_{1})}\},\{Z^{(1,c_{2})},$|  |$Z^{(2,c_{2})}\}$| separately using multiple heads. We concatenate learned combinatorial features by

and compress them using a fully connected network to comprehensive multi-omics representations |$Z=[z_{1},\cdots ,z_{n}]^{T}$| as follows:

(2)

where |$W_{i}, f_{j} (i=0,1,2; j=1,2)$| are the parameter matrices and activation function of the fully connected network, respectively, and |$\oplus $| represents the concatenation of two matrices.

Since the Cox proportional hazards (Cox-PH) model, which optimizes parameters by learning a loss function that reflects the relative risk among patients, does not align with existing federated learning frameworks [24], we instead directly model the distribution of survival time [7] in this study. We denote the maximum time horizon as |$T_{max}$|⁠, and partition the time interval |$[0, T_{max}]$| into |$R$| sequential sub-intervals |$\{\mathcal{T}_{1},\cdots ,\mathcal{T}_{R}\}$|⁠. Similar to a classification problem, we map the multi-omics feature representation |$z_{i}$| of patient |$i$| to a probability distribution |$y_{i} = [y_{i,1}, \cdots , y_{i,R}]$| over the |$R$| sub-intervals using a softmax layer, where |$y_{i,r}$| represents the probability of death for patient |$i$| during |$r$|th time sub-interval |$\mathcal{T}_{r}$|⁠. Accordingly, for patient |$i$| in the training dataset, the observation time |$O_{i}$| is associated with an integer |$r_{i}$| such that |$O_{i}\in \mathcal{T}_{r_{i}}$|⁠. To learn the probability distribution of the first hitting time over |$\{\mathcal{T}_{1},\cdots ,\mathcal{T}_{R}\}$|⁠, we define a loss function |$L_{1}$| by

where cumulative incidence function |$\hat{F}(r_{i}|z_{i})=\sum _{k=1}^{r_{i}}y_{i,k}$| represents the probability of death event during or before the sub-interval |$\mathcal{T}_{r_{i}}$| conditional on covariates |$z_{i}$|⁠. The first term of |$L_{1}$| captures information on deaths at specific time points from uncensored patients, while the second term addresses the censoring bias by maximizing the probability of death for censored patients after their respective censoring time.

Meanwhile, a ranking loss function |$L_{2}$| is used to penalize incorrect ordering of pairs. The ranking loss |$L_{2}$| is defined as follows:

where |$I$| denotes the indicator function and |$\mu $| is a hyperparameter that can be set to 0.1 experimentally. The ranking loss enforces the criterion that a patient who dies at time |$O_{i}$| should be assigned a higher predicted death risk at time |$O_{i}$| compared with a patient whose survival time beyond |$O_{i}$|⁠, i.e. |$O_{j}>O_{i}$|⁠. The ranking loss makes sure the predicted survival outcome preserves the partial ranking information inferred from the training samples.

To accurately learn the probability distribution of patient survival time, we combine loss |$L_{1}$| with loss |$L_{2}$|⁠. The total loss of our MOSAHit model is shown below:

(3)

where |$\lambda $| balances the two terms |$L_{1}$| and |$L_{2}$|⁠, and is set to 0.1 in our experiments. Note that the softmax output layer consists of 30 neurons, and due to the high skewness of time-to-event data, the time intervals defined by the neurons are different. For the first 20 neurons, we search the time interval from {60, 90, 120} (days), and for the last 10 neurons, we set the time interval to 180 days.

Federated transfer learning with differential privacy for MOSAHit method

A generalizable MOSAHit model could be well learned when sufficient training data samples are available for a specific target cancer, but multi-omics data often suffer from the “big |$p$|⁠, small |$n$|” problem, making MOSAHit method prone to overfitting in the specific cancer. Transfer learning approaches can help learn the MOSAHit model for the target cancer via leveraging knowledge from extensive data of related cancers, thereby reducing its reliance on large volumes of labeled data specific to target cancers. However, these approaches could not be directly used when the datasets are distributed across different institutions without data sharing agreement. Due to privacy concerns, developing a generalizable MOSAHit model is often challenging when relying on data silos from multiple institutions instead of a centralized data repository. To integrate data silos from different institutions while preserving privacy, we propose a federated transfer learning framework with differential privacy to learn our multi-omics model MOSAHit. The architecture of the proposed framework is presented in Fig. 2.

Overview of federated transfer learning with differential privacy for MOSAHit method. Each node trains its respective model with local data, adds random noise to the weight parameters and uploads the values of the trainable model parameters to the hub server at a consistent frequency. Once the hub server receives all parameters, it adopts a weighted average strategy to update the parameters of the global model ($\theta = \sum _{k=1}^{4} w_{k}\theta _{k}$, and $w_{k}$ denotes weight of local model $k$), and sends the new parameters to back each node for synchronization.
Figure 2

Overview of federated transfer learning with differential privacy for MOSAHit method. Each node trains its respective model with local data, adds random noise to the weight parameters and uploads the values of the trainable model parameters to the hub server at a consistent frequency. Once the hub server receives all parameters, it adopts a weighted average strategy to update the parameters of the global model (⁠|$\theta = \sum _{k=1}^{4} w_{k}\theta _{k}$|⁠, and |$w_{k}$| denotes weight of local model |$k$|⁠), and sends the new parameters to back each node for synchronization.

Suppose |$M$| institutions possess multi-omics survival datasets for different cancers, where institution node with dataset |$D_{M}$| is for a specific target cancer, and institutions with datasets |$\{D_{1},\cdots ,D_{M-1}\}$| are for related cancers. We adopted a master-server architecture for federated transfer learning, where the local MOSAHit model with parameters |$\theta _{k}$| for each node |$k$| and the global MOSAHit model with parameters |$\theta $| for a centralized hub server have the same network structure and collaborate with each other. The local and global models are updated in an iterative way.

For local model update, each local node first downloads the model parameter |$\theta $| from the hub server, and then updates the parameters using its local data |$D$|⁠. To prevent the leakage of specific patient information, differential privacy mechanism is employed to preserve individual privacy by obscuring the parameters of each local node |$k$| with random noise |$\omega _{k}$|⁠. For each local node, we define a gradient descent mapping |$t:\hat{\mathcal{D}}\rightarrow \Theta $| as

where |$\hat{\mathcal{D}}$| represents the collection of local data (⁠|$D \in \hat{\mathcal{D}}$|⁠), |$\alpha $| is the learning rate, and |$\nabla L_{D}(\theta )$| represents the gradient of the loss function for local data |$D$|⁠. Gaussian mechanism |$G$| with parameter |$\sigma $|⁠, defined as

(4)

where |$\omega $| are random vectors drawn from |$N(0,\sigma ^{2}I)$|⁠, simply computes |$t(D)$|⁠, and perturbs its outputs with noise drawn from the Gaussian distribution. With Gaussian mechanism, the parameter update process for local node |$k$| is as shown below:

(5)

where |$\omega _{k}$| follows |$N(0,\sigma ^{2} I)$|⁠, and |$L_{k}$| is the loss function for local node |$k$| with dataset |$D_{k}$|⁠.

The differential privacy mechanism ensures that attackers cannot infer any sensitive information about patient |$i$| from the weight parameters of the shared model when patient |$i$| is removed from or added to the data node. In Gaussian mechanism, the parameter |$\sigma $| controls the balance between the privacy preservation and data utilization. Mathematically, given two adjacent datasets, |$D$| and |$D^{\prime}\in \mathcal{D}$|⁠, which differ by only one individual, a randomized algorithm |$F:\mathcal{D} \rightarrow \mathcal{S}\subset \mathbb{R}^{l}$| satisfies |$(\epsilon ,\delta )$| differential privacy if, for any subset |$S\subset \mathcal{S}$|⁠:

(6)

where |$\epsilon $| is the privacy budget to control the privacy level, and |$\delta $| represents the probability that |$\epsilon $|-differential privacy may be violated. By theorem 3.22 in Dwork et al.’s [23] book, for arbitrary |$\epsilon \in (0, 1)$|⁠, if

(7)

then Gaussian mechanism |$G$| with |$\sigma $| defined in (4) satisfies |$(\epsilon ,\delta )$|-differential privacy, where |$s_{t}$| is the |$l_{2}$|-sensitivity of function |$t$| defined as

(8)

From condition (7), we can obtain

(9)

This implies that, for a given level |$\delta $|⁠, increasing the standard deviation |$\sigma $| in Gaussian mechanism |$G$| allows a smaller |$\epsilon $| to satisfy the requirements of differential privacy. Following the work [25], in the deep learning framework, we set |$\sigma = \rho * \eta $|⁠, where |$\rho $| is an adjustable parameter and |$\eta $| represents the standard deviation of the increments in network training parameters at each update of the local model. By adjusting |$\rho $|⁠, we can change the standard deviation |$ \sigma $| of the added Gaussian noise |$\omega $|⁠, thus controlling the level of differential privacy protection.

After updating the local parameters by equation (5), local node |$k$| uploads the model parameters |$\theta _{k}$| to the hub server. Once the hub server receives all parameters, it adopts a weighted average strategy to update the parameters of the global model and sends the new parameters back to each local node for the next iteration. The parameter update process for the global model is as follows:

(10)

where |$\sum _{k=1}^{M} w_{k} = 1$| and |$w_{k}$| denotes weight of local model |$k$|⁠. In multitask learning, all the tasks are learned simultaneously and equally weighted, and use interconnections to boost each other. Compared with multitask learning, our proposed federated transfer learning empirically required that weights of samples from the target cancer be greater than those from related cancers, which induces the global model focusing more on the target cancer. The federated transfer learning framework with differential privacy for MOSAHit leverages knowledge from multi-omics data of related cancers distributed across multiple institutions through a parameter-sharing mechanism. This approach enables the learning of more robust and universal combinatorial feature representations across multiple omics, facilitating the development of a more generalized model for survival prediction of target cancers in a privacy-preserving setting.

Experiments

Datasets and preprocessing

The Cancer Genome Atlas (TCGA) [31] is a publicly available cancer genome database containing whole-genome data of |$\sim $|11 000 patients from 33 common cancers. We integrated RNASeq data for gene expression and miRNA-Seq data for microRNA expression to evaluate the performance of multi-omics survival analysis method MOSAHit trained using federated transfer learning with differential privacy. In this study, the preprocessing procedures of gene expression data and microRNA expression data are as follows. First, we removed low-quality features (genes/microRNAs) with over 10% missing values and used median imputation strategy to fill in the remaining missing data. Second, we performed log transformation and filtered out noise-sensitive features with low variance across samples. Then we standardized these features to eliminate the effects of unit and scale differences among features. Finally, 17 720 genes and 1881 microRNA features are chosen for further study.

We selected 10 prevalent cancer types from TCGA as target cancers, which are listed as follows: Breast Invasive Carcinoma (BRCA), Kidney Renal Clear Cell Carcinoma (KIRC), Lung Adenocarcinoma (LUAD), Lung Squamous Cell Carcinoma (LUSC), Head And Neck Squamous Cell Carcinoma (HNSC), Bladder Urothelial Carcinoma (BLCA), Colon Adenocarcinoma (COAD), Liver Hepatocellular Carcinoma (LIHC), Ovarian Serous Cystadenocarcinoma (OV), and Esophageal Carcinoma (ESCA). To further validate its effectiveness of multi-omics survival analysis method MOSAHit trained within a federated transfer learning framework with differential privacy, we conducted an external experiment on the independent dataset TARGET-WT [32]. TARGET-WT dataset collects multi-omics and clinical data from pediatric patients with three different high-risk kidney tumors. TARGET-WT dataset is publicly available and can be obtained from UCSC Xena database (https://xena.ucsc.edu/). Detailed descriptions of the 11 target cancer datasets are provided in Table 1.

Table 1

Statistics of multi-omics data for the target cancers

Cancer type# patientsProp.CensoredCancer type# patientsProp.Censored
BRCA10210.861HNSC4950.564
KIRC5080.671COAD4390.770
LIHC3670.651OV3720.390
LUAD4960.637LUSC4690.582
BLCA4020.562ESCA1640.610
Target-WT1310.588
Cancer type# patientsProp.CensoredCancer type# patientsProp.Censored
BRCA10210.861HNSC4950.564
KIRC5080.671COAD4390.770
LIHC3670.651OV3720.390
LUAD4960.637LUSC4690.582
BLCA4020.562ESCA1640.610
Target-WT1310.588
Table 1

Statistics of multi-omics data for the target cancers

Cancer type# patientsProp.CensoredCancer type# patientsProp.Censored
BRCA10210.861HNSC4950.564
KIRC5080.671COAD4390.770
LIHC3670.651OV3720.390
LUAD4960.637LUSC4690.582
BLCA4020.562ESCA1640.610
Target-WT1310.588
Cancer type# patientsProp.CensoredCancer type# patientsProp.Censored
BRCA10210.861HNSC4950.564
KIRC5080.671COAD4390.770
LIHC3670.651OV3720.390
LUAD4960.637LUSC4690.582
BLCA4020.562ESCA1640.610
Target-WT1310.588

Evaluation metrics

In this study, we evaluated the performance of multi-omics survival analysis approach MOSAHit trained using federated transfer learning with the time-dependent concordance index (⁠|$C^{td}$|-index) [33]. The ordinary C-index [34] measures the predictive ability of survival analysis models by estimating the probability of concordance between the ordering of predicted risks and actual survival times. The C-index cannot capture changes in risk that may occur over time. |$C^{td}$|-index takes time into account and provides an appropriate assessment of how covariates influence survival over time, defined as follows:

where |$\hat{F}$| is the cumulative incidence function. Note that when the proportional hazards assumption holds, |$C^{td}$|-index is equivalent to the usual C-index. The value of |$C^{td}$|-index ranges from 0 to 1, with higher values of |$C^{td}$|-index indicating better prediction performance of the model and conversely.

Experimental setting

In our experiments, we considered four distinct institutional nodes, aggregating the multi-omics data for the target cancer to target node |$4$| and distributing multi-omics data for the related cancers from TCGA database alphabetically across source nodes |$1, 2$|⁠, and |$3$|⁠. We randomly partitioned the target cancer datasets into 80% for training and 20% for testing, stratified by censoring status. In federated transfer learning setting, we train a prediction model using the training set of the target cancer and multi-omics datasets from related cancers across multiple source nodes and evaluate the model on the paired testing set of the target cancer. To comprehensively evaluate the performance of MOSAHit method trained using federated transfer learning with differential privacy, we repeated the above process 20 times and reported the mean value and standard deviation of the |$C^{td}$|-indices.

In this study, MOSAHit method follows a contemporary deep learning design and is implemented on PyTorch platform. When training MOSAHit method using federated transfer learning with differential privacy, we set the Gaussian noise level |$\rho $| to 0.05 and selected Adam optimizer with learning rate of 2e-4 as the optimization algorithm. Furthermore, the number of cancer samples from the target node is significantly smaller than that from the source nodes; thus when updating the parameters of the global model, we did not specifically design the weights for training samples across different nodes and instead averaged the contributions of each local model, which enables the global model to effectively learn the training samples from the specific target cancer.

Results

Performance evaluation of MOSAHit method trained using federated transfer learning with differential privacy

In order to evaluate the performance of MOSAHit method trained using federated transfer learning with differential privacy in multi-omics survival analysis, we introduced single-omics methods and three typical single-fusion methods for comparison: (i) DeepHit: Only using single omics (gene/microRNA) embedding features for learning the probability distribution of patient survival time; (ii) MODCHit: direct concatenation from multi-omics embedding features; (iii) MOEAHit: element-wise addition from multi-omics embedding features; (iv) MOFBHit: factorized bilinear fusion of multi-omics embedding features. For a fair comparison, we extracted embedding features for the aforementioned methods using the same neural networks.

Table 2 shows the |$C^{td}$|-index values of different methods in the contexts of federated transfer learning with differential privacy and direct learning using only the target cancer data, and several important observations are summarized as outlined below. We observed that all methods, trained on multi-institutional data using federated transfer learning with differential privacy, significantly improve performance by leveraging knowledge from related cancer datasets, compared with learning from the target cancer data alone. For example, DeepHit method, trained on gene/microRNA expression data from multiple institutions using federated transfer learning, has an average |$C^{td}$|-index value 0.658/0.630 across all target cancers and obtains an improvement of 3.9%/3.3% compared with only learning from the gene/microRNA expression data of the target cancers. Moreover, it is of note that, when these methods are trained on multi-institutional data using federated transfer learning, MOSAHit method achieves the most satisfactory performance and brings significant improvement over single-omics methods and three typical single-fusion methods. Specifically, in the setting of federated transfer learning with differential privacy, MOSAHit method outperforms DeepHit (gene/microRNA), MODCHit, MOEAHit, and MOFBHit methods by about 5.5%/10.2%, 3.6%, 9.5%, and 3.3%, respectively. These results fully demonstrate that the proposed method can capture common features across different cancers distributed among multiple institutions through the parameter sharing mechanism, enabling the learning of more robust and universal combinatorial feature representations across multiple omics for survival prediction of target cancers while preserving differential privacy.

Table 2

|$C^{td}$|-index values for different methods in the contexts of federated transfer learning with differential privacy and direct learning using only the target cancer data. The average of |$C^{td}$|-index values for various methods in different contexts are shown in the last column. The top two results are emphasized in bold and underlined, respectively

Training ModeDataMethodBRCAKIRCBLCACOADLUADLUSCHNSCESCAOVLIHCAve
microRNADeepHit0.6100.6380.6490.6070.6090.5530.6120.6490.5390.6380.610
geneDeepHit0.6680.6860.6530.6840.6220.5810.6180.5800.5770.6620.633
Direct Learningmulti-omicsMODCHit0.6390.6800.6700.6600.6130.5800.6180.5910.5590.6720.628
MOEAHit0.6260.6700.6450.6290.5950.5720.6160.6150.5530.6480.617
MOFBHit0.6560.6890.6650.6510.6080.5830.6230.6110.5850.6740.635
MOSAHit0.6660.7420.6840.6740.6360.5830.6290.6340.6010.6880.654
microRNADeepHit0.6450.6490.6590.6380.6380.5650.6330.6670.5650.6420.630
geneDeepHit0.6930.7200.6740.6940.6460.5770.6410.6330.6150.6800.658
Federated Transfermulti-omicsMODCHit0.6970.7340.7100.7140.6470.5870.6580.6650.6040.6870.670
MOEAHit0.6500.6790.6640.6510.6410.5620.6380.6200.5750.6560.634
MOFBHit0.7020.7400.7070.7120.6540.5890.6550.6470.6200.6970.672
MOSAHit0.7250.7730.7290.7190.6790.6000.6550.7030.6410.7110.694
Training ModeDataMethodBRCAKIRCBLCACOADLUADLUSCHNSCESCAOVLIHCAve
microRNADeepHit0.6100.6380.6490.6070.6090.5530.6120.6490.5390.6380.610
geneDeepHit0.6680.6860.6530.6840.6220.5810.6180.5800.5770.6620.633
Direct Learningmulti-omicsMODCHit0.6390.6800.6700.6600.6130.5800.6180.5910.5590.6720.628
MOEAHit0.6260.6700.6450.6290.5950.5720.6160.6150.5530.6480.617
MOFBHit0.6560.6890.6650.6510.6080.5830.6230.6110.5850.6740.635
MOSAHit0.6660.7420.6840.6740.6360.5830.6290.6340.6010.6880.654
microRNADeepHit0.6450.6490.6590.6380.6380.5650.6330.6670.5650.6420.630
geneDeepHit0.6930.7200.6740.6940.6460.5770.6410.6330.6150.6800.658
Federated Transfermulti-omicsMODCHit0.6970.7340.7100.7140.6470.5870.6580.6650.6040.6870.670
MOEAHit0.6500.6790.6640.6510.6410.5620.6380.6200.5750.6560.634
MOFBHit0.7020.7400.7070.7120.6540.5890.6550.6470.6200.6970.672
MOSAHit0.7250.7730.7290.7190.6790.6000.6550.7030.6410.7110.694
Table 2

|$C^{td}$|-index values for different methods in the contexts of federated transfer learning with differential privacy and direct learning using only the target cancer data. The average of |$C^{td}$|-index values for various methods in different contexts are shown in the last column. The top two results are emphasized in bold and underlined, respectively

Training ModeDataMethodBRCAKIRCBLCACOADLUADLUSCHNSCESCAOVLIHCAve
microRNADeepHit0.6100.6380.6490.6070.6090.5530.6120.6490.5390.6380.610
geneDeepHit0.6680.6860.6530.6840.6220.5810.6180.5800.5770.6620.633
Direct Learningmulti-omicsMODCHit0.6390.6800.6700.6600.6130.5800.6180.5910.5590.6720.628
MOEAHit0.6260.6700.6450.6290.5950.5720.6160.6150.5530.6480.617
MOFBHit0.6560.6890.6650.6510.6080.5830.6230.6110.5850.6740.635
MOSAHit0.6660.7420.6840.6740.6360.5830.6290.6340.6010.6880.654
microRNADeepHit0.6450.6490.6590.6380.6380.5650.6330.6670.5650.6420.630
geneDeepHit0.6930.7200.6740.6940.6460.5770.6410.6330.6150.6800.658
Federated Transfermulti-omicsMODCHit0.6970.7340.7100.7140.6470.5870.6580.6650.6040.6870.670
MOEAHit0.6500.6790.6640.6510.6410.5620.6380.6200.5750.6560.634
MOFBHit0.7020.7400.7070.7120.6540.5890.6550.6470.6200.6970.672
MOSAHit0.7250.7730.7290.7190.6790.6000.6550.7030.6410.7110.694
Training ModeDataMethodBRCAKIRCBLCACOADLUADLUSCHNSCESCAOVLIHCAve
microRNADeepHit0.6100.6380.6490.6070.6090.5530.6120.6490.5390.6380.610
geneDeepHit0.6680.6860.6530.6840.6220.5810.6180.5800.5770.6620.633
Direct Learningmulti-omicsMODCHit0.6390.6800.6700.6600.6130.5800.6180.5910.5590.6720.628
MOEAHit0.6260.6700.6450.6290.5950.5720.6160.6150.5530.6480.617
MOFBHit0.6560.6890.6650.6510.6080.5830.6230.6110.5850.6740.635
MOSAHit0.6660.7420.6840.6740.6360.5830.6290.6340.6010.6880.654
microRNADeepHit0.6450.6490.6590.6380.6380.5650.6330.6670.5650.6420.630
geneDeepHit0.6930.7200.6740.6940.6460.5770.6410.6330.6150.6800.658
Federated Transfermulti-omicsMODCHit0.6970.7340.7100.7140.6470.5870.6580.6650.6040.6870.670
MOEAHit0.6500.6790.6640.6510.6410.5620.6380.6200.5750.6560.634
MOFBHit0.7020.7400.7070.7120.6540.5890.6550.6470.6200.6970.672
MOSAHit0.7250.7730.7290.7190.6790.6000.6550.7030.6410.7110.694

To further evaluate the performance of MOSAHit method trained using federated transfer learning with differential privacy, we categorized cancer patients in the testing dataset of target cancers into longer term or shorter term survivors by the criterion of 5-year survival and estimated the |$C^{td}$|-index values between longer term survivors and shorter term survivors. Since longer term survivors are rare in ESCA dataset, we only reported the results of different methods for nine common target cancers in Table 3. From the experimental results, we could see that these methods trained using federated transfer learning with differential privacy, compared with direct learning only on the local data of target cancers, can better distinguish longer term survivors from shorter term survivors by leveraging the information from the abundant multi-omics data of related cancers. Moreover, among these methods trained using federated transfer learning with differential privacy, our proposed method MOSAHit outperforms single-omics methods and three typical single-fusion methods. Especially for KIRC, LUAD, and BLCA, MOSAHit method trained using federated transfer learning has achieved significant improvements compared with the second-best method. Taken together, these results clearly demonstrate the superiority of multi-omics survival analysis method MOSAHit in federated transfer learning setting.

Table 3

Performance comparison of different methods in the contexts of federated transfer learning with differential privacy and direct learning using the |$C^{td}$|-index values between longer-term survivors and shorter-term survivors. The average of |$C^{td}$|-index values for various in different contexts are shown in the last column. The top two results are emphasized in bold and underlined, respectively

Training ModeDataMethodBRCAKIRCBLCACOADLUADLUSCHNSCOVLIHCAve
microRNADeepHit0.6290.6640.6950.6000.6170.5620.5830.5260.6330.612
geneDeepHit0.6850.7280.7280.6940.6170.6200.6040.5850.6890.661
Direct Learningmulti-omicsMODCHit0.6600.7200.7420.6490.6210.6050.6060.5570.6780.649
MOEAHit0.6390.7070.6890.6170.6010.5830.6150.5500.6720.630
MOFBHit0.6720.7370.7410.6580.6260.6190.6290.5950.6710.661
MOSAHit0.7000.7760.7630.7080.6540.6060.6360.6200.7100.686
microRNADeepHit0.6540.6730.7020.6410.6910.5810.6920.5670.6910.655
geneDeepHit0.7100.7390.7250.7320.6840.5830.6630.6430.7110.688
Federated Transfermulti-omicsMODCHit0.7140.7680.7550.7320.6980.6190.6970.6190.7210.703
MOEAHit0.6670.7080.6870.6420.6770.5770.6730.5930.6770.656
MOFBHit0.7190.7790.7670.7520.7030.6240.6930.6570.7050.711
MOSAHit0.7470.8240.8180.7700.7390.6500.7010.6790.7480.742
Training ModeDataMethodBRCAKIRCBLCACOADLUADLUSCHNSCOVLIHCAve
microRNADeepHit0.6290.6640.6950.6000.6170.5620.5830.5260.6330.612
geneDeepHit0.6850.7280.7280.6940.6170.6200.6040.5850.6890.661
Direct Learningmulti-omicsMODCHit0.6600.7200.7420.6490.6210.6050.6060.5570.6780.649
MOEAHit0.6390.7070.6890.6170.6010.5830.6150.5500.6720.630
MOFBHit0.6720.7370.7410.6580.6260.6190.6290.5950.6710.661
MOSAHit0.7000.7760.7630.7080.6540.6060.6360.6200.7100.686
microRNADeepHit0.6540.6730.7020.6410.6910.5810.6920.5670.6910.655
geneDeepHit0.7100.7390.7250.7320.6840.5830.6630.6430.7110.688
Federated Transfermulti-omicsMODCHit0.7140.7680.7550.7320.6980.6190.6970.6190.7210.703
MOEAHit0.6670.7080.6870.6420.6770.5770.6730.5930.6770.656
MOFBHit0.7190.7790.7670.7520.7030.6240.6930.6570.7050.711
MOSAHit0.7470.8240.8180.7700.7390.6500.7010.6790.7480.742
Table 3

Performance comparison of different methods in the contexts of federated transfer learning with differential privacy and direct learning using the |$C^{td}$|-index values between longer-term survivors and shorter-term survivors. The average of |$C^{td}$|-index values for various in different contexts are shown in the last column. The top two results are emphasized in bold and underlined, respectively

Training ModeDataMethodBRCAKIRCBLCACOADLUADLUSCHNSCOVLIHCAve
microRNADeepHit0.6290.6640.6950.6000.6170.5620.5830.5260.6330.612
geneDeepHit0.6850.7280.7280.6940.6170.6200.6040.5850.6890.661
Direct Learningmulti-omicsMODCHit0.6600.7200.7420.6490.6210.6050.6060.5570.6780.649
MOEAHit0.6390.7070.6890.6170.6010.5830.6150.5500.6720.630
MOFBHit0.6720.7370.7410.6580.6260.6190.6290.5950.6710.661
MOSAHit0.7000.7760.7630.7080.6540.6060.6360.6200.7100.686
microRNADeepHit0.6540.6730.7020.6410.6910.5810.6920.5670.6910.655
geneDeepHit0.7100.7390.7250.7320.6840.5830.6630.6430.7110.688
Federated Transfermulti-omicsMODCHit0.7140.7680.7550.7320.6980.6190.6970.6190.7210.703
MOEAHit0.6670.7080.6870.6420.6770.5770.6730.5930.6770.656
MOFBHit0.7190.7790.7670.7520.7030.6240.6930.6570.7050.711
MOSAHit0.7470.8240.8180.7700.7390.6500.7010.6790.7480.742
Training ModeDataMethodBRCAKIRCBLCACOADLUADLUSCHNSCOVLIHCAve
microRNADeepHit0.6290.6640.6950.6000.6170.5620.5830.5260.6330.612
geneDeepHit0.6850.7280.7280.6940.6170.6200.6040.5850.6890.661
Direct Learningmulti-omicsMODCHit0.6600.7200.7420.6490.6210.6050.6060.5570.6780.649
MOEAHit0.6390.7070.6890.6170.6010.5830.6150.5500.6720.630
MOFBHit0.6720.7370.7410.6580.6260.6190.6290.5950.6710.661
MOSAHit0.7000.7760.7630.7080.6540.6060.6360.6200.7100.686
microRNADeepHit0.6540.6730.7020.6410.6910.5810.6920.5670.6910.655
geneDeepHit0.7100.7390.7250.7320.6840.5830.6630.6430.7110.688
Federated Transfermulti-omicsMODCHit0.7140.7680.7550.7320.6980.6190.6970.6190.7210.703
MOEAHit0.6670.7080.6870.6420.6770.5770.6730.5930.6770.656
MOFBHit0.7190.7790.7670.7520.7030.6240.6930.6570.7050.711
MOSAHit0.7470.8240.8180.7700.7390.6500.7010.6790.7480.742

We trained MOSAHit method using federated transfer learning with different Gaussian noise levels |$\delta $|⁠, to analyze the impact of privacy protection degree on performance of federated transfer learning method. Following the work [35], we controlled the Gaussian noise level by adjusting |$\rho $|⁠. Figure 3 reported |$C^{td}$|-index values of MOSAHit method trained using federated transfer learning with differential privacy on all target cancers when the adjustable parameter |$\rho $| varies in |$\{0, 0.05, 0.5, 1\}$|⁠. Note that the larger value of |$\rho $| indicates the higher level of privacy protection for patients. From the results listed, it is obvious that the model performance significantly deteriorates when |$\rho $| was set too high (e.g. |$\rho $| = 1). Specifically, the average |$C^{td}$|-index values across all target cancers ranged from 0.626 to 0.694 when different levels of Gaussian random noise are added during federated weight averaging. Clearly, adding high level of Gaussian noise to the model parameters leads to vast quantities of loss of learned knowledge, hindering the model’s ability to learn effectively. From the above analysis, we confirmed that there is indeed a trade-off between model performance and privacy protection in federated transfer learning setting.

$C^{td}$-index values of MOSAHit method trained using federated transfer learning with different levels of Gaussian random noise. We can observe that when patient information privacy is highly protected, a substantial amount of useful information is lost, making it difficult for the model to learn effectively.
Figure 3

|$C^{td}$|-index values of MOSAHit method trained using federated transfer learning with different levels of Gaussian random noise. We can observe that when patient information privacy is highly protected, a substantial amount of useful information is lost, making it difficult for the model to learn effectively.

In addition, we compared the performance of MOSAHit method trained using federated transfer learning with differential privacy in the scenarios with one, two, and three source nodes, to further evaluate the effectiveness of federated transfer learning in multi-omics survival analysis. Figure 4 shows the |$C^{td}$|-index values of MOSAHit method trained using federated transfer learning in the scenarios with different source nodes. From the results, it is obvious that as more source nodes participate in the training process of federated transfer learning, we can integrate the useful knowledge from multi-omics datasets of more related cancers and learn more robust and universal combinatorial feature representations across multiple omics for survival prediction of target cancers. Specifically, MOSAHit method trained using federated transfer learning with three source nodes reaches a superior average |$C^{td}$|-index value of 0.694 across all target cancers, surpassing the results of 0.683 with one source node and 0.687 with two source nodes by 1.6% and 1.0%, respectively. These results again confirm the effectiveness of federated transfer learning with differential privacy in multi-omics survival analysis.

$C^{td}$-index values of MOSAHit method trained using federated transfer learning with differential privacy in the scenarios with one, two, and three source nodes. When more source nodes participate in the federated transfer learning training process, we can learn more robust and universal combinatorial feature representations across multiple omics for survival prediction of target cancers.
Figure 4

|$C^{td}$|-index values of MOSAHit method trained using federated transfer learning with differential privacy in the scenarios with one, two, and three source nodes. When more source nodes participate in the federated transfer learning training process, we can learn more robust and universal combinatorial feature representations across multiple omics for survival prediction of target cancers.

Furthermore, we analyzed the impact of Gaussian noise on model convergence when training MOSAHit method on multi-institutional data using federated transfer learning with differential privacy. Given |$\rho $|⁠, Gaussian noise level |$\sigma = \rho * \eta $| is mainly determined by |$\eta $|⁠, which represents the standard deviation of the increments in network training parameters at each update of the local model. When |$\rho $| is set to be relatively small, the changes in model parameters during the training process of federated transfer learning gradually decrease, making |$\eta $| gradually reduce. Figure 1 of the supplementary material showed the loss value of MOSAHit method on the training set of target cancers during the training process of federated transfer learning when |$\rho $| is set to 0.05. From the figure, we observed that for all target cancers, the training loss decreases gradually with increasing iterations until convergence.

Performance comparison with existing methods trained using federated transfer learning with differential privacy

MOSAHit method trained using federated transfer learning with differential privacy is further evaluated by comparing with several existing state-of-the-art methods, including traditional method Cox [3], deep learning-based survival prediction methods DeepSurv[6], HFBSurv [28], MDNNMD [36], and SurvCNN [11]. For fair comparison, we similarly trained the parameters of existing methods within a federated transfer learning framework with differential privacy. Figure 5 reported the |$C^{td}$|-index values of the different methods trained using federated transfer learning with differential privacy for all target cancers. From the experimental results, it is obvious that deep learning methods outperform traditional linear approaches when trained using federated transfer learning, showcasing their strong learning capability. For example, the |$C^{td}$|-index of DeepSurv method with gene/microRNA expression data shows an average improvement of 1.4%/1.6% over Cox method across all cancer datasets in federated transfer learning setting.

$C^{td}$-index values of different methods trained using federated transfer learning with differential privacy.
Figure 5

|$C^{td}$|-index values of different methods trained using federated transfer learning with differential privacy.

Moreover, we observed that multi-omics methods including HFBSurv, MDNNMD, and MOSAHit produce better survival predictions compared with single-omics methods when trained using federated transfer learning with differential privacy. Specifically, the three multi-omics methods obtained better performance than the best-performing single-omics method on the 7 of 10 TCGA cancer datasets in federated transfer learning setting. Meanwhile, it is of note that multi-omcis method SurvCNN when trained using federated transfer learning does not perform well. SurvCNN method converts gene expression and microRNA expression data into image formats, and trains convolutional neural networks (CNNs) to extract features for survival prediction. CNNs, while significantly reducing the number of model parameters, may limit the learning of complex feature relationships, especially in federated transfer learning settings. More importantly, when training model parameters within a federated transfer learning framework with differential privacy, our proposed method MOSAHit achieved the best performance among all the methods. Especially for KIRC, COAD, LUSC, ESCA, and OV, MOSAHit method can outperform the second-best method by a large margin in federated transfer learning settings. These results clearly illustrate the significant superiority of MOSAHit method, trained using federated transfer learning with differential privacy, in addressing data silos and privacy protection issues in multi-omics survival analysis.

External validation of MOSAHit method trained using federated transfer learning with differential privacy

We performed external validation on the independent dataset TARGET-WT, which is not included in TCGA database, to further demonstrate the effectiveness of MOSAHit method trained using federated transfer learning with differential privacy. We reported |$C^{td}$|-index values of DeepHit (gene/microRNA), MODCHit, MOEAHit, MOFBHit, and MOSAHit methods on TARGET-WT dataset in the contexts of federated transfer learning with differential privacy and direct learning using only target cancer data, as shown in Fig. 6(a). From the experimental results, we could see that federated transfer learning can help improve the performance on TARGET-WT dataset of the target node by leveraging the useful knowledge from TCGA cancer datasets distributed across multiple source nodes. For example, the average |$C^{td}$|-index value of different methods trained using federated transfer learning is 0.631, with an improvement of 3.4% compared with the corresponding |$C^{td}$|-index value of 0.610 from direct learning. Moreover, it is of note that MOSAHit method outperforms DeepHit (gene/microRNA), MODCHit, MOEAHit, and MOFBHit methods when trained using federated transfer learning with differential privacy.

(a) Performance comparison of DeepHit (gene/microRNA), MODCHit, MOEAHit, MOFBHit, and MOSAHit methods on TARGET-WT dataset in the contexts of federated transfer learning with differential privacy and direct learning using only target cancer data. (b) Performance comparison of MOSAHit method and the existing methods on TARGET-WT dataset when trained using federated transfer learning with differential privacy.
Figure 6

(a) Performance comparison of DeepHit (gene/microRNA), MODCHit, MOEAHit, MOFBHit, and MOSAHit methods on TARGET-WT dataset in the contexts of federated transfer learning with differential privacy and direct learning using only target cancer data. (b) Performance comparison of MOSAHit method and the existing methods on TARGET-WT dataset when trained using federated transfer learning with differential privacy.

Meanwhile, we also compared the performance of MOSAHit method and the existing methods on TARGET-WT dataset in the setting of federated transfer learning with differential privacy. The experimental results presented in Fig. 6(b) indicate that MOSAHit method trained using federated transfer learning achieved the best performance among all methods. Specifically, the |$C^{td}$|-index value for MOSAHit method trained using federated transfer learning is 0.665, outperforming the second-best method, MDNNMD, with a |$C^{td}$|-index value of 0.656, by |$\sim $|1.5%. Taken together, these results fully demonstrate that federated transfer learning with differential privacy for MOSAHit can help develop a more generalized multi-omics survival prediction model for the local TARGET-WT dataset with limited training data, by leveraging information from the abundant multi-omics data of TCGA datasets distributed across multiple institutionss while preserving individual privacy.

MOSAHit method trained using federated transfer learning with differential privacy on different types of multi-omics data

MOSAHit method, trained using federated transfer learning with differential privacy, is further assessed on multi-omics data, including gene expression and DNA methylation, or microRNA expression and DNA methylation. Similarly, for DNA methylation data, we filtered out the noise-sensitive variables with low variance across samples and retained the top 10 000 features for further study. Note that OV cancer type has been removed due to very few training samples for DNA methylation in OV cancer within TCGA database. Thus, we performed MOSAHit method on different types of multi-omics data from the nine target cancers in the contexts of federated transfer learning with differential privacy and direct learning using only the target cancer data. We reported the results in Tables 4 and 5.

Table 4

|$C^{td}$|-index values of different approaches with gene expression and DNA methylation data in the contexts of federated transfer learning with differential privacy and direct learning using only target cancer data. The last column displays the average rankings of different methods trained using federated transfer learning and direct learning for the nine target cancers. Note that the top two results are emphasized in bold and underlined, respectively

Training modeDataMethodBRCAKIRCBLCACOADLUADLUSCHNSCESCALIHCAvg.ranking
DNA methylationDeepHit0.6310.6380.6030.5480.6660.5670.5890.6500.6189.7
geneDeepHit0.6920.6990.6480.6670.6110.5620.6180.5870.6728.2
Direct Learningmulti-omicsMODCHit0.6500.7030.6440.6250.6310.5560.6150.5990.6578.9
MOEAHit0.6180.6850.6090.6200.6200.5750.6090.5900.64310.1
MOFBHit0.6700.7020.6460.6400.6440.5890.6200.6080.6666.8
MOSAHit0.7360.7710.6850.6860.6890.5780.6230.6090.6733.7
DNA methylationDeepHit0.6760.6950.6150.6290.6700.5670.6280.6670.6337.0
geneDeepHit0.7190.7620.6720.7170.6410.5900.6540.6280.6763.6
Federated Transfermulti-omicsMODCHit0.6930.7650.6500.6810.6530.5760.6460.6400.6575.3
MOEAHit0.6650.6820.6200.6120.6280.5670.6080.6140.6389.3
MOFBHit0.7050.7620.6520.6860.6630.5860.6580.6550.6743.4
MOSAHit0.7650.8040.7010.7310.6870.6030.6540.6890.6801.2
Training modeDataMethodBRCAKIRCBLCACOADLUADLUSCHNSCESCALIHCAvg.ranking
DNA methylationDeepHit0.6310.6380.6030.5480.6660.5670.5890.6500.6189.7
geneDeepHit0.6920.6990.6480.6670.6110.5620.6180.5870.6728.2
Direct Learningmulti-omicsMODCHit0.6500.7030.6440.6250.6310.5560.6150.5990.6578.9
MOEAHit0.6180.6850.6090.6200.6200.5750.6090.5900.64310.1
MOFBHit0.6700.7020.6460.6400.6440.5890.6200.6080.6666.8
MOSAHit0.7360.7710.6850.6860.6890.5780.6230.6090.6733.7
DNA methylationDeepHit0.6760.6950.6150.6290.6700.5670.6280.6670.6337.0
geneDeepHit0.7190.7620.6720.7170.6410.5900.6540.6280.6763.6
Federated Transfermulti-omicsMODCHit0.6930.7650.6500.6810.6530.5760.6460.6400.6575.3
MOEAHit0.6650.6820.6200.6120.6280.5670.6080.6140.6389.3
MOFBHit0.7050.7620.6520.6860.6630.5860.6580.6550.6743.4
MOSAHit0.7650.8040.7010.7310.6870.6030.6540.6890.6801.2
Table 4

|$C^{td}$|-index values of different approaches with gene expression and DNA methylation data in the contexts of federated transfer learning with differential privacy and direct learning using only target cancer data. The last column displays the average rankings of different methods trained using federated transfer learning and direct learning for the nine target cancers. Note that the top two results are emphasized in bold and underlined, respectively

Training modeDataMethodBRCAKIRCBLCACOADLUADLUSCHNSCESCALIHCAvg.ranking
DNA methylationDeepHit0.6310.6380.6030.5480.6660.5670.5890.6500.6189.7
geneDeepHit0.6920.6990.6480.6670.6110.5620.6180.5870.6728.2
Direct Learningmulti-omicsMODCHit0.6500.7030.6440.6250.6310.5560.6150.5990.6578.9
MOEAHit0.6180.6850.6090.6200.6200.5750.6090.5900.64310.1
MOFBHit0.6700.7020.6460.6400.6440.5890.6200.6080.6666.8
MOSAHit0.7360.7710.6850.6860.6890.5780.6230.6090.6733.7
DNA methylationDeepHit0.6760.6950.6150.6290.6700.5670.6280.6670.6337.0
geneDeepHit0.7190.7620.6720.7170.6410.5900.6540.6280.6763.6
Federated Transfermulti-omicsMODCHit0.6930.7650.6500.6810.6530.5760.6460.6400.6575.3
MOEAHit0.6650.6820.6200.6120.6280.5670.6080.6140.6389.3
MOFBHit0.7050.7620.6520.6860.6630.5860.6580.6550.6743.4
MOSAHit0.7650.8040.7010.7310.6870.6030.6540.6890.6801.2
Training modeDataMethodBRCAKIRCBLCACOADLUADLUSCHNSCESCALIHCAvg.ranking
DNA methylationDeepHit0.6310.6380.6030.5480.6660.5670.5890.6500.6189.7
geneDeepHit0.6920.6990.6480.6670.6110.5620.6180.5870.6728.2
Direct Learningmulti-omicsMODCHit0.6500.7030.6440.6250.6310.5560.6150.5990.6578.9
MOEAHit0.6180.6850.6090.6200.6200.5750.6090.5900.64310.1
MOFBHit0.6700.7020.6460.6400.6440.5890.6200.6080.6666.8
MOSAHit0.7360.7710.6850.6860.6890.5780.6230.6090.6733.7
DNA methylationDeepHit0.6760.6950.6150.6290.6700.5670.6280.6670.6337.0
geneDeepHit0.7190.7620.6720.7170.6410.5900.6540.6280.6763.6
Federated Transfermulti-omicsMODCHit0.6930.7650.6500.6810.6530.5760.6460.6400.6575.3
MOEAHit0.6650.6820.6200.6120.6280.5670.6080.6140.6389.3
MOFBHit0.7050.7620.6520.6860.6630.5860.6580.6550.6743.4
MOSAHit0.7650.8040.7010.7310.6870.6030.6540.6890.6801.2
Table 5

|$C^{td}$|-index values of different approaches with microRNA expression and DNA methylation data in the contexts of federated transfer learning with differential privacy and direct learning using only target cancer data. The last column displays the average rankings of different methods trained using federated transfer learning and direct learning for the nine target cancers. Note that the top two results are emphasized in bold and underlined, respectively

Training modeDataMethodBRCAKIRCBLCACOADLUADLUSCHNSCESCALIHCAvg.ranking
microRNADeepHit0.6370.6470.6170.6270.6020.5590.6070.6660.6417.8
DNA methylationDeepHit0.6380.6270.6020.5580.6460.5560.6020.6240.6039.9
Direct Learningmulti-omicsMODCHit0.6300.6550.5990.5840.6200.5410.6130.6450.6209.8
MOEAHit0.6360.6220.5930.5990.6100.5580.5900.6410.60010.7
MOFBHit0.6600.6460.5980.6040.6290.5720.6170.6760.6247.7
MOSAHit0.6930.7040.6410.5980.6810.5560.6230.6490.6355.4
microRNADeepHit0.6530.6500.6130.6530.6280.5870.6320.6390.6416.7
DNA methylationDeepHit0.6670.6950.6060.6130.6710.5850.6160.6590.6286.3
Federated Transfermulti-omicsMODCHit0.6760.6820.6180.6570.6410.5910.6390.6760.6543.3
MOEAHit0.6700.6510.6060.6330.6100.5890.6350.6660.6475.4
MOFBHit0.7040.7010.6360.6480.6450.5970.6450.6860.6462.6
MOSAHit0.6990.7310.6560.6570.6730.6000.6450.6800.6451.7
Training modeDataMethodBRCAKIRCBLCACOADLUADLUSCHNSCESCALIHCAvg.ranking
microRNADeepHit0.6370.6470.6170.6270.6020.5590.6070.6660.6417.8
DNA methylationDeepHit0.6380.6270.6020.5580.6460.5560.6020.6240.6039.9
Direct Learningmulti-omicsMODCHit0.6300.6550.5990.5840.6200.5410.6130.6450.6209.8
MOEAHit0.6360.6220.5930.5990.6100.5580.5900.6410.60010.7
MOFBHit0.6600.6460.5980.6040.6290.5720.6170.6760.6247.7
MOSAHit0.6930.7040.6410.5980.6810.5560.6230.6490.6355.4
microRNADeepHit0.6530.6500.6130.6530.6280.5870.6320.6390.6416.7
DNA methylationDeepHit0.6670.6950.6060.6130.6710.5850.6160.6590.6286.3
Federated Transfermulti-omicsMODCHit0.6760.6820.6180.6570.6410.5910.6390.6760.6543.3
MOEAHit0.6700.6510.6060.6330.6100.5890.6350.6660.6475.4
MOFBHit0.7040.7010.6360.6480.6450.5970.6450.6860.6462.6
MOSAHit0.6990.7310.6560.6570.6730.6000.6450.6800.6451.7
Table 5

|$C^{td}$|-index values of different approaches with microRNA expression and DNA methylation data in the contexts of federated transfer learning with differential privacy and direct learning using only target cancer data. The last column displays the average rankings of different methods trained using federated transfer learning and direct learning for the nine target cancers. Note that the top two results are emphasized in bold and underlined, respectively

Training modeDataMethodBRCAKIRCBLCACOADLUADLUSCHNSCESCALIHCAvg.ranking
microRNADeepHit0.6370.6470.6170.6270.6020.5590.6070.6660.6417.8
DNA methylationDeepHit0.6380.6270.6020.5580.6460.5560.6020.6240.6039.9
Direct Learningmulti-omicsMODCHit0.6300.6550.5990.5840.6200.5410.6130.6450.6209.8
MOEAHit0.6360.6220.5930.5990.6100.5580.5900.6410.60010.7
MOFBHit0.6600.6460.5980.6040.6290.5720.6170.6760.6247.7
MOSAHit0.6930.7040.6410.5980.6810.5560.6230.6490.6355.4
microRNADeepHit0.6530.6500.6130.6530.6280.5870.6320.6390.6416.7
DNA methylationDeepHit0.6670.6950.6060.6130.6710.5850.6160.6590.6286.3
Federated Transfermulti-omicsMODCHit0.6760.6820.6180.6570.6410.5910.6390.6760.6543.3
MOEAHit0.6700.6510.6060.6330.6100.5890.6350.6660.6475.4
MOFBHit0.7040.7010.6360.6480.6450.5970.6450.6860.6462.6
MOSAHit0.6990.7310.6560.6570.6730.6000.6450.6800.6451.7
Training modeDataMethodBRCAKIRCBLCACOADLUADLUSCHNSCESCALIHCAvg.ranking
microRNADeepHit0.6370.6470.6170.6270.6020.5590.6070.6660.6417.8
DNA methylationDeepHit0.6380.6270.6020.5580.6460.5560.6020.6240.6039.9
Direct Learningmulti-omicsMODCHit0.6300.6550.5990.5840.6200.5410.6130.6450.6209.8
MOEAHit0.6360.6220.5930.5990.6100.5580.5900.6410.60010.7
MOFBHit0.6600.6460.5980.6040.6290.5720.6170.6760.6247.7
MOSAHit0.6930.7040.6410.5980.6810.5560.6230.6490.6355.4
microRNADeepHit0.6530.6500.6130.6530.6280.5870.6320.6390.6416.7
DNA methylationDeepHit0.6670.6950.6060.6130.6710.5850.6160.6590.6286.3
Federated Transfermulti-omicsMODCHit0.6760.6820.6180.6570.6410.5910.6390.6760.6543.3
MOEAHit0.6700.6510.6060.6330.6100.5890.6350.6660.6475.4
MOFBHit0.7040.7010.6360.6480.6450.5970.6450.6860.6462.6
MOSAHit0.6990.7310.6560.6570.6730.6000.6450.6800.6451.7

It is evident from these results that federated transfer learning can effectively leverage the useful knowledge from related cancer datasets distributed across various source nodes to help improve survival prediction of the target cancer in most situations. For example, MOSAHit method trained using federated transfer learning can learn more robust and universal combinatorial feature representations across multiple omics for survival prediction of target cancers, showing significant improvement over direct learning using only the target cancer data. Moreover, we could see that MOSAHit method, trained using federated transfer learning, shows superior performance across various multi-omics data types compared with other federated transfer learning methods. In addition, the last column in Tables 4 and 5 displays the average rankings of |$C^{td}$|-index for different methods trained using federated transfer learning and direct learning, respectively, over the nine target cancer datasets. It is clear that MOSAHit method, trained using federated transfer learning, presents the lowest average ranking. These results fully demonstrate that our proposed method can enable the learning of a more robust multi-omics survival prediction model for a local target cancer with limited training data by effectively leveraging information from the abundant multi-omics data of related cancers distributed across multiple institutions while preserving individual privacy.

Conclusion

Over the past several years, deep learning algorithms for multi-omics data have received considerable attention as a decision support system to assist clinicians in cancer diagnosis and treatment. Although multi-omics data can help us explore inter-patient dramatic discrepancy in molecular to better understand cancer heterogeneity and complexity, it suffers from the “big |$p$|⁠, small |$n^{\prime}$|’ problem, which makes the integration of multi-omics data for survival prediction of specific cancers particularly challenging. Thus, we leverage the information from the abundant data of related cancers to aid the model’s learning on specific cancer. We cannot aggregate the related cancer multi-omics data from multiple institutions into a centralized data repository due to data privacy and security. Federated learning provides a method for learning from distributed data across various institutions, without sharing sensitive patient data.

In this work, we propose a multi-omics survival prediction model with self-attention mechanism (MOSAHit), trained within a federated transfer learning framework with differential privacy. This approach enables the development of a more robust multi-omics survival prediction model for a local target cancer with limited training data by effectively leveraging related cancer multi-omics data distributed across multiple institutions while preserving individual privacy. Specifically, federated transfer learning for MOSAHit captures common features across different cancers distributed among multiple institutions through the parameter sharing mechanism, to help learn more robust and universal combinatorial feature representations across multiple omics for survival prediction of local target cancers. Results from the comprehensive experiments on TCGA datasets and TARGET-WT dataset show that the proposed method effectively alleviates data insufficiency and significantly improves the generalization performance of multi-omics survival prediction models for a target cancer, without directly sharing related cancer multi-omics data from multiple institutions. Recently, computational pathology has achieved impressive results on many clinically relevant tasks, and future research can explore how to effectively integrate multi-omics and pathology data in a distributed environment to improve patient survival predictions without exposing patients’ sensitive information.

Key Points
  • We propose a multi-omics survival prediction model with self-attention mechanism (MOSAHit), trained within a federated transfer learning framework with differential privacy. This approach enables the development of a more robust multi-omics survival prediction model for a local target cancer with limited training data by effectively leveraging relevant cancer multi-omics data distributed across multiple institutions while preserving individual privacy.

  • Federated transfer learning for MOSAHit captures common features across different cancers distributed among multiple institutions through the parameter sharing mechanism, to help learn more generalized and universal combinatorial feature representations across multiple omics for survival prediction of local target cancers.

  • We experimentally validated MOSAHit method, trained using federated transfer learning with differential privacy, on TARGET-WT dataset and 10 cancer datasets from TCGA database. Results from the comprehensive experiments on multiple real-world datasets show that the proposed method effectively alleviates data insufficiency and significantly improves the generalization performance of multi-omics survival prediction models for a target cancer while avoiding the direct sharing of multi-omics data for related cancers.

Competing interests

No competing interest is declared.

Funding

This work was funded by National Natural Science Foundation of China projects under grant nos. 12222115 and 92470106.

Data availability

All data used in this manuscript are publicly available. The TCGA dataset underlying this article can be accessed from https://www.cancer.gov/ccg/research/genome-sequencing/tcga. The source codes are available at https://github.com/LiminLi-xjtu/Federated-Transfer-MOSAHit.

References

1.

Dey
 
T
,
Lipsitz
 
SR
,
Cooper
 
Z
. et al.  
Survival analysis-time-to-event data and censoring
.
Nat Methods
 
2022
;
19
:
906
8
.

2.

Louis
 
DN
,
Perry
 
A
,
Reifenberger
 
G
. et al.  
The 2016 world health organization classification of tumors of the central nervous system: A summary
.
Acta Neuropathol
 
131
:
803
,
2016
20
.

3.

Cox
 
DR
.
Regression models and life-tables
.
J R Stat Soc B Methodol
 
1972
;
34
:
187
202
.

4.

Kenton
 
JDM-WC
,
Toutanova
 
LK
.
Bert: Pre-training of deep bidirectional transformers for language understanding
. In:
Proceedings of NAACL-HLT
, Minneapolis, Minnesota, ACL, Stroudsburg, PA, Vol.
1
, p.
2
.
2019
.

5.

Zifeng
 
W
,
Shen
 
C
,
Van Den Hengel
.
Wider or deeper: Revisiting the resnet model for visual recognition
.
Pattern Recognit
 
2019
;
90
:
119
33
.

6.

Katzman
 
JL
,
Shaham
 
U
,
Cloninger
 
A
. et al.  
Deepsurv: Personalized treatment recommender system using a cox proportional hazards deep neural network
.
BMC Med Res Methodol
 
2018
;
18
:
1
12
.

7.

Lee
 
C
,
Zame
 
W
,
Yoon
 
J
,
Mihaela
 
Van Der Schaar
.
Deephit: A deep learning approach to survival analysis with competing risks
. In
Proceedings of the AAAI conference on artificial intelligence
, New Orleans, USA, AAAI, Menlo Park, CA, volume
32
,
2018
. .

8.

Ching
 
T
,
Zhu
 
X
,
Garmire
 
LX
.
Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data
.
PLoS Comput Biol
 
2018
;
14
:1–18.

9.

Hao
 
J
,
Kim
 
Y
,
Mallavarapu
 
T
. et al.  
Interpretable deep neural network for cancer survival analysis by integrating genomic and clinical data
.
BMC Med Genomics
 
2019
;
12
:
1
13
.

10.

Chaudhary
 
K
,
Poirion
 
OB
,
Liangqun
 
L
. et al.  
Deep learning–based multi-omics integration robustly predicts survival in liver cancer
.
Clin Cancer Res
 
2018
;
24
:
1248
59
.

11.

Kalakoti
 
Y
,
Yadav
 
S
,
Sundar
 
D
.
Survcnn: A discrete time-to-event cancer survival estimation framework using image representations of omics data
.
Cancer
 
2021
;
13
:
3106
.

12.

Wang
 
Y
,
Zhang
 
Z
,
Chai
 
H
. et al.  
Multi-omics cancer prognosis analysis based on graph convolution network
. In:
2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
.
Piscataway, NJ: IEEE
, pp.
1564
8
.
2021
.

13.

Li
 
Y
,
Wang
 
L
,
Wang
 
J
. et al.  
Transfer learning for survival analysis via efficient l2, 1-norm regularized cox regression
. In:
In 2016 IEEE 16th International Conference on Data Mining (ICDM)
.
Piscataway, NJ: IEEE
, pp.
231
40
.
2016
.

14.

Kim
 
S
,
Kim
 
K
,
Choe
 
J
. et al.  
Improved survival analysis by learning shared genomic information from pan-cancer data
.
Bioinformatics
 
2020
;
36
:
i389
98
.

15.

Cho
 
HJ
,
Shu
 
M
,
Bekiranov
 
S
. et al.  
Interpretable meta-learning of multi-omics data for survival analysis and pathway enrichment
.
Bioinformatics
 
2023
;
39
:
btad113
.

16.

Abadi
 
M
,
Chu
 
A
,
Ian
 
G
. et al.  
Deep learning with differential privacy
. In:
Proceedings of the 2016 ACM SIGSAC conference on computer and communications security
. New York, USA: ACM, pp.
308
,
2016
18
.

17.

Brendan
,
McMahan
 
H
,
Moore
 
E
,
Ramage
 
D
. et al.  
Federated Learning of Deep Networks Using Model Averaging
 
arXiv preprint arXiv:1602.05629
, Vol.
2
,
2016
.

18.

McMahan
 
B
,
Moore
 
E
,
Ramage
 
D
. et al.  
Communication-efficient learning of deep networks from decentralized data
. In:
Artificial Intelligence and Statistics
, pp.
1273
82
.
PMLR
,
2017
.

19.

Zhu
 
L
,
Liu
 
Z
,
Han
 
S
.
Deep leakage from gradients
.
Advances in neural information processing systems
 
2019
, vol.
32
, pp. 17–31.

20.

Geiping
 
J
,
Bauermeister
 
H
,
Dröge
 
H
. et al.  
Inverting gradients-how easy is it to break privacy in federated learning?
 
Advances in neural information processing systems
 
2020
;
33
:
16937
47
.

21.

Carlini
 
N
,
Liu
 
C
,
Erlingsson
 
Ú
. et al.  
The secret sharer: Evaluating and testing unintended memorization in neural networks
. In:
28th USENIX security symposium (USENIX security 19)
. Berkeley, CA: USENIX Association, pp.
267
84
,
2019
.

22.

Zhang
 
Y
,
Jia
 
R
,
Pei
 
H
. et al.  
The secret revealer: Generative model-inversion attacks against deep neural networks
. In:
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
. Piscataway, NJ: IEEE, pp.
253
61
,
2020
.

23.

Dwork
 
C
,
Roth
 
A
. et al.  
The algorithmic foundations of differential privacy. Foundations and trends|$\circledR $|
.
Theor Comput Sci
 
2014
;
9
:
211
407
.

24.

Andreux
 
M
,
Manoel
 
A
,
Menuet
 
R
. et al.  
Federated survival analysis with discrete-time cox models
 
arXiv preprint arXiv:2006.08997
.
2020
.

25.

Lu
 
MY
,
Chen
 
RJ
,
Kong
 
D
. et al.  
Federated learning for computational pathology on gigapixel whole slide images
.
Med Image Anal
 
2022
;
76
:102298.

26.

Wang
 
Q
,
He
 
M
,
Guo
 
L
. et al.  
Afei: Adaptive optimized vertical federated learning for heterogeneous multi-omics data integration
.
Brief Bioinform
 
2023
;
24
:
bbad269
.

27.

Kim
 
J-H
,
On
 
K-W
,
Lim
 
W
. et al.  
Hadamard product for low-rank bilinear pooling
, In:
5th International Conference on Learning Representations
, ICLR 2017.

28.

Li
 
R
,
Xingqi
 
W
,
Li
 
A
. et al.  
Hfbsurv: Hierarchical multimodal fusion with factorized bilinear models for cancer survival prediction
.
Bioinformatics
 
2022
;
38
:
2587
94
.

29.

Vaswani
 
A
.
Attention is all you need
.
Advances in Neural Information Processing Systems
 
2017
;
30
.

30.

Guo
 
Q
,
Qiu
 
X
,
Liu
 
P
. et al.  
Multi-scale self-attention for text classification
. In
Proceedings of the AAAI conference on artificial intelligence
. Menlo Park, CA: AAAI, volume
34
, pages
7847
54
,
2020
.

31.

Zhu
 
Y
,
Qiu
 
P
,
Ji
 
Y
.
Tcga-assembler: Open-source software for retrieving and processing tcga data
.
Nat Methods
 
2014
;
11
:
599
600
.

32.

Huang
 
G
,
Mao
 
J
.
Identification of a 12-gene signature and hub genes involved in kidney wilms tumor via integrated bioinformatics analysis
.
Front Oncol
 
2022
;
12
:877796.

33.

Antolini
 
L
,
Boracchi
 
P
,
Biganzoli
 
E
.
A time-dependent discrimination index for survival data
.
Stat Med
 
2005
;
24
:
3927
44
.

34.

Harrell
 
FE
,
Califf
 
RM
,
Pryor
 
DB
. et al.  
Evaluating the yield of medical tests
.
JAMA
 
1982
;
247
:
2543
6
.

35.

Li
 
X
,
Yufeng
 
G
,
Dvornek
 
N
. et al.  
Multi-site fmri analysis using privacy-preserving federated learning and domain adaptation: Abide results
.
Med Image Anal
 
2020
;
65
:101765.

36.

Sun
 
D
,
Wang
 
M
,
Li
 
A
.
A multimodal deep neural network for human breast cancer prognosis prediction by integrating multi-dimensional data
.
IEEE/ACM Trans Comput Biol Bioinform
 
2018
;
16
:
841
50
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]

Supplementary data