Interpretable meta-learning of multi-omics data for survival analysis and pathway enrichment

Abstract Motivation Despite the success of recent machine learning algorithms’ applications to survival analysis, their black-box nature hinders interpretability, which is arguably the most important aspect. Similarly, multi-omics data integration for survival analysis is often constrained by the underlying relationships and correlations that are rarely well understood. The goal of this work is to alleviate the interpretability problem in machine learning approaches for survival analysis and also demonstrate how multi-omics data integration improves survival analysis and pathway enrichment. We use meta-learning, a machine-learning algorithm that is trained on a variety of related datasets and allows quick adaptations to new tasks, to perform survival analysis and pathway enrichment on pan-cancer datasets. In recent machine learning research, meta-learning has been effectively used for knowledge transfer among multiple related datasets. Results We use meta-learning with Cox hazard loss to show that the integration of TCGA pan-cancer data increases the performance of survival analysis. We also apply advanced model interpretability method called DeepLIFT (Deep Learning Important FeaTures) to show different sets of enriched pathways for multi-omics and transcriptomics data. Our results show that multi-omics cancer survival analysis enhances performance compared with using transcriptomics or clinical data alone. Additionally, we show a correlation between variable importance assignment from DeepLIFT and gene coenrichment, suggesting that genes with higher and similar contribution scores are more likely to be enriched together in the same enrichment sets. Availability and implementation https://github.com/berkuva/TCGA-omics-integration.


Introduction
Traditionally, survival analysis has usually been done by means of statistical analyses that aim to identify the distributions of event times and the statistical properties of the estimated parameters. In comparison, machine learning algorithms aim to produce more accurate survival prediction by combining established statistical methods with machine learning methods. Yet, a high ratio of missing survival information, data sparsity, missing features, and labels can cause significant loss of performance even with some of the most recent machine learning algorithms. In particular, deep neural network machine learning approaches, which have shown promising results in various aspects of biomedical studies, require a large amount of well-organized and complete data for training, which is rare in biomedical settings.
Recently, high throughput techniques together with the continuously decreasing cost of sequencing have resulted in a wide availability of multiomics data. Because different molecular processes starting on DNA and resulting in protein products are intricately interconnected, omics data derived from different stages of gene and protein expression are likely to contain complementary information. However, the explicit relationships among multiomics data are complex and remain poorly understood. Moreover, many omics data suffer from the problem of 'big P, small N'-having significantly higher number of features (P) than the number of data samples (N). This makes integrating and interpreting multiomics data even more challenging.
As a result of those challenges, many existing works on survival analysis use a single omics dataset, such as clinical or transcriptomics (Spooner et al. 2020;Srujana et al. 2022). Only recently, advanced machine learning-based approaches including transfer learning and representation learning have been applied to alleviate the 'big P, small N' problem in omics datasets (Kim et al. 2020;Qiu et al. 2020). Although such approaches have shown promising advances in survival analysis, the integration of multiple omics data is likely to further improve the performance of advanced machine learning applications on survival analysis and related tasks due to the unique feature sets in each omics dataset. However, the lack of standards and applications for omics data integration challenge the attempt for multi-omics integration for machine learning applications.
Meta-learning methods have been adopted widely to handle multiple related tasks (Finn et al. 2017). These learning-to-learn methods are first trained on a distribution of related tasks before being fine-tuned and evaluated on target tasks. It has been shown that meta-learning methods often require fewer samples than more traditional deep learning models (Triantafillou et al. 2020;Gevaert 2021). As a result, we propose a meta-learning approach that uses multi-omics datasets to train a hazard predictive model for cancer survival analysis. In particular, we use transcriptomics (RNA-seq), proteomics, and clinical datasets from The Cancer Genome Atlas (TCGA). Compared with most prior studies that rely exclusively on one omics dataset for pan-cancer analysis, we apply our metalearning model on various combinations of integrated omics datasets.
Additionally, we analyze the benefit of multi-omics data integration on identifying differentially expressed genes for deep learningbased survival analysis. Given the black-box nature of deep neural network-based algorithms, the convenience of applying them leads to difficulties in interpretability. To address this, we apply an advanced variable importance analysis method for deep learning, DeepLIFT (Deep Learning Important FeaTures) (Shrikumar et al. 2017), and compare pathway enrichment for transcriptomics and multi-omics data. More specifically, each input gene to our metalearning model is assigned a DeepLIFT contribution score. We then set a cutoff score and assign genes with above-the-cutoff-score as the input to g: Profiler (Raudvere et al. 2019) to identify enriched pathways. Additionally, we use Gene Set Enrichment Analysis (GSEA) (Subramanian et al. 2020) for pathway enrichment analysis from a ranked list of genes' DeepLIFT scores. Finally, we use a neural network extension of the Cox regression model and the protein enrichment database STRING (Jensen et al. 2009) to show that genes with similar DeepLIFT contribution scores are statistically more likely to be enriched together.

Multi-omics data integration
Statistical approaches have been applied for using integrated pancancer multi-omics data. For instance, (Mankoo et al. 2011) integrated gene expression, miRNA expression, copy number alteration, and DNA methylation data to predict time to recurrence, survival, and risk predictions in ovarian cancer using multivariate Cox Lasso. iCluster (Shen et al. 2009) is a tumor subtype estimation method that works by joint latent variable modeling for integrative clustering. It fits a regularized latent variable model based clustering to generate an integrated cluster assignment based on joint inference across multiple genomic data types, such as copy number, gene expression, and DNA methylation. PARADIGM (Vaske et al. 2010;Subramanian et al. 2020) infers the activities of patient-specific biological pathways from multi-omics data using Bayesian factor graphs. MANCIE (Zang et al. 2016) uses a Bayesian-supported principal component analysis-based approach to address the noises and biases that arise in high-dimensional genomic data integration. Various forms of non-negative matrix factorization (NMF) methods have also been introduced for identifying modules of correlated multi-omics data, sample clustering and subtype discovery (Lee and Seung 1999;Zhang et al. 2012;Yang and Michailidis 2016;Chalise and Fridley 2017).
Various machine learning approaches have been adopted for multi-omics data. MOLI (Sharifi-Noghabi et al. 2019) and DeepMO ) implement subnetworks on each omics dataset and concatenate the obtained deep features for prediction of drug activity (Picard et al. 2021). Ma and Zhang (2018) integrated gene expression, miRNA expression, and DNA methylation data and implemented Multi-view Factorization AutoEncoder (MAE) to predict target clinical variables.
The recent development of data repositories such as BioGRID (Oughtred et al. 2021) and STRING (Szklarczyk et al. 2021) have allowed the integration of a priori information and inter-omics relationships for model development and interpretation. While new datasets are constantly being added to them, the integration of features that are relevant but vastly different such as demographics, genes, and mutation significance remains challenging, and there exists no standards for feature engineering and model selection for such multi-omics data, especially for pan-cancer data.

Pan-cancer data analysis and molecular pathways
Numerous methods of pan-cancer analysis have been conducted for different cancer types, such as identifying molecular subtypes significant for clinical studies (Sohn et al. 2017;Oh et al. 2020), classifying genes with high morbidity correlation (Teo et al. 2019), and building next-generation sequencing (NGS)-based diagnostics for lung cancer (Chun et al. 2019).
Traditional machine learning algorithms, such as random forests and support vector machines, have shown promising performance in various areas of pan-cancer data analysis, including cancer type classification and detection of biomarkers (Furey et al. 2000;Toth et al. 2019;Liñares-Blanco et al. 2020).
More recently, non-parametric machine learning models, particularly deep neural networks, have been applied to more complex pan-cancer data analysis, such as multi-omics cancer type classification and survival prediction. For example, Kong et al. (2021) used an autoencoder and a backward propagation-based neuron importance assignment method to capture nonlinear relationships among genes to develop a robust cancer type classification that outperformed L1000 expression profiling, a reduced representation of $1000 genes whose expression is highly representative (Subramanian et al. 2017). Withnell et al. (2021) trained a variational autoencoder to capture the dense genetic relationships for each cancer type to classify different cancer types. Kim et al. (2020) applied transfer learning through a variational autoencoder, transferring trained encoder parameters to a Coxhazard based prediction network. Their VAECox model improved regularization-based Cox Hazard models on 7 out of 10 cancer types tested. They applied Pearson's correlation coefficient on the neurons in the hidden layers of their encoder network with the most variance for determining the most important genes for breast cancer. This led them to identify cancer-related genes with existing literature support. A recent work on cancer type classification found that meta-learning outperformed more traditional machine learningbased approaches (Chou et al. 2020). Qiu et al. (2020) used a metalearning-based Cox module on various cancer types to train a quickly adaptive model, learning the common latent information across the training cancer types. They showed that their model achieved notable testing performance increases compared to direct learning and transfer learning methods. They used gradient-based feature importance followed by GSEA to determine the important molecular pathways for each of their target cancer types.

Data and preprocessing
The Cancer Genome Atlas Program, or TCGA, is a landmark cancer program that provides public cancer genomics datasets covering 33 cancer types. We use transcriptomics (RNA-seq), proteomics, and clinical datasets. These datasets contain information in different molecular stages: proteins are translated from RNA, and the functional mechanisms of cells are determined from proteins, leading to various clinically relevant outcomes.
For the clinical and the proteomics datasets, we imputed missing numeric feature values using the k-nearest neighbors (KNN) method, with k ¼ 1. We used label encoding to encode categorical features. In the clinical dataset, we manually removed the feature that contains patients' vital status, which has a high correlation with their survival status. The transcriptomics dataset required no imputation. We then applied z-score transformation for each dataset after imputation. Finally, in order to remove features with high correlation to each other, we filtered out features with over 0.7 Pearson correlation to at least one other feature.
Post-imputation, scaling, and feature engineering, the transcriptomics, the clinical, and the proteomics datasets in the TCGA comprised 11 068, 8490, and 5853 patients, respectively, and 13 960, 10 039, and 227 features. To integrate the multiple omics datasets, we reapplied z-scale transformation and filtered out features with a correlation of over 0.7 to another feature in the integrated dataset.
Although there are 33 cancer types in TCGA, the variations among them can result in high bias and noise when all are used to construct the input for the learning model. Therefore, we carefully selected 17 cancer types that satisfy the following two conditions: (1) sufficient number of patients in all three datasets and (2) long follow-up times. We reviewed , a comprehensive study of the quality of TCGA clinical data, and selected ACC, BRCA, CESC, COAD, ESCA, GBM, KIRP, LGG, LUSC, OV, PAAD, STAD, UCEC, and UCS as source tasks and BLCA, HNSC, and LUAD as target tasks. The separation of train and evaluation data was chosen arbitrarily. Even though LUAD and LUSC are both lung-related cancer types, we do not merge them into one lung cancer type like (Qiu et al. 2020) did because they are known to be transcriptomically different (Relli et al. 2019). There were 4647 patients whose data existed in all three omics datasets. We reserved 30 samples from each cancer type as the query set for model evaluation on each respective cancer type and used the rest as the support set for training the model. The size of support and query sets for each cancer type, as well as the proportion of censored patients, are shown in Table 1. A patient's data is censored if the patient left the TCGA study before death; hence, their time-to-death is unknown.

Optimization
Driven by statistical analysis and the existence of censored data, non-parametric methods have become more popular over parametric methods due to the following reasons: (i) fewer assumptions and (ii) no distributional constraints on survival times. Given many collected features in a dataset, Cox regression models determine which features conduce to a probability of survival. By taking the natural logarithm of the hazard function, we obtain log hðt; xÞ ¼ log h 0 ðtÞ þ b T x, where h 0 is the baseline hazard, and x is a set of covariates. This means the hazard ratio does not depend on the baseline hazard and remains constant. This leads to one of the most widely used time-to-event analysis, Cox Proportional Hazard (Cox-PH) method. Cox-PH is a semi-parametric model that calculates the probability of an event occurring in relation to a time variable.
The Cox hazard loss L C is used to find the optimal partial loglikelihood estimates of the model parameters b: where for sample i, C i ¼ 1 indicates the occurrence of the event of interest, X i is the representation of the dependent variables, and Y i is the time to event occurrence.
Recently, the Cox-PH model has been combined with various forms of artificial deep neural networks, allowing flexible model selection and thereby enhancing the performance compared to regularized Cox-PH methods, such as in Kim et al. (2020) and Qiu et al. (2020). To study whether the underlying information contains complementary survival data in spite of weak correlations among the omics datasets, we implemented a meta-learning model and applied it on the transcriptomics and on various integrated omics datasets with Cox hazard loss.

Meta-learning for survival analysis
Meta-learning is an algorithm that allows quick adaptation across multiple similar datasets. It is first trained on related source data, followed by fine-tuning and evaluation on target tasks. Backpropagation of the gradients generated from the loss incurred from each task leads to the optimal values of h, as illustrated in Fig. 1. Its parameters h are generically trained to minimize respective cox hazard loss, L, allowing quick adaptation to new tasks.
The meta-learning model used in our survival analysis is Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (MAML) (Finn et al. 2017). MAML is a task-agnostic algorithm for  Our meta-learning model's parameters h are trained over a distribution of source cancer types, and the corresponding loss L are generated. h are optimized by gradient descent, or back-propagation of rL taski meta-learning that trains model parameters such that a small number of gradient updates will lead to fast learning on a new task. MAML's learning (training) process is largely divided into two parts: inner and outer loop. The inner loop trains on source tasks, and multiple inner loop training steps together update the parameters in the outer loop so that they are generalized to adapt to new (target) tasks quickly. Within each task, the data are split into support and query sets. The model is trained on the support sets and its performance is evaluated on each task using the query set. We split the support and query sets so that the query set contains 30 samples, and the rest is used as the support set.
Our MAML model, we used a four-layered feed-forward neural network. For nonlinear activation, we used ReLU. For inner loop training, we use SGD optimizer and a learning rate of 0.01. For outer loop training, we use Adam optimizer and a learning rate of 0.0001. Finally, we compare our model's performance with the performance of direct learning with 5-fold cross validation.

Identifying important molecular pathways with explanations
As detailed below, the variable importance assignment approaches that we applied produce scores associated with our input variables. This includes scores associated with thousands of genes. Consequently, to assess the significance and cancer relevance of the gene sets that are driving survival prediction, we took two common pathway enrichment approaches: (1) applying a variable importance score cutoff and inputting the resulting gene list into a pathway enrichment tool and (2) using all annotated genes in the genome and their associated scores to arrive at enriched pathways.
For the first approach, we used g: Profiler to find functionally enriched pathways on the limited gene list. For the second approach, we performed gene set enrichment analysis (GSEA) by using a ranked list of genes with scores. The selection of genes and the assignment of scores were done by applying DeepLIFT (Deep Learning Important FeaTures) to our meta-learning model that estimates the patients' probability of survival by optimizing cox-loss. DeepLIFT is a recently developed method for gaining interpretability for deep neural networks. Interpretability methods applicable for deep neural networks such as DeepLIFT largely consist of two methods: forward propagation and backward propagation methods. Forward propagation methods, such as (Alipanahi et al. 2015;Zhou and Troyanskaya 2015), generate permutations of input features to measure the impact of each feature on the output. This is highly inefficient due to exponential growth of possible permutations with a large feature set. More recently, backward propagation interpretability methods such as DeepLIFT have been adopted widely for model intelligibility.
DeepLIFT first assigns reference values, or null values, to the input neurons in the network. It forward propagates the difference between the value and its reference value. Then, it calculates how much each input affects its subsequent neuron, or it 'blames' the difference from reference of the output on the difference from reference of the input. This difference is then allocated (or blamed) to each input neuron by back-propagation, assigning contribution scores to all neurons in the network. Related interpretability methods, such as integrated gradients (Sundararajan et al. 2017) and SHAP (SHapley Additive exPlanations) (Lundberg and Lee 2017), calculate variable contribution in similar back propagation-based approaches.

Gene-set co-enrichment
We also studied whether DeepLIFT importance analyses are correlated with gene coenrichment using STRING (Jensen et al. 2009), a database for protein-protein interaction networks and functional enrichment. Specifically, we used the meta-learning model from survival analysis that was trained on the source cancer types and finetuned to each target cancer type on the transcriptomics data.
We analyzed the degree of correlation between genes with similar contribution scores and genes' coenrichment in functional sets in STRING. To test this, we sorted the genes by their contribution scores by applying DeepLIFT to the fine-tuned meta-learning model and checked how the genes in the same enrichment set are clustered together in the sorted list compared with how they are in a randomly sorted list. In each enrichment set, we chose the first gene as the anchor gene and checked whether the surrounding genes in a certain window size around it in the sorted and unsorted list are in the same enrichment set. We generated STRING enrichment sets using genes in the 50th, 70th, and 90th percentiles of DeepLIFT scores. This process is illustrated in Fig. 2.
Being able to learn gene coenrichment information via importance analyses without prior knowledge indicates the functional, pathological, and semantic relationships among genes that are captured by the model. This experiment is conducted using the transcriptomics genes for the three target cancer types: BLCA, HNSC, and LUAD.

Evaluation metrics
In this section, we explain how we evaluate the performance of survival analysis by meta-learning, enriched pathways, and gene coenrichment.
Cox loss-based models predict the probability of survival given a time step. Concordance index (C-index) measures the relative order of predicted patients' hazard times to the actual order. Hence, it is Figure 2. The process of studying the correlation between genes with similarly high DeepLIFT contribution scores and their likelihood of being enriched together in the same STRING enrichment sets one of the most widely used evaluation metrics for Cox loss-based survival analyses. It is calculated by dividing the number of concordant pairs by the sum of concordant and discordant patient pairs. A pair of patients is concordant if the predicted order of the event is the true order; otherwise, it is a discordant pair. We evaluated the C-index of the meta-learning model on the target tasks on two conditions: using transcriptomics dataset alone and using different combinations of the three datasets.
A metric for evaluating the error of survival prediction includes the integrated Brier score. It is calculated by the mean squared differences between predicted and ground truth survival probabilities at a given time step. Therefore, we use integrated Brier score (Clarke and Wallis 2008) to evaluate the model's survival probabilities for different combinations of the three datasets.
Given a list of enriched genes and attribution scores, GSEA calculates the number of statistically significantly enriched molecular pathways differentially expressed between survived and deceased patients. The higher the attribution scores are for the genes that are coenriched in a pathway for a group of patients, the more statistically significant the pathways. However, there exists no standard for objectively comparing the goodness of enriched pathways. Therefore, we provide literature support for a few selected pathways, supporting their enrichment status for cancer.
As a follow-up study for interpreting DeepLIFT scores and pathway analysis, we compare how genes with similar DeepLIFT attribution scores appear in the same STRING enrichment sets with how genes with different attribution scores do. We queried STRING with genes with the highest attribution scores in the 50th, 70th, and 90th percentiles to observe correlation between attribution scores and gene coenrichment. Finally, we studied how the distribution of categories of enriched pathways change with different percentiles of DeepLIFT attribution scores.

Meta-learning for survival analysis
In Table 2, we show the performance of meta-learning on individual, paired, and integrated omics datasets on the target cancer types.
For convenience, we denote our meta-learning model M trained on each dataset d as M d , such as M c , M t , M p , M c;t , M t;p , M p;c , and M c;t;p , where c, t, and p indicate the clinical, the transcriptomics, and the proteomics, respectively, and any combination of two or all three of these subscripts indicates dataset integration. Similarly, we denote the direct learning baseline model trained on c, t, p as D c;t;p . The direct learning method was trained on all training data simultaneously. Its performance was then evaluated on the 30 query set of each target cancer type. Its c-index scores were 0.66, 0.62, and 0.50 on BLCA, HNSC, and LUAD, respectively. Qiu et al. (2020) described how meta-learning performance is superior to direct learning as well as regular transfer learning when all models were trained on the transcriptomics dataset. The results from our experiment extend their analysis; on BLCA, HNSC, and LUAD, M t recorded 0.74, 0.70, and 0.62 c-index, outperforming D c;t;p . Compared with M c or M t , M p performed worse, with 0.58 on BLCA, 0.60 on HNSC, and 0.53 on LUAD. We believe this to be due to the relatively small number of features compared with the other two datasets.
We then ran survival analysis on pairwise integrated datasets. To perform data integration, we merged the features from each pair of datasets and removed features with higher than 0.7 Pearson correlation to at least one other feature. The resulting dimensions after integrating and removing the feature were 23 994 for the clinical and the transcriptomics datasets, 13 724 for the proteomics and the transcriptomics datasets, and 10 266 for the proteomics and the clinical datasets. M c;t performed similarly to M c but outperformed M t across all target cancer types, scoring 0.75 on BLCA, 0.77 on HNSC and 0.72 on LUAD. We analyzed which significantly mutated genes were associated with the highest DeepLIFT scores and validated their direct or indirect contributions to each cancer type by existing research support: COPZ2, MED23, SUZ12, CYP4B1, NFE2L2 for BLCA (Imaoka et al. 2000;Fan et al. 2014;Sharma et al. 2019;Yi et al. 2021;Zhang et al. 2022); JAK2 for HNSC (William et al. 2021); 22q12.1, SNRPD3, SLC14A2, RCVTB2, ITGAM, Mrc1, SETD2 for LUAD (Neville et al. 1995;Wang et al. 2014;Tariq et al. 2017;van Olst et al. 2017;Wang et al. 2017;Bai et al. 2019;Chang et al. 2021;Gao et al. 2021;Li et al. 2021;Larroquette et al. 2022). M t;p experienced a decreased performance on BLCA and HNSC compared with M t but an increase on LUAD by scoring 0.64, 0.63, and 0.66, respectively. Compared with M c , M c;p experienced a decreased performance on BLCA but meaningful increases on HNSC and LUAD with c-index of 0.66, 0.79, and 0.73, respectively.
Finally, we expanded our survival analysis by integrating all three datasets. As we did to the paired datasets, we also removed features with 0.7 or higher Pearson correlation, resulting in 24 218 features. M c;t;p achieved the best c-index on all cancer types, with 0.79, 0.84, and 0.73 scores respectively. The fact that M c;t;p has performed better than M c;t tells us that the addition of the proteomics dataset was a contributor to the increased performance. This shows that although the integration of omics datasets can lead to improved performance on survival analysis, it can be highly dependent on the types of omics datasets involved as well as the specific cancer types that the model is applied to.
Our results summarized in Table 2 show that meta-learning can perform superior to direct learning and that the integration of certain omics dataset can lead to increased performance, which can depend on the types of integrated omics dataset and on the cancer type. Moreover, it is noteworthy that the performance of survival analysis can significantly rely on several factors, including the data preprocessing methods, the selection of source and target cancer types, and the quantity of samples used for fine-tuning the model for each target cancer. These results are mostly aligned with the integrated brier scores summarized in Table 3.

Identifying important molecular pathways with explanations
We tried four DeepLIFT reference values: the median and the mean feature values for deceased patients and survived patients Table 2. 95% confidence interval for concordance index achieved by metalearning on using individual and integrated datasets.

Omics
No individually. The median feature values for deceased patients yielded higher number of enriched molecular pathways.
In Table 4, we show statistically enriched KEGG (Kanehisa and Goto 2000) and WikiPathways (Pico et al. 2008) using g: Profiler and GSEA. For selecting the query genes against the background genes for g: Profiler, we filtered out the genes with less than the mean scores from each target cancer type. On the other hand, we assigned every gene and its corresponding DeepLIFT contribution scores as input to GSEA. In order to correctly show statistical significance for multiple hypothesis testing, we show pathways with adjusted p values less than 0.05 for g: Profiler and FDR q-values less than 0.25 for GSEA.
For a number of pathways identified in Table 4, we identified existing literature support for their correlation with cancer. For example, DNA repair pathways are closely linked to cancer as they are responsible for maintaining genetic stability and integrity of DNA and cells (Alhmoud et al. 2020;Huang and Zhou 2021). When DNA repair pathways are disrupted, it can lead to mutations that can cause cancer. Additionally, some cancer cells are able to survive DNA damage caused by chemotherapy treatments due to their ability to repair the damage to their DNA (Kiwerska and Szyfter 2019). Inhibitors of certain DNA repair pathways can be used to further increase the effectiveness of chemotherapy treatments (Li et al. 2021).
The VEGF-VEGFR2 signaling pathway plays a critical role in the development and progression of cancer. Vascular endothelial growth factor (VEGF) stimulates angiogenesis (the formation of new blood vessels), which is necessary for tumors to grow and spread (Shibuya 2011). Vascular endothelial growth factor receptor 2 (VEGFR2) is the primary receptor for VEGF and is overexpressed in many types of cancer, leading to an increase in angiogenesis and tumor growth. Inhibitors of this pathway, such as tyrosine kinase inhibitors, have been developed and used as cancer treatments with varying degrees of success (Garc ıa-Caballero et al. 2022).
Similarly, research suggests that oxidation by cytochrome P450 and ABC transporters are also related to cancer. Specifically, cytochrome P450 is a group of enzymes that are involved in the metabolism of drugs, hormones, and other substances in the body. They are also involved in the metabolism of carcinogens, which are substances that can cause cancer (Elfaki et al. 2018). Moreover, genetic polymorphisms of cytochrome P450 can lead to changes in its activity, which in turn can increase the risk of cancer. ABC transporters can act as a mechanism for multi-drug resistance (MDR). ABC transporters contribute to MDR in cancer by actively pumping chemotherapy drugs out of the cancer cells, reducing their effectiveness (Choi 2005).
TLR4 signaling pathway is involved in the development and progression of cancer and has been linked to inflammation and  tumorigenesis (Chen et al. 2018). Activation of TLR4 can promote the growth and survival of cancer cells by signaling pathways that promote cell proliferation and suppress apoptosis (Huang et al. 2008). TLR4 can also induce angiogenesis, the formation of new blood vessels, which is important for tumor growth and spread. Additionally, TLR4 has been shown to be involved in the activation of immune cells and regulation of the immune response, which can impact the progression of cancer ). TLR4 has therefore been identified as a promising target for cancer immunotherapy (Jiang et al. 2020).

Gene-set co-enrichment
In addition, we used STRING to find the correlation between genes in the 50th, 70th, and 90th percentiles of DeepLIFT contribution scores and gene sets contained within STRING representing genes where protein products interact and form pathways. With 13 960 genes in our transcriptomics dataset, genes in the 50th, 70th, and 90th percentiles represent 6980, 4188, and 1396 genes, respectively, and genes in the different percentiles are independently assigned as input to STRING. To ensure proper analysis of enrichment sets and genes' DeepLIFT scores, we filtered the enrichment sets with at least 30 genes.
After running our pretrained metalearning model from survival analysis on each target cancer type data, we sorted the genes by DeepLIFT scores and set the first gene from each enrichment set as the anchor gene. We then counted how many genes in the same enrichment set appear near the anchor gene in the sorted list. In this process, a standard for how near we look around the anchor gene becomes necessary, which we refer to as the window size. If a gene is within 6 the window size from the anchor gene, we consider the two genes' DeepLIFT scores to be similar. We set the window size to be the number of genes divided by 10, meaning that 698, 418, and 139 are the window sizes for the 50th, 70th, and 90th percentiles of genes, respectively.
On BLCA, with genes in the 50th, 70th, and 90th percentile of DeepLIFT scores, 18.8%, 19.7%, and 19.2% of genes in each enrichment sets had similar DeepLIFT scores, respectively, to the anchor gene in the sorted list. On the other hand, only 9.4%, 5.0%, and 2.4% genes were near the anchor gene in a randomly sorted list. Clearly, the more similar genes' DeepLIFT scores, the more likely they were coenriched in functional sets. This indicates that using a metalearning model trained using Cox-loss on the transcriptomics genes, genes that are coenriched in the same enrichment sets were assigned similar DeepLIFT contribution scores. Since these enrichment sets are categorized by their functional attributions, this result indicates that the model captures functional and semantic relationships among genes without any prior knowledge. Results for each of the three cancer types are shown in Table 5.

Conclusion
Although data integration and transfer of knowledge have been shown to be effective in other machine learning areas, omics data integration in comparison has only recently been explored considerably due to data sparsity, complexity, and extremely large feature space. Toward the goal of advancing omics data integration for survival analysis and pathway enrichment analysis, we used metalearning to show how multi-omics dataset integration improves cancer survival analysis. Whereas transcriptomics dataset is the most widely used, our results show that complementing it with clinical and proteomics datasets enhances performance of survival analysis. Furthermore, we combined metalearning with variable importance analysis methods of DeepLIFT, compared pathway enrichment using g: Profiler and GSEA, and compared enriched molecular pathways using the transcriptomics data alone and using integrated multi-omics data. Finally, we demonstrated how genes with similar, high DeepLIFT attribution scores are more likely to be coenriched in functional enrichment sets, which implies that training a metalearning model can learn functional relationships of genes without prior knowledge.
Financial Support: None declared.
Conflict of interest: None declared.