Abstract

Motivation

Multi-omics data provide a comprehensive view of gene regulation at multiple levels, which is helpful in achieving accurate diagnosis of complex diseases like cancer. However, conventional integration methods rarely utilize prior biological knowledge and lack interpretability.

Results

To integrate various multi-omics data of tissue and liquid biopsies for disease diagnosis and prognosis, we developed a biological pathway informed Transformer, Pathformer. It embeds multi-omics input with a compacted multi-modal vector and a pathway-based sparse neural network. Pathformer also leverages criss-cross attention mechanism to capture the crosstalk between different pathways and modalities. We first benchmarked Pathformer with 18 comparable methods on multiple cancer datasets, where Pathformer outperformed all the other methods, with an average improvement of 6.3%–14.7% in F1 score for cancer survival prediction, 5.1%–12% for cancer stage prediction, and 8.1%–13.6% for cancer drug response prediction. Subsequently, for cancer prognosis prediction based on tissue multi-omics data, we used a case study to demonstrate the biological interpretability of Pathformer by identifying key pathways and their biological crosstalk. Then, for cancer early diagnosis based on liquid biopsy data, we used plasma and platelet datasets to demonstrate Pathformer’s potential of clinical applications in cancer screening. Moreover, we revealed deregulation of interesting pathways (e.g. scavenger receptor pathway) and their crosstalk in cancer patients’ blood, providing potential candidate targets for cancer microenvironment study.

Availability and implementation

Pathformer is implemented and freely available at https://github.com/lulab/Pathformer.

1 Introduction

Comparing to a single type of data, multi-omics data provide a more comprehensive view of gene regulation (Hasin et al. 2017). Therefore, integrating multi-omics data from tissue and liquid biopsies would be helpful in addressing challenges in disease diagnosis (Ning et al. 2023), treatment (Chiu et al. 2019, Sharifi-Noghabi et al. 2019), and prognosis (Hao et al. 2018), such as deregulated network between different types of molecules and data noise caused by patients’ heterogeneity (Tarazona et al. 2021). To integrate multi-omics data of cancer, several supervised methods have been developed, such as mixOmics (Rohart et al. 2017), liNN (Kuru et al. 2022), eiNN (Preuer et al. 2018), liCNN (Islam et al. 2020), eiCNN (Fu et al. 2020), MOGONet (Wang et al. 2021), and MOGAT (Xing et al. 2021). Later, the performance and interpretability of multi-omics data integration were further improved using deep learning models informed by biological pathways. For instance, a pathway-associated sparse deep neural network (PASNet) was utilized to predict the prognosis of glioblastoma multiforme (GBM) patients (Hao et al. 2018). Recently, P-NET, a sparse neural network integrating multiple molecular features based on a multilevel view of biological pathways, was introduced to predict subtype and survival of prostate cancer patients (Elmarakeby et al. 2021). In addition, PathCNN based on a convolutional neural network (CNN) was developed to predict the prognosis of GBM patients using principal component analysis (PCA) to define image-like multi-omics pathways (Oh et al. 2021).

These pathway-informed deep learning methods did not consider the crosstalk between omics and between pathways, although the crosstalk holds biological significance as well as pathway itself (Kim et al. 2007, Li et al. 2008, Prahallad and Bernards 2016, Liu et al. 2021). The crosstalk means a member of one pathway regulates a component of another pathway. The balance and oscillation between different pathways contribute to cancer progression and metastasis. For instance, a positive feedback loop between Wnt pathway and ERK pathway were revealed in cancer(Kim et al. 2007); crosstalk between TGF-β pathway and TNF-α pathway can promote tumor’s invasion and metastasis by affecting its microenvironment (Liu et al. 2021). Meanwhile, the criss-cross attention mechanism of the Transformer would be very useful to capture the crosstalk information (Jumper et al. 2021). However, incorporating multi-omics data and their crosstalk information in a Transformer is very challenging: when processing multi-omics data, the multi-modal features are usually multiplied by tens of thousands of genes, producing an extremely long input that is not acceptable by a common Transformer model (usually <512 words). Meanwhile, certain embedding methods for biological data, such as discretization and linear transformation, were introduced in the previous Transformer models (Osseni et al. 2022, Cui et al. 2023, Theodoris et al. 2023), while biological information was largely lost during these kinds of embedding.

In order to integrate multi-omics data by embedding biological pathway crosstalk without information loss, we introduce a Transformer model, Pathformer, with three key steps to address the above problems. First, it transforms various modalities into distinct gene-level features using a series of statistical methods, such as the maximum value method, and connects these features into a novel compacted multi-modal vector for each gene, which not only preserves valuable information but also shortens the input. Second, Pathformer utilizes a sparse neural network based on prior pathway knowledge to transform gene embeddings into pathway embeddings. Third, Pathformer naturally incorporates pathway crosstalk network into a Transformer model with bias to enhance the exchange of information between different pathways and between different modalities (e.g. omics) as well.

Here, we first benchmarked Pathformer and 18 other integration methods in various classification tasks, using multiple cancer tissue datasets from TCGA. Then, we used Pathformer to integrate various multi-omics data from tissue and liquid biopsies. Through case studies on survival prediction of breast cancer and noninvasive diagnosis of pan-cancer, we revealed interesting pathways, genes, and regulatory mechanisms related to cancer in human tissue and plasma, demonstrating the prediction accuracy and biological interpretability of Pathformer in various clinical applications.

2 Materials and methods

2.1 Overview of Pathformer

Pathformer is mainly designed to integrate various multi-omics data from tissue and liquid biopsies, which can be used for different classification tasks in disease diagnosis and prognosis, such as cancer early detection, cancer staging and survival prediction (Fig. 1a). It has six modules: (i) biological pathway and crosstalk network calculation module, (ii) multi-omics data input module, (iii) biological multi-modal embedding module (key module), (iv) Transformer module with pathway crosstalk network bias, (v) classification module, and (vi) biological interpretability module.

Overview of Pathformer. Schematic of Pathformer (a), which integrates multi-oimcs data of tissue and liquid biopsies for disease diagnosis and prognosis. Pathformer has six modules: (i) biological pathway and crosstalk network calculation module, (ii) multi-omics data input module (b), (iii) biological multi-modal embedding module (c), (iv) transformer module with pathway crosstalk network bias, (v) classification module, and (vi) biological interpretability module. FE, conversion function in the gene embedding; G, gene; P, pathway; W, weight of pathway-based sparse neural network.
Figure 1.

Overview of Pathformer. Schematic of Pathformer (a), which integrates multi-oimcs data of tissue and liquid biopsies for disease diagnosis and prognosis. Pathformer has six modules: (i) biological pathway and crosstalk network calculation module, (ii) multi-omics data input module (b), (iii) biological multi-modal embedding module (c), (iv) transformer module with pathway crosstalk network bias, (v) classification module, and (vi) biological interpretability module. FE, conversion function in the gene embedding; G, gene; P, pathway; W, weight of pathway-based sparse neural network.

Pathformer combines prior biological pathway information (module 1, Fig. 1a) with multi-modal data (module 2, Fig. 1b) for disease diagnosis and prognosis. It introduces a new embedding method to incorporate biological multi-modal data at both gene level and pathway level: it initiates the process by uniformly transforming different modalities to the gene level through a series of statistical indicators, then concatenates these modalities into compacted multi-modal vectors to define gene embedding, and employs a sparse neural network based on the gene-to-pathway mapping to transform gene embedding into pathway embedding (module 3, Fig. 1c). Pathformer then enhances the fusion of information between various modalities and pathways by combining pathway crosstalk networks with Transformer encoder (module 4, Fig. 1a, Supplementary Fig. S1). Finally, a fully connected layer serves as the classifier for different downstream classification tasks (module 5). In addition, Pathformer uses a biological interpretable module with attention weights and SHapley Additive exPlanations (Lundberg and Lee 2017) values to identify important genes, pathways, modalities, and their crosstalk or regulation (module 6). These six modules are described in detail below.

2.2 Module 1: curation of biological pathways and calculation of initial crosstalk network

We curated 2289 biological pathways from four public databases including Kyoto Encyclopedia of Genes and Genomes database (KEGG) (Kanehisa and Goto 2000), Pathway Interaction database (PID) (Schaefer et al. 2009), Reactome database (Reactome) (Croft et al. 2010), and BioCarta Pathways database (BioCarta) (Nishimura 2001). Then, we filtered these pathways by three criteria: the gene number, the overlap ratio with other pathways (the proportion of genes in the pathway that are also present in other pathways), and the number of pathway subsets (the number of sub-pathways included in the pathway). Following the principle of moderate size and minimal overlap with other pathway information, we selected 1497 pathways with gene number between 15 and 100, or gene number >15 and overlap ratio <1, or gene number >15 and the number of pathway subsets <5. Next, we used BinoX (Ogris et al. 2017) to calculate the crosstalk relationship of pathways and build a pathway crosstalk network with adjacency matrix PRNp×Np, Np = 1497 (more details in Supplementary Note S1).

2.3 Modules 2 and 3: multi-omics data input and multi-modal embedding

Biological multi-modal data preprocessing and embedding method are two key modules of Pathformer (Fig. 1b and c). In module 2 (Fig. 1b), to capture more comprehensive regulatory information, we expanded biological multi-omics data into multi-modal data, including not only data from different omics sources but also variant features of the same omics, such as RNA splicing, RNA editing, RNA alternative promoter, and so on. To obtain multi-modal data, we used standardized bioinformatics pipeline to calculate different omics or variant features of the same omics from raw sequence reads (more details in Supplementary Note S2). These multi-modal data have different dimensions, including nucleotide level, fragment level, and gene level. For example, Pathformer’s input for cancer tissue datasets from Cancer Genome Atlas (TCGA) (Cancer Genome Atlas Research Network 2013) includes gene-level RNA expression, fragment-level DNA methylation, and both fragment-level and gene-level DNA CNV. Modalities and their dimension levels for different datasets are described in Supplementary Table S1.

In module 3 (Fig. 1c), we proposed a new biological multi-modal embedding method of Pathformer, which consists of gene embedding EG and pathway embedding EP. We represented biological multi-modal input matrix of a sample as M, described as follows:
(1)
where m is the number of modalities, and NMi is the length of input for modality i, like the number of genes for RNA expression, the number of editing sites for RNA editing, and the number of CpG islands for DNA methylation.
Next, we first used a series of statistical indicators to convert different modalities into gene level modal features, and then concatenated these modal features into a compressed multi-modal vector as gene embedding EGRNg×Dg, which are calculated as follows:
(2)
(3)
(4)
where Gl=[V1l,,Vil,,Vml]RDg is gene embedding of the lth gene and is a compacted multi-modal vector; Vi is the modal feature matrix of modality i; Vij is the jth dimension of modal feature matrix for modality i; Ng is the number of genes, Dg=e1+e2++em is the dimension of gene embedding; ei is the dimension of modal feature matrix for modality i; FE is the conversion function, which uses statistical indicators to uniformly convert different modalities into gene level; FEi is the conversion function of modality i, and each modality’s function is constructed from distinct statistical indicator functions f* (more details in Supplementary Table S1). These statistical indicator functions include gene level score (f1), count (f2), minimum (f3), maximum (f4), mean (f5), entropy (f6), weighted mean in whole gene (f7), and weighted mean in window (f8), formulas of which are in Supplementary Note S3.
Subsequently, we used the known gene-pathway mapping relationship to develop a sparse neural network based on prior pathway knowledge (PSNN) to transform gene embedding EG into pathway embedding EP, as described below:
(5)
where Np is the number of pathways, Dp=Dg is the dimension of pathway embedding, WsparseRNg×Np is a learnable sparse weight matrix, and B is a bias term. Wsparse is constructed based on the known relationship between pathways and genes. When the given gene and the pathway are irrelevant, the corresponding element of Wsparse will always be 0. Otherwise, it needs to be learned through training. Therefore, pathway embedding is a dynamic embedding method. The PSNN cannot only restore the mapping relationship between genes and pathways, but also capture the different roles of different genes in pathways, and can preserve the complementarity of different modalities. Additionally, this biological multi-modal embedding step does not require additional gene selection, thereby avoiding bias and overfitting problems resulting from artificial feature selection.

2.4 Module 4: transformer module with pathway crosstalk network bias

We developed the Transformer module based on criss-cross attention (CC-attention) with bias for data fusion of pathways, modalities, and their crosstalk (Supplementary Fig. S1). This module has 3 blocks, each containing multi-head column-wise self-attention (col-attention), multi-head row-wise self-attention (row-attention), layer normalization, GELU activation, residual connection, and network update. Particularly, col-attention is used to enhance the exchange of information between pathways, with the pathway crosstalk network matrix serving as the bias for col-attention to guide the flow of information. Row-attention is employed to facilitate information exchange between different modalities, and the updated pathway embedding matrix is used to update the pathway crosstalk network matrix by calculating the correlation between pathways.

Multi-head column-wise self-attention contains 8 heads, and each head is a mapping of Q1,K1,V1,P, which are query vector, key vector, and value vector of pathway embedding EP and pathway crosstalk network matrix P, respectively. First, we represented the hth column-wise self-attention by Acol(h), calculated as follows:
(6)
(7)
(8)
where h=1,2,,H is the hth head; H is the number of heads; WQ1(h)RDp×d, WK1(h)RDp×d, WV1(h)RDp×d are the weight matrices as parameters; d is the attention dimension; dropout0.2 is a dropout neural network layer with a probability of 0.2; and softmax is the normalized exponential function.
Next, we merged multi-head column-wise self-attention and performed a series of operations as follows:
(9)
(10)
(11)
where h=1,2,,H is the hth head; H is the number of heads; is the matrix dot product operator; Wg1(h)RDp×d, WU1(h)Rd×Dp, WO11RDp×o, WO12Ro×Dp are the weight matrices as parameters; o is a constant; LN is the layer normalization function; GELU is the distortion of RELU activation function; and dropout0.2is a dropout neural network layer with a probability of 0.2.
Multi-head row-wise self-attention enables information exchange between different modalities. It is a regular dot-product attention. It also contains eight heads, and the hth row-wise self-attention, i.e. Arow(h), is calculated as follows:
(12)
(13)
(14)
where h=1,2,, h is the hth head; H is the number of heads; WQ2(h)RNp×d, WK2(h)RNp×d, WV2(h)RNp×d are the weight matrices as parameters; d is the attention dimension; dropout0.2 is a dropout neural network layer with a probability of 0.2; and softmax is the normalized exponential function.
Subsequently, we merged multi-head row-wise self-attention and performed a series of operations. The formulas are as follows:
(15)
(16)
(17)
where h=1,2,, h is the hth head; H is the number of heads; is the matrix dot product operator; Wg2(h)RNp×d, WU2(h)Rd×Np, WO21RNp×o, WO22Ro×Np are the weight matrices as parameters; o is a constant; β is a constant coefficient for row-attention; LayerNorm is the layer normalization function; GELU is the distortion of RELU activation function; and dropout0.2 is a dropout neural network layer with a probability of 0.2. O2 is pathway embedding input of the next Transformer block. In other words, when EP is EP(0), O2 is EP(1). Superscripts with parenthesis represent data at different block.
Then, we used the updated pathway embedding O2 to update the pathway crosstalk network. We exploited the correlation between embedding vectors of two pathways to update the corresponding element of the pathway crosstalk network matrix. The formula is as follows:
(18)
where P is the updated pathway crosstalk network matrix of the next Transformer block. In other words, when P is P(1), P is P0. Superscripts with parenthesis represent data at different block.

2.5 Module 5: classification module

Given the classification tasks in disease diagnosis and prognosis, we used the fully connected neural network as the classification module to transform pathway embedding encoded by the Transformer module into the probability for each label. Three fully connected neural networks each have 300, 200, and 100 neurons, with dropout probability dropoutc, which is a hyperparameter. More details are described in Supplementary Note S4.

2.6 Module 6: biological interpretability module

The biological interpretable module enables us to calculate the contribution of each modality, identify important pathways and their key genes, and uncover the most critical pathway crosstalk subnetworks.

To calculate the contribution of each omics and each modality, we first integrated all matrices of row-attention maps into one matrix by element-wise averaging. Then, we averaged this average row-attention matrix along with columns as the attention weights of modalities. More details are described in Supplementary Note S5.

To identify important pathways and their key genes, we used SHapley Additive exPlanations (Lundberg and Lee 2017) (SHAP value) to calculate the contribution of each feature. It is an additive explanation model inspired by coalitional game theory, which regards all features as “contributors.” SHAP value is the value assigned to each feature, which explains the relationship between modalities, pathways, genes and classification, implemented by “SHAP” package of Python v3.6.9. Then, pathways with the top 15 SHAP values in the classification task are considered as important pathways. For each pathway, genes with top five SHAP values are considered as the key genes of the pathway. The modality of a gene with the rank of SHAP value higher than other modalities is considered the core modality of the gene. More details are described in Supplementary Note S5.

Particularly, the pathway crosstalk network matrix is used to guide the direction of information flow, and updated according to updated pathway embedding in each Transformer block. Therefore, the updated pathway crosstalk network contains not only the prior information in the initial network (module 1) but also the multi-modal data information derived from the Transformer module (module 4), which represents the specific regulatory mechanism in each classification task. We defined the sub-network score through SHAP value of each pathway in the sub-network, so as to find foremost sub-network for prediction, i.e. hub module of the updated pathway crosstalk network. The calculation of the sub-network score can be divided into four steps: average pathway crosstalk network matrix calculation, network pruning, sub-network boundary determination, and score calculation. More details of sub-network score calculations are described in Supplementary Note S5.

2.7 Experimental settings

2.7.1 Data collection and preprocessing

We assayed both tissue biopsy and liquid biopsy data in this study. First, for benchmark testing on cancer diagnosis and prognosis, we collected multiple datasets of different cancer types from TCGA (tissue data) to evaluate the classification performance, including 10 datasets for early- and late-stage classification, 10 datasets for low- and high-risk survival classification, and 5 datasets for drug responses prediction (Supplementary Fig. S2). In addition, we also collected and processed two types of body fluid datasets: the plasma dataset [373 samples assayed by total cell-free RNA-seq (Chen et al. 2022, Tao et al. 2023)] and the platelet dataset [918 samples assayed by blood platelet RNA-seq (Best et al. 2015, Best et al. 2017)]. Through our biological information pipeline, 3 and 7 biological modalities were derived from the TCGA (tissue biopsy) datasets and the liquid biopsy datasets, respectively. More details are described in Supplementary Notes S2.

2.7.2 Model training and test

We implemented Pathformer’s network architecture using the “PyTorch” package in Python v3.6.9, and our codes can be found in the GitHub repository (https://github.com/lulab/Pathformer). For model training and test, we used 5-fold cross-validation, and repeated it twice by shuffling. Before evaluating the performance on test sets, we optimized hyperparameters (e.g. learning rate, dropout probability of classification and constant coefficient for row-attention) and epoch numbers inside the training set only. More details of model training and test are described in Supplementary Note S6.

2.7.3 Evaluation criteria

When evaluating the classification performance, we used at least three evaluation indicators, area under the receiver operating characteristic curve (AUC), weighted-averaged F1 score (F1score_weighted), and macro-averaged F1 score (F1score_macro). Notably, we prioritized F1score_macro as the main evaluation criterion in this paper. This choice stems from the imbalance of sub-classes in our data, where F1score_macro stands out as a fairer and more robust indicator compared to other metrics such as AUC.

3 Results

3.1 Benchmark of Pathformer and 18 multi-omics data integration methods using TCGA data

We conducted a meticulous benchmark of Pathformer and 18 other multi-omics integration methods for various classification tasks in cancer diagnosis, treatment, and prognosis (Fig. 2). These methods can be categorized into three types. Type I includes early and late integration methods based on conventional classifiers, such as support vector machine (SVM), logistic regression (LR), random forest (RF), and extreme gradient boosting (XGBoost). Type II includes partial least squares-discriminant analysis (PLSDA) and sparse partial least squares-discriminant analysis (sPLSDA) of mixOmics (Rohart et al. 2017). Type III consists of deep learning-based integration methods, i.e. eiNN (Preuer et al. 2018), liNN (Kuru et al. 2022), eiCNN (Fu et al. 2020), liCNN (Islam et al. 2020), MOGONet (Wang et al. 2021), MOGAT (Xing et al. 2021), P-NET (Elmarakeby et al. 2021) and PathCNN (Oh et al. 2021). Among these, eiNN and eiCNN are early integration methods based on NN and CNN; liNN and liCNN are late integration methods based on fully connected neural network (NN) and convolutional neural network (CNN); MOGONet and MOGAT are multi-modal integration methods based on graph neural network; P-NET and PathCNN are representative multi-modal integration methods that combines pathway information. More details of comparison methods are in Supplementary Note S7.

Performance comparison among multiple multi-omics data integration methods. Average macro-averaged F1 score (a), its percentage gap from Pathformer (b), and its standard deviation (c) are shown for each method on the TCGA datasets (all cancer types) for cancer low- and high-risk survival classification, early- and late-stage classification, and clinical drug response prediction, respectively. Error bars are from 5-fold cross-validation repeated twice (10 values) of all datasets.
Figure 2.

Performance comparison among multiple multi-omics data integration methods. Average macro-averaged F1 score (a), its percentage gap from Pathformer (b), and its standard deviation (c) are shown for each method on the TCGA datasets (all cancer types) for cancer low- and high-risk survival classification, early- and late-stage classification, and clinical drug response prediction, respectively. Error bars are from 5-fold cross-validation repeated twice (10 values) of all datasets.

To evaluate the performance, we tested the methods on multiple TCGA datasets for three tasks: cancer survival prediction, cancer staging, and drug response prediction. DNA methylation, DNA copy number variation (CNV), and RNA expression were used as input. Optimal hyperparameter combination for each dataset are listed in Supplementary Table S2. Considering the imbalanced numbers of sub-classes in the TCGA data, we utilized the macro-averaged F1 score as the primary evaluation metric for hyperparameter optimization and performance evaluation (Fig. 2 and Tables 1). Other evaluation indicators (e.g. AUC) are listed in Supplementary Table S3.

Table 1.

Performance comparison among multiple multi-omics data integration methods for TCGA datasets.

MethodsType I
Type II
eiLReiRFeiSVMeiXGBoostliLRliRFliSVMliXGBoostPLSDAsPLSDA
Survival classificationaBRCAd0.5710.4940.4670.5640.5690.4730.4680.5230.5610.549
KIRC0.5840.6360.6560.6530.6190.6220.6140.5850.6600.617
LUAD0.5300.4430.4870.5300.4790.4530.4380.4810.4440.450
LUSC0.5200.4770.4710.5190.4810.4520.5080.4910.4960.495
HNSC0.5290.4970.4660.4760.5120.4780.4700.4830.5270.521
BLCA0.4790.4750.4590.5490.4560.4380.4750.5040.5060.465
LIHC0.4970.5790.4360.5500.4870.4280.4390.5000.5140.442
SKCM0.5570.6090.6300.5940.5540.6090.5610.5320.6350.593
LGG0.6890.7590.6400.7230.7180.7240.7140.7150.7780.726
Pan-cancere0.6740.7090.7050.6940.6960.7120.7100.6970.6690.658
Stage classificationbBRCA0.5100.4880.4510.5180.5220.4540.4360.4750.4560.450
KIRC0.6470.7230.6610.6860.6540.7040.6750.6700.7100.717
LUAD0.5060.5030.4600.5260.4730.4970.4600.4710.4620.503
LUSC0.5010.5090.4670.4980.5060.4690.4700.4830.4870.465
STAD0.5410.5400.5280.5480.5810.5430.5520.4930.5650.540
BLCA0.6120.6470.5550.6390.6240.6110.5640.5420.6090.568
LIHC0.5670.5280.4640.5400.4960.5070.4380.5260.5210.552
SKCM0.5790.5710.5510.5350.5480.5700.5570.5160.5620.514
THCA0.6130.6600.5500.6640.6260.6220.6010.6010.6270.596
Pan-cancerf0.6310.6220.5570.6290.6200.6040.5670.6080.5720.559
Drug response PredictioncCarboplating0.5590.5640.5410.5870.5410.5250.5130.5430.5530.556
Cisplatin0.5770.5640.5070.5640.5250.4820.4440.5430.5680.566
Fluorouracil0.4880.5060.4660.5240.5280.4590.4840.4600.5090.444
Gemcitabine0.5530.5520.5560.5400.5330.5460.5320.5420.6060.617
Paclitaxel0.5740.5290.5490.5910.6020.5330.4960.5010.5910.551
MethodsType I
Type II
eiLReiRFeiSVMeiXGBoostliLRliRFliSVMliXGBoostPLSDAsPLSDA
Survival classificationaBRCAd0.5710.4940.4670.5640.5690.4730.4680.5230.5610.549
KIRC0.5840.6360.6560.6530.6190.6220.6140.5850.6600.617
LUAD0.5300.4430.4870.5300.4790.4530.4380.4810.4440.450
LUSC0.5200.4770.4710.5190.4810.4520.5080.4910.4960.495
HNSC0.5290.4970.4660.4760.5120.4780.4700.4830.5270.521
BLCA0.4790.4750.4590.5490.4560.4380.4750.5040.5060.465
LIHC0.4970.5790.4360.5500.4870.4280.4390.5000.5140.442
SKCM0.5570.6090.6300.5940.5540.6090.5610.5320.6350.593
LGG0.6890.7590.6400.7230.7180.7240.7140.7150.7780.726
Pan-cancere0.6740.7090.7050.6940.6960.7120.7100.6970.6690.658
Stage classificationbBRCA0.5100.4880.4510.5180.5220.4540.4360.4750.4560.450
KIRC0.6470.7230.6610.6860.6540.7040.6750.6700.7100.717
LUAD0.5060.5030.4600.5260.4730.4970.4600.4710.4620.503
LUSC0.5010.5090.4670.4980.5060.4690.4700.4830.4870.465
STAD0.5410.5400.5280.5480.5810.5430.5520.4930.5650.540
BLCA0.6120.6470.5550.6390.6240.6110.5640.5420.6090.568
LIHC0.5670.5280.4640.5400.4960.5070.4380.5260.5210.552
SKCM0.5790.5710.5510.5350.5480.5700.5570.5160.5620.514
THCA0.6130.6600.5500.6640.6260.6220.6010.6010.6270.596
Pan-cancerf0.6310.6220.5570.6290.6200.6040.5670.6080.5720.559
Drug response PredictioncCarboplating0.5590.5640.5410.5870.5410.5250.5130.5430.5530.556
Cisplatin0.5770.5640.5070.5640.5250.4820.4440.5430.5680.566
Fluorouracil0.4880.5060.4660.5240.5280.4590.4840.4600.5090.444
Gemcitabine0.5530.5520.5560.5400.5330.5460.5320.5420.6060.617
Paclitaxel0.5740.5290.5490.5910.6020.5330.4960.5010.5910.551
MethodsType III
Pathformer
eiNNeiCNNliNNliCNNMOGONetMOGATPathCNNP-NETPathformer
Survival classificationaBRCAd0.5730.5100.5760.5360.5100.4660.5580.6400.673
KIRC0.6400.5480.6310.5960.6320.6370.6430.6610.688
LUAD0.5030.4620.5210.5040.4550.4380.5220.5510.633
LUSC0.4830.5290.4800.5010.4970.4340.5600.5090.615
HNSC0.5370.5120.5230.4870.4690.4620.5250.5590.606
BLCA0.4970.4740.5180.4600.4700.4430.4860.4700.601
LIHC0.5930.5780.5510.4650.4760.4480.5980.5780.651
SKCM0.5830.5820.5900.4890.5990.6050.5510.6410.694
LGG0.6950.6310.7020.7060.7060.5950.7660.7150.787
Pan-cancere0.6870.6880.7130.6970.6760.6890.7120.7330.735
Stage classificationbBRCA0.5450.5220.5350.5180.4660.4630.5250.5220.573
KIRC0.6690.6470.6810.6430.6460.6370.6940.6240.726
LUAD0.5490.5630.5310.5590.4610.4930.5540.5430.629
LUSC0.5150.5010.5280.4730.4760.4590.5340.5260.564
STAD0.5550.5050.5180.4820.5740.5620.5210.5370.596
BLCA0.6360.6340.6160.5650.5530.5540.5660.6600.706
LIHC0.5690.5840.5650.5860.4740.4610.5630.6120.616
SKCM0.6140.5530.5820.5570.5260.5280.540.5710.646
THCA0.5280.4690.5470.4680.6440.5740.5950.6320.690
Pan-cancerf0.6170.6060.6300.6180.6240.4730.6430.6640.657
Drug response PredictioncCarboplating0.5890.5880.5560.5650.5040.5040.6020.5810.680
Cisplatin0.5550.5930.6080.5740.4690.5250.5750.5310.652
Fluorouracil0.4820.5120.4950.4920.5000.4420.4860.4890.602
Gemcitabine0.5880.5650.5640.5280.5850.5580.5490.5060.721
Paclitaxel0.5830.5750.5830.4870.5040.5270.5640.5390.660
MethodsType III
Pathformer
eiNNeiCNNliNNliCNNMOGONetMOGATPathCNNP-NETPathformer
Survival classificationaBRCAd0.5730.5100.5760.5360.5100.4660.5580.6400.673
KIRC0.6400.5480.6310.5960.6320.6370.6430.6610.688
LUAD0.5030.4620.5210.5040.4550.4380.5220.5510.633
LUSC0.4830.5290.4800.5010.4970.4340.5600.5090.615
HNSC0.5370.5120.5230.4870.4690.4620.5250.5590.606
BLCA0.4970.4740.5180.4600.4700.4430.4860.4700.601
LIHC0.5930.5780.5510.4650.4760.4480.5980.5780.651
SKCM0.5830.5820.5900.4890.5990.6050.5510.6410.694
LGG0.6950.6310.7020.7060.7060.5950.7660.7150.787
Pan-cancere0.6870.6880.7130.6970.6760.6890.7120.7330.735
Stage classificationbBRCA0.5450.5220.5350.5180.4660.4630.5250.5220.573
KIRC0.6690.6470.6810.6430.6460.6370.6940.6240.726
LUAD0.5490.5630.5310.5590.4610.4930.5540.5430.629
LUSC0.5150.5010.5280.4730.4760.4590.5340.5260.564
STAD0.5550.5050.5180.4820.5740.5620.5210.5370.596
BLCA0.6360.6340.6160.5650.5530.5540.5660.6600.706
LIHC0.5690.5840.5650.5860.4740.4610.5630.6120.616
SKCM0.6140.5530.5820.5570.5260.5280.540.5710.646
THCA0.5280.4690.5470.4680.6440.5740.5950.6320.690
Pan-cancerf0.6170.6060.6300.6180.6240.4730.6430.6640.657
Drug response PredictioncCarboplating0.5890.5880.5560.5650.5040.5040.6020.5810.680
Cisplatin0.5550.5930.6080.5740.4690.5250.5750.5310.652
Fluorouracil0.4820.5120.4950.4920.5000.4420.4860.4890.602
Gemcitabine0.5880.5650.5640.5280.5850.5580.5490.5060.721
Paclitaxel0.5830.5750.5830.4870.5040.5270.5640.5390.660
a

Ten TCGA datasets of survival classification are tested. Average macro-averaged F1 scores are listed for the two unbalanced classes, high- and low-risk survival cancer patients. Each value is the mean of 5-fold cross-validation repeated twice (10 values).

b

Ten TCGA datasets of stage classification are tested. Average macro-averaged F1 scores are listed for the two unbalanced classes, early- (stage I and II) and late-stage (stage III and IV) cancer patients. Each value is the mean of 5-fold cross-validation repeated twice (10 values).

c

Five drug response datasets from TCGA are tested. Average macro-averaged F1 scores are listed for the two unbalanced classes, responder (including complete response and partial response) and nonresponder (including stable disease and progressive disease) from cancer patients. Each value is the mean of 5-fold cross-validation repeated twice (10 values).

d

Abbreviation of cancer type according to the TCGA terms.

e

Pan-cancer dataset of survival classification contains 33 cancer types of TCGA terms.

f

Pan-cancer dataset of stage classification contains 21 cancer types of TCGA terms.

g

Abbreviation of drug type.

The bold values are within the top two in each dataset.

Table 1.

Performance comparison among multiple multi-omics data integration methods for TCGA datasets.

MethodsType I
Type II
eiLReiRFeiSVMeiXGBoostliLRliRFliSVMliXGBoostPLSDAsPLSDA
Survival classificationaBRCAd0.5710.4940.4670.5640.5690.4730.4680.5230.5610.549
KIRC0.5840.6360.6560.6530.6190.6220.6140.5850.6600.617
LUAD0.5300.4430.4870.5300.4790.4530.4380.4810.4440.450
LUSC0.5200.4770.4710.5190.4810.4520.5080.4910.4960.495
HNSC0.5290.4970.4660.4760.5120.4780.4700.4830.5270.521
BLCA0.4790.4750.4590.5490.4560.4380.4750.5040.5060.465
LIHC0.4970.5790.4360.5500.4870.4280.4390.5000.5140.442
SKCM0.5570.6090.6300.5940.5540.6090.5610.5320.6350.593
LGG0.6890.7590.6400.7230.7180.7240.7140.7150.7780.726
Pan-cancere0.6740.7090.7050.6940.6960.7120.7100.6970.6690.658
Stage classificationbBRCA0.5100.4880.4510.5180.5220.4540.4360.4750.4560.450
KIRC0.6470.7230.6610.6860.6540.7040.6750.6700.7100.717
LUAD0.5060.5030.4600.5260.4730.4970.4600.4710.4620.503
LUSC0.5010.5090.4670.4980.5060.4690.4700.4830.4870.465
STAD0.5410.5400.5280.5480.5810.5430.5520.4930.5650.540
BLCA0.6120.6470.5550.6390.6240.6110.5640.5420.6090.568
LIHC0.5670.5280.4640.5400.4960.5070.4380.5260.5210.552
SKCM0.5790.5710.5510.5350.5480.5700.5570.5160.5620.514
THCA0.6130.6600.5500.6640.6260.6220.6010.6010.6270.596
Pan-cancerf0.6310.6220.5570.6290.6200.6040.5670.6080.5720.559
Drug response PredictioncCarboplating0.5590.5640.5410.5870.5410.5250.5130.5430.5530.556
Cisplatin0.5770.5640.5070.5640.5250.4820.4440.5430.5680.566
Fluorouracil0.4880.5060.4660.5240.5280.4590.4840.4600.5090.444
Gemcitabine0.5530.5520.5560.5400.5330.5460.5320.5420.6060.617
Paclitaxel0.5740.5290.5490.5910.6020.5330.4960.5010.5910.551
MethodsType I
Type II
eiLReiRFeiSVMeiXGBoostliLRliRFliSVMliXGBoostPLSDAsPLSDA
Survival classificationaBRCAd0.5710.4940.4670.5640.5690.4730.4680.5230.5610.549
KIRC0.5840.6360.6560.6530.6190.6220.6140.5850.6600.617
LUAD0.5300.4430.4870.5300.4790.4530.4380.4810.4440.450
LUSC0.5200.4770.4710.5190.4810.4520.5080.4910.4960.495
HNSC0.5290.4970.4660.4760.5120.4780.4700.4830.5270.521
BLCA0.4790.4750.4590.5490.4560.4380.4750.5040.5060.465
LIHC0.4970.5790.4360.5500.4870.4280.4390.5000.5140.442
SKCM0.5570.6090.6300.5940.5540.6090.5610.5320.6350.593
LGG0.6890.7590.6400.7230.7180.7240.7140.7150.7780.726
Pan-cancere0.6740.7090.7050.6940.6960.7120.7100.6970.6690.658
Stage classificationbBRCA0.5100.4880.4510.5180.5220.4540.4360.4750.4560.450
KIRC0.6470.7230.6610.6860.6540.7040.6750.6700.7100.717
LUAD0.5060.5030.4600.5260.4730.4970.4600.4710.4620.503
LUSC0.5010.5090.4670.4980.5060.4690.4700.4830.4870.465
STAD0.5410.5400.5280.5480.5810.5430.5520.4930.5650.540
BLCA0.6120.6470.5550.6390.6240.6110.5640.5420.6090.568
LIHC0.5670.5280.4640.5400.4960.5070.4380.5260.5210.552
SKCM0.5790.5710.5510.5350.5480.5700.5570.5160.5620.514
THCA0.6130.6600.5500.6640.6260.6220.6010.6010.6270.596
Pan-cancerf0.6310.6220.5570.6290.6200.6040.5670.6080.5720.559
Drug response PredictioncCarboplating0.5590.5640.5410.5870.5410.5250.5130.5430.5530.556
Cisplatin0.5770.5640.5070.5640.5250.4820.4440.5430.5680.566
Fluorouracil0.4880.5060.4660.5240.5280.4590.4840.4600.5090.444
Gemcitabine0.5530.5520.5560.5400.5330.5460.5320.5420.6060.617
Paclitaxel0.5740.5290.5490.5910.6020.5330.4960.5010.5910.551
MethodsType III
Pathformer
eiNNeiCNNliNNliCNNMOGONetMOGATPathCNNP-NETPathformer
Survival classificationaBRCAd0.5730.5100.5760.5360.5100.4660.5580.6400.673
KIRC0.6400.5480.6310.5960.6320.6370.6430.6610.688
LUAD0.5030.4620.5210.5040.4550.4380.5220.5510.633
LUSC0.4830.5290.4800.5010.4970.4340.5600.5090.615
HNSC0.5370.5120.5230.4870.4690.4620.5250.5590.606
BLCA0.4970.4740.5180.4600.4700.4430.4860.4700.601
LIHC0.5930.5780.5510.4650.4760.4480.5980.5780.651
SKCM0.5830.5820.5900.4890.5990.6050.5510.6410.694
LGG0.6950.6310.7020.7060.7060.5950.7660.7150.787
Pan-cancere0.6870.6880.7130.6970.6760.6890.7120.7330.735
Stage classificationbBRCA0.5450.5220.5350.5180.4660.4630.5250.5220.573
KIRC0.6690.6470.6810.6430.6460.6370.6940.6240.726
LUAD0.5490.5630.5310.5590.4610.4930.5540.5430.629
LUSC0.5150.5010.5280.4730.4760.4590.5340.5260.564
STAD0.5550.5050.5180.4820.5740.5620.5210.5370.596
BLCA0.6360.6340.6160.5650.5530.5540.5660.6600.706
LIHC0.5690.5840.5650.5860.4740.4610.5630.6120.616
SKCM0.6140.5530.5820.5570.5260.5280.540.5710.646
THCA0.5280.4690.5470.4680.6440.5740.5950.6320.690
Pan-cancerf0.6170.6060.6300.6180.6240.4730.6430.6640.657
Drug response PredictioncCarboplating0.5890.5880.5560.5650.5040.5040.6020.5810.680
Cisplatin0.5550.5930.6080.5740.4690.5250.5750.5310.652
Fluorouracil0.4820.5120.4950.4920.5000.4420.4860.4890.602
Gemcitabine0.5880.5650.5640.5280.5850.5580.5490.5060.721
Paclitaxel0.5830.5750.5830.4870.5040.5270.5640.5390.660
MethodsType III
Pathformer
eiNNeiCNNliNNliCNNMOGONetMOGATPathCNNP-NETPathformer
Survival classificationaBRCAd0.5730.5100.5760.5360.5100.4660.5580.6400.673
KIRC0.6400.5480.6310.5960.6320.6370.6430.6610.688
LUAD0.5030.4620.5210.5040.4550.4380.5220.5510.633
LUSC0.4830.5290.4800.5010.4970.4340.5600.5090.615
HNSC0.5370.5120.5230.4870.4690.4620.5250.5590.606
BLCA0.4970.4740.5180.4600.4700.4430.4860.4700.601
LIHC0.5930.5780.5510.4650.4760.4480.5980.5780.651
SKCM0.5830.5820.5900.4890.5990.6050.5510.6410.694
LGG0.6950.6310.7020.7060.7060.5950.7660.7150.787
Pan-cancere0.6870.6880.7130.6970.6760.6890.7120.7330.735
Stage classificationbBRCA0.5450.5220.5350.5180.4660.4630.5250.5220.573
KIRC0.6690.6470.6810.6430.6460.6370.6940.6240.726
LUAD0.5490.5630.5310.5590.4610.4930.5540.5430.629
LUSC0.5150.5010.5280.4730.4760.4590.5340.5260.564
STAD0.5550.5050.5180.4820.5740.5620.5210.5370.596
BLCA0.6360.6340.6160.5650.5530.5540.5660.6600.706
LIHC0.5690.5840.5650.5860.4740.4610.5630.6120.616
SKCM0.6140.5530.5820.5570.5260.5280.540.5710.646
THCA0.5280.4690.5470.4680.6440.5740.5950.6320.690
Pan-cancerf0.6170.6060.6300.6180.6240.4730.6430.6640.657
Drug response PredictioncCarboplating0.5890.5880.5560.5650.5040.5040.6020.5810.680
Cisplatin0.5550.5930.6080.5740.4690.5250.5750.5310.652
Fluorouracil0.4820.5120.4950.4920.5000.4420.4860.4890.602
Gemcitabine0.5880.5650.5640.5280.5850.5580.5490.5060.721
Paclitaxel0.5830.5750.5830.4870.5040.5270.5640.5390.660
a

Ten TCGA datasets of survival classification are tested. Average macro-averaged F1 scores are listed for the two unbalanced classes, high- and low-risk survival cancer patients. Each value is the mean of 5-fold cross-validation repeated twice (10 values).

b

Ten TCGA datasets of stage classification are tested. Average macro-averaged F1 scores are listed for the two unbalanced classes, early- (stage I and II) and late-stage (stage III and IV) cancer patients. Each value is the mean of 5-fold cross-validation repeated twice (10 values).

c

Five drug response datasets from TCGA are tested. Average macro-averaged F1 scores are listed for the two unbalanced classes, responder (including complete response and partial response) and nonresponder (including stable disease and progressive disease) from cancer patients. Each value is the mean of 5-fold cross-validation repeated twice (10 values).

d

Abbreviation of cancer type according to the TCGA terms.

e

Pan-cancer dataset of survival classification contains 33 cancer types of TCGA terms.

f

Pan-cancer dataset of stage classification contains 21 cancer types of TCGA terms.

g

Abbreviation of drug type.

The bold values are within the top two in each dataset.

In general, Pathformer significantly performed better than 18 other integration methods in terms of F1score_macro score (Fig. 2a and b) and cross-validation variances (Fig. 2c). In cancer low- and high-risk survival classification tasks, comparing to the other eight deep learning methods (Type III), Pathformer’s F1score_marco showed average improvements between 6.3% and 14.6%. When comparing to eiXGBoost, which performed best in the conventional machine learning methods (Types I and II), Pathformer’s F1score_marco showed an average improvement of 8.3% (Fig. 2b). In early- and late-stage classification tasks, comparing to the deep learning methods (Type III), Pathformer’s F1score_marco showed average improvements between 5.1% and 12%. Compared to eiXGBoost, Pathformer’s F1score_marco showed an average improvement of 6.2% (Fig. 2b). In drug response prediction tasks, comparing to the deep learning methods (Type III), Pathformer’s F1score_marco showed average improvements between 8% and 13.6%. When comparing to eiXGBoost, Pathformer’s F1score_marco showed an average improvement of 8.6% (Fig. 2b). Moreover, Pathformer demonstrated reduced variance (Fig. 2c) and a stronger correlation between predictive confidence scores and fraction of positives (Supplementary Fig. S7) in cross-validation, indicating greater stability and reliability.

The detailed performance comparisons of Pathformer and other integration methods for different cancer types are shown in Tables 1 and Supplementary Table S3. In survival classifications, Pathformer achieved the highest F1score_macro and F1score_weighted in all the 10 datasets, and the highest AUC in 7 of 10 datasets. In stage classifications, Pathformer achieved the highest F1score_macro in 9 of 10 datasets, the highest F1score_weighted in 8 of 10 datasets, and the highest AUC in 6 of 10 datasets. In drug response prediction, Pathformer achieved the highest F1score_macro, F1score_weighted and AUC in all datasets.

3.2 Ablation analysis of Pathformer

We used ablation analysis to evaluate the contributions of different input modalities and calculation modules in the Pathformer model, based on nine datasets for cancer survival prediction (Fig. 3), nine datasets for cancer stage classification (Supplementary Fig. S8), and five datasets for drug response prediction (Supplementary Fig. S9). The pan-cancer dataset in cancer survival and stage classification was not used here. Firstly, to evaluate the contribution of integrating different modalities of data to classification, we compared seven models, including Pathformer with three modalities as input (RNA expression + DNA methylation + DNA CNV), Pathformer with two modalities as input (RNA expression + DNA methylation, RNA expression + DNA CNV, and DNA methylation + DNA CNV), and Pathformer with a single modality as input (RNA expression-only, DNA methylation-only, and DNA CNV-only). By comparing the performances of these models on cancer survival risk classification, we discovered that the model with all three modalities as input achieved the best performance, followed by the model with RNA expression and DNA CNV, and the model with DNA methylation-only (Fig. 3a). Furthermore, we observed that the performances of models with single modality as input can vary greatly between datasets. For example, DNA methylation-only model performed better than RNA expression-only and DNA CNV-only model in the LUSC, LIHC, and LGG datasets, but the opposite results were observed in the LUAD and BLCA datasets. Ablation analysis of different modalities on cancer stage classification (Supplementary Fig. S8a) and drug response prediction (Supplementary Fig. S9a) showed similar results. These findings underscore the distinct behaviors of different modalities in different cancer types, highlighting the necessity of multi-modal data integration in various cancer stage and survival risk classification tasks.

Ablation analysis of Pathformer for different input modalities and different calculation modules. (a) Different types of input modalities (omics data types) were used as input for TCGA cancer low- and high-risk survival classification. (b) Ablation analysis of different calculation modules in Pathformer. Error bars are from 2 times 5-fold cross-validation across 9 datasets, representing 95% confidence intervals. CC-attention, Pathformer without pathway crosstalk network bias; Transformer, Pathformer with only normal attention and pathway embedding; PSNN, Pathformer with only classification module with pathway embedding; NN, Pathformer with only classification module with gene embedding.
Figure 3.

Ablation analysis of Pathformer for different input modalities and different calculation modules. (a) Different types of input modalities (omics data types) were used as input for TCGA cancer low- and high-risk survival classification. (b) Ablation analysis of different calculation modules in Pathformer. Error bars are from 2 times 5-fold cross-validation across 9 datasets, representing 95% confidence intervals. CC-attention, Pathformer without pathway crosstalk network bias; Transformer, Pathformer with only normal attention and pathway embedding; PSNN, Pathformer with only classification module with pathway embedding; NN, Pathformer with only classification module with gene embedding.

Next, to evaluate the essentialities of different calculation modules in Pathformer, we compared four additional variations of Pathformer, namely CC-attention, Transformer, PSNN, and NN, in which one to multiple modules of Pathformer are successively removed. The “CC-attention” model is Pathformer without pathway crosstalk network bias. The “Transformer” model is Pathformer without pathway crosstalk network bias and row-attention, using only normal attention mechanism and pathway embeddings. The “PSNN” model directly uses classification module with pathway embedding as input. The “NN” model directly uses classification module with gene embedding as input. As shown in Fig. 3b and Supplementary Figs S8 and S9, the complete Pathformer achieved the best classification performance, while the performance of CC-Attention, Transformer, PSNN, and NN decreased successively. This indicates that pathway crosstalk network, attention mechanism, and pathway embedding are all integral components of Pathformer. In particular, CC-attention exhibited significantly poorer classification performance compared to Pathformer, providing strong evidence for the necessity of incorporating pathway crosstalk in Pathformer.

3.3 Biological interpretability of Pathformer in breast cancer prognosis prediction using tissue data

To further understand the decision-making process of Pathformer and validate the reliability of its biological interpretability, we showed a case study on breast cancer survival risk classification. We demonstrated that Pathformer can use attention weights and SHAP values to identify modalities, pathways, and genes statistically associated with breast cancer prognosis, which aligns with known biological knowledge (Fig. 4).

Biological interpretation of the breast cancer survival data using Pathformer. (a) Contributions of different modalities for breast cancer (BRCA) survival risk classification calculated by attention weights (averaging attention maps of row-attention). (b) Important pathways and their key genes with top SHapley Additive exPlanations (SHAP) values for BRCA survival risk classification. Among the key genes, different colors represent different pillar modalities of the genes. (c) A hub module of the updated pathway crosstalk network for BRCA survival risk classification. Color depth and size of node represents the degree of node. Line thickness represents the weight of edge. All links are predicted by Pathformer, where known links are reported by the initial crosstalk network and new links are new predictions. (d) Keplan–Meier curves of the most active pathway selected identified by Pathformer. P-value calculated through Log-Rank test.
Figure 4.

Biological interpretation of the breast cancer survival data using Pathformer. (a) Contributions of different modalities for breast cancer (BRCA) survival risk classification calculated by attention weights (averaging attention maps of row-attention). (b) Important pathways and their key genes with top SHapley Additive exPlanations (SHAP) values for BRCA survival risk classification. Among the key genes, different colors represent different pillar modalities of the genes. (c) A hub module of the updated pathway crosstalk network for BRCA survival risk classification. Color depth and size of node represents the degree of node. Line thickness represents the weight of edge. All links are predicted by Pathformer, where known links are reported by the initial crosstalk network and new links are new predictions. (d) Keplan–Meier curves of the most active pathway selected identified by Pathformer. P-value calculated through Log-Rank test.

First, at the omics and modality level, we visualized the contributions of different modalities for breast cancer survival risk classification by the attention weights (Fig. 4a). The contribution of transcriptomic data was the greatest in breast cancer prognostic prediction, which is consistent with the results of ablation analysis (Fig. 3a) and findings from other literatures (Huang et al. 2019, Tong et al. 2021). Additionally, from Fig. 4a and Supplementary Fig. S10a, we observed that the contribution of various features in the same modality varied between BRCA prognosis and staging, such as DNA methylation. These findings further validate the necessity of biological multi-modal embedding and integration.

Next, at the pathway and gene levels, we identified key pathways with top 15 SHAP values and key genes with top 5 SHAP values for each pathway in breast cancer survival risk classification (Fig. 4b). Then, we presented a hub module of the updated pathway crosstalk network (Fig. 4c). These key pathways and genes identified by SHAP and the hub module are biologically meaningful and consistent with previous biological experiments. For instance, complex I biogenesis pathway, which was identified as the most critical pathway during the classification and a key node in the hub module of the updated pathway crosstalk network, was reported to play an important role in cancer cell proliferation and metastasis (Urra et al. 2017). Five mitochondrial genes (MT-ND4, MT-ND1, MT-ND3, MT-ND6, and MT-ND2), which were identified as key genes of the complex I biogenesis pathway by Pathformer, were also reported to be associated with breast cancer prognosis (Kopinski et al. 2021). Another example is FTL, which was predicted by our SHAP value to be up-regulated in the high-risk cancer group, was also reported to promote breast cancer cell proliferation validated by knockout experiments (Tang et al. 2023).

Subsequently, to facilitate a more intuitive understanding of the impact of active pathways identified by Pathformer on breast cancer survival risk classification, we depicted survival curves comparing patients with high and low scores in active pathways (Fig. 4d and Supplementary Fig. S11). The pathway score for each sample was obtained by averaging across different dimensions of pathway embedding updated by Pathformer. Log-rank tests indicated that most active pathways identified by Pathformer, like complex I biogenesis pathway, significantly influenced patient survival. Finally, to gain further insights into how Pathformer uses key features for accurate decision-making, we visualized pathway embedding changes and discussed the commonalities among correctly classified samples (Supplementary Note S9). We visually examined the feature extraction capability of Pathformer’s Transformer module with CC-attention by comparing PCA of pathway embedding matrices before and after Pathformer update, and pathway score heatmap (Supplementary Fig. S12a and b). We also found that Pathformer’s accuracy is not influenced by clinical indicators such as cancer subtype and age (Supplementary Fig. S12c).

3.4 Performance of Pathformer for the noninvasive diagnosis of cancer using liquid biopsy data

In clinical practice, cancer diagnosis involves not only using tissue data for cancer staging but also using liquid biopsy data (i.e. plasma) for noninvasive early detection and screening. The latter has even greater clinical significance because early detection substantially increases five-year survival rate of cancer patients. For instance, 5-year survival rates of colon cancer were reported as 93.2% for stage I, and only 8.1% for stage IV (O’Connell et al. 2004). Therefore, we applied Pathformer to liquid biopsy data, aiming to classify cancer patients from healthy controls. We curated two types of cell-free RNA sequencing (cfRNA-seq) data, including plasma datasets (comprising 98 healthy donors and 275 cancer samples) and platelet datasets (comprising 286 healthy donors and 632 cancer samples). We then calculated seven RNA-level modalities as Pathformer’s multi-modal input, including RNA expression, RNA splicing, RNA editing, RNA alternative promoter (RNA alt. promoter), RNA allele-specific expression (RNA ASE), RNA single nucleotide variations (RNA SNV), and chimeric RNA. Liquid biopsy data collection and preprocessing procedures are in Supplementary Note S2, while model parameters and settings are in Supplementary Note S6. Because these seven modalities of RNA may have information redundancy, we selected the best modality combination based on 2 times 5-fold cross validations (Supplementary Note S10). The results showed that the plasma data with seven modalities and the platelet data with three modalities obtained the best performances (AUCs > 0.9). Additionally, we found that Pathformer’s performance was superior to the other integration methods using the liquid biopsy data (Tables 2 and Supplementary Table S4). Because cancer screening usually requires high specificity, we particularly report sensitivities on 99% specificity in Table 2. Pathformer achieves an average sensitivity of 48.8% in the plasma dataset and an average sensitivity of 48.1% in the platelet dataset. It is worth noting that the sensitivity is still above 45% on 99% specificity in the plasma data even for the early-stage cancer patients, showing Pathformer’s potential for early cancer diagnosis.

Table 2.

Cancer detection performance of Pathformer and other integration methods based on the cell-free RNA liquid biopsy data.

MethodsDatasetMacro-averaged F1 scoreWeighted-averaged F1 scoreAUCSensitivity (99% specificity)
Type IeiLRPlasmaa0.7950.8400.8750.316
eiXGBoostPlasma0.7770.8310.8690.324
eiSVMPlasma0.8140.8610.9100.367
eiRFPlasma0.7920.8470.8820.370
liSVMPlasma0.6410.7540.9040.431
liRFPlasma0.7540.8230.8970.438
liLRPlasma0.6980.7880.9100.462
liXGBoostPlasma0.7900.8450.9110.467
Type IIPLSDAPlasma0.7120.7940.8430.321
sPLSDAPlasma0.7170.7960.8590.366
Type IIIPathCNNPlasma0.4240.6260.5420.070
eiCNNPlasma0.6190.7400.6710.254
MOGATPlasma0.7990.8420.8700.307
eiNNPlasma0.8210.8590.9100.393
liNNPlasma0.7720.8210.8860.395
P-NETPlasma0.7250.8060.8690.409
MOGOnetPlasma0.8400.8730.8720.412
liCNNPlasma0.7840.8320.8840.445
PathformerPathformerPlasma0.8430.8770.9140.488
PathformerPathformerPlasma (early-stageb)0.8530.8690.9160.479
Type IeiLRPlateletc0.8530.8710.9380.409
eiRFPlatelet0.8260.8530.9300.418
eiSVMPlatelet0.7520.8020.8860.423
liSVMPlatelet0.7490.7980.9080.425
liLRPlatelet0.7170.7850.8740.446
liRFPlatelet0.8170.8490.9400.447
liXGBoostPlatelet0.8790.8970.9590.453
eiXGBoostPlatelet0.8530.8750.9390.461
Type IIsPLSDAPlatelet0.7110.7670.8490.253
PLSDAPlatelet0.7880.8260.8990.370
Type IIIPathCNNPlatelet0.4080.5610.4920.027
eiCNNPlatelet0.4780.6030.5670.048
liCNNPlatelet0.5790.6710.6620.134
eiNNPlatelet0.7680.8040.8340.422
P-NETPlatelet0.5480.6590.9340.439
liNNPlatelet0.8430.8730.9090.445
MOGATPlatelet0.7020.7720.9230.445
MOGOnetPlatelet0.7060.7760.9290.469
PathformerPathformerPlatelet0.8890.9030.9380.481
MethodsDatasetMacro-averaged F1 scoreWeighted-averaged F1 scoreAUCSensitivity (99% specificity)
Type IeiLRPlasmaa0.7950.8400.8750.316
eiXGBoostPlasma0.7770.8310.8690.324
eiSVMPlasma0.8140.8610.9100.367
eiRFPlasma0.7920.8470.8820.370
liSVMPlasma0.6410.7540.9040.431
liRFPlasma0.7540.8230.8970.438
liLRPlasma0.6980.7880.9100.462
liXGBoostPlasma0.7900.8450.9110.467
Type IIPLSDAPlasma0.7120.7940.8430.321
sPLSDAPlasma0.7170.7960.8590.366
Type IIIPathCNNPlasma0.4240.6260.5420.070
eiCNNPlasma0.6190.7400.6710.254
MOGATPlasma0.7990.8420.8700.307
eiNNPlasma0.8210.8590.9100.393
liNNPlasma0.7720.8210.8860.395
P-NETPlasma0.7250.8060.8690.409
MOGOnetPlasma0.8400.8730.8720.412
liCNNPlasma0.7840.8320.8840.445
PathformerPathformerPlasma0.8430.8770.9140.488
PathformerPathformerPlasma (early-stageb)0.8530.8690.9160.479
Type IeiLRPlateletc0.8530.8710.9380.409
eiRFPlatelet0.8260.8530.9300.418
eiSVMPlatelet0.7520.8020.8860.423
liSVMPlatelet0.7490.7980.9080.425
liLRPlatelet0.7170.7850.8740.446
liRFPlatelet0.8170.8490.9400.447
liXGBoostPlatelet0.8790.8970.9590.453
eiXGBoostPlatelet0.8530.8750.9390.461
Type IIsPLSDAPlatelet0.7110.7670.8490.253
PLSDAPlatelet0.7880.8260.8990.370
Type IIIPathCNNPlatelet0.4080.5610.4920.027
eiCNNPlatelet0.4780.6030.5670.048
liCNNPlatelet0.5790.6710.6620.134
eiNNPlatelet0.7680.8040.8340.422
P-NETPlatelet0.5480.6590.9340.439
liNNPlatelet0.8430.8730.9090.445
MOGATPlatelet0.7020.7720.9230.445
MOGOnetPlatelet0.7060.7760.9290.469
PathformerPathformerPlatelet0.8890.9030.9380.481
a

All cancer stages from I to IV.

b

Cancer stage I and stage II.

c

Stage information not available. All types of cancer patients are used as positives; the heathy controls are used as negatives. Each value is the mean of 5-fold cross-validation repeated twice (10 values).

The bold values are the highest in each dataset.

Table 2.

Cancer detection performance of Pathformer and other integration methods based on the cell-free RNA liquid biopsy data.

MethodsDatasetMacro-averaged F1 scoreWeighted-averaged F1 scoreAUCSensitivity (99% specificity)
Type IeiLRPlasmaa0.7950.8400.8750.316
eiXGBoostPlasma0.7770.8310.8690.324
eiSVMPlasma0.8140.8610.9100.367
eiRFPlasma0.7920.8470.8820.370
liSVMPlasma0.6410.7540.9040.431
liRFPlasma0.7540.8230.8970.438
liLRPlasma0.6980.7880.9100.462
liXGBoostPlasma0.7900.8450.9110.467
Type IIPLSDAPlasma0.7120.7940.8430.321
sPLSDAPlasma0.7170.7960.8590.366
Type IIIPathCNNPlasma0.4240.6260.5420.070
eiCNNPlasma0.6190.7400.6710.254
MOGATPlasma0.7990.8420.8700.307
eiNNPlasma0.8210.8590.9100.393
liNNPlasma0.7720.8210.8860.395
P-NETPlasma0.7250.8060.8690.409
MOGOnetPlasma0.8400.8730.8720.412
liCNNPlasma0.7840.8320.8840.445
PathformerPathformerPlasma0.8430.8770.9140.488
PathformerPathformerPlasma (early-stageb)0.8530.8690.9160.479
Type IeiLRPlateletc0.8530.8710.9380.409
eiRFPlatelet0.8260.8530.9300.418
eiSVMPlatelet0.7520.8020.8860.423
liSVMPlatelet0.7490.7980.9080.425
liLRPlatelet0.7170.7850.8740.446
liRFPlatelet0.8170.8490.9400.447
liXGBoostPlatelet0.8790.8970.9590.453
eiXGBoostPlatelet0.8530.8750.9390.461
Type IIsPLSDAPlatelet0.7110.7670.8490.253
PLSDAPlatelet0.7880.8260.8990.370
Type IIIPathCNNPlatelet0.4080.5610.4920.027
eiCNNPlatelet0.4780.6030.5670.048
liCNNPlatelet0.5790.6710.6620.134
eiNNPlatelet0.7680.8040.8340.422
P-NETPlatelet0.5480.6590.9340.439
liNNPlatelet0.8430.8730.9090.445
MOGATPlatelet0.7020.7720.9230.445
MOGOnetPlatelet0.7060.7760.9290.469
PathformerPathformerPlatelet0.8890.9030.9380.481
MethodsDatasetMacro-averaged F1 scoreWeighted-averaged F1 scoreAUCSensitivity (99% specificity)
Type IeiLRPlasmaa0.7950.8400.8750.316
eiXGBoostPlasma0.7770.8310.8690.324
eiSVMPlasma0.8140.8610.9100.367
eiRFPlasma0.7920.8470.8820.370
liSVMPlasma0.6410.7540.9040.431
liRFPlasma0.7540.8230.8970.438
liLRPlasma0.6980.7880.9100.462
liXGBoostPlasma0.7900.8450.9110.467
Type IIPLSDAPlasma0.7120.7940.8430.321
sPLSDAPlasma0.7170.7960.8590.366
Type IIIPathCNNPlasma0.4240.6260.5420.070
eiCNNPlasma0.6190.7400.6710.254
MOGATPlasma0.7990.8420.8700.307
eiNNPlasma0.8210.8590.9100.393
liNNPlasma0.7720.8210.8860.395
P-NETPlasma0.7250.8060.8690.409
MOGOnetPlasma0.8400.8730.8720.412
liCNNPlasma0.7840.8320.8840.445
PathformerPathformerPlasma0.8430.8770.9140.488
PathformerPathformerPlasma (early-stageb)0.8530.8690.9160.479
Type IeiLRPlateletc0.8530.8710.9380.409
eiRFPlatelet0.8260.8530.9300.418
eiSVMPlatelet0.7520.8020.8860.423
liSVMPlatelet0.7490.7980.9080.425
liLRPlatelet0.7170.7850.8740.446
liRFPlatelet0.8170.8490.9400.447
liXGBoostPlatelet0.8790.8970.9590.453
eiXGBoostPlatelet0.8530.8750.9390.461
Type IIsPLSDAPlatelet0.7110.7670.8490.253
PLSDAPlatelet0.7880.8260.8990.370
Type IIIPathCNNPlatelet0.4080.5610.4920.027
eiCNNPlatelet0.4780.6030.5670.048
liCNNPlatelet0.5790.6710.6620.134
eiNNPlatelet0.7680.8040.8340.422
P-NETPlatelet0.5480.6590.9340.439
liNNPlatelet0.8430.8730.9090.445
MOGATPlatelet0.7020.7720.9230.445
MOGOnetPlatelet0.7060.7760.9290.469
PathformerPathformerPlatelet0.8890.9030.9380.481
a

All cancer stages from I to IV.

b

Cancer stage I and stage II.

c

Stage information not available. All types of cancer patients are used as positives; the heathy controls are used as negatives. Each value is the mean of 5-fold cross-validation repeated twice (10 values).

The bold values are the highest in each dataset.

3.5 Biological interpretability of Pathformer in the data of cancer patient’s blood

Based on the above analysis, we attempted to gain new insight into the deregulated alterations in plasma through Pathformer’s biological interpretability module (Fig. 5a and Supplementary Fig. S18a). First, we found that the pathways and genes ranked highly in SHAP values were associated with dysregulated alterations reported by previous experimental studies. For example, binding and uptake of ligands (e.g. oxidized low-density lipoprotein, oxLDL) by scavenger receptors pathway, with top SHAP value ranking, was reported to play a crucial role in cancer prognosis and carcinogenesis by promoting the degradation of harmful substances and accelerating the immune response (Ryu et al. 2020). Another two examples are DAP12 signaling pathway and DAP12 interactions pathway, which were highly ranked by SHAP value in both plasma and platelet data, were reported to regulate natural killer cell immune responses against certain tumor cells through platelet modulation cells (Campbell and Colonna 1999, Placke et al. 2011).

Biological interpretation of the cancer patients’ plasma data using Pathformer. (a) Important pathways and their key genes revealed by Pathformer in the plasma cell free RNA-seq data when classifying cancer patients from healthy controls. The pathways and their key genes were selected with top SHAP values. Among the key genes, different colors represent different pillar modalities (e.g. RNA expression, RNA editing, etc.) of the genes. (b) Hub modules of pathway crosstalk network are shown for plasma cell free RNA-seq data. Color depth and size of node represent the degree of node. Line thickness represents the weight of edge. All links are predicted by Pathformer, where known links are reported by the initial crosstalk network and new links are new predictions.
Figure 5.

Biological interpretation of the cancer patients’ plasma data using Pathformer. (a) Important pathways and their key genes revealed by Pathformer in the plasma cell free RNA-seq data when classifying cancer patients from healthy controls. The pathways and their key genes were selected with top SHAP values. Among the key genes, different colors represent different pillar modalities (e.g. RNA expression, RNA editing, etc.) of the genes. (b) Hub modules of pathway crosstalk network are shown for plasma cell free RNA-seq data. Color depth and size of node represent the degree of node. Line thickness represents the weight of edge. All links are predicted by Pathformer, where known links are reported by the initial crosstalk network and new links are new predictions.

Furthermore, Pathformer can explore potential novel interactions between various biological processes in cancer patients’ plasma by updating pathway crosstalk network (Fig. 5b). For example, the link between binding and uptake of ligands by scavenger receptors pathway and iron uptake and transport pathway was a novel addition to the known links (Pathformer’s input pathway crosstalk network curated from published databases, see Methods). This finding aligns with a previous report of SCARA5 (scavenger receptor class A member) as a ferritin receptor (Yu et al. 2020). The crosstalk between two pathways was amplified by Pathformer in plasma dataset, probably because they were important for classification. In summary, Pathformer’s updated pathway crosstalk network can effectively visualize the information flow between pathways related to cancer classification tasks, providing new insight into the crosstalk of biological pathways in cancer patients’ plasma.

4 Conclusion and discussion

Pathformer successfully applied a Transformer model to integrate multi-modal data for cancer diagnosis and prognosis. Particularly, it introduced a novel biological embedding method based on the compacted multi-modal vectors (Fig. 1b). Moreover, it utilized the criss-cross attention mechanism of Transformer to capture crosstalk between biological pathways and regulation between modalities (i.e. different omics).

4.1 Clinical applications of Pathformer

Pathformer can be applied to various classification tasks in disease diagnosis, treatment, and prognosis, such as early detection, cancer staging, survival prediction, and drug response prediction. Its predictive accuracy, stability, reliability, and biological interpretability were demonstrated through substantial benchmark and case studies focusing on cancer prognosis and noninvasive diagnosis. Ablation analysis demonstrated the pivotal role of various modal integrations and core modules (CC-attention, PNSS, and pathway embedding) in Pathformer for accurate classification. Our discussion on explained variances across integration models and multi-omics data further corroborated conclusions from benchmark tests and ablation analyses (Supplementary Note S11). Moreover, this framework is adaptable for the diagnosis and prognosis of other complex diseases, like autoimmune disease, neurodegenerative diseases, etc.

4.2 Potential targets revealed in cancer patients’ blood

Particularly, we identified some potential noninvasive cancer diagnostic biomarkers by Pathformer, such as the scavenger receptor related pathways and DAP12 related pathways, which are associated with extracellular vesicle transport (Kzhyshkowska et al. 2006) and immune response (Campbell and Colonna 1999), respectively. We even found a new cancer-related pathway crosstalk in blood, which is between binding and uptake of ligands by scavenger receptors pathway and iron uptake and transport pathway. These results provide candidate targets for the mechanism study of cancer microenvironment and immune system, and even new targets for cancer treatment.

4.3 Limitations of Pathformer and future directions

For gene selection, Pathformer used genes involved in four common pathway databases, all of which consist of protein-coding genes. However, a substantial body of literature has reported that noncoding RNAs are also crucial in cancer prognosis and diagnosis (Qi et al. 2016). Therefore, incorporating noncoding RNAs and their related functional pathways into Pathformer would be promising for future work. For clinical applications in liquid biopsy, we used the multi-modal features derived from cfRNA-seq only in the application of liquid biopsy, because the published cell-free multi-omics datasets (Tao et al. 2023) are usually too small to be train-and-tested. For computational efficiency and memory costs, there is still room for improvement for Pathformer. Pathway embedding of Pathformer has prevented memory overflow of Transformer module caused by long inputs, but training still requires significant time and space (Supplementary Note S12). Therefore, when adding more pathways or gene sets (e.g. transcription factors), Pathformer still faces the issue of memory overflow. In the future work, we may introduce linear attention to further improve computational speed. Furthermore, potential signatures, regulations and biomarkers identified by Pathformer are also needed to be studied and validated by further biological experiments and clinical tests.

Acknowledgements

We extend our heartfelt thanks to the two anonymous reviewers for their perceptive suggestions. Their suggestions have greatly enhanced the depth of our manuscript, resulting in a more comprehensive and impactful presentation of our work.

Supplementary data

Supplementary data are available at Bioinformatics online.

Conflict of interest

None declared.

Funding

This work was supported by National Natural Science Foundation of China [32170671, 82341101, 82371855], Tsinghua University Guoqiang Institute Grant [2021GQG1020], Tsinghua University Initiative Scientific Research Program of Precision Medicine [2022ZLA003], Bioinformatics Platform of National Center for Protein Sciences (Beijing) [2021-NCPSB-005]. This study was also supported by Bayer Micro-funding, Bio-Computing Platform of Tsinghua University Branch of China National Center for Protein Sciences.

Data availability

All datasets used in this study are publicly available for academic research usages. The TCGA datasets were derived from sources in the public domain: https://www.cancer.gov/ccg/. The plasma dataset is available in Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) under accession codes GSE174302 and GSE186607. The platelet dataset is available in Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) under accession codes GSE68086 and GSE89843. The details of usage are also fully illustrated in Methods and Supplementary Notes. Source code for data preprocessing and model training is freely available at Github (https://github.com/lulab/Pathformer) with detailed instructions. Source code for comparing the other methods is also included.

References

Best
MG
,
Sol
N
,
In 't Veld
SGJG
et al.
Swarm intelligence-enhanced detection of non-small-cell lung cancer using tumor-educated platelets
.
Cancer Cell
2017
;
32
:
238
52.e9.e239
.

Best
MG
,
Sol
N
,
Kooi
I
et al.
RNA-Seq of tumor-educated platelets enables blood-based pan-cancer, multiclass, and molecular pathway cancer diagnostics
.
Cancer Cell
2015
;
28
:
666
76
.

Campbell
KS
,
Colonna
M.
DAP12: a key accessory protein for relaying signals by natural killer cell receptors
.
Int J Biochem Cell Biol
1999
;
31
:
631
6
.

Cancer Genome Atlas Research Network
.
The cancer genome atlas pan-cancer analysis project
.
Nat. Genet
2013
;
45
:
1113
20
.

Chen
S
,
Jin
Y
,
Wang
S
et al.
Cancer type classification using plasma cell-free RNAs derived from human and microbes
.
Elife
2022
;
11
:
e75181
.

Chiu
Y-C
et al.
Predicting drug response of tumors from integrated genomic profiles by deep neural networks
.
BMC Med Genomics
2019
;
12
:
143
55
.

Croft
D
,
O'Kelly
G
,
Wu
G
et al.
Reactome: a database of reactions, pathways and biological processes
.
Nucleic Acids Res
2010
;
39
:
D691
7
.

Cui
H
,
Wang
C
,
Maan
H
et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nature Methods 2024:
1
11
.

Elmarakeby
HA
,
Hwang
J
,
Arafeh
R
et al.
Biologically informed deep neural network for prostate cancer discovery
.
Nature
2021
;
598
:
348
52
.

Fu
Y
,
Xu
J
,
Tang
Z
et al.
A gene prioritization method based on a swine multi-omics knowledgebase and a deep learning model
.
Commun Biol
2020
;
3
:
502
.

Hao
J
,
Kim
Y
,
Kim
T-K
et al.
PASNet: pathway-associated sparse deep neural network for prognosis prediction from high-throughput data
.
BMC Bioinformatics
2018
;
19
:
510
.

Hasin
Y
,
Seldin
M
,
Lusis
A.
Multi-omics approaches to disease
.
Genome Biol
2017
;
18
:
83
15
.

Huang
Z
,
Zhan
X
,
Xiang
S
et al.
SALMON: survival analysis learning with multi-omics neural networks on breast cancer
.
Front Genet
2019
;
10
:
166
.

Islam
MM
et al.
An integrative deep learning framework for classifying molecular subtypes of breast cancer
.
Comput Struct Biotechnol J
2020
;
18
:
2185
99
.

Jumper
J
,
Evans
R
,
Pritzel
A
et al.
Highly accurate protein structure prediction with AlphaFold
.
Nature
2021
;
596
:
583
9
.

Kanehisa
M
,
Goto
S.
KEGG: Kyoto encyclopedia of genes and genomes
.
Nucleic Acids Res
2000
;
28
:
27
30
.

Kim
D
,
Rath
O
,
Kolch
W
et al.
A hidden oncogenic positive feedback loop caused by crosstalk between Wnt and ERK pathways
.
Oncogene
2007
;
26
:
4571
9
.

Kopinski
PK
,
Singh
LN
,
Zhang
S
et al.
Mitochondrial DNA variation and cancer
.
Nat Rev Cancer
2021
;
21
:
431
45
.

Kuru
HI
,
Tastan
O
,
Cicek
AE.
MatchMaker: a deep learning framework for drug synergy prediction
.
IEEE/ACM Trans Comput Biol Bioinform
2022
;
19
:
2334
44
.

Kzhyshkowska
J
,
Gratchev
A
,
Goerdt
S.
Stabilin‐1, a homeostatic scavenger receptor with multiple functions
.
J Cell Mol Med
2006
;
10
:
635
49
.

Li
Y
,
Agarwal
P
,
Rajagopalan
D.
A global pathway crosstalk network
.
Bioinformatics
2008
;
24
:
1442
7
.

Liu
Z-W
,
Zhang
Y-M
,
Zhang
L-Y
et al.
Duality of interactions between TGF-β and TNF-α during tumor formation
.
Front Immunol
2021
;
12
:
810286
.

Lundberg
SM
,
Lee
S-I.
A unified approach to interpreting model predictions
.
Adv Neural Inf Process Syst
2017
;
30
.

Ning
C
,
Cai
P
,
Liu
X
et al.
A comprehensive evaluation of full-spectrum cell-free RNAs highlights cell-free RNA fragments for early-stage hepatocellular carcinoma detection
.
EBioMedicine
2023
;
93
:
104645
.

Nishimura
D.
BioCarta
.
Biotech Softw Internet Rep Comput Softw J Sci
2001
;
2
:
117
20
.

O'Connell
JB
,
Maggard
MA
,
Ko
CY.
Colon cancer survival rates with the new American Joint Committee on Cancer sixth edition staging
.
J Natl Cancer Inst
2004
;
96
:
1420
5
.

Ogris
C
,
Guala
D
,
Helleday
T
et al.
A novel method for crosstalk analysis of biological networks: improving accuracy of pathway annotation
.
Nucleic Acids Res
2017
;
45
:
e8
.

Oh
JH
,
Choi
W
,
Ko
E
et al.
PathCNN: interpretable convolutional neural networks for survival prediction and pathway analysis applied to glioblastoma
.
Bioinformatics
2021
;
37
:
i443
50
.

Osseni
MA
,
Tossou
P
,
Laviolette
F
, et al. MOT: a multi-omics transformer for multiclass classification tumour types predictions. bioRxiv, https://doi.org/10.1101/2022.11.14.516459,
2022
, preprint: not peer reviewed.

Placke
T
,
Kopp
H-G
,
Salih
HR.
Modulation of natural killer cell anti-tumor reactivity by platelets
.
J Innate Immun
2011
;
3
:
374
82
.

Prahallad
A
,
Bernards
R.
Opportunities and challenges provided by crosstalk between signalling pathways in cancer
.
Oncogene
2016
;
35
:
1073
9
.

Preuer
K
,
Lewis
RPI
,
Hochreiter
S
et al.
DeepSynergy: predicting anti-cancer drug synergy with deep learning
.
Bioinformatics
2018
;
34
:
1538
46
.

Qi
P
,
Zhou
X-y
,
Du
X.
Circulating long non-coding RNAs in cancer: current status and future perspectives
.
Mol Cancer
2016
;
15
:
39
11
.

Rohart
F
,
Gautier
B
,
Singh
A
et al.
mixOmics: an R package for ‘omics feature selection and multiple data integration
.
PLoS Comput Biol
2017
;
13
:
e1005752
.

Ryu
S
,
Howland
A
,
Song
B
et al.
Scavenger receptor class a to E involved in various cancers
.
Chonnam Med J
2020
;
56
:
1
5
.

Schaefer
CF
,
Anthony
K
,
Krupa
S
et al.
PID: the pathway interaction database
.
Nucleic Acids Res
2009
;
37
:
D674
9
.

Sharifi-Noghabi
H
,
Zolotareva
O
,
Collins
CC
et al.
MOLI: multi-omics late integration with deep neural networks for drug response prediction
.
Bioinformatics
2019
;
35
:
i501
i509
.

Tang
C
,
Zhang
B
,
Yang
Y
et al.
Overexpression of ferritin light chain as a poor prognostic factor for breast cancer
.
Mol Biol Rep
2023
;
50
:
8097
109
.

Tao
Y
,
Xing
S
,
Zuo
S
et al.
Cell-free multi-omics analysis reveals potential biomarkers in gastrointestinal cancer patients’ blood
.
Cell Rep Med
2023
;
4
:
101281
.

Tarazona
S
,
Arzalluz-Luque
A
,
Conesa
A.
Undisclosed, unmet and neglected challenges in multi-omics studies
.
Nat Comput Sci
2021
;
1
:
395
402
.

Theodoris
CV
,
Xiao
L
,
Chopra
A
et al.
Transfer learning enables predictions in network biology
.
Nature
2023
;
618
:
616
24
.

Tong
L
,
Wu
H
,
Wang
MD.
Integrating multi-omics data by learning modality invariant representations for improved prediction of overall survival of cancer
.
Methods
2021
;
189
:
74
85
.

Urra
FA
,
Muñoz
F
,
Lovy
A
et al.
The mitochondrial complex (I) ty of cancer
.
Front Oncol
2017
;
7
:
118
.

Wang
T
,
Shao
W
,
Huang
Z
et al.
MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification
.
Nat Commun
2021
;
12
:
3445
.

Xing
X
,
Yang
F
,
Li
H
, et al. An interpretable multi-level enhanced graph attention network for disease diagnosis with gene expression data. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Houston, TX: IEEE,
2021
,
556
61
.

Yu
B
,
Cheng
C
,
Wu
Y
et al.
Interactions of ferritin with scavenger receptor class a members
.
J Biol Chem
2020
;
295
:
15727
41
.

Author notes

= Xiaofan Liu and Yuhuan Tao equal contribution.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Associate Editor: Jonathan Wren
Jonathan Wren
Associate Editor
Search for other works by this author on:

Supplementary data