Abstract

Splicing factors (SFs) are the major RNA-binding proteins (RBPs) and key molecules that regulate the splicing of mRNA molecules through binding to mRNAs. The expression of splicing factors is frequently deregulated in different cancer types, causing the generation of oncogenic proteins involved in cancer hallmarks. In this study, we investigated the genes that encode RNA-binding proteins and identified potential splicing factors that contribute to the aberrant splicing applying a random forest classification model. The result suggested 56 splicing factors were related to the prognosis of 13 cancers, two SF complexes in liver hepatocellular carcinoma, and one SF complex in esophageal carcinoma. Further systematic bioinformatics studies on these cancer prognostic splicing factors and their related alternative splicing events revealed the potential regulations in a cancer-specific manner. Our analysis found high ILF2-ILF3 expression correlates with poor prognosis in LIHC through alternative splicing. These findings emphasize the importance of SFs as potential indicators for prognosis or targets for therapeutic interventions. Their roles in cancer exhibit complexity and are contingent upon the specific context in which they operate. This recognition further underscores the need for a comprehensive understanding and exploration of the role of SFs in different types of cancer, paving the way for their potential utilization in prognostic assessments and the development of targeted therapies.

Introduction

Numerous studies indicate that dysregulated splicing factors (SFs) are linked with various human diseases, including different types of cancer. In cancer, these SFs control the expression levels and alternative splicing (AS) of target RNAs related to cancer cell growth, death, blood vessel formation, aging, and cell movement. SF-mediated regulation ultimately contributes to cancer development and pathologie [1]. Two families of splicing factors, namely the serine–arginine (SR) factors and heterogeneous ribonucleoproteins (hnRNP), have been extensively studied for their roles in promoting or inhibiting exon recognition, respectively [2–4]. Recently, studies have revealed that RNA binding motif proteins 5, 6, and 10 (RBM5, RBM6, and RBM10) are instrumental in regulating the splicing of NUMB, thereby impacting the growth of lung cancer cells [5–7]. Currently, various strategies have been developed to target splicing factors for potential cancer therapy, including small molecule inhibitors, metal ions, small interfering RNAs, and decoy RNA oligonucleotides [8–10]. Since SFs play a vital role in regulating cancer, they could be promising targets for cancer treatment. Understanding the functional, and pathological roles of RNA-binding proteins (RBPs) in cancers will provide valuable insights for cancer therapy. Nevertheless, at the time of the study, there was a lack of comprehensive analysis to integrate high-throughput data from various dimensions to systematically identify cancer-related splicing factors and investigate the effect of their abnormal expression and potential functions. This made it challenging to gain a holistic understanding of the role of SFs in cancer. As shown here, systematic analyses of the splicing factors using diverse bioinformatics approaches are urgently needed to provide therapeutic targets for cancer.

Cross-linking and immunoprecipitation (CLIP)-based techniques enable researchers to map RBP binding sites so that they can detect the potential targets of individual RBP and the RBP binding affinity, which provide a basis for studying splicing factor regulation [11]. However, the CLIP experiment of all RBPs is not available, the main challenge faced with the systematic analyses of all human RBPs and identification of the splicing factors in cancer would be the lack of CLIP experiments. Here, we used the Pearson correlation coefficient as the RBP effect score to predict the splicing change of AS events and performed multiple analyses to study the splicing factor regulation including survival analysis, identify the potential SF complex, identify the SF-related AS events, SF-affected gene function, SF-affected gene expression (Fig. 1). In general, our work developed a pipeline identifying the potential splicing factors in cancer and highlighted their significance as potential biomarkers for cancer prognosis, which provided potential biomarkers in cancer prognosis. We aimed to shed light on the role of splicing factors in cancer pathogenesis and prognosis, offering valuable insights for future research and clinical applications.

Overview of the analysis pipeline. (A) Identifying RNA binding proteins associated with alternative splicing events in cancer. (B) The structure design for the random forest was used to select cancer prognostic splicing factors. (C) The downstream analyses of cancer prognostic splicing factors, including survival analysis of each RBP, correlation study between RBPs, differential analysis between good and poor prognostic groups, function annotation of SF-related AS events, and SF-affected gene expression.
Figure 1

Illustrates the overview of the identity of the prognosis splicing factors and the downstream analysis. Using the Pearson correlation coefficient as the RNA binding protein effect score to predict the splicing change of AS events and performed multiple analyses to study the splicing factor regulation including survival analysis, identify the potential splicing factor complex, identify the splicing factor-related AS events, splicing factor affected gene function, splicing factor -affected gene expression.

Results

Abnormal RBP and alternative splicing events in cancer

Alternative splicing enables the regulated generation of multiple mRNA and protein products from a single gene. Cancer cells have general as well as cancer-type-specific alterations in the splicing process that can have prognostic value and contribute to every hallmark of cancer progression [12]. We performed differential analysis to identify the differential AS events, 13 046 AS events in 4156 genes showed different PSI values between tumor and normal tissues, including 2649 ES events, 1334 A5SS events, 1218 A3SS events, 3270 MXE events, and 4575 IR events (Fig. 2A, Supplementary Table 1). Further functional enrichment analysis indicated 4156 genes were mainly involved in cancer relating pathways such as “VEGF signaling pathway”, “NOD-like receptor signaling pathway”, and “PPAR signaling pathway” (Supplementary Table 2). These insights suggest that the identified AS events and their related genes have the potential to significantly impact the course of cancer progression, highlighting their importance in the context of cancer biology and therapy.

Differential RBP and AS events in cancer. (A) The number of differentially expressed RBPs of individual cancers. (B) The number of differential AS events of individual cancers. For each cancer, the number in each bar from left to right represents the number of differential A3SS, A5SS, ES, IR, and MXE. (C) The number of differential RBP in multiple cancer types. (D) The boxplot of SNRPB in ten cancers, the solid line boxes represent the tumor tissues, the dotted line boxes represent the normal tissues. (E) The number of positive and negative associated RBP-AS pairs in each correlation level of 17 cancers. The significant in figures based on * p.adjusted < 0.05; ** p.adjusted<0.01; *** p.adjusted<0.001, **** p.adjusted<0.0001.
Figure 2

Illustrates the transcriptome profile of cancer, showcasing aberrant RNA binding protein, alternative splicing events, and their interconnections. This analysis provides insights into the roles played by RBPs and alternative splicing events in cancer progression.

The analysis of differential gene expression has revealed that many RBPs exhibit differential expression exclusively in specific types of cancer (Fig. 2B). This suggests that there is cancer-specific regulation of splicing across different cancer types. Additionally, among the differentially expressed RBPs, nine were found to be altered in more than ten cancer types. These RBPs include EEF1A2, SNRPB, RNASE1, EZH2, MEX3A, PABPC1L, RNASEH2A, RPL22L1, and EXO1. Notably, a previous study highlighted that SNRPB-mediated RNA splicing is crucial for tumor cell proliferation and stemness in hepatocellular carcinoma [13]. Consistent with these findings, our analysis reveals that SNRPB is upregulated in ten different types of cancer. Furthermore, it predominantly exhibits positive regulation over AS events in nine out of the ten cancers analyzed (Fig. 2C and D). These findings not only reinforce the emerging significance of SNRPB but also emphasize the broader importance of RBPs as potential therapeutic targets in the battle against cancer. Understanding the dysregulation of RBPs and their influence on alternative splicing events can provide valuable insights for developing targeted therapies and personalized treatment strategies.

RBP effect score demonstrated strong performance in predicting splicing changes

Splicing factor regulation is indeed a crucial aspect of controlling post-transcriptional gene expression in eukaryotic cells. Dysregulation of splicing factors can lead to abnormal splicing events, which in turn can contribute to tumor initiation, growth, metastasis, and resistance to therapy [14, 15]. To investigate the relationship between differential RBPs and differential AS events, association studies were performed. These studies identified a total of 1 419 758 RBP-AS correlation pairs. Among them, 747 272 pairs exhibited positive correlations, indicating that the expression levels of the RBPs were positively associated with differential AS events. Conversely, 672 485 pairs showed negative correlations, suggesting an inverse relationship between RBP expression and AS events. It is worth noting that nearly half of the RBP-AS pairs (48.21%) displayed low correlation, indicating weak associations. 50.57% of RBP-AS pairs represent moderate correlation, while only a small fraction (1.19%) demonstrated high or very high correlation (Fig. 2E). These RBPs, whose expression levels correlated with the differential AS events, are likely to play a regulatory role in splicing changes associated with cancer.

These findings highlight the importance of understanding the correlation between RBPs and AS events to identify potential splicing regulators involved in cancer-related processes. To identify high-impact splicing factors, we employed random forests to investigate the splicing effects of splicing changes based on the correlation score between RBPs and AS events. Initially, we trained the model using different levels of RBP correlation regulation to predict splicing changes. The results indicated that incorporating more RBP effects improved the model’s performance. Figure 3A illustrates that solely considering highly correlated RBP effects is insufficient for accurately predicting splicing changes. However, when all RBP effect scores of AS events were utilized to predict splicing changes, the model exhibited optimal performance, with accuracy ranging from 0.78 ± 0.11 to 0.99 ± 0.01, precision from 0.75 ± 0.06 to 1.00 ± 0.00, recall from 0.55 ± 0.19 to 1.00 ± 0.00, F1-score from 0.68 ± 0.19 to 0.99 ± 0.01, and AUC from 0.92 ± 0.07 to 0.99 ± 0.00. Subsequently, we utilized the Gini importance to identify potential splicing factors.

Performance of Random Forest model. (A) The accuracy, AUC, recall, precision, and F1 score for the random forest model with different inputs. The x-axis represents the threshold of the correlation coefficient. (B) The accuracy, AUC, recall, precision, and F1 score for the random forest model with different RBPs were selected as features. The x-axis represents the threshold of the Gini importance. The dotted vertical line was in the Gini importance = 0.01.
Figure 3

Illustrates the performance of the random forest model to predict the splicing change of AS events. The figure showed when all RBP effect scores were utilized to predict splicing changes, the model exhibited optimal performance. RBPs with a Gini importance greater than or equal to 0.01 were used to run the model the predictions were able to keep a high level of accuracy.

To select RBPs as potential splicing factors, it is crucial to apply a threshold to the Gini variable importance. To identify key regulators of alternative splicing, we selected the RBP features with higher Gini importance and ran the model. As depicted in Fig. 3B, only when RBPs with a Gini importance greater than or equal to 0.01 were used to run the model, were the predictions able to keep a high level of accuracy. Based on this threshold, potential splicing factors were selected for each cancer type. Finally, 210 RBPs were identified as potential splicing factors. Among them, 134 SFs were specific to individual cancer types, while 76 SFs were found to be associated with more than two cancer types, suggesting their potential as common splicing regulators across multiple cancer types (Supplementary Table 3).

The regulation of therapeutic splicing factors in cancer exhibits complexity and context-dependent characteristics

In this study, we employ a two-component Gaussian mixture model to analyze the expression patterns of individual RBPs across cancer patients. By clustering patients into high-expressed and low-expressed groups for each RBP, we were able to identify distinct expression profiles. Furthermore, we conducted survival analyses to determine the prognostic relevance of these SFs. Our results revealed 56 potential SFs that significantly influenced patient prognosis across 13 different types of cancer (Table 1, Supplementary Table 4). These findings highlight the importance of alternative splicing regulated by specific SFs in influencing patient outcomes in these cancer types. Moreover, five splicing factors were found to affect the patient prognosis in multiple cancer types (PTGES3, PABPC1L, SPATS2, GAPDH, DHX34, Fig. 4A–C). Indeed, among the identified splicing factors, DHX34 exhibited distinct prognostic patterns in colorectal cancer and liver cancer. Additionally, high expression levels of three SFs, namely PTGES3, SPATS2, and GAPDH, consistently correlated with poor patient prognosis. Interestingly, the high expression of PABPC1L was found to suppress cancer progression and prolong survival in patients with LUAD and READ (Supplementary Fig. 1A–C). These observations highlight the complex and context-dependent roles of specific SFs in different cancer types and emphasize their significance as potential prognostic markers or therapeutic targets.

Table 1

Potential prognosis-related splicing factor in cancer.

CancerPotential splicing factors  
(High expression with good prognosis)
Potential splicing factors  
(High expression with poor prognosis)
BLCARBPMSRUVBL1
CHOLSCAF1RCL1, ANG
COADPUS1
ESCARAN, PTGES3, SNRPG, TRMT112, DCAF13, NOP56
KICHSPATS2L, EEF1E1
KIRCLARS2
KIRPSNRPN, MRPS6, GAPDH, RPS2
LIHCTCOF1, CHTOP, UBAP2L, ILF3, ILF2, SNRPB, LARP1, SPATS2, AKAP8L, U2AF2, SNRNP70, DHX34, RBMX, TRMT1, SNRPE, PTGES3, DYNLL1, DAP3, ATXN2L, NOL7
LUADPABPC1LGAPDH, SPATS2, RBMS2
LUSCEXO1LRRFIP1, SECISBP2L
READDHX34, EIF2S2, PABPC1L,PDCD4, RIOK3, EIF4E3, FAM46A
STADNOP58
UCECCSDC2, MARS
CancerPotential splicing factors  
(High expression with good prognosis)
Potential splicing factors  
(High expression with poor prognosis)
BLCARBPMSRUVBL1
CHOLSCAF1RCL1, ANG
COADPUS1
ESCARAN, PTGES3, SNRPG, TRMT112, DCAF13, NOP56
KICHSPATS2L, EEF1E1
KIRCLARS2
KIRPSNRPN, MRPS6, GAPDH, RPS2
LIHCTCOF1, CHTOP, UBAP2L, ILF3, ILF2, SNRPB, LARP1, SPATS2, AKAP8L, U2AF2, SNRNP70, DHX34, RBMX, TRMT1, SNRPE, PTGES3, DYNLL1, DAP3, ATXN2L, NOL7
LUADPABPC1LGAPDH, SPATS2, RBMS2
LUSCEXO1LRRFIP1, SECISBP2L
READDHX34, EIF2S2, PABPC1L,PDCD4, RIOK3, EIF4E3, FAM46A
STADNOP58
UCECCSDC2, MARS
Table 1

Potential prognosis-related splicing factor in cancer.

CancerPotential splicing factors  
(High expression with good prognosis)
Potential splicing factors  
(High expression with poor prognosis)
BLCARBPMSRUVBL1
CHOLSCAF1RCL1, ANG
COADPUS1
ESCARAN, PTGES3, SNRPG, TRMT112, DCAF13, NOP56
KICHSPATS2L, EEF1E1
KIRCLARS2
KIRPSNRPN, MRPS6, GAPDH, RPS2
LIHCTCOF1, CHTOP, UBAP2L, ILF3, ILF2, SNRPB, LARP1, SPATS2, AKAP8L, U2AF2, SNRNP70, DHX34, RBMX, TRMT1, SNRPE, PTGES3, DYNLL1, DAP3, ATXN2L, NOL7
LUADPABPC1LGAPDH, SPATS2, RBMS2
LUSCEXO1LRRFIP1, SECISBP2L
READDHX34, EIF2S2, PABPC1L,PDCD4, RIOK3, EIF4E3, FAM46A
STADNOP58
UCECCSDC2, MARS
CancerPotential splicing factors  
(High expression with good prognosis)
Potential splicing factors  
(High expression with poor prognosis)
BLCARBPMSRUVBL1
CHOLSCAF1RCL1, ANG
COADPUS1
ESCARAN, PTGES3, SNRPG, TRMT112, DCAF13, NOP56
KICHSPATS2L, EEF1E1
KIRCLARS2
KIRPSNRPN, MRPS6, GAPDH, RPS2
LIHCTCOF1, CHTOP, UBAP2L, ILF3, ILF2, SNRPB, LARP1, SPATS2, AKAP8L, U2AF2, SNRNP70, DHX34, RBMX, TRMT1, SNRPE, PTGES3, DYNLL1, DAP3, ATXN2L, NOL7
LUADPABPC1LGAPDH, SPATS2, RBMS2
LUSCEXO1LRRFIP1, SECISBP2L
READDHX34, EIF2S2, PABPC1L,PDCD4, RIOK3, EIF4E3, FAM46A
STADNOP58
UCECCSDC2, MARS
Prognostic splicing factors and splicing factor complexes in cancer. (A) Gini importance of 30 prognostic splicing factors in 11 cancers. (B) Gini importance of 20 prognostic splicing factors in LIHC. (C) Gini importance of 6 prognostic splicing factors in ESCA. (D) The experiment supported physical interaction in LIHC. The scores in line represent the active interaction sources. (E) The experiment supported physical interaction in ESCA. The scores represent the active interaction sources.
Figure 4

Illustrates a total of 56 cancer prognostic splicing factors and 3 splicing factor complexes were found to regulate the abnormal alternative splicing in 17 cancers.

Notably, we observed that only three cancer types (COAD, KIRC, and STAD) were affected by a single potential splicing factor that significantly influenced patient prognosis. This suggests that the impact of splicing factors on prognosis may vary across different cancer types. Understanding these specific associations between splicing factors and cancer prognosis in individual cancer types is crucial for developing targeted therapeutic strategies and predictive markers tailored to each specific context (Supplementary Fig. 1D). 10 cancer types exhibited multiple SFs that were involved in alternative splicing regulation and showed associations with cancer prognosis. Thus, we explored the potential splicing factor complexes in cancer, we identified SF complexes where the proteins not only interacted physically but also exhibited correlated expression patterns. After analyzing the available data, we successfully identified three SF complexes that met our criteria in two different types of cancer (Fig. 4D and E, Table 2, Supplementary Table 5). These complexes consisted of SFs that were known to physically interact with each other, and these interactions had been experimentally verified with high confidence. By examining these splicing factor regulation complexes, we gain insights into the potential cooperative mechanisms underlying alternative splicing regulation in cancer.

Table 2

Potential prognosis-related splicing factor complex in cancer.

CancerSplicing factor complexHigh expression prognosis status
ESCANOP56, DCAF13Poor
LIHCSNRNP70, SNRPE, UBAP2LGood
LIHCILF3, ILF2Good
CancerSplicing factor complexHigh expression prognosis status
ESCANOP56, DCAF13Poor
LIHCSNRNP70, SNRPE, UBAP2LGood
LIHCILF3, ILF2Good
Table 2

Potential prognosis-related splicing factor complex in cancer.

CancerSplicing factor complexHigh expression prognosis status
ESCANOP56, DCAF13Poor
LIHCSNRNP70, SNRPE, UBAP2LGood
LIHCILF3, ILF2Good
CancerSplicing factor complexHigh expression prognosis status
ESCANOP56, DCAF13Poor
LIHCSNRNP70, SNRPE, UBAP2LGood
LIHCILF3, ILF2Good

High ILF2-ILF3 expression correlates with poor prognosis in LIHC through alternative splicing

Our study identified a total of 20 potential splicing factors that were found to be related to the prognosis of patients with LIHC (liver hepatocellular carcinoma). Among them, Five SFs were involved in two SF complexes (ILF2-ILF3 and SNRNP70-SNRPE-UBAP2L). Previous studies have indicated that ILF2/3 can be recruited to precursor mRNA molecules near specific exons, leading to the exclusion of those exons from the final mature mRNA. This phenomenon has been observed in cell culture experiments as well as studies conducted using mice as model organisms [16]. Besides, ILF2 has been reported as a regulator of RNA splicing and DNA damage response in 1q21-amplified multiple myeloma [17]. Our associated study found ILF2 and ILF3 were highly correlated and the patients with high ILF2 and ILF3 expression showed poor prognosis in LIHC (Fig. 5A).

High ILF2-ILF3 expression correlates with poor prognosis in LIHC through alternative splicing. (A) ILF2 and ILF3 profile in LIHC in PCPG. From left to right, the figures are the ILF2 and ILF3 expression correlation; the survival analysis between high and low-expressed ILF2-ILF3 patients; the expression of ILF2 in high-expressed ILF2-ILF3 patients; low-expressed ILF2-ILF3 patients and normal tissues; the expression of ILF3 in high-expressed ILF2-ILF3 patients, low-expressed ILF2-ILF3 patients and normal tissues. (B) The analyses to identify ILF2-ILF3-related AS events. (C) DisGeNET enrichment analysis for 228 AS events in 135 genes. (D) ORF annotation of 228 AS events. (E) MTSS1 exon7 (chr8:125592791-125592803) skipping in LIHC. From left to right, the figures are the PSI value of MTSS1 exon7 skipping in high and low-expressed ILF2-ILF3 patients and normal tissues; the correlation between MTSS1 exon7 skipping and ILF2 expression, the x-axis represents the log2 (expression) of ILF2; the correlation between MTSS1 exon7 skipping and ILF3 expression, the x-axis represents the log2 (expression) of ILF3. (F) ACSM3 intron (chr16:20807811-20808207) retention in LIHC. From left to right, the figures are the PSI value of ACSM3 intron retention in high and low-expressed ILF2-ILF3 patients and normal tissues; the correlation between ACSM3 intron retention and ILF2 expression, the x-axis represents the log2 (expression) of ILF2; the correlation between ACSM3 intron retention and ILF3 expression, the x-axis represents the log2 (expression) of ILF3. (G) ACSM3 differential expression in LIHC. Lift is the expression of ACSM3 in high, low-expressed patients and normal tissue. (H) Intron retention in the ACSM3 3′UTR region increases the miRNA binding. The significant in figures based on * p.adjusted < 0.05, ** p.adjusted<0.01, *** p.adjusted<0.001, **** p.adjusted<0.0001.
Figure 5

Illustrates a case study of a splicing factor complex to regulate cancer-related alternative splicing in cancer. ILF2-ILF3 regulates 228 abnormal alternative splicing events and leads to poor prognosis in LIHC.

To gain insights into the underlying mechanisms of cancer progression and understand the functional roles of the ILF2-ILF3 complex, we conducted a differential analysis. Specifically, we compared patients with high expression of ILF2-ILF3 to those with low expression and identified 676 AS events that exhibited differential values. Among these, 288 AS events were not only differentially expressed between LIHC tumor tissues and normal tissues but also showed an association with the expression of ILF2 and ILF3 (Fig. 5B, Supplementary Table 6). Moreover, conducting a disease enrichment analysis, we discovered that these 288 AS events, located in 135 genes, were significantly enriched in liver diseases and secondary malignant neoplasms of the liver (Fig. 5C). These findings suggest that the ILF2-ILF3 complex and its associated AS events may play essential roles in liver disease progression. Further investigations into the functional implications of these AS events could deepen our understanding of the underlying mechanisms involved and potentially lead to the identification of novel approaches for therapeutic interventions.

Based on our analysis, we found that 206 occurred in the coding region of genes, indicating potential changes in the protein-coding sequence. Among these AS events, 146 were predicted to result in in-frame alterations, which may lead to changes in the protein’s functional domains or interactions. On the other hand, 60 AS events were predicted to cause frame-shift alterations, potentially resulting in a disruption of the open reading frame and significant changes in the resulting protein. Additionally, 13 AS events were observed in the 5′UTR, while 24 AS events occurred in the 3′ UTR region, suggesting potential regulatory effects on gene expression (Fig. 5D, Supplementary Table 7).

Noting the tumor-suppressing function of MTSS1 in hematopoietic malignancies, it is worth mentioning that the downregulation of MTSS1, potentially caused by DNA methylation, has also been observed in various other cancer types [18–20]. Our findings reveal that exon 7 (chr8:125592791-125592803) skipping in MTSS1 results in a shift in the open reading frame, leading to the exclusion of residues 154 to 157 in O43312. This alteration causes the loss of crucial functional elements such as “chain,” “Compositional bias,” “InterPro Representative Domain,” “Region,” and “Coiled-coil” in Q06481 (Fig. 5E and Supplementary Fig. 2), potentially disrupting the protein’s normal biological functions and structural integrity. Additionally, previous studies have reported the overexpression of ACSM3 suppressing ovarian cancer progression and inhibiting the pathogenesis of high-grade serous ovarian carcinoma by promoting AMPK activity [21]. In the context of this study, the retained intron (chr16:20807811-20808207) in the 3′UTR region of ACSM3 increases the binding of miRNAs such as hsa-mir-3920, hsa-mir-5696, and has-mir-4452 (Supplementary Table 8). The intron retention in ACSM3 leads to mRNA degradation and contributes to a poor prognostic outcome for LIHC (Fig. 5F–H). The ILF2-ILF3 complex is involved in regulating both the exon skipping of MTSS1 and the intron retention in ACSM3, which can lead to the loss of protein function or mRNA degradation (Fig. 6). These mechanisms are significant contributors to the poor prognostic outcomes observed in LIHC. Understanding these regulatory processes is essential for identifying potential therapeutic targets and developing effective strategies to improve patient outcomes. Further research in this area will advance our understanding of LIHC pathogenesis and aid in the development of more targeted and efficient treatment approaches.

Potential splicing regulatory mechanisms of ILF2-ILF3 complex. The ILF2-ILF3 complex is involved in regulating both the exon skipping of MTSS1 and the intron retention in ACSM3, which can lead to the loss of protein function or mRNA degradation. These mechanisms are significant contributors to the poor prognostic outcomes observed in LIHC.
Figure 6

Illustrates ILF2-ILF3 regulating the exon skipping of MTSS1 and the intron retention in ACSM3 to affect LIHC progression.

Discussion

Accumulated evidence showed that studying the role of splicing factors in alternative splicing is crucial to understanding cancer progression. The aberrant expression of splicing factors in tumor cells distributes the splicing profiles [22, 23]. Thus, splicing factors can potentially be used as the cancer biomarker that affects splicing changes with tumorigenic directions. A better understanding of the functions of prognostic splicing factors will be crucial for developing new therapeutic strategies and representing potential targets for personalized cancer therapies. Classical studies usually integrate CLIP experiments to explore the splicing regulation by RBPs [24–26]. However, due to the nature of the lack of publicly available CLIP experiments, we tried to identify the potential prognostic splicing factors from the transcriptomic data of RNA-seq data. In this work, we have systematically studied the alternative splicing events in TCGA by developing a pipeline for identifying the potential splicing factors, which regulate abnormal splicing in cancer. Finally, we identified 56 prognostic splicing factors across 13 different cancer types.

Our study involved a comprehensive and systematic analysis of 56 cancer prognostic splicing factors to investigate the regulation of SFs, which included the identification of splicing factor complexes, as well as the examination of SF-related exon skipping that could lead to protein loss of function and SF-related 3′UTR regions that could affect miRNA binding and cause mRNA degradation. Our findings suggested that each cancer type has its unique regulatory patterns, which highlights the importance of identifying reliable predictive markers for individual cancer types.

The identified splicing factors and biomarkers hold significant potential for various clinical applications. For example, one possible application is using small molecule inhibitors to disrupt the interaction between ILF2-ILF3 complex and their related AS events, leading to altered splicing patterns and potentially inhibiting cancer cell growth. Another potential application is to design antisense oligonucleotides to target the MTSS1 exon-skipping splicing site or ACSM3 retained intron splicing sites to avoid splicing abnormalities in cancer. Additionally, RNA interference techniques can selectively silence or knock down the high expression of the ILF2-ILF3 complex. By inhibiting the expression of these aberrantly expressed splicing factors, RNAi can restore proper splicing and potentially impede tumor growth.

However, it is important to note that while targeting splicing abnormalities holds great promise, the development of splicing-based therapies is still in its early stages. A comprehensive understanding of the specific splicing alterations in different cancers, identification of druggable targets, and careful evaluation of therapeutic efficacy and safety are ongoing research areas. Moreover, personalized approaches considering individual tumor splicing profiles and molecular characteristics may be necessary to maximize the effectiveness of splicing-based treatments. As research progresses, the potential for leveraging splicing modulation in cancer therapy continues to be an exciting area of investigation.

Despite our efforts, there are several limitations to our work. First, our work is based on the existing RBP genes, meaning that it is difficult to identify the novel splicing factors. Secondly, it is hard to achieve the physical RBP-AS interaction due to a lack of CLIP experiments and tumor heterogeneity which will reduce the reliability of the biomarkers. However, our work is still valuable since we performed comprehensive systematic analyses for the potential splicing factor. In summary, our pipeline and systematic analyses for RBPs can be used to predict the prognostic splicing factors in either cancer or any other disease type with available transcriptome data.

Materials and methods

Dataset

To identify the splicing factor in cancer development, we collected the AS events of 32 cancer types in the Cancer Genome Atlas (TCGA) (https://gdc.cancer.gov/about-data/publications/PanCanAtlas-Splicing-2018) [27]. This data set includes 336 961 exon skipping (ES) events, 135 988 alternative 5-splice sites (A5SS) events, 180 487 alternative 3′-splice sites (A3SS) events, 939 606 mutually exclusive exons (MXE) events, and 122 246 intron retention (IR) events. The gene expression data and clinical information were downloaded from Broad Institute GDAC Firehose (http://gdac.broadinstitute.org/) [28].

Identifying the differential AS events and differential RBPs between cancer and normal tissues

For each cancer type, we first selected the AS events that were detected in 80% of patients and required a minimum of five normal tissues for inclusion. This stringent criterion allowed us to focus on robust AS events that were consistently observed across a significant portion of the patient population. A total of 17 different cancers met these criteria and were chosen for further investigation. Then, to study the differential alternative splicing events, we performed the Wilcoxon test and used the Benjamini-Hochberg false discovery rate (FDR) for multiple testing corrections based on percent spliced-in (PSIs) values between tumor and matched normal samples in each cancer type(|ΔPSI| > 0.1 and p.adjusted < 0.05). The previous study on the census of human RNA-binding proteins defined about 1500 genes that encode RBPs [29]. To identify differentially expressed RBPs, the Wilcoxon test and multiple testing corrections were also performed based on the expression of RBPs between tumor and matched normal samples in each cancer type (|log2foldchange| > 1 and p.adjusted < 0.05).

Association study between differential AS events and differential RBPs

To identify RBP-associated AS events, we performed the association studies between the expressions of these genes encoding RBPs and PSI values of individual differential AS events using Pearson’s correlation and the Benjamini-Hochberg false discovery rate for multiple testing correction. A Previous study with high citations reports a rule of thumb for interpreting the size of a correlation coefficient and clustering the correlation coefficient into very high (0.9 to 1.0), high (0.7 to 0.9), Moderate (0.5 to 0.7), low (0.3 to 0.5) and negligible (0.0 to 0.3) correlation, respectively [30]. In this paper, we identify the RBP-associated AS events based on |ρ| > 0.3 and p.adjusted < 0.05.

Training of random Forest model

We used a random forest as the classifier to predict the AS events change between cancer and normal and evaluate the important RBP features based on the Gini index [31]. First, for each cancer type, differential AS events were clustered into two groups based on the splicing change between cancer and normal tissues (ΔPSI > 0.1 as “IN”, ΔPSI < −0.1 as “OUT”, the number of all sample sizes shown in the Supplementary table 2. Then, we used the Pearson correlation coefficient as the RBP effect score to predict the splicing change(“IN”/“OUT”) of each AS event. To identify the best correlation coefficient threshold used to predict the splicing change, we set a series threshold of correlation coefficient (0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9) to train the model. If the correlation coefficient is bigger than the threshold, keep the score to predict the splicing change, otherwise, the score will be standardized as 0. The input data of the Random Forest Classifier includes an m*n feature matrix and a m*1 label. m was the number of differential AS events, and n was the number of differential RBPs that might cause the splicing change.

For each cancer type, we selected 20% of AS events from the dataset as a testing set and performed five-fold cross-validation on the training data, then applied a grid search strategy to select hyper-parameters that maximize the classification score using the RandomForestClassifier() and GridSearchCV() of the sklearn package in python [20]. Then, we evaluate the model and define the correlation coefficient threshold based on the accuracy, precision, recall, F1 score, and area under curve (AUC).

Selection of Gini importance threshold and identification of the potential splicing factors

The Gini importance of the random forest provided a superior means for measuring RBP feature relevance on splicing changes. To select the potential splicing factor, we select the top RBP features based on the Gini importance and rerun the model for individual cancer. Finally, the RBP with a Gini important score > 0.01 was selected as a potential splicing factor for further analysis.

Identifying cancer prognostic splicing factors

For each RBP candidate selected from the random forest classifications, we classified the cancer patients into two clusters (high-expressed RBP cluster and low-expressed RBP cluster) based on the gene expression of the RBP genes using the two-component Gaussian mixture model (GMM) from the python scikit-learn package [32]. Then, Kaplan–Meier analysis was performed using the Python lifelines package [33] to identify cancer prognostic splicing factors (P-value < 0.05).

Identify the potential splicing factor complex

We downloaded the protein physical interactions from the STRING database and identified the potential splicing factor regulation complexes [34]. The splicing factor complex requires proteins to physically interact with others which has been verified by experiments with high confidence (active interaction sources > 0.7) and expression associated with each other (|ρ| > 0.3 and p.adjusted < 0.05).

Identifying the cancer prognostic SF-related AS events

To identify the cancer prognostic SF-related AS events, we first performed differential PSI values analysis between the high-expressed SF group and low-expressed SF group (ΔPSI| > 10% and p.adjusted < 0.05). Then associated studies were performed between SF’s expression and AS events. The cancer prognostic SF-related AS events were identified if AS events were differential expressed between the high-expressed SF group and low-expressed SF group and associated with SF(|ρ| > 0.3 and p.adjust < 0.05). To gain further insights, function enrichment analyses for the SF-related AS events were conducted using Enrichr [35].

Open reading frame (ORF) and protein function annotation of AS events

To analyze AS events, we assigned each event to specific regions of the transcript, namely the coding sequence (CDS), 5′ untranslated region (5′UTR), and 3′ untranslated region (3′UTR). Then, for AS events in CDS, we examined isoform sequences’ open reading frames. We reported transcripts as “in-frame” if the number of sequences was a multiple of three after removing the skipped exon sequence. One or two nucleotide insertions were classified as “frame-shift”.

Analysis of identifying miRNA binding sites in 3′-UTR-located AS events

For the AS events’ splicing regions in the 3′-UTR regions. we extracted the splicing region sequences of these AS events based on the GENCODE v19 reference genome sequence. The miRNA binding sites in alternative splicing of the region are based on miRDB [36] and TargetScarn [37]. To reduce the false-positive predictions, we adopted the miRNAs whose target scores are greater than 100 from miRnada and predicted binding to an alternative splicing region by TargetScarn.

Acknowledgements

We thank the members of the Center for Computational Systems Medicine (CCSM) for the valuable discussion.

Author contributions

M.Y. and X.Z. contributed to the conception and study design. M.Y. and J.L. contributed to the analysis and interpretation of data. M.Y. contributed to the drafting of the manuscript. M.Y. and P.K. contributed to the critical revision of the manuscript. X.Z. contributed to the leadership and supervision of various aspects of this work. All the authors contributed to the article and approved the submitted version.

Conflict of interest statement. The authors declare no potential conflicts of interest.

Funding

M.Y. was supported by the China Postdoctoral Science Foundation [2022M712900, 2023T160590]; J.L. and X.Z. were supported by the National Institutes of Health [R01GM123037, U01AR069395-01A1, R01CA241930]; National Science Foundation [2 217 515, 2 326 879]; P.K., was supported by National Institutes of Health [R35GM138184].

References

1.

Kang
 
D
,
Lee
 
Y
,
Lee
 
J-S
.
RNA-binding proteins in cancer: functional and therapeutic perspectives
.
Cancer
 
2020
;
12
:
2699
.

2.

Fu
 
X-D
,
Ares
 
M
.
Context-dependent control of alternative splicing by RNA-binding proteins
.
Nat Rev Genet
 
2014
;
15
:
689
701
.

3.

Howard
 
JM
,
Sanford
 
JR
.
The RNAissance family: SR proteins as multifaceted regulators of gene expression
.
Wiley Interdiscip Rev RNA
 
2015
;
6
:
93
110
.

4.

Morton
 
M
,
AlTamimi
 
N
,
Butt
 
H
. et al.  
Serine/arginine-rich protein family of splicing regulators: new approaches to study splice isoform functions
.
Plant Sci
 
2019
;
283
:
127
34
.

5.

Zhou
 
C
,
Gao
 
X
,
Hu
 
S
. et al.  
RBM-5 modulates U2AF large subunit-dependent alternative splicing in C. elegans
.
RNA Biol
 
2018
;
15
:
1295
308
.

6.

Bechara
 
EG
,
Sebestyén
 
E
,
Bernardis
 
I
. et al.  
RBM5, 6, and 10 differentially regulate NUMB alternative splicing to control cancer cell proliferation
.
Mol Cell
 
2013
;
52
:
720
33
.

7.

Loiselle
 
JJ
,
Roy
 
JG
,
Sutherland
 
LC
.
RBM10 promotes transformation-associated processes in small cell lung cancer and is directly regulated by RBM5
.
PLoS One
 
2017
;
12
:
e0180258
.

8.

Wang
 
Y
,
Yang
 
F
,
Shang
 
J
. et al.  
Integrative analysis reveals the prognostic value and functions of splicing factors implicated in hepatocellular carcinoma
.
Sci Rep
 
2021
;
11
:
1
14
.

9.

Sahin
 
I
,
George
 
A
,
Seyhan
 
AA
.
Therapeutic targeting of alternative RNA splicing in gastrointestinal malignancies and other cancers
.
Int J Mol Sci
 
2021
;
22
:
11790
.

10.

Gleave
 
ME
,
Monia
 
BP
.
Antisense therapy for cancer
.
Nat Rev Cancer
 
2005
;
5
:
468
79
.

11.

Lee
 
FC
,
Ule
 
J
.
Advances in CLIP technologies for studies of protein-RNA interactions
.
Mol Cell
 
2018
;
69
:
354
69
.

12.

Bonnal
 
SC
,
López-Oreja
 
I
,
Valcárcel
 
J
.
Roles and mechanisms of alternative splicing in cancer—implications for care
.
Nat Rev Clin Oncol
 
2020
;
17
:
457
74
.

13.

Zhan
 
Y-T
,
Li
 
L
,
Zeng
 
T-T
. et al.  
SNRPB-mediated RNA splicing drives tumor cell proliferation and stemness in hepatocellular carcinoma
.
Aging (Albany NY)
 
2021
;
13
:
537
54
.

14.

Du
 
J-X
,
Zhu
 
G-Q
,
Cai
 
J-L
. et al.  
Splicing factors: insights into their regulatory network in alternative splicing in cancer
.
Cancer Lett
 
2021
;
501
:
83
104
.

15.

Dvinge
 
H
,
Kim
 
E
,
Abdel-Wahab
 
O
. et al.  
RNA splicing factors as oncoproteins and tumour suppressors
.
Nat Rev Cancer
 
2016
;
16
:
413
30
.

16.

Rigo
 
F
,
Hua
 
Y
,
Chun
 
SJ
. et al.  
Synthetic oligonucleotides recruit ILF2/3 to RNA transcripts to modulate splicing
.
Nat Chem Biol
 
2012
;
8
:
555
61
.

17.

Marchesini
 
M
,
Ogoti
 
Y
,
Fiorini
 
E
. et al.  
ILF2 is a regulator of RNA splicing and DNA damage response in 1q21-amplified multiple myeloma
.
Cancer Cell
 
2017
;
32
:
88
100.e6
.

18.

Xie
 
F
,
Ye
 
L
,
Ta
 
M
. et al.  
MTSS1: a multifunctional protein and its role in cancer invasion and metastasis
.
Front Biosci (Schol Ed)
 
2011
;
3
:
621
31
.

19.

Parr
 
C
,
Jiang
 
WG
.
Metastasis suppressor 1 (MTSS1) demonstrates prognostic value and anti-metastatic properties in breast cancer
.
Eur J Cancer
 
2009
;
45
:
1673
83
.

20.

Cong
 
M
,
Wang
 
Y
,
Yang
 
Y
. et al.  
MTSS1 suppresses mammary tumor-initiating cells by enhancing RBCK1-mediated p65 ubiquitination
.
Nat Cancer
 
2020
;
1
:
222
34
.

21.

Yan
 
L
,
He
 
Z
,
Li
 
W
. et al.  
The overexpression of acyl-CoA medium-chain Synthetase-3 (ACSM3) suppresses the ovarian cancer progression via the inhibition of integrin β1/AKT signaling pathway
.
Front Oncol
 
2021
;
11
:
644840
.

22.

Chang
 
C
,
Rajasekaran
 
M
,
Qiao
 
Y
. et al.  
The aberrant upregulation of exon 10-inclusive SREK1 through SRSF10 acts as an oncogenic driver in human hepatocellular carcinoma
.
Nat Commun
 
2022
;
13
:
1
17
.

23.

Wan
 
L
,
Deng
 
M
,
Zhang
 
H
.
SR splicing factors promote cancer via multiple regulatory mechanisms
.
Genes
 
2022
;
13
:
1659
.

24.

Witten
 
JT
,
Ule
 
J
.
Understanding splicing regulation through RNA splicing maps
.
Trends Genet
 
2011
;
27
:
89
97
.

25.

Carazo
 
F
,
Gimeno
 
M
,
Ferrer-Bonsoms
 
JA
. et al.  
Integration of CLIP experiments of RNA-binding proteins: a novel approach to predict context-dependent splicing factors from transcriptomic data
.
BMC Genomics
 
2019
;
20
:
1
11
.

26.

Wang
 
Y
,
Chen
 
SX
,
Rao
 
X
. et al.  
Modulator-dependent RBPs changes alternative splicing outcomes in kidney cancer
.
Front Genet
 
2020
;
11
:
265
.

27.

Kahles
 
A
,
Lehmann
 
K-V
,
Toussaint
 
NC
. et al.  
Comprehensive analysis of alternative splicing across tumors from 8,705 patients
.
Cancer Cell
 
2018
;
34
:
211
224.e6
.

28.

Samur
 
MK
.
RTCGAToolbox: a new tool for exporting TCGA firehose data
.
PLoS One
 
2014
;
9
:
e106397
.

29.

Gerstberger
 
S
,
Hafner
 
M
,
Tuschl
 
T
.
A census of human RNA-binding proteins
.
Nat Rev Genet
 
2014
;
15
:
829
45
.

30.

Mukaka
 
MM
.
A guide to appropriate use of correlation coefficient in medical research
.
Malawi Med J
 
2012
;
24
:
69
71
.

31.

Breiman
 
L
.
Random forests
.
Mach Learn
 
2001
;
45
:
5
32
.

32.

Pedregosa
 
F
,
Varoquaux
 
G
,
Gramfort
 
A
. et al.  
Scikit-learn: machine learning in python
.
J Mach Learn Res
 
2011
;
12
:
2825
30
.

33.

Davidson-Pilon
 
C
.
Lifelines: survival analysis in python
.
J Open Source Software
 
2019
;
4
:
1317
.

34.

Szklarczyk
 
D
,
Gable
 
AL
,
Nastou
 
KC
. et al.  
The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets
.
Nucleic Acids Res
 
2021
;
49
:
D605
12
.

35.

Kuleshov
 
MV
,
Jones
 
MR
,
Rouillard
 
AD
. et al.  
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update
.
Nucleic Acids Res
 
2016
;
44
:
W90
7
.

36.

Wong
 
N
,
Wang
 
X
.
miRDB: an online resource for microRNA target prediction and functional annotations
.
Nucleic Acids Res
 
2015
;
43
:
D146
52
.

37.

Agarwal
 
V
,
Bell
 
GW
,
Nam
 
J-W
. et al.  
Predicting effective microRNA target sites in mammalian mRNAs
.
elife
 
2015
;
4
:
e05005
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/pages/standard-publication-reuse-rights)