Decoding the genetic symphony: Profiling protein-coding and long noncoding RNA expression in T-acute lymphoblastic leukemia for clinical insights

Abstract T-acute lymphoblastic leukemia (T-ALL) is a heterogeneous malignancy characterized by the abnormal proliferation of immature T-cell precursors. Despite advances in immunophenotypic classification, understanding the molecular landscape and its impact on patient prognosis remains challenging. In this study, we conducted comprehensive RNA sequencing in a cohort of 35 patients with T-ALL to unravel the intricate transcriptomic profile. Subsequently, we validated the prognostic relevance of 23 targets, encompassing (i) protein-coding genes—BAALC, HHEX, MEF2C, FAT1, LYL1, LMO2, LYN, and TAL1; (ii) epigenetic modifiers—DOT1L, EP300, EML4, RAG1, EZH2, and KDM6A; and (iii) long noncoding RNAs (lncRNAs)—XIST, PCAT18, PCAT14, LINC00202, LINC00461, LINC00648, ST20, MEF2C-AS1, and MALAT1 in an independent cohort of 99 patients with T-ALL. Principal component analysis revealed distinct clusters aligning with immunophenotypic subtypes, providing insights into the molecular heterogeneity of T-ALL. The identified signature genes exhibited associations with clinicopathologic features. Survival analysis uncovered several independent predictors of patient outcomes. Higher expression of MEF2C, BAALC, HHEX, and LYL1 genes emerged as robust indicators of poor overall survival (OS), event-free survival (EFS), and relapse-free survival (RFS). Higher LMO2 expression was correlated with adverse EFS and RFS outcomes. Intriguingly, increased expression of lncRNA ST20 coupled with RAG1 demonstrated a favorable prognostic impact on OS, EFS, and RFS. Conclusively, several hitherto unreported associations of gene expression patterns with clinicopathologic features and prognosis were identified, which may help understand T-ALL's molecular pathogenesis and provide prognostic markers.


Introduction
T-lineage acute lymphoblastic leukemia (T-ALL) represents a formidable challenge in the field of oncology, with its unique genetic and clinical features.It is characterized by the malignant transformation of immature T-cell precursors.It accounts for 20-25% of adults and 10-15% of pediatric ALL cases in Europe, the United States, and Japan (1,2).It has been reported to be more prevalent in developing countries than in developed countries, possibly due to genetic and environmental factors that are yet unidentified (3).It is more prevalent in males than in females (1).Although the prognosis of T-ALL has improved considerably over the years, the outcomes remain inferior to those of B-lineage ALL, particularly in relapsed and refractory settings (4).
Understanding the molecular intricacies that drive T-ALL is paramount for advancing our knowledge and improving patient outcomes.The emergence of high-throughput techniques has provided immense insights into the genomic organization of functional elements in the human genome.This, combined with cell biology techniques applied to T-ALL, has led to significant advances in our understanding of the disease and has allowed the development of novel therapeutic approaches (5).The malignant transformation that culminates in T-ALL is a multistep process in which genetic alterations occurring in crucial cellular pathways work together to produce the T-ALL phenotype.Activating mutations in the NOTCH1 gene, mutations in FBXW7 tumor suppressor, and loss of CDKN2A locus frequently occur in T-ALL (6)(7)(8)(9), which also impact patient survival (9,10).Besides genetic mutations, gene expression profiling in patients with T-ALL has revealed aberrant expression of a diverse group of transcription factors such as LYL1, LMO1, LMO2, TAL1, TLX1, TLX3, HOXA, NKX2.1, NKX2.2, NKX2.5, MYC, MYB, and SPI1 in distinct T-ALL subtypes (7,11).Several common genetic defects have also been observed among distinct genetic subgroups, which commonly involve oncogenic signaling cascades, including IL7R/JAK/STAT, PI3K/AKT, and RAS/MEK/ERK signaling (1,12).Some less understood facets in T-ALL include epigenetic deregulation, ribosomal dysfunction, and altered expression of oncogenic miRNAs or long noncoding RNA (lncRNA).
Although a better understanding of molecular pathophysiology and immunophenotyping led to the refinement of the classification of T-ALL, it could not translate into their application in the management of patients as the clinical relevance of these subtypes remains either unclear or controversial (1,7,13,14).None of the currently known genetic markers is used for risk stratification of patients with T-ALL.The only subtype of T-ALL that has a place in the 2022 revision of WHO is early thymic precursor ALL (ETP-ALL) (13,15,16).This study delves into the intricate landscape of protein-coding and lncRNA expression in T-ALL.By dissecting the molecular signatures of T-ALL at the RNA level, we aim to shed light on the underlying mechanisms that drive this disease.Furthermore, we seek to elucidate the clinical relevance of these targets in T-ALL diagnosis, prognosis, and treatment response, with the ultimate goal of paving the way for more precise and effective therapeutic interventions.

Results
This study included 134 subjects with de novo T-ALL.The patients were immunophenotypically classified into immature (pro-T-ALL and pre-T-ALL), cortical, and mature T-ALL based on the European Group for the Immunological Characterization of Leukemias criteria (17,18).ETP-ALL was recognized based on previously defined criteria (cCD3 positive, CD1a negative, CD5dim/ negative; lack of expression of both CD4 and CD8; and positivity of stem cell and/or myeloid markers [HLA-DR, CD13, CD33, CD34, or CD117]) (16,18,19).The patients were divided into two cohorts: discovery cohort (n = 35) and validation cohort (n = 99).Total RNA sequencing was performed in the discovery cohort.Further, the clinical and prognostic relevance of 23 selected targets identified in the discovery cohort was checked in the validation cohort (Fig. 1).

Analysis of the RNA expression profile in the discovery cohort
We identified the gene expression profile of our patients with T-ALL.We did a supervised analysis based on the immunophenotype of the T-ALL cases to identify differentially expressed RNAs (proteincoding genes, epigenetic modifiers, and lncRNAs) for the three T-ALL subtypes viz.immature, cortical, and mature T-ALL.In addition, we used an unsupervised approach to classify our cases using principal component analysis (PCA) on BioVinci Version: 1.1.5,r20181005 (BioTuring, USA; https://vinci.bioturing.com/feature).

Determination of expression profile of epigenetic modifiers
We also investigated the transcriptomic profile of various epigenetic-modifying genes.Our analysis revealed overexpression of histone methyltransferases like SETD2, ASH1L, and SUZ12, along with overexpression of histone demethylase, KDM6a, and transcription regulators like ATM, and PHF6 in all subtypes of T-ALL.Histone deacetylase HDAC4 was also found to be overexpressed in all the subtypes, pointing toward increased methylation and repression of the target genes in T-ALL.On comparative analysis among three subtypes, HDAC9 and SMYD3 were found to be up-regulated, while EZH2 was down-regulated in immature T-ALL subtype.HDAC10 was underexpressed in cortical T-ALL, and EP300, PKN1, EML4, and DOT1L were overexpressed in the mature subtype of T-ALL.HDAC7 was underexpressed in immature and cortical T-ALL (Fig. S1B and Table S4).

Determination of expression profile of lncRNAs
Two thousand two hundred and forty-three lncRNAs were found to be differentially expressed, out of which 223 lncRNAs were filtered based on >2 FPKM score (Fragments Per Kilobase of transcript per Million mapped reads).Unique lncRNA signatures were found to be associated with each subtype of T-ALL.Previously reported lncRNAs in T-ALL, like XIST, were expressed in immature T-ALL (26), while LUNAR1, known as specific NOTCH1-regulated lncRNA, was expressed in cortical and mature T-ALL (27).In addition, we observed an overexpression of HOTTIP and MEF2C-AS1 in immature T-ALL, LINC01221, LINC00202, LINC00461, and LINC00648 in cortical T-ALL and MALAT1, ST20, and PCAT14 in mature T-ALL.PCAT18 was overexpressed in all subtypes of T-ALL (Fig. S1C and Table S5).

Principal component analysis
In PCA, the input was normalized gene expression per patient sample and one normal thymus sample (kind courtesy, Dr Jan Cools, Belgium).We found three separate clusters, as shown in Fig. 3, in which cluster 1, cluster 2, and cluster 3 comprised 9, 5, and 19 T-ALL cases, respectively.Three samples, including the normal thymus, did not group with any of the clusters.The patients with immature T-ALL immunophenotype were clustered together as cluster 1. Cluster 2 consisted of five samples, and all were cortical T-ALL.In cluster 3, out of 19 samples, 3 were immature, 4 were mature, and the remaining were cortical T-ALL.Three samples which did not fall into any cluster belonged to the normal thymus, immature T-ALL, and mature T-ALL, respectively.

Clinical characteristics of patients in the validation cohort
In the validation cohort (n = 99), there were 47 immature (including 12 ETP-ALL), 36 cortical, and 16 mature T-ALL cases.The median age Fig. 1.Workplan of the study.One hundred and thirty-four patients with T-ALL were recruited in the study.The patients were divided into two cohorts: discovery cohort (n = 35) and validation cohort (n = 99).Total RNA sequencing was done in the discovery cohort.RNAseq data were analyzed using both supervised and unsupervised approaches.The supervised analysis was based on immunophenotype (immature, cortical, and mature) of the patients at the diagnosis.The gene expression profiles (GEP) of protein-coding genes, histone modifiers, and lncRNAs for the three subtypes of T-ALL were determined.The clinical significance of the 23 selected targets (identified from the RNAseq data of the discovery cohort) was tested in the validation cohort.
We checked the expression levels of 23 RNA targets which were found to be differentially expressed in the discovery cohort.These targets were selected for each subtype of T-ALL, i.e. immature, cortical, and mature T-ALL, using available literature for T-ALL and other hematological and solid malignancies.They were chosen from different classes: (i) protein coding genes-BAALC, HHEX, MEF2C, FAT1, LYL1, LMO2, LYN, and TAL1; (ii) epigenetic modifiers -DOT1L, EP300, EML4, EZH2, RAG1, and KDM6A; and (iii) lncRNAs -XIST, ST20, PCAT18, PCAT14, LINC00202, LINC00461, LINC00648, MEF2C-AS1, and MALAT1.The expression of these targets was estimated by real-time PCR in the validation cohort to assess their clinical and prognostic significance in T-ALL.

Association between gene expression and patients' variables in the validation cohort
On analysis of the association of patients' characteristics with expression levels of protein and noncoding RNAs, we found an association between RAG1 expression and the age of the patients.Specifically, RAG1 expression was higher in patients younger than 12 years compared with those older than 12 years (P = 0.034).Higher expression of XIST was observed in females (P = 0.011).Low MALAT1 expression was associated with low TLC at diagnosis (P = 0.02).Patients with low XIST, low KDM6A, and high TAL1 more frequently had NCI high risk (P = 0.01, 0.018, and 0.04, respectively).Prednisolone resistance was associated with high MEF2C and HHEX expressions (P = 0.048 and 0.018, respectively).Postinduction minimal residual disease (MRD) positivity (≥0.01%) was associated with high PCAT18 (P = 0.04), HHEX (P = 0.027), and MEF2C (P = 0.007) expression.

Association between gene expression and treatment outcome in the validation cohort
In the validation cohort, 77 patients were treated with ICiCLe protocol (20), 19 with BFM-90 protocol (21), and 3 with hyper CVAD therapy (23).Complete remission was defined as bone marrow blasts <5% with a recovery of blood counts at the end of 4 weeks of induction chemotherapy.Any failure to do so (including the persistence of leukemic blasts at an extramedullary site) or death during induction therapy for any reason was considered induction failure.Patients who failed with one protocol were reinduced with another.The median follow-up was 27 months (range, 0.5-78 months).Complete remission was achieved in 84 (84.84%) patients with induction chemotherapy.Two patients died during induction therapy.

Discussion
T-ALL is a genetically heterogeneous malignancy that poses significant challenges in risk stratification and clinical management.In our study, we employed RNA sequencing to comprehensively analyze the expression profiles of protein-coding genes, epigenetic modifiers and lncRNAs in patients with T-ALL.The aim was to unravel the molecular landscape of T-ALL and assess the clinical relevance of identified RNA targets.Our findings revealed distinct expression profiles of protein-coding genes, epigenetic modifiers, and lncRNA transcripts across immunophenotypic subtypes of T-ALL.The key genes which served as transcription factors in early hematopoiesis, such as MEF2C, LYL1, LMO2, HHEX, RUNX2, HOXA10, HOXA9, RUNX1T1, and ZBTB16, were up-regulated in immature T-ALL.The dysregulation of MEF2C has been previously shown in immature T-ALL (28)(29)(30)(31)(32)(33)(34).Colomer-Lahiguera et al. (29).reported that MEF2C dysregulation is associated with CDKN1B deletion and poor prednisolone response in T-ALL.We also found an association between prednisolone resistance and high MEF2C expression.La Starza et al. (35) showed its up-regulation in T-ALL cases with interstitial deletion of 5q.Nagel et al. (34) proposed distinct mechanisms for aberrant MEF2C gene expression, either by NKX2-5 signaling or by chromosomal deletion of 5q.They also showed that MEF2C inhibits BCL2-regulated apoptosis by inhibition of NR4A1/NUR77 (34).In addition, Kawashima-Goto et al. (30) reported that BCL2 inhibitors might be helpful for treating T-ALL with high expression levels of MEF2C.In our study, MEF2C gene expression emerged as a significant predictor of EFS, RFS, and OS.Although MEF2C overexpression is associated with chemoresistance and poor outcomes in AML (36), its prognostic relevance in T-ALL is not clear.It was recently reported by our group for the first time (37).
In our study, MN1, BAALC, and IGFBP7 were overexpressed in immature T-ALL.Up-regulation of these genes is believed to arise from T-cell progenitors retaining myeloid differentiation potential (25,38,39).Like previous studies, we found BAALC overexpression to be associated with the expression of CD34 and myeloid markers (25,38).Previous studies showed overexpression of these genes was associated with poor outcome and resistance to chemotherapy (38,40).We did not find any significant association between BAALC expression and prednisolone sensitivity.Interestingly, higher BAALC expression was associated with worse patient outcome.Increased BAALC expression is an important marker for chemoresistance and poor patient prognosis in myeloid and lymphoid malignancies.Baldus et al. (40) reported that low expression of ERG and BAALC predicts a favorable outcome in T-ALL; however, another report suggested no association of BAALC with the prognosis in a larger cohort of 232 adult patients with T-ALL (41).
HHEX (hematopoietically expressed homeobox transcription factor) plays a pivotal role in the development of various hematological malignancies, most notably T-ALL and AML (42).It has been shown to act as direct transcriptional target of LMO2 and concordantly expressed with LYL1 in human ETP-ALL (43).It has also been reported to be required for radio-resistance of leukemic stem cells.Its expression is known to be down-regulated by deacetylation treatment signifying the role of this therapy in T-ALL (43).In our study, HHEX overexpression was associated with worse OS, EFS, and RFS.This has never been reported in the literature before.
LYL1 codes for a transcriptional factor involved in leukemia progression (44,45).LYL1 interacts with LMO2 expressed in  (46).Therefore, LYL1 is an essential factor for LMO2-driven T-cell leukemia (47).We observed that LYL1 overexpression in T-ALL was associated with the unfavorable EFS and RFS.In myeloid malignancies, LYL1 has been shown to be associated with a lower remission rate, higher relapse rate, and poor patient survival (48).In addition, we also observed higher LMO2 expression to be associated with poor EFS and RFS.Contrary to this, previous studies have suggested LMO2 expression to be associated with a better prognosis in B-ALL and T-ALL (49,50).We also found overexpression of ZBTB16 (PLZF) in immature T-ALL, although not stressed in previous western studies, was a notable finding in a Chinese study (28,51).ZBTB16 (or promyelocytic leukemia zinc finger, PLZF) contains one BTB domain and nine zinc fingers.Its overexpression was shown in that study to result from ZBTB16::ABL1 translocation and occurred in different patients along with other mutations, including NOTCH1, ZEB2, PTEN, MYCN, and PIK3CD.In addition, laboratory studies in Jurkat cells and mice showed that ZBTB16::ABL1 to be a leukemogenic driver lesion that caused increased proliferation and a 4-fold heightened protein tyrosine kinase activity that was amenable to tyrosine kinase inhibitor (TKI) activity (28).Although we did not find this translocation in our patients, our finding is also significant because, along with LYN overexpression, ZBTB16 overexpression means that our patients of immature T-ALL may benefit from TKIs.
Apart from these known genes, we identified aberrantly expressed unreported genes such as RUNX1T1, RUNX2, PLD4, NT5E (CD73), HOPX, TP63, and HOXA11-AS.Furthermore, a role for RUNX2 in T-ALL has been suggested in a study by Nagel et al. (34), who, in order to uncover additional target genes, investigated in detail the aberrant expression of MEF2C mediated by complex deletion at 5q, del(5)(q14) in T-ALL cell line LOUCY.This could be an evidence that RUNX2 instead of RUNX1 could be involved in the manifestation of ETP-ALL that allows in vivo functional evaluation of putative oncogenes and preclinical drug testing.Further, analysis of cortical T-ALL yielded differentially expressed genes with CD1A, CD1C, CD4, CFTR, FAT3, NKX2-1, TLX1, TLX3, and RAG1 being reported earlier in various reports, while EREG, PAX6, and ZIC2 were identified to be up-regulated in the present study.Neumann et al. (52), in a study of adult ETP-ALL, showed that cadherins FAT1 (25%) and FAT3 (20%) were mutated, implicating alterations in cell adhesion and activation of the Wnt pathway.In another study, Neumann et al. (53) showed that FAT1 expression was correlated with a more mature leukemic immunophenotype in T-ALL, with 74% of patients with thymic T-ALL being FAT1 positive compared with 45% of patients with mature T-ALL and only 4% of patients with early T-ALL.Expression of FAT1 in our cortical T-ALL is in keeping with this finding.Like the previous study (53), we also observed a correlation between FAT1 expression and patient outcome.
Mature T-ALL is a rare subgroup immunophenotypically diagnosed by CD1a − and sCD3 + .Molecularly, TAL1 is a driver gene for late cortical T-ALL (1).We found TAL1 to be overexpressed in both mature and cortical T-ALL.Among the protein-coding genes, APC2, BCL3, CCR4, ST20, EML4, and NCOR2 were some of the key up-regulated genes.TAL1 underexpression was found to be associated with poor OS and EFS in our study.
Aberrant histone modifications are the hallmark of cancer and are associated with dysregulated expression of histone modifiers.
We also studied their expression to identify a set of histones modifying enzymes to be up-regulated or explicitly down-regulated in different subtypes of T-ALL.EZH2, a member of the polycomb repressor complex, was underexpressed in our immature T-ALL cases.This may be related to their mutations in immature T-ALL (54).Danis et al. (55) mechanistically linked EZH2 inactivation to stem-cell-associated transcriptional programs and increased growth/survival signaling, features that convey an adverse prognosis in patients.However, we did not find a correlation between EZH2 expression and outcome.Loss-of-function mutations and deletions in SETD2 have been shown to lead to chemotherapy tolerance and clonal survival by cell cycle arrest followed by apoptosis.Hence, the overexpression of SETD2 has been postulated to develop chemotherapy resistance in many cancers, including leukemias (54,(56)(57)(58).We found overexpression of SETD2 in T-ALL when compared with normal thymus.SETD2 in leukemic patients may develop chemotherapy resistance.This can be further investigated to see the involvement of histone methylation at the genomic level to correlate it with the transcriptional inferences.In pediatric cases, higher expression of HDAC7 and HDAC9 in ALL can be associated with poor prognosis.In our study, we observed overexpression of HDAC9 in our immature and cortical cases.CREBBP, EP300, ASH1L, ATM, PKN1, KDM2B, KDM4B, and DOT1L showed significant differential expression in mature T-ALL.EP300 and CREBBP have lysine acetyltransferase activity in transcription coactivation (59)(60)(61)(62).Targeted histone lysine acetylation of EP300 and CREBBP can influence chromatin conformation (60), and concomitant binding of EP300 and acetylation of H3K27 are hallmarks of promoter or enhancer activation (63).For the first time, we observed that the low expression of RAG1 is associated with poor OS, EFS, and RFS.DOT1-like (DOT1L) histone lysine methyltransferase methylates H3K79 and plays a significant role in embryogenesis and hematopoiesis.Its function is unknown in T-ALL, but its aberrant activation is associated with other acute leukemia (64,65).DOT1L catalytic activity depends on the monoubiquitination of lysine120 in histone H2B (H2BK120Ub), which provides crosstalk between histone posttranslational modifications (66).Recent studies suggested the role of DOT1L in H3K79 methylation and monoubiquitination of lysine (H2BK120Ub) that may pave the way for developing novel DOT1L-driven antileukemia therapies (67,68).DOT1L was overexpressed in our patients with mature T-ALL, and it may be worth investigating if they could be subjects for DOT1L-driven antileukemia therapy.
Apart from proteins, noncoding repertoire forms another layer of the regulatory paradigm in normal cell hemostasis.Using RNAseq, we tried to identify the differentially expressed noncoding RNAs, especially lncRNAs, which were well documented earlier for their role in cancers.Recent studies have revealed lncRNA's aberrant expression profile in T-ALL, leading to deregulated downstream signaling pathways (69).An in-depth analysis revealed 2,243 lncRNAs, with 223 showing differential expression.NOTCH1-regulated lncRNA, LUNAR1, was overexpressed in cortical and mature T-ALL (27).This may be related to a higher incidence of activating NOTCH11 mutations in these T-ALL subtypes (8,70).HOTTIP and MEF2C-AS1 were overexpressed in immature; LINC00202, LINC0648, and LINC00461 in cortical T-ALL and MALAT1 in mature T-ALL.We also observed higher LINC00202 expression to be associated with poor OS, while lower expression of LINC00461 was significantly associated with adverse patient outcomes.Both of these associations have not been reported previously.XIST was associated with worse EFS and RFS.HOTTIP has been reported to be aberrantly activated in AML.It promotes hematopoietic stem-cell renewal leading to AML-like disease in mice  (71).This may explain its overexpression in immature T-ALL, which has myeloid potential in our study.We found the overexpression of ST20 (Suppressor of Tumorigenicity 20) in T-ALL cases with better OS, EFS, and RFS.This has never been reported before.
MALAT1 (metastasis-associated lung adenocarcinoma transcript 1) is known to be involved in a plethora of biological processes ranging from alternative splicing, nuclear organization, and epigenetic regulation of gene expression.It is also associated with various pathological complications such as breast cancer, lung adenocarcinomas, hepatocellular carcinomas, bladder cancers, and diabetes (72)(73)(74).The up-regulated level of MALAT1 is often used as a prognostic marker for various cancer types (75).At a molecular level, MALAT1 plays an important role in modulating several signaling pathways like MAPK/ERK, PI3K/AKT, WNT, and NF-kB, leading to a modification of proliferation, cell death, cell cycle, migration, invasion, immunity, angiogenesis, and tumorigenicity.We also report the association of MALAT1 with the adverse OS in T-ALL.The exact mechanism of how MALAT1 helps in cancer development and progression is unknown.MALAT1 can be a therapeutic target, potential diagnostic, and prognostic biomarker for cancers (73,76,77).
Our PCA results identified three separate clusters.The clustering seems to somewhat reflect the immunophenotypic characteristics of the leukemia samples, with cluster 1 being associated with the immature immunophenotype, cluster 2 being exclusive to cortical T-ALL, and cluster 3, being more heterogeneous, included a mix of immature, mature, and cortical T-ALL cases.The presence of outliers, the two T-ALL samples, that did not fall into any cluster, adds complexity to the analysis.These outliers may represent unique molecular or genetic profiles that differ from the main clusters identified.Overall, our results provide valuable insights into the heterogeneity of gene expression patterns in T-ALL.Our results also indicate that the immunophenotyping of T-ALL, based on currently available immunomarkers, does not fully capture the molecular phenotype of the leukemic cells.This may prompt further research to identify and validate novel biomarkers or integrate multiple data types (genomic, proteomic, etc.) to enhance the precision of T-ALL diagnosis and potentially guide more targeted therapeutic approaches.
We also investigated for fusion transcripts in our patients and found many known and novel fusion transcripts.We found TCR gene to be fused with known oncogenes like NKX2-1, CCND3, and TAL1 (28,78).STIL::TAL1 was identified in four cases but it was not specific for any particular T-ALL subtype.We also found a previously reported MIR181A1HG::HOXA11-AS fusion in a case of immature T-ALL (78).MIR181A1HG gene is located on chromosome 1q32.This region has also been reported to be rearranged with MYC gene in a case of T-ALL (78).Interestingly, we found SEPTIN6::ABL2 fusion in a cortical T-ALL case.This has been recently described in a T-ALL case at diagnosis and relapse (79).This fusion was shown to have oncogenic potential and responded to TKI highlighting the fact that this fusion oncoprotein can be used as therapeutic target in T-ALL (79).We found CRLF2::IGH fusion in a case with cortical immunophenotype.Although CRLF2 overexpression has been described in T-ALL, CRLF2::IGH has not been reported (80).In contrast to our finding, CRLF2 overexpression has been reported to be associated with immature-like immunophenotype (80).TPM4::KLF2, RB1::RCBTB2, and NCOR2::BCL7A fusions have been reported in B-ALL (81)(82)(83).The novel fusion transcripts found in our study were CEP128::JAK2, CDK6::WDR74, ARID4B::ABL2, NBPF26:: NOTCH2, RUNX1::SLC44A3, and CD2AP::IL7.These novel fusions highlight the power of RNA sequencing in identifying fusion transcripts even in cases with normal karyotype (78).The oncogenic potential of these fusions needs to be validated in future studies.
Our results suggest that certain RNA signatures may have prognostic value, potentially aiding in risk stratification and personalized treatment approaches for patients with T-ALL.The study also addresses the gap in knowledge regarding the prognostic relevance of lncRNAs and histone modifiers in T-ALL.By shedding light on these unexplored aspects, our findings contribute to a more comprehensive understanding of the molecular landscape of T-ALL, paving the way for future research and potential clinical applications.Our findings also provide insights into the heterogeneity of fusion transcripts in T-ALL, including their distribution across different subtypes and the presence of multiple fusion transcripts in some cases.The data suggest a complex landscape of genetic alterations in T-ALL, which could have implications for understanding the disease and developing targeted therapies.
While our study provides valuable insights, it is important to acknowledge its limitations.The relatively modest sample size and the need for further validation in larger cohorts are recognized.Additionally, the dynamic nature of RNA expression patterns in leukemia necessitates longitudinal studies to elucidate the temporal evolution of these molecular signatures during disease progression and treatment response.Furthermore, the oncogenicity of the novel fusion transcripts identified in our study needs to be validated by in-vitro studies.The patient-derived animal models should be used to investigate the leukemogenesis and drug response.
In conclusion, our study elucidates the profile of protein-coding genes, epigenetic modifiers and lncRNA expression in T-ALL, revealing potential clinical implications.These findings not only advance our understanding of the molecular basis of T-ALL but also open avenues for the development of targeted therapies and improved risk stratification in the clinical management of this challenging disease.Future investigations building upon these results may uncover additional layers of complexity in T-ALL biology, ultimately guiding the development of more effective and personalized treatment strategies.

Patients
A total of 134 T-ALL cases diagnosed by morphology, cytochemistry, and immunophenotyping were enrolled in this study.The patients with T-ALL were divided into two cohorts: discovery (n = 35) and validation (n = 99) (Fig. 1).All patients or legal guardians gave their informed consent for blood/bone marrow collection and biological analyses in accordance with the Declaration of Helsinki.The All India Institute of Medical Sciences, New Delhi Institutional Ethical Committee approved the study.The transcriptome data from pooled RNA of 5 normal human thymus samples were made available from European Genome-phenome Archive (EGA, http://www.ebi.ac.uk/ega/, kind courtesy of Dr Jan Cools, Belgium).The data were used as a control in the required analysis, like in PCA.These thymus samples were de-identified prior to use in our study.

RNA sequencing and analysis
RNA was isolated from patient samples by the TRIzol method (Thermo Fisher Scientific, MA, USA).Paired-end whole transcriptome sequencing was performed on the Illumina HiSeq2000 platform using the Truseq RNA sample preparation kit (Illumina, San Diego, CA, USA).Sequence reads were processed to identify the expression profile of protein-coding and noncoding RNAs by supervised and unsupervised approaches (Supplementary Methods).To investigate the role of histone modifiers in T-ALL development, the transcript abundance of epigenetic modifiers was measured across mature, cortical, and immature subtypes of T-ALL (Supplementary Methods).To know the novel lncRNA transcript, a computational pipeline combining open reading frame prediction coupled with the coding potential calculator (CPC algorithm) to annotate the protein-coding potential of transcripts was used (Supplementary Methods).The examination of fusion transcripts across all samples was conducted using the freely accessible online tool FusionCatcher (https://doi.org/10.1101/011650;Supplemental Methods).The novel fusions were validated by RT-PCR followed by Sanger sequencing.

Determination of the expression of RNA targets by real-time PCR
Total RNA was extracted from blast enriched mononuclear cells isolated from bone marrow samples collected at the time of diagnosis using the TRIzol method (Thermo Fisher Scientific).The concentration and quality of RNA were determined with spectrophotometer.RNA was reverse transcribed to cDNA using random hexamers, RNase inhibitor, dNTPs, and M-MuLV reverse transcriptase enzyme (Fermentas, USA).The expression levels of the targets were measured by real-time PCR (CFX96, Bio-Rad, Hercules, CA, USA) using TaqMan probe PCR master mix (Bio-Rad).The primers and probes used are given in Supplementary Excel file.In all cases, the samples were run in triplicates.The Ct values were normalized with housekeeping genes.The relative expressions were calculated using the comparative cycle threshold method.

Statistical analysis
Fisher's exact test for categorical data and the nonparametric Mann-Whitney U test for continuous variables were used to compare baseline clinical variables across groups in the validation cohort.Kruskal-Wallis test was used to determine the association between gene expression and immunophenotype.A P-value ≤0.05 (two-sided) was considered significant.EFS was defined as the time from diagnosis to the date of the last follow-up in complete remission or the first event (i.e.induction failure, relapse, secondary neoplasm, or death from any cause).OS was defined as the time from diagnosis to death or the last follow-up.Patients lost to follow-up were censored at the last contact.RFS is the time from the remission date to the last follow-up, relapse date, or death from any cause.The last follow-up was carried out on 2022 May 15.The Kaplan-Meier (KM) survival analysis was performed to estimate survival rates, with the differences compared using a two-sided log-rank test.Cox proportional hazard models were constructed as univariate and multivariate analyses for association with EFS, RFS, and OS.Covariates included in the full model of OS, EFS, and RFS were gender, WBC (<50 × 10 9 /L, ≥ 50 × 10 9 /L), age, gene expression, immunophenotype, response to prednisolone treatment, and presence of MRD after the end of induction chemotherapy.Patients with high and low expressions were delineated using KM plotter tool (84) for each target (BAALC, HHEX, MEF2C, FAT1, LYL1, LMO2, LYN, TAL1, DOT1L, XIST, PCAT18, PCAT14, LINC00202, LINC00461, LINC00648, MEF2C-AS1, ST20, RAG1, EP300, EML4, EZH2, MALAT1, and KDM6A).All analyses were performed using the SPSS statistical software package, version 20.0/STATA software, version 11.

Fig. 2 .
Fig. 2. Radar plot displaying fold changes of selected differentially expressed protein-coding genes in T-ALL subtypes-immature, cortical, and mature.The three subgroups are represented in different colors.Each circle represents the log 2 -fold change of expression for the differentially expressed genes.

Fig. 3 .
Fig. 3. PCA of 35 T-ALL samples and 1 normal thymus sample based on the normalized FPKM count.The samples were clustered into three major clusters-I, II, and III.The detail of the immunophenotype of the samples is given in the table.Two T-ALL samples and one normal thymus did not group with any of the clusters.The different colored dots represent samples from different patients.

Fig. 4 .
Fig. 4. Circos plot depicting the fusion transcripts found in T-ALL cases in the discovery cohort.

Table 1 .
Fusion transcripts observed in T-ALL samples in the discovery cohort by RNA sequencing.
ETP-ALL.It regulates the stem cells' signature of thymocytes, and generates T-cell leukemia

Table 2 .
Univariate analysis for prognostic association of RNA targets and other covariates in patients with T-ALL in the validation cohort.
All values in bold indicates the statistical significant parameters.