Exitrons: offering new roles to retained introns—the novel regulators of protein diversity and utility

Abstract Exitrons are exonic introns. This subclass of intron retention alternative splicing does not contain a Pre-Terminating stop Codon. Therefore, when retained, they are always a part of a protein. Intron retention is a frequent phenomenon predominantly found in plants, which results in either the degradation of the transcripts or can serve as a stable intermediate to be processed upon induction by specific signals or the cell status. Interestingly, exitrons have coding ability and may confer additional attributes to the proteins that retain them. Therefore, exitron-containing and exitron-spliced isoforms will be a driving force for creating protein diversity in the proteome of an organism. This review establishes a basic understanding of exitron, discussing its genesis, key features, identification methods and functions. We also try to depict its other potential roles. The present review also aims to provide a fundamental background to those who found such exitronic sequences in their gene(s) and to speculate the future course of studies.


Introduction
In most eukaryotic genes, the intronic sequences interrupt the reading frame of the protein-coding sequence, that is, exons.During splicing events, the introns are removed, and consequently, exons join together to form a mature mRNA transcript.However, alternate splicing (AS) events can create multiple transcripts and proteins from a single gene by regulating the splicing events involving exons and introns (Staiger and Brown 2013).A single gene AS event can produce diverse and dynamic products whose tissue expression and abundance vary depending on developmental stages and environmental cues (Kalsotra and Cooper 2011;Staiger and Brown 2013).AS events are known to be present in about 61 % of multi-exonic genes of Arabidopsis (Marquez et al. 2012) and at least 90 % of mammalian genes (Wang et al. 2008), creating multiple protein isoforms with different biological functions.Major AS events include exon skipping (ES), alternative 5ʹ splice sites (A5SS), alternative 3ʹ splice sites (A3SS), mutually exclusive exons (MXEs) and intron retention (IR) (Ner-Gaon et al. 2004;Marquez et al. 2012;Braunschweig et al. 2013Braunschweig et al. , 2014;;Reddy et al. 2013).These AS processes, widespread in plants and humans, produce two or more mature mRNA from the same precursor-mRNA (pre-mRNA), substantial contributing to the protein diversity (Syed et al. 2012;Reddy et al. 2013) (Fig. 1).ES, in which single or multiple exons are spliced out or retained along with the flanking introns, is the most common form of AS in metazoans; but is uncommon in plants (Kim et al. 2007).Alternative 5ʹ/3ʹ donor/acceptor sites in which two or more splice sites at one end of an exon are present alter the boundary of exons.It depends on the feasibility of using different splice sites at exon's 5ʹ and 3ʹ ends, resulting in longer or shorter exons from the same transcript (Sugnet et al. 2004;Syed et al. 2012).Another category of AS is mutually exclusive exons (MXEs), in which only one of the two exons is retained in mature mRNA while the other exon is always spliced out (Lam et al. 2021).It has been shown that MXEs are found in transmembrane transporters and are involved in ion channel activity (Hatje and Kollmar. 2013;Hatje et al. 2017).Another type of AS is IR which is considered to be the most common form of splicing in plants and occupies 28-64 % of the total AS events, depending on the growth conditions and tissue types (Ner-Gaon et al. 2004;Filichkin et al. 2010;Kalyna et al.2012;Marquez et al. 2012;Mandadi and Scholthof 2015).IR is an AS type where introns are retained in the mature RNAs rather than getting spliced out.The IR events were initially thought to be associated with the down-regulation of gene expression through Non-sense Mediated Decay (NMD) (Ge and Porse 2014), where the premature stop codon-containing transcripts are targeted for degradation.Further IR event studies in plants show they play a significant role in regulating growth, development, physiological and stress responses (Kalyna et al. 2012;Syed et al. 2012;Drechsel et al. 2013;Filichkin et al. 2015).Originally described in plants and viruses, IR is now observed as a major form of AS in mammalian systems also (Hammarskjold 1997;Ner-Gaon et al. 2004;Braunschweig et al. 2014;Rekosh and Hammarskjold, 2018).The wide range of studies on IR that have been carried out in the mammalian system showed the importance of regulated IR mechanisms in cell development, differentiation and responses to cellular stress (Edwards et al. 2016;Memon et al. 2016;Pimentel et al. 2016;Ullrich and Guigo, 2020).
A subfamily of IR events where an intron constituted the internal regions of the annotated protein, they were initially reported as cryptic introns in Arabidopsis thaliana (Marquez et al. 2012).Since it is present in the internal region of protein-coding exons, the retained intron did not carry any stop codons but had core splice signals, 5ʹ and 3ʹ splice sites and branch points.Based on their dual nature (exonic and intronic), this specialized exonic-intron was named exitrons (Marquez et al. 2015).
Exitrons and IR possess apparent distinguishable features, leading to different fates for their transcripts.The IR transcripts are known to be retained in the nucleus as an incompletely processed transcript with forestalled translation (Boothby et al. 2013;Shalgi et al. 2014), whereas the exitroncontaining transcripts are associated with polyribosome fraction and, hence, are translated.IR transcripts harbour a premature termination codon (PTC) due to the retained intron (Braunschweig et al. 2014), while retained exitrons-containing transcripts have no PTCs as they are protein-coding sequences.The exitron-containing transcripts are more abundant than the IR transcript (Marquez et al. 2012).As exitrons find their way into the complex gene regulation and proteome plasticity world, our review focuses on their genesis, exitron retention and splicing mechanism and their diverse role in plant adaptation, development and immune responses.

Genesis of Exitrons: Unleashing the Mystery
Exitrons (previously called cryptic introns), having features of both introns as well as protein-coding exons, are flanked directly by exons.Therefore, exitrons have great possibilities of enhancing the protein diversity in Arabidopsis and humans (Marquez et al. 2015).The exitrons maintain their length in the multiple of three bases, thereby decreasing the chances of having a change in the reading frame.Fig. 2A shows a model depicting the generation of exitron, which regulates significant functions contributing to proteome diversity.One of the hypotheses, that is, 'splicing memory', has been proposed for the genesis of exitrons (Marquez et al. 2012(Marquez et al. , 2015)), where the loss of introns from the ancestral gene causes the unusual origin of exitrons in modern genes of both plants and humans.Upon intron loss and retroposition (insertion of reverse-transcribed spliced mRNA into a new genomic position), genes have impressions of former exon borders and 'remember' previous exonic information.If ancestral region underwent AS, vestigial exonic splicing regulatory components that are present at exon boundaries (Reed and Maniatis 1986) can provide position-dependent details to facilitate the evolution of core splicing signals and re-establishment of modern gene structure by exitron splicing (EIS).
A conserved exitron splicing event has been observed between humans and plants (Marquez et al. 2015).Interestingly, retained exitrons, an internal part of a protein-coding exon, generate a longer protein as a major isoform during transcript translation.In contrast, splicing of an exitron possibly results in three types of protein variants, that is, internally deleted protein isoforms, alters the carboxyl-terminal of protein or triggers nonsense-mediated RNA decay (NMD) by introducing a PTC from the splice junction (Fig. 2B).Hence, the transcripts with both retained exitron and spliced-exitron isoforms are exported to the cytoplasm and then translated, unlike the other IR transcripts, which are found to be mostly retained in the nucleus.Hence, they affect various post-translational modifications (PTM) regulating protein function by directly targeting transcripts and increasing protein diversity and integrity.However, detailed studies about ancestral AS events, conserved splicing regulatory elements, and other aspects of EIS evolution need to be thoroughly investigated.

Characteristic Features of Exitrons
Exitrons are highly conserved from Arabidopsis to human genomes.Exitrons have the following features: (i) High GC content: The exitron carries higher GC content closer to exonic GC than the conventional Intronic GC content.(ii) Weaker splice site signals: Exitrons possess weaker splice site signals than conventional introns.(iii) Absence of stop codons: In contrast to introns, exitrons do not contain stop codon(s) and thus lack premature termination of the translation event.(iv) Intron length in a multiple of three bases: Exitronic sequences are seen in a size corresponding to a multiple of three nucleotides (a rare occurrence in conventional introns) essential to maintain the ORF for a successful translation event.(v) Nuclear export: The transcripts carrying the exitron are usually transported from the nucleus to the cytoplasm for translation, while the transcripts with retained introns are incompletely processed and contained in the nucleus.(vi) Predominant transcript: The transcripts with retained introns (i.e.exitrons) are the significant isoforms compared to those with no exitron (Marquez et al. 2015).Exitrons (exonic introns) in the encoded protein can enrich disordered protein regions, short linear motifs and phosphorylation and ubiquitination sites, thus impacting protein function.

Factors Affecting Exitron Splicing/Retention
Even though the factors involved in IR and AS are studied extensively, the precise mechanism of exitron retention and EIS needs to be better understood.5ʹ and 3ʹ splice site signals and the presence of branch points are critical factors for removing introns during splicing (Wang and Burge 2008).These signals represent only a piece of information to define the introns (Lim and Burge 2001).The conditions that limit spliceosome availability, namely, down-regulation of spliceosomal components and deficient splice site recognition, affect IR events (Wong et al. 2013).The presence and absence of regulatory splicing cis-elements, length of the exons and introns, GC content of exon and intron, distinct DNA methylation pattern, histone modifications, nucleosome positioning over exons and introns over exon/intron boundaries are some of the factors which contribute significantly to the recognition of splice site signals and changes in splice site resulting in AS (Braunschweig et al. 2013;Reddy et al. 2013).
The transcription speed of RNA pol II may be affected by the chromatin state that may, in turn, affect the AS events (Alexander et al. 2010;Ullah et al. 2018;Zhu et al. 2018).The evidence for the influence of chromatin environment in the splicing process by regulating the processivity of RNA pol II and recruitment of splicing factors was proven at different times (Nojima et al. 2018;Jabre et al. 2019;Yu et al. 2019;Kindgren et al. 2020;Li et al. 2020;Zhu et al. 2020).The transcriptome (RNA-seq) and nucleosome positioning (MNase-seq) data derived from a study in A. thaliana elucidated the nucleosome positioning mediated control of the cold-induced alternative splicing events (Jabre et al. 2021).They reported that exitrons exhibited distinct nucleosome positioning patterns compared to other alternatively spliced regions.A clear difference in the nucleosome positioning pattern of exitron and other retained introns was observed, indicating their distinct regulations.

Function of Exitrons
Exitron-retained splice variants generally manifest tissuespecific functions and play crucial roles in post-translational protein modification in plants and animals (Marquez et al. 2015).One of the earliest examples of an IR event that creates a novel protein isoform is the retention of intron-10 in mammals' NXF1 (nuclear RNA export factor 1) gene (Li et al. 2006).The small, nxf1, truncated protein acts as a cofactor for the nuclear export of long normal nxf1 protein, defining a novel self-gene regulation mechanism (Li et al. 2016).
Among the different auxin response factors reported, ARF6 and ARF8 play crucial roles in developing various floral organs (Nagpal et al. 2005).The novel splice variant of ARF8.2, that is, ARF8.4,encompassed a retained exitron (intron 8), which, upon translation, gets imported into the nucleus and regulates a developmental process that controls the growth of stamen filament and anther opening at early growth stages by activating the MYB26 gene (Ghelli et al. 2018) in Arabidopsis.They also showed that ARF8.4 binds to specific sequences related to auxin in AUX/IAA19 and MYB26 promoters and further activates their transcription with enhanced efficiency than ARF8.2 (Fig. 3A).Another exitron-containing gene FLAGELLIN-SENSING 2 (FLS2), which codes for leucine-rich repeat receptor-like protein in plants, can sense bacterial flagellin and trigger a series of immune responses in dicot plants (Cheng et al. 2020).The 5ʹ splice site region of FLS2 genes is highly conserved in dicots, and the exitron proximal to the 5ʹ end had a stimulatory role in gene expression through the intron-mediated enhancer mechanism.A protein product of the alternate spliced FLS2-1 exitron, NbFLS2-1-AT1 acts as a suppressor of ROS production induced by flagellin 22 (flg22), a potential elicitor of plant immune response suggests that one of the exitron plays a negative role in regulating the FLS2 pathway.Another study reported exitron-mediated enzyme localization in the MBD4L gene, which encodes DNA glycosylase.This enzyme is known to be involved in DNA repair mechanisms.Methyl-CpG-binding domain protein 4-like (MBD4L) protein is a DNA glycosylase in Arabidopsis which excises in vitro U, U-halogenated derivatives and T mis-paired to G, with preference for CpG (Ramiro-Merina et al. 2013).Initial studies showed that AtMBD4L had three alternate transcripts named At3g07930.1,At3g07930.2and At3g07930.3(Nota et al. 2015).In an attempt to amplify and clone At3g07930.3(MBD4L.3),an additional smaller-sized fragment At3g07930.4(MBD4L.4) was obtained by splicing a previously unidentified intron.Both predicted proteins contained conserved C-terminal DNA glycosylase domain (115 last amino acids) and at their N-terminus, MBD4L.3 includes two nuclear localization signals (NLS) while MBD4L.4 had one NLS (Nota et al. 2015).Further studies in MBD4L.3 and MBD4L.4 showed that the two isoforms had distinct localization patterns (Cecchini et al. 2022).MBD4L.3 is localized in the nucleoplasm, and MBD4L.4 is in the nucleolus.Interestingly, there was an increase in the nucleolar variant MBD4L.4 under heat stress (Cecchini et al. 2022), showing the functional role of exitrons in trafficking under abiotic stress (Fig. 3A and Table 1).
Exitrons, whose spliced transcripts are linked to disease pathogenesis, are also reported in humans (Sibley et al. 2016).The consequences of exitron splicing include indels causing in-frame changes of proteomic sequences and can introduce highly immunogenic neoantigens promoting anti-tumour immune responses (Wang et al. 2021).Various studies found roles of exitrons in breast cancer (Wang et al. 2021), prostate cancer, gastric cancer (Zhang et al. 2022) and other cancer types (Fig. 3B).Wang and Yang (2021), using a bioinformatic tool named ScanExitron, identified exitron splicing in 33 cancer types across 9599 tumour transcriptome data.They observed that exitron splicing affected 63 % of the coding genes of humans, and 95 % of these exitronic events were tumour specific.The exitron splicing event changed the fate of novel and known cancer genes, leading to loss or gain of function mutation, which enhances tumour progression.They identified exitron splicing-derived neo-epitopes, which can bind to MHC Class I or II and potentially be targeted for immunotherapy.An integrated protocol for the identification of exitron and exitron-derived neo-antigen using RNA-seq data using bioinformatic tools like ScanExitron and Scan Neo was developed by Wang and Yang (2021).Splicing of exitrons was observed in genes like TAF15, FUS and EWSR1, which may be involved in promoting cancer progression (Zhang et al. 2022).An exitron splicing event in the exon 2 of forkhead box protein O4 (FOXO4), implicated in regulating cell growth and cellular differentiation, upregulated significantly in tumour cells compared to normal human cells (Wang and Yang 2021).

Methodologies for Detection
Marquez used TopHat (http://ccb.jhu.edu/software/tophat/index.shtml)alignment mapping inside the annotated protein-coding exons that led to the identification of exitrons in an intensive study of IR events in Arabidopsis and humans.Only those splice junctions with three reads and no mismatch in the alignment were selected to define the exitron.In this case, the exitrons carry weaker splice signals than other introns, for example, constitutive introns, alternative introns and retained introns (Marquez et al., 2015).Marquez et al. evaluated the strength of splice sites of exitrons and introns of A. thaliana using position weight matrices defined by Sheth et al. (2006).The presence of exitron in ARF8 and MBD4L genes was identified in an attempt to amplify and clone their respective reported variants (Ghelli et al. 2018;Cecchini et al. 2022).When cDNAs of different plant species were used in RT-PCR amplification of FLS2 genes, additional bands, which arose from splicing of exitron, were observed in agarose gel.RT-PCR-seq was also further used to validate the presence of exitron in FLS2 genes across dicots (Cheng et al. 2020).A bioinformatics tool, ScanExitron, was also developed by Wang and Yang (2021) for the detection and annotation of exitrons from RNA-seq data of humans.ScanExitron is a machine learning algorithm-based tool that can identify a potential exitron by analysing the transcript sequences and exon-intron boundaries, which can be further analysed for biological and functional implications.While ScanExitron analysed Illumina sequencing platform-derived short-read RNA sequence data, ScanExitronLR (Fry et al. 2022) utilizes long-read RNA sequences, which promises to address falsepositive sequencing error often found in short-read sequences derived from repetitive regions.This tool makes use of specific annotated transcripts, that is, expectation maximization algorithm provided by LIQA (Hu et al. 2021) to overcome the higher sequencing errors of long reads.Outputs of ScanexitronLR can be applied to subsequent investigations of differential exitron splicing as well as exitron annotations including frameshift type, nonsense-mediated decay features and Pfam domain interruptions.Different methods adapted for identifying and validating exitron are mentioned in Table 1.

Conclusions
Exitrons have only recently gained attention as an IR subclass where introns with exonic features are seen within a protein-coding   conventional exon.However, with rapid advancement in RNA sequencing technologies and continued expansion of proteome and protein-protein interaction datasets, such investigation will likely lead to the discovery of many exitron-containing genes in the near future.Splicing or retaining exitrons through an AS mechanism can increase the protein diversity, thus giving diverse phenotypic changes.In plants, exitrons play key roles in their development and stress and defense response, while in humans, they are primarily associated with tumour progression.The exitron present in a gene can act as a potential site for PTM and thus have a regulatory role.Exitron can also provide suitable amino acid residues that can be a potential site for additional disulphide bonds, thus enhancing the stability of proteins.Additional domains and signal peptides within the exitron could also increase the protein diversity (Fig. 4).However, how an intron evolved into an exitron with a protein-coding feature remains a mystery.Exitrons and IR have observable distinguishing characteristics that determine a different fate for their transcripts, but a clear demarcation between the mechanisms of both events still needs to be known.Exitron splicing is also seen in annotated intronless genes, suggesting a potential source of a novel splicing mechanism.However, this emerging gene regulatory mechanism and its role in proteome complexity and phenotypic diversity need to be extensively studied in the immediate future.In this context, the integration of RNA sequencing datasets at the tissue or single cell level with information on RNA binding proteins could usher the regulation of exitron retention and splicing.

Source of funding
The present work is financially supported by the Department of Biotechnology, Ministry of Sceince and Technology, Government of India (BT/PR42008/AGIII/103/1285/2021).
The research grant was awarded to SKS.

Figure 1 .
Figure 1.Different types of alternative splicing events.

Figure 2 .
Figure 2. Schematic representation of genesis of exitron and exitron splicing events.(A) Exitron (shown in red) in the modern gene evolved by the loss of introns in the ancestral gene as an integral part, having features of both protein-coding exons and introns.(B) Retained exitrons produce full-length protein isoform whereas the spliced-exitron results in protein variants, namely, internally deleted protein isoform, downstream frame-shift from the splice junction causing C-terminal change in protein or producing NMD-sensitive transcripts.

Figure 3 .
Figure 3. Representation of function of exitron and exitron splicing in plants and humans.Panel (A) depicts the role of exitron, and exitron and exitron splicing in plant stress response, trafficking, development and defense response.Panel (B) shows exitron acting as a source of neoantigens and their role in tumour progression.

Figure 4 .
Figure 4. Model depicting the potential role of exitron-containing proteins.The presence of domains, phosphorylation site(s) and localization signals in the exitron peptide can give additional functional diversity to the protein.Cysteine amino acid residue could be a potential source for additional disulphide bonds enhancing the protein stability.Exitron could give a perfect balance in protein turnover.