All exons are not created equal—exon vulnerability determines the effect of exonic mutations on splicing

Abstract It is now widely accepted that aberrant splicing of constitutive exons is often caused by mutations affecting cis-acting splicing regulatory elements (SREs), but there is a misconception that all exons have an equal dependency on SREs and thus a similar vulnerability to aberrant splicing. We demonstrate that some exons are more likely to be affected by exonic splicing mutations (ESMs) due to an inherent vulnerability, which is context dependent and influenced by the strength of exon definition. We have developed VulExMap, a tool which is based on empirical data that can designate whether a constitutive exon is vulnerable. Using VulExMap, we find that only 25% of all exons can be categorized as vulnerable, whereas two-thirds of 359 previously reported ESMs in 75 disease genes are located in vulnerable exons. Because VulExMap analysis is based on empirical data on splicing of exons in their endogenous context, it includes all features important in determining the vulnerability. We believe that VulExMap will be an important tool when assessing the effect of exonic mutations by pinpointing whether they are located in exons vulnerable to ESMs.


Introduction
With the increasing use of genome-wide sequencing, detection of variants is now widely implemented in routine diagnostics ( 1 ).Whereas interpretation of the effect of classical missense and nonsense mutations that directly affect the amino acid sequence of a protein appears straightforward, identification and characterization of exonic mutations that alter the splicing code ( 2 ,3 ) is still challenging.
Correct assessment of the impact that a mutation may have on the splicing code is vital for correct classification of variants, and numerous in silico tools have been developed to help in predicting this ( 4-7 ).These tools have been widely implemented diagnostically to enable clinicians to correctly call the pathogenicity of identified sequence variants and make more accurate decisions ( 8 ).Mutations located in the splice site regions at the terminal parts of exons are now often correctly recognized as splice altering rather than amino acid altering ( 9 ), but variants located outside of the canonical splice sites are often misclassified when based solely on their effect on the amino acid code ( 10 ).Several tools have been developed for analysis of a mutation's effect on the splicing regulatory elements (SREs) ( 11 ).These tools estimate the impact of a mutation, either through an analysis of known SRE sequences or, more recently, by incorporating machine learning to account for the large variety of features that may be impacted by the mutation ( 5 , 12 , 13 ).There are many challenges to accurately predict splicing mutations using such in silico tools, and often the predictions do not accurately or consistently predict the effect of a mutation so that this mimics the observed in vivo effect on splicing ( 14 ).One of the primary shortcomings of the current in silico tools is the lack of consideration of genomic context-dependent effects and the complex interplay between genomic context, splice site strengths and SREs, which is influenced by numerous parameters.Although empirical data from patient cells naturally include all information and therefore are superior to models, in vivo validation of splicing is often limited by the availability of patient tissue samples or requires generation of a suitable splicing reporter able to mimic the endogenous context accurately.
Importantly, it is usually assumed that all exons have an equal dependency on SREs and, consequently, that sequence variants that alter SREs are equally likely to cause aberrant splicing in all exons.Despite this, studies employing model exons demonstrate that some exons can be skipped by different mutations located across the entire exon, while others are only affected by mutations located in close proximity to the splice sites ( 15 ).Further, it has recently been reported that up to 77% of exonic mutations in MLH1 exon 10 and 60% of exonic mutations in BRCA2 exon 3 affect splicing ( 9 ,10 ), suggesting that the majority of exonic mutations affect splicing and that this is a very frequent disease mechanism.In sharp contrast to this, another study reported that only 10% of 4964 exonic disease-associated mutations altered splicing ( 16 ).It is at present unknown what causes these large differences in the reported proportion of exonic mutations that affect splicing of an exon and whether 10% or 70% should be expected when these findings are extrapolated to other exons.We have previously demonstrated that A C ADM exon 5 is particularly vulnerable to splicing mutations and that identical mutations in SREs have different effects dependent on the exon where they reside, as well as the genomic context ( 17 ).We hypothesize that such different vulnerability of exons may explain the widely different proportions of exonic splicing mutations (ESMs) observed in the studies above.Based on our observations from A C ADM , and other genes harboring particularly vulnerable exons, we speculated that this vulnerability can be revealed as a low degree of exon skipping from normal cells.Therefore, we developed VulExMap that designates whether an exon is vulnerable, based on junction counts from Snaptron GTEx data ( 18 ).We used VulExMap to show that, on a transcriptome-wide basis, vulnerability to ESMs differs between constitutive exons.Importantly, we collected 359 previously reported ESMs from 75 genes and demonstrate that ESMs are mainly located in the small proportion of exons predicted to be vulnerable by VulExMap.

Discovery of vulnerable exons in RNA sequencing data-VulExMap
The VulExMap tool is divided into tabs, each of which is a layer deeper into the process of analyzing the data.The first tab is for uploading data [for now it is only possible to upload a BED file containing single nucleotide polymorphisms (SNPs); all other files are pre-loaded files used when analyzing the data for this paper].The second tab displays all possible genes and allows you to choose one to plot, and the third is a graph of the gene showing all the exons and color coding them according to vulnerability.To identify vulnerable exons, we downloaded junction counts from the Snaptron server ( 18 ) belonging to the GTEx samples ( 19 ), and estimated inclusion levels using flattened gene models of the hg38 NCBI refGene annotation downloaded from UCSC ( 20 ).SAJR ( 21 ) was used to generate the gene models and, using its splice site annotation of gene segments, we computed inclusion estimates as outlined in Supplementary Figure S1 .To ensure accurate results, we excluded samples with insufficient coverage by considering only those with ≥30 combined inclusion and exclusion junction counts for each segment in the final inclusion estimates.Additionally, we limited downstream analyses to cassette exons with a mean junction read count of at least 50.We then calculated the coefficient of variation (CV) of the percent spliced in (PSI) for each cassette exon as the standard error of the mean (SEM) divided by the mean of the PSI across the samples.Next, we divided exons into bins based on the mean PSI so that we obtained bins of cassette exons with PSI in the ranges We then defined thresholds of minimum and maximum CV for vulnerable cassette exons using the median of the CV of cassette exons in the [85-90]PSI bin as the maximum and the median of the CV in the [99-99.5]PSIbin as the minimum.We then defined resilient exons to be exons that were robustly included in all sequenced samples, i.e.PSI > 99% and CV < minimum CV threshold, while exons that had a higher deviation of inclusion between samples resulting in a CV value between the two CV thresholds and a PSI > 85% were defined as vulnerable.All other cassette exons were categorized as alternative, unless their PSI was exactly 0, in which case we classed them as not spliced in.Exons with junction counts below the threshold were classified as NA.For the online webservice, we applied these classification criteria to all segments, including segments that are not cassette exons, but in the analyses herein we exclusively considered cassette exons as defined in the ref-Gene annotation.The online VulExMap webtool is available at https://vulexmap.compbio.sdu.dk .

Sequence analysis of resilient and vulnerable exons
The vulnerable, resilient and alternative segments for each dataset were compared by characteristics associated with exon definition.We used putative exonic splicing enhancer (PESE) and putative exonic splicing silencer (PESS) motif databases ( 22 ) to identify general ESE and ESS motif density.We used MaxEntScan ( 23 ) to obtain maximum entropy scores of the donor and acceptor sites as a measure of splice site strength.Up-and downstream splice sites were obtained using the immediate up-and downstream exons annotated in the hg38 refGene annotation table from UCSC ( 20 ).K-mer analysis was performed with jellyfish 2.2.6 ( 24 ).Multiple transcripts in some genes were resolved by using the transcript with the lowest identification number.

ESM database
To collect all previously reported ESMs, PubMed was searched using the search terms 'exonic splicing mutation' and 'exon skipping minigene assay', as well as in-house knowledge of ESM reports.All reported ESMs were then manually curated in ENSEMBL to the canonical refseq transcript.Mutations located in the first or last three bases of an exon were excluded from the database, as these were presumed to disrupt the splice site sequences.Furthermore, mutations that were reported to create a new splice site were also excluded.Finally, mutations in annotated alternative exons were also excluded from the database.All mutations in Supplementary Table S1 were, if necessary, reassigned correct HGVS nomenclature, and position on hg38 is included.The database was last updated on December 1, 2023.

Transfection of cells
HeLa cells were cultured in RPMI 1640 medium (Lonza, Copenhagen, Denmark) supplemented with 10% (v / v) fetal calf serum, 0.29 mg / ml glutamine, 100 U / ml penicillin and 0.1 mg / ml streptomycin at 37 • C in 5% (v / v) carbon dioxide.The cells were grown to ∼75% confluence in 12-well plates and transfected with 0.2 μg of minigene plasmid.Transfections were carried out in two biological replicates with duplicates.

RNA extraction and cDNA synthesis
After 48 h, the cells were lysed using 0.5 ml of Trizol® reagent (Fisher) per well.Total RNA was extracted using chloroform and precipitated with isopropanol.Complementary DNA (cDNA) was synthesized from 0.5 μg of RNA using the High capacity cDNA kit (Invitrogen).

PCR analysis of splicing
To investigate the mutations effect on splicing, plasmidspecific primers were used.In pSPL3 we used the primers SD6, 5 TCTGA GTCA CCTGGA CAA CC and SA2, 5 A TCTCAGTGGT A TTTGTGAGC.In the ACADM_ pcDNA minigenes, the primers were located in exon 4 forward primer, 5 CCTGGAACTTGGTTT AA TG; and exon 6-pcDNA reverse primer, 5 A GA CTCGA GTTA CTAATTAATTA CA CATC.The cDNA was amplified using TEMPase HOT Start DNA polymerase (Ampliqon).The polymerase chain reaction (PCR) program consisted of 15 min at 95 • C, followed by 32 cycles of 30 s at 95 • C, 30 s at 55 • C and 30 s at 72 • C, and then by 5 min at 72 • C.After PCR, the samples were visualized by agarose gel electrophoresis using SeaKem LE agarose (Lonza) and Gelred (Biotium).Exon skipping was quantified using the DNF-910 dsDNA analysis kit (35-1500 bp) on a Fragment analyzer (Advanced Analytical).

Vulnerable exons can be detected empirically with VulExMap
We previously defined A C ADM exon 5 as a vulnerable exon, because disruption of any of several SREs by diseaseassociated exonic point mutations can affect exon inclusion ( 17 ).Interestingly, it has previously been observed that A C ADM exon 5 is skipped in a small proportion of cDNA from patient and control cells ( 27 ,28 ).We noted that low levels of background skipping have also been reported in other genes, especially in exons with a high occurrence of ESMs (29)(30)(31)(32).We therefore hypothesized that low levels of skipping could be a general indicator of exon vulnerability to ESMs.To test this hypothesis, we developed VulExMap.This tool can identify vulnerable exons based on analysis of RNA-seq junction count data, allowing the creation of a map of vulnerable exons on a genomic scale, to help researchers evaluate if mutations that affect SREs are likely to cause exon skipping.In order to establish VulExMap, we used Snaptron GTEx junctions ( 18 ), because a large sample size is necessary in order to enable detection of the small, but significant levels of exon skipping that indicate exon vulnerability.Based on inclusion levels (PSI) and CV, VulExMap categorizes exons as either vulnerable, resilient or alternative (Figure 1 A).We decided to only include exon segments with a mean combined inclusion and exclusion junction count ≥ 50 and a minimum junction read count of 30 reads in VulExMap.This was done to be sure that samples with low expression of the segment are not included, thereby ensuring a correct CV.Additionally, we restricted the analysis to inclusion and exclusion reads from segments with sufficient coverage to ensure accurate PSI values for the given segment.If too few junctions are used to calculate the PSI for segments, those in genes with low expression would exhibit a lower PSI, even with very few exclusion reads.
To confirm that exons labeled as vulnerable are not tissuespecific alternative exons, we partitioned the PSI values for each segment across different tissues.Our analysis affirmed that the vast majority of exons with a PSI < 50% in one or two tissues were indeed classified as alternative ( Supplementary Figure S2 A), while more vulnerable exons had a PSI ≤ 85% in one or more tissues ( Supplementary Figure S2 B).Because we have set the lower PSI threshold for vulnerability at 85%, and use the median CV from 80% to 85%, VulExMap correctly classifies tissue-specific alternative exons as alternative rather than vulnerable.We developed an online web tool available at https://vulexmap.compbio.sdu.dk, where VulExMap can be used to analyze any gene of interest to detect vulnerable exons ( Supplementary Figure S3 ).
With these threshold settings, we find that on a global scale, 25% (44371) of exons with junction data in Snaptron ( 18) are vulnerable to ESMs (Figure 1 B).When we used VulExMap to analyze A C ADM (Figure 1 C) this supported that A C ADM exon 5 is in fact vulnerable.Interestingly, VulExMap also categorizes A C ADM exon 2 as vulnerable, and a silent c.85C > A mutation in this exon has been reported to cause disease (33)(34)(35)(36).To investigate this, we therefore cloned A C ADM exon 2 and flanking intronic sequences into a splicing reporter vector with either the wild-type (WT) sequence or the sequence containing mutations identified by newborn screening to cause medium-chain acyl-CoA dehydrogenase deficiency (MCADD) ( 33 ) (Figure 1 D).Consistent with the predicted vulnerability, we observed exon 2 skipping from several mutations in this exon, including the two c.85C > A and c.87A > G silent mutations.Analysis of lymphoblasts from a patient harboring the c.87A > G mutation as one of the disease alleles showed that this mutation does indeed cause exon 2 skipping also in the endogenous gene ( Supplementary Figure S4 ).As the 5 ss of exon 2 is weak (MaxEnt 1.75), we strengthened the score of the 5 ss (MaxEnt 10.86) in the splicing reporter to test if this would make exon 2 resilient and thereby less responsive to the mutations.When optimizing the 5 ss we observed complete exon inclusion from all mutations (Figure 1 E).
Interestingly, mutations altering the motif ATCGACA ( A C ADM cDNA position 83-89) also result in exon skipping in other genes, suggesting that this motif functions as an ESE.In the context of F9 exon 5, a c.484C > A mutation creates an identical TA GA C motif and causes complete exon skipping, but both c.484C > G and C > T have also been reported to partially skip F9 exon 5 ( 37 ).This pattern aligns with our observations from A C ADM exon 2, where the c.85C > A mutation induces more pronounced exon skipping than c.85C > T (Figure 1 D), suggesting that the C > A mutation's impact might stem from both disrupting an ESE and creating an ESS simultaneously.Supporting this notion, the c.840C > T mutation in SMN2 creates an identical TA GA CA hnRNPA1-binding ESS motif ( 26 ,38-40 ) to that generated by the C > A mutation in A C ADM exon 2 and F9 exon 5.This suggests that these mutations exert a dual effect, abrogating an ESE while establishing an ESS.
To validate the creation of the TA GA CA ESS, we performed RNA affinity pull-down using 15-mer biotinylated RNA oligonucleotides (Figure 1 F).As previously reported, the SMN1 CA GA CA ESE binds SRSF1, which is abolished in the SMN2 TA GA CA motif.Interestingly both SMN1 and SMN2 oligos bind hnRNPA1.Consistently, we observed similar effects on the A C ADM c.85C > A and F9 c.484C > A mutants, with the F9 mutant significantly increasing hnRNPA1 binding compared with the WT.
While VulExMap classification identifies A C ADM exon 2 and F9 exon 5 as vulnerable, SMN1 exon 7 is classified as resilient ( Supplementary Figure S5 ), underscoring the complex interplay of multiple conditions in determining exon resilience to splicing mutations.

Vulnerable exons share multiple characteristics with alternative exons
Because our previous experimental analysis of A C ADM ( 17 ) showed that vulnerability is determined by several factors, such as SREs, splice site strength and the genomic context, we compared exons which are classified as either vulnerable ( n = 44 371), resilient ( n = 110 212) or alternative ( n = 20 467).We first scored the SRE density in the three groups.We used the sample of 2069 PESEs and 974 PESSs from Zhang et al .( 41 ) (Figure 2 A, B).Interestingly, this showed that there are significantly fewer ESEs / bp in the vulnerable exons compared with the resilient exons ( P < 2.22e-16, Wilcoxon rank sum).Conversely, the vulnerable exons have a significantly higher ESS density than the resilient exons ( P < 2.22e-16).Analysis of GC content revealed that vulnerable exons also have a significantly lower GC content ( P < 0.0005) than resilient exons, which further supports the notion that exon definition is weaker for vulnerable exons (Figure 2 C), since GC content is directly linked with exon definition ( 42 ).The vulnerable exons were also shorter than the resilient exons (Figure 2 D) and were flanked by longer introns.
Analysis of splice site strength shows that the vulnerable exons have significantly weaker 3 ss and 5 ss than the resilient exons (Figure 2 F).This is very interesting, as it has previously been reported that exons with weak splice sites generally have a higher density of ESEs in order to achieve sufficiently strong exon definition ( 43 ), whereas we observe the opposite for the group of vulnerable exons, indicating that their vulnerability may in part be caused by a low density of ESEs, which is very close to being insufficient to compensate for the weaker splice sites.Furthermore, the proximal splice site of the exons that flank the vulnerable exons are stronger than the splice sites of the vulnerable exons, indicating that competition with the downstream and upstream splice sites is also important in defining vulnerability of an exon (Figure 2 G).
Because vulnerability of each individual exon is defined by several features each contributing with different weight to the overall vulnerability of that individual exon, there are exons in each group (vulnerable, resilient and alternative) with scores above or below the mean for a feature in another group.Therefore, we separated the different features into positive (PESE, GC content, exon length, splice site strength) and negative (PESS, flanking intron length, difference from flanking  exon splice site) characteristics associated with either exon inclusion or splicing repression.We then took the total mean for all exons in all groups and counted the density of exons in each group with a score above the mean of a certain feature.When grouping the density of the features, we observe that for the positive characteristics, the alternative exons had the highest density for few characteristics above the mean, whereas the resilient exons had the highest density for having more than two characteristics associated with exon inclusion (Figure 2 H).Conversely, the resilient exons had the highest density for having one or two characteristics associated with repression, whereas the vulnerable and alternative exons both had the highest density of negative features (Figure 2 I).This indicates that resilient exons are innately more strongly defined than vulnerable exons, but they may still contain few features normally associated with weak exons.Similarly vulnerable exons have fewer positive characteristics than resilient exons but are more strongly defined than alternative exons (Figure 2 J).

Exonic splicing mutations are over-represented in vulnerable exons
Based on our findings we speculated whether vulnerable exons are more sensitive to changes to the splicing code caused by ESMs.To investigate if mutations in vulnerable exons are more likely to cause exon skipping, we first manually searched the literature for ESMs that had been functionally validated to cause exon skipping (Figure 3 A).Exonic mutations in the splice sites, i.e. the first three and last three nucleotides of the exon, were excluded from the analysis, as these would typically result in exon skipping regardless of the vulnerability of the exon.Mutations that activate cryptic splice sites, as well as mutations in alternative exons, were also excluded, as we only wanted to establish the distribution of mutations affecting constitutively spliced exons.In total, 359 ESMs were included in our database, of which 340 are single nucleotide variants ( Supplementary Table S1 ).Next, we analyzed the distribution of these ESMs in the exons classified by VulExMap.Interestingly, 240 of 358 ESMs (67%) were located in vulnerable exons, whereas only 118 of 358 ESMs (33%) were located in resilient exons (Figure 3 B).Only one ESM could not be classified, due to the lack of USH2A expression in GTEx.In total there were 86 (54%) different vulnerable exons and 72 (46%) resilient exons in the ESM database (Figure 3 C), although vulnerable exons only make up 25% of all exons (19% in the genes in the database, Figure 3 D).There is a statistically significant difference ( P 4.933e-18) in the proportion of vulnerable exons between exons harboring ESMs (88 / 158 exons were vulnerable) and vulnerable exons in general within the same genes harboring ESMs (388 / 1893 exons were vulnerable).With a 2.66-fold relative difference and an odds ratio of 4.06 (2.91-5.66),this suggests that exonic mutations are almost three times more likely to affect splicing when located in a vulnerable exon than when located in a resilient exon.Due to this distribution, we wanted to know if vulnerable exons in general contained more mutations, and whether these were associated with aberrant splicing.When mapping the GTEx splicing quantitative trait loci (sQTLs) ( 19 ) and dbSNP156 variants ( 44 ) to exons classified by VulExMap, we observed that vulnerable exons had twice as many sQTLs / SNP as resilient exons (Figure 3 E; Supplementary Figure S6 ).Alternative exons had the highest number of both sQTLs and SNPs in general, which indicates a higher tolerance towards missense variants.Furthermore, vulnerable exons have slightly fewer synonymous variants than resilient exons (Figure 3 F), which may represent an evolutionary constraint against mutations, as silent mutations have a higher chance of causing exon skipping in vulnerable exons.To demonstrate this, we obtained minor allele frequency (MAF) data from GnomAD ( 45 ) and noted that synonymous SNPs in vulnerable exons exhibit a lower MAF compared with that of synonymous SNPs in resilient exons (Figure 3 G; Supplementary Figure S7 A).We did not observe any differences when comparing the MAF of missense and nonsense mutations between vulnerable and resilient exons, nor did we observe any difference when only analyzing common SNPs with an MAF ≥ 0.1 ( Supplementary Figure S7 B).These results demonstrate that synonymous SNPs are more likely to have a deleterious effect in vulnerable exons, making them less prevalent in the general population.
We have linked the ESM database with VulExMap, so that all ESMs from the database are automatically displayed in the corresponding exons at https://vulexmap.compbio.sdu.dk .Furthermore, VulExMap allows the user to upload their own bed6 file with mutations for quick visualization of the position of mutations in a gene of interest.

Exonic splicing mutations affect similar motifs in vulnerable exons
Although there are common motifs for the SREs in different exons, the consequences on splicing from abrogating identical SREs in different exons are not necessarily the same ( 17 ).To identify essential elements for exon definition, we performed K-mer analysis of enriched hexamers in the vulnerable and resilient group of exons.Interestingly, the most enriched hexamer in the resilient exons was the GAAGAA ESE (Figure 4 A), which we have previously demonstrated to be critical for inclusion of the vulnerable A C ADM exon 5 ( 17 ).Furthermore, we observe that the top hexamers in the resilient exons were all highly represented in the PESE database ( Supplementary Table S2 ).When comparing with the alternative exons, a wider dispersion of K-mer z-scores was observed ( Supplementary Figure S8 ).Although the Kmer analysis revealed that the GAAGAA ESE is the most enriched hexamer in the resilient exons, we observed that the ESM database ( Supplementary Table S1 ) contains 10 mutations that disrupt the GAAGAA motif located in vulnerable exons.The only example of a resilient exon being skipped by a mutation disrupting a GAAGAA motif is ATM exon 40 (NM_000051: c.5932G > T).The GAAGAA hexamer has previously been identified as an SRSF1-binding ESE ( 46 ), which is consistent with our previous work with the GAAGAA ESE in ACADM exon 5 ( 17 ).None of the ESMs created the GAAGAA motif.Furthermore, GCTGGG and CTGGGG hexamers were enriched in the vulnerable exons (Figure 4 A).The CTGGGG motif was created by three different ESMs whereas the GCTGGG motif was created by three ESMs and abolished by two ESMs.Most probably, these two hexamers are enriched because they harbor GGG triplet motifs, which are known to function as ESSs ( 47 ,48 ).In fact, 27 (8%) of the ESMs created triplet GGG motifs, whereas only three ESMs disrupted triple GGG motifs.In total, 17 ESMs resulted in creation of the TAGG ESS motif, whereas this motif was not disrupted by any ESM.To investigate if indeed all GAAGAA-disrupting mutations in the ESM database affect SRSF1 binding and that the mutations creating TAGG all affect hnRNPA1 binding, we performed SPRi analysis employing biotinylated RNA oligonucleotides, with either WT or mutant 15-mer sequences (Figure 4 B; Supplementary Table S3 ).For SRSF1 we observe a significant reduction of binding to the GAAGAA oligonucleotides, when the GAAGAA motif is disrupted ( P = 0.027, Wilcoxon signed rank).The only case where the mutation actually increases binding of SRSF1 is MFSD8 c.750A > G, where the mutation changes GAAGAA to GAGGAA, which is still considered a strong SRSF1-binding ESE according to DeepCLIP analysis ( 49 ) ( Supplementary Table S4 ).Our analysis showed that the exon skipping effect of this mutation is instead caused by a simultaneously increased binding of hnRNPA1 to an already strong ESS (Figure 4 B; Supplementary Tables S3 and  S4 ).In the only resilient exon with a GAAGAA-disrupting mutation, ATM exon 40 (NM_000051: c.5932G > T), we observe disruption of SRSF1 binding and a simultaneous increase in hnRNPA1 binding consistent with a dual effect from loss of an ESE and gain of an ESS (Figure 4 B; Supplementary Table S3 ).This indicates that the resilient ATM exon 40 requires a larger, dual change in exon definition in order to be skipped.Analysis of synonymous variants from dbSNP156 normalized to the total number of nucleotides in the resilient, vulnerable and alternative exons shows that vulnerable e x ons ha v e fe w er synon ymous SNPs than resilient e x ons, but more than alternativ e e x ons.( G ) T he a v erage MAF from gnomAD illustrates that synonymous variants in resilient exons have a higher MAF than vulnerable and alternative exons.The median MAF value is shown.
The TAGG-creating mutations all increase binding of hn-RNPA1 in the resilient exons, with the exception of PYGM c.1085G > A, where the TGGG to TAGG change instead increases binding of hnRNPA2 (R max increased from 135.4 to 186.9, Supplementary Table S3 ).Furthermore, the TAGG mutations created in the vulnerable exons do not all increase binding of hnRNPA1, but some instead increased binding of hnRNPA2 or hnRNPH ( Supplementary Table S3 ).The overlapping ESS motifs increase the likelihood of a mutation creating an ESS, while simultaneously disrupting an ESE.
To validate the importance of SRSF1 and hnRNPA1 as splicing regulators in the endogenous context, we mapped EN-CODE eCLIP data ( 50 ) from K562 cells across all the exons classified by VulExMap (Figure 4 C).Consistent with the PESE and PESS distribution (Figure 2A, B), we observed a higher distribution of SRSF1 eCLIP reads across the resilient exons, and the highest distribution of hnRNPA1 eCLIP reads across the alternative exons, with the vulnerable exons having fewer hnRNPA1 eCLIP reads than alternative exons, but more hn-RNPA1 eCLIP reads than both alternative and resilient exons in the flanking up-and downstream exons, underscoring the importance of context for exon vulnerability.
These data indicate that vulnerable exons are defined by a very finely tuned balance between positive and negative SREs, which can be easily disrupted by a point mutation that either abolishes an ESE or creates an ESS, whereas the balance in resilient exons is less fragile and requires stronger shifts, possibly due to the context in which the exon gains its resilience to aberrant splicing.

Vulnerability to splicing mutations is context dependent
We have previously shown that splicing of the vulnerable A C ADM exon 5 is strongly affected when inserted into a different context, whereas the resilient A C ADM exon 6 was not affected when inserted in a similar context ( 17 ).Due to the high frequency of mutated GAAGAA motifs in the ESM database, we chose to investigate the vulnerable BRCA2 exon 3 (NM_000059) and the resilient ATM exon 40 (NM_000051), which are both skipped by mutations disrupting a GAAGAA ESE.We chose two mutations from BRCA2 exon 3, namely the c.100G > A mutation, which disrupts SRSF1 and hnRNP A1 binding completely in the SPRi experiments ( Supplementary Table S3 ), and the c.145G > T mutation, which simultaneously disrupts SRSF1 binding and only partially decreases hnRNPA1 binding to a strong ESS in the SPRi data and increases hnRNPA1 according to DeepCLIP analysis ( Supplementary Table S4 ; Supplementary Figure S9 ).In the resilient ATM exon 40, only the c.5932G > T mutation has been reported to cause exon skipping by disrupting a GAAGAA ESE, which we demonstrate both disrupts SRSF1 binding and creates hnRNPA1 binding with SPRi (Figure 4 B; Supplementary Table S3 ) We therefore designed an artificial c.5935G > A mutation (GAAGAA to GAAAAAA), which is predicted not only to affect SRSF1 binding, but also to decrease hnRNPA1 binding when analyzed by Deep-CLIP ( Supplementary Table S4 ; Supplementary Figure S10 ).We used RNA affinity pull-down analysis to confirm these effects of the two BRCA2 and ATM mutations and observe that BRCA2 c.100G > A disrupted binding of both a positive (SRSF1) and a negative (hnRNPA1) binding factor, while the c.145G > T mutation disrupted SRSF1 binding but increased hnRNPA1 binding (Figure 5 A).Surprisingly the ATM c.5932G > T mutation only affected SRSF1 binding, while hn-RNPA1 binding was maintained, despite SPRi and DeepCLIP analysis indicating that this mutation should increase binding of hnRNPA1 (Figure 5 B; Supplementary Tables S3 and  S4 ; Supplementary Figure S10 ).As expected, the c.5935G > A mutation disrupted SRSF1 binding and decreased hnRNPA1 binding.Therefore, we hypothesize that BRCA2 c.145G > T and ATM c.5932G > T will have a more severe effect on splicing compared with the milder BRCA2 c.100G > A and ATM c.5935G > A mutations.
We cloned BRCA2 exon 3 and ATM exon 40 into the pSPL3 splicing reporter and introduced selected mutations.Next, we also subcloned the WT and mutated exons into the A C ADM minigene used in our previous study ( 17 ), substituting the vulnerable A C ADM exon 5. We inserted BRC A2 exon 3 and ATM exon 40 both into the WT A C ADM minigene context and into a context where the strength of the downstream 3 ss is increased from 4.77 (MaxEnt score) to 8.63 (MaxEnt score) in order to test the effect of increasing vulnerability by altering the context of the inserted exons (Figure 5C, E).Interestingly, we observed that when the vulnerable BRCA2 exon 3 is inserted into the pSPL3 splicing reporter, both the c.100G > A and the c.145G > T mutation cause some, but not complete, exon skipping.The c.145G > T mutation had the strongest effect on splicing, possibly due to the hnRNPA1-binding ESS, which is no longer inhibited by SRSF1 binding (Figure 5 D).Surprisingly, when we inserted the vulnerable BRCA2 exon 3 into the context of the vulnerable A C ADM exon 5, neither of the two mutations was able to cause exon skipping.When the downstream 3 ss was strengthened, the more severe c.145G > T mutation was able to cause a low degree of exon 3 skipping, but not to the same extent as in the pSPL3 splicing reporter, which has a lower 3 ss score and a longer downstream intron (Figure 5 D).This indicates that, whereas BRCA2 exon 3 may be vulnerable in its endogenous context and in the pSPL3 splicing reporter context, the flanking exons of A C ADM exon 5 do not provide sufficient competition, and the exon becomes resilient in the more favorable A C ADM exon 5 minigene context.
When we inserted the resilient ATM exon 40 into the pSPL3 splicing reporter, we surprisingly observed exon skipping from both the natural c.5932G > T mutation and the weaker artificial c.5935G > A mutation (Figure 5 F).As ex-pected, the c.5932G > T mutation had a much more severe effect on splicing, causing almost complete exon skipping, but the c.5935G > A mutation was also able to cause partial exon skipping.When we inserted ATM exon 40 into the A C ADM exon 5 minigene context, we observed that only the more severe c.5932G > T mutation was able to cause exon skipping, but when the downstream 3 ss was optimized, we again observed exon skipping from both mutations, with the strongest effect caused by the c.5932G > T mutation.This illustrates that the resilient ATM exon 40 is vulnerable in the pSPL3 context, resilient in the WT A C ADM exon 5 context and vulnerable again in the optimized A C ADM contexts.Aside from the downstream 3 ss strength, a major difference between the pSPL3 and A C ADM minigene is the length of the downstream intron.Interestingly, the downstream intron length is the only significant difference between the vulnerable and resilient exons in the ESM database ( Supplementary Figure S11 ), with vulnerable exons generally having longer downstream introns.
Taken together, these results suggest that although ATM exon 40 is scored as resilient in the endogenous context and BRCA2 exon 3 is scored as vulnerable in the endogenous context, they behave differently when inserted in the splicing reporters and minigenes.This underscores the importance of assessing exon vulnerability or resilience to splicing mutations in the endogenous context.Because VulExMap is based on empirical data on exon skipping levels from RNA-seq data from an endogenous context, we believe that it will be useful when assessing the effect of exonic splicing mutations.

Discussion
Based on our studies of the vulnerable exon 5 in A C ADM ( 17 ), we hypothesized that the inherent vulnerability to ESMs is different between exons and that this vulnerability is reflected in the observed low levels of basal exon skipping of an exon in the endogenous context in normal cells ( 27 ,28 ).Consistent with this hypothesis, it has also been reported from other genes, such as HRAS ( 48 ), BRCA2 ( 29 ,30 ) and CFTR ( 31 , 32 , 51 ), that some WT constitutive exons where ESMs have been reported are slightly less efficiently included in normal tissues.We therefore speculated whether this is a general feature of vulnerable exons, which could be used to pinpoint vulnerability to ESMs.Consequently, we designed VulExMap, which identifies low levels of exon skipping in RNA-seq data, and we demonstrate in the present study that this approach enables easy prediction of vulnerable exons.We previously also observed that A C ADM exon 2 is skipped in a small proportion of cDNA from patient and control cells ( 27 ,28 ), and consistent with this it was scored as vulnerable by VulExMap analysis.Here we confirm that several mutations in A C ADM exon 2 associated with MCADD affected splicing in a splicing reporter and that this mimicked ESM-based exon 2 skipping observed in patient cells (Figure 1 D; Supplementary Figure S4 ).We previously observed the phenomenon of low levels of exon skipping of HRAS exon 2 in normal cells, where an exonic mutation in a patient with Costello syndrome was predicted to cause the severe p.Gly12Val change, but instead functions as an ESM that causes a milder disease phenotype ( 48 ).VulExMap also confirmed that HRAS exon 2 is vulnerable ( Supplementary Figure S12 ; Supplementary Table S1 ).
With 45 reported ESMs, BRCA2 is the gene with by far the highest number of reported ESMs ( Supplementary Table S1 ).These are clustered in seven exons (exon 3, 5, 7, 12, 18, 19 and 23 NM_000059) of which all, except exon 23 (containing only one ESM), are predicted by VulExMap to be vulnerable (Figure 6 A).A recent study of 50 BRCA2 exon 3 variants revealed that 30 resulted in some degree of exon 3 skipping ( 30 ), and another study reported that 32 of 52 selected exonic variants in exon 17 and exon 18 caused exon skipping ( 29 ).
In CFTR , five of the six exons where ESMs have been reported (exon 3, 5, 10, 12, 13 and 14, NM_000492) are vulner-able according to VulExMap (Figure 6 B) and only one (exon 5) is resilient.Consistent with this, low levels of exon skipping from normal cells have been reported from CFTR exons 10, 13 and 14 ( 31 , 52 , 53 ).ESMs have been reported based on initial analysis of patient mRNA in the predicted vulnerable CFTR exons 3, 10, 12, 13 and 14, whereas the two reported ESMs in the resilient exon 5 have only been analyzed in a hybrid minigene ( 32 ), where the strength of the upstream 5 ss was increased.This would alter the vulnerability of CFTR exon 5 in the reporter and could thereby lead to artificially high exon skipping from the two reported ESMs.We show in the present study that vulnerability of BRCA2 exon 3 and ATM exon 40 to ESMs is context specific, and we have previously shown that the splice sites of neighboring exons and intron size are important in defining vulnerability by analyzing splicing of WT and mutant A C ADM exon 5 and exon 6 in different contexts ( 17 ).Consistent with our studies, Hefferon and co-workers ( 54 ), who analyzed CFTR exon 10 splicing kinetics, showed that strengthening of the upstream (exon 9) 5 ss also weakens definition of the downstream exon and therefore increases vulnerability.Moreover, a recent study ( 55 ) also demonstrated that the strength of the upstream 5 ss has a strong effect on inclusion of the downstream exons employing a massive parallel splicing assay.This is consistent with our observations on the splicing patterns observed when BRCA2 exon 3 and ATM exon 40 are inserted into different contexts and suggests that using splicing reporters, such as the pSPL3 vector, may not always accurately mimic the effect a mutation may have on splicing.
At present, exons in a disease gene are considered equally likely to be affected by ESMs, and the proportion of exonic mutations that cause exon skipping by affecting SREs has been reported to be as high as 77% in MLH1 exon 10 and 60% in BRCA2 exon 3 ( 30 ,56 ).Our analysis with VulExMap demonstrates that both MLH1 exon 10 and BRCA2 exon 3 are vulnerable ( Supplementary Table S1 ) and therefore more sensitive to ESMs.On the other hand, Soemedi et al .tested 4964 exonic disease-associated mutations in a fixed genomic context employing a splicing reporter and found that only 10% of the mutations altered splicing ( 16 ).However, the splicing reporter in their assay consisted of a design with an atypical context, where only exons ≤ 100 bp was used and the flanking introns were only ∼200-300 bp long ( Supplementary Figure S13 ).This is far below the mean length of exons and flanking introns (mean exon length is ∼150 bp, mean upstream intron length is ∼7000 bp and mean downstream intron length is ∼6000 bp) and may have resulted in a very efficient splicing reporter, which is only sensitive to mutations that cause a major alteration to the splicing code.Furthermore, as we have observed that downstream intron length is directly correlated with exon vulnerability, the very short downstream intron may increase the splicing efficiency, and thus the resilience of the investigated exons.On average, the group of vulnerable exons identified by VulExMap are more likely to have a weaker 3 ss than that of the flanking downstream exon (Figure 2 F, G).This may contribute to explain why these exons are particularly vulnerable to ESMs as it is well documented that the cooperation and competition of the up-and downstream splice sites influences the efficiency of exon inclusion (57)(58)(59).This further underscores that when testing the effects of exonic mutations on splicing by employing minigenes, it is imperative that these very closely reflect the endogenous context.
Although the vast majority (68%) of reported ESMs are located in vulnerable exons there is, as mentioned above, also a proportion of ESMs which are located in resilient exons ( Supplementary Table S1 ).Although some of these ESMs may have been misclassified because exon skipping has only been observed by minigene analysis and not in the endogenous context, it is also apparent that disrupted splicing is not simply an all or none process.Our previous analysis of A C ADM exon 5 mutations ( 17 ( 37 ).This reflects that the relative importance of the affected SRE is determined by exon vulnerability, but also that the severity of the change that the mutation imposes on the SRE is important [i.e.whether skipping can be caused by creating an ESS or by disrupting an ESE (single change to the splicing code), or if the ESM simultaneously needs to create an ESS and disrupt an ESE (dual change to the splicing code)] (Figure 7 ).Our PESE and PESE analysis (Figure 2A, B) revealed that resilient exons are more strongly defined, due to the over-representation of PESEs and under-representation of PESSs compared with vulnerable exons.This is consistent with recent findings, where it was demonstrated that alternative exons are enriched with suboptimal ESEs, that are only one mutation away from becoming an ESS ( 15 ).Furthermore, due to the multi-faceted capacities of some splicing factors, predicting the outcome from loss of binding of a specific factor is a complex matter.Notably, the SPRi analysis revealed that the GAAGAA-disrupting mutations within vulnerable exons led to diminished hnRNPA1 binding, even though these mutations caused exon skipping (Figure 4 B).This highlights the intricate interplay of splicing factors, where a disruption in both inhibitory and stimulatory elements can culminate in abnormal splicing patterns within vulnerable exons.Moreover, the reduction in hnRNPA1 binding might also have different implications for splicing by potentially altering U2AF's capability to engage with decoy 3 splice sites ( 61 ).
As shown in the initial design of VulExMap ( Supplementary Figure S2 ), low occurrence of tissue-specific alternative exons may result in a few exons that are not classified correctly.Due to the heterogeneous expression of several splicing regulatory factors across tissues ( 62 ), there is also a risk that some exons are vulnerable only in certain tissues.In a recent study, an extensive splicing analysis was conducted of 97 mutations in F8 from patients with hemophilia A ( 63 ).
Here the authors show that when inserted into a heterogeneous splicing reporter, F8 exons 9, 13, 15, 17, 22, 24 and 25 were all resilient to mutation-induced aberrant splicing.VulExMap correctly classifies all of these exons except exon 22 as resilient ( Supplementary Figure S14 A).The authors also report that F8 exons 7, 11, 16 and 18 are affected by splicing mutations, and show that F8 exon 16 is vulnerable to splicing mutations.Despite this, VulExMap only classifies F8 exon 7 as vulnerable.We do observe, however, that F8 has a very heterogenous expression across the tissues included in GTEx, and that F8 exon 16 is classified as vulnerable in 5 out of 27 tissues in GTEx ( Supplementary Figure S14 B).Moreover, VulExMap uses annotated junctions, but considers each cassette exon as isolated and does not specifically account for use of alternative splice sites in the vicinity, or of intron retention.Therefore, some vulnerable exons may appear resilient due to more complicated alternative splicing patterns, as for instance F8 exon 16, which contains an alternative 3 ss with a new and complex regulatory mechanism, as noted by Tse and colleagues ( 63 ).
Previous studies investigating the predictive power of in silico tools, such as tESRseq ( 64) and ( 5 ), found that the accuracy of these tools varies for different genes and exons ( 56 ) because they simply score if an altered motif results from the mutation, and this is more likely to affect a vulnerable exon.Consistent with our findings, Canson et al .recently reported that these tools have the highest accuracy in BRCA1 exon 6, BRCA2 exon 7, CFTR exon 12, MLH1 exon 10 and NF1 exon 37 ( 65 ), which are all vulnerable according to VulExMap analysis ( Supplementary Table S1 ).Recent studies have implemented new decision pipelines for determining the pathogenicity of exonic variants ( 30 ,66 ), and we suggest that knowledge of vulnerability should also be included to aid such classification.While tools such as HexoSplice ( 13 ) and EX-SKIP ( 4 ) were able to predict a splicing change in the majority of the mutations in the ESM database, there was no significant difference in the predicted scores between resilient and vulnerable exons ( Supplementary Figure S15 ).More recently, the advanced SpliceAI tool ( 67) is a frequently used tool for predicting splice mutations ( 68 ).However, while SpliceAI excels in evaluating splice site mutations, precisely gauging the impact of ESMs is more challenging.When evaluating the mutations in the ESM database using SpliceAI's scoring system, 257 out of 359 mutations were predicted to alter either a donor or an acceptor site by < 0.2, which is the software's lowest threshold ( Supplementary Table S5 ; Supplementary Figure S16 ).This underscores the complexity of predicting splicing mutations that do not directly alter splice site sequences.Notably, our exclusion of mutations that generate cryptic splice sites further reduces the utility of SpliceAI for ESM evaluation in our database.This is consistent with recent findings where SpliceAI or similar tools were used to evaluate mutations located outside of the splice sites ( 69 ).
In summary, we have developed VulExMap which, based on empirical data, classifies the vulnerability of an exon when present in its endogenous genomic context.We used it to demonstrate that, on a transcriptome-wide basis, vulnerability to ESMs differs between constitutive exons.Importantly, VulExMap provides a simplistic and user-friendly way to evaluate whether a mutation is located in a vulnerable exon and therefore has a several fold higher risk of causing effects on splicing.We thus believe that VulExMap will be useful in future genetic diagnosis.

Figure 1 .
Figure 1.VulExMap re v eals vulnerable constitutiv e e x ons.( A ) L ogarithm of the CV of binned internal e x ons with sufficient reads.Ex ons w ere binned into groups based on mean PSI le v el, and CV thresholds were established as the median CV value of the exons with PSI in the ranges of 80-85% and 99-99.5% (indicated in dashed vertical lines).( B ) Distribution of segments from GTEx classified by VulExMap.The segments were classified based on PSI and CV, and only included if the segments had > 50 junctions co v ering the e x on and a minimum junction read count of 30 reads.Exons with a PSI > 99.5 were classified as resilient, exons with a PSI < 99.5 and > 85 were classified as vulnerable, and exons with a PSI < 85 were classified as alternative.If an exon had a PSI = 0, it was not spliced in, and if the exon was not extensively included, if was termed NA. ( C ) VulExMap of ACADM re v eals three vulnerable e x ons.T he primary protein-coding refseq transcript NM_0 0 0 016.1 is sho wn directly belo w the gene model.( D ) pSPL3 splicing reporter with the vulnerable ACADM e x on 2 was used to investigate mutations from patients with medium-chain acyl-CoA dehydrogenase (MCAD) deficiency.Real-time PCR (RT-PCR) was carried out on cDNA from HeLa cells transfected with the plasmids.PCR products were quantified by fragment analyzer, n = 4. ( E ) An optimized 5 ss was introduced to some of the vectors which was transfected as above.( F ) RNA affinity pull-down of 15-mer biotinylated RNA oligonucleotides with either wild-type (WT) or mutant ACADM c.85C > A, SMN1 / SMN2 c.840C > T or F9 c.484C > T. Proteins were analyz ed b y SDS-PAGE and w estern blotting against SR SF1, hnRNPA1 and hnRNPL (binding control).Binding w as quantified with ImageJ and normalized to WT oligonucleotides, n = 2.

Figure 2 .
Figure 2. Resilient exons possess clearer definitions compared with vulnerable exons.From the GTEx RNA-seq data, segments that were identified as vulnerable ( n = 52 315), resilient ( n = 109 460) or alternative (n = 26 091) were included for comparison.Only internal exons were included.(A-G) In each group, PESEs / 100 bp ( A ), PESSs / 100 bp ( B ) and GC content ( C ) were calculated.To determine context, we also compared exon length ( D ), flanking intron length ( E ) and splice site strength ( F ).To acquire the difference between the flanking up-and downstream splice sites, we took the 3 ss score of the downstream 3 ss and subtracted the 3 ss score of the e x ons in each group, or the 5 ss score of the upstream 5 ss and subtracted the 5 ss score ( G ). Wilco x on rank sum test was performed between the three groups.**** P ≤ 0.0 0 0 05, *** P ≤ 0.0 0 05, ** P ≤ 0.005, ns P > 0.05.( H and I ) Density of features associated with e x on definition (H) and e x on repression (I).The mean score of each feature abo v e w as calculated, and the e x ons with a score abo v e the total mean were grouped by whether they had 0, 1, 2, 3, 4 or 5 different feature scores abo v e the total mean.( J ) Graphical representation of the many factors that can affect exon definition.Resilient exons are, in general, long, with a high density of ESEs and a low density of ESSs, strong splice sites compared with the flanking e x ons' splice sites, and shorter introns.Vulnerable e x ons are w eak er on multiple parameters than resilient e x ons, but more defined than alternative exons.

Figure 3 .
Figure 3. Vulnerable e x ons are enriched with ESMs and predisposed to skipping.( A ) The ESM database was generated by manually searching PubMed using the search terms 'e x onic splicing mutation' and 'e x on skipping minigene assa y'.Mutations w ere assigned correct HGMD nomenclature.Mutations in alternative exons, the first and last 3 bp of an exon or mutations that created new splice sites were excluded from the database.All mutations in the ESM database were uploaded to VulExMap.Figure made with BioRender.( B ) Distribution of 359 ESMs in vulnerable and resilient e x ons, from 75 genes.VulExMap was used to classify the exons containing the ESMs from Supplementary Table S1 .If the gene had a mean segment count < 50, then those ESMs could not be classified = NA (only USH2A could not be classified due to low expression in GTEx).( C ) Classification of exons with ESMs, eliminating any bias for multiple ESMs in the same exon, still show an over-representation of ESMs in vulnerable exons.( D ) Distribution of all exons in the 75 genes from the ESM database that could be classified according to the pre-determined threshold values.( E ) Distribution of all dbSNP156variants, outside of the first and last three bases of the exon, in GTEx sQTLs shows that vulnerable exons have more SNPs associated with splicing.( F ) Analysis of synonymous variants from dbSNP156 normalized to the total number of nucleotides in the resilient, vulnerable and alternative exons shows that vulnerable e x ons ha v e fe w er synon ymous SNPs than resilient e x ons, but more than alternativ e e x ons.( G ) T he a v erage MAF from gnomAD illustrates that synonymous variants in resilient exons have a higher MAF than vulnerable and alternative exons.The median MAF value is shown.

Figure 4 .
Figure 4. Overlapping motifs indicate exon vulnerability.(A ) K-mer analysis (jellyfish) was carried out on the sequences of the exons that were either vulnerable or resilient, and differentially enriched hexamers were compared between the two groups.Shown are the top enriched hexamers in the vulnerable (blue) and resilient (green) e x ons.Significantly enriched k-mers were identified using a proportion test and adjusting for multiple testing with Bonferroni.( B ) Eleven mutations affecting the GAAGAA motif (10 in vulnerable exons and 1 in a resilient exon) and 17 mutations creating a TAGG ESS motif (10 in vulnerable e x ons and 7 in resilient e x ons) w ere introduced in 15-mer biotin ylated RNA oligonucleotides, and the WT and mutant (MUT) sequences were analyzed with SPRi using recombinant SRSF1 and hnRNPA1.Binding is shown as log2 (R max ).Red lines indicate the two mutations in ATM and MFSD8 .Wilco x on paired test was performed between WT and MUT binding scores.* P ≤ 0.05, ns P > 0.05.( C ) ENCODE eCLIP data from K562 cells against SRSF1 and hnRNPA1 were plotted across the last 50 bp of the upstream exon, the first and last 300 bp of the upstream intron, the first and last 50 bp of the resilient vulnerable and control e x ons, the first and last 300 bp of the downstream intron and the first 50 bp of the downstream e x on.Control is the mean binding of all e x ons in the three groups.

Figure 5 .
Figure 5. Exonic context determines splicing outcome in vectors.( A ) Two mutations in the vulnerable BRCA2 exon 3 were selected, and the effect the mutations had on protein binding was verified with RNA-affinity pulldown with nuclear extract.hnRNPL was included as a binding control.The c.10 0G > A mut ation is considered a mild mut ation, and the c.145G > T mut ation is se v ere.Binding w as quantified with ImageJ and normaliz ed to WT oligonucleotides, n = 2. ( B ) In the resilient ATM e x on 40, only one splicing mutation, c.5932G > T, has been reported .A second mutation at c.5935G > A was included, as this was predicted to be less severe than c.5932G > T ( Supplementary Figure S5 ).Pull-down with nuclear extract confirmed that c.5932G > T w as se v ere and c.5935G > A w as mild.Binding w as quantified with ImageJ and normaliz ed to WT oligonucleotides, n = 2. ( C ) The vulnerable BR CA2 e x on 3 with WT, c.100G > A or c.145G > T w as introduced into three different v ectors, the pSPL3 splicing reporter or the ACADM e x on 5 minigene from ( 17 ) with either a WT or optimized (OPT) downstream 3 ss.Also shown is the context in endogenous BRCA2.( D ) The splicing reporters with BR CA2 e x on 3 w ere transfected into HeLa cells, and splicing w as e v aluated with R T-PCR.( E ) T he resilient ATM e x on 40 with WT, c.5932G > T or c.5935G > A was introduced into the same splicing reporters as BRCA2 exon 3. Also shown is the context of endogenous ATM. ( F ) The splicing reporters with ATM e x on 40 w ere transfected into HeLa cells, and splicing w as e v aluated with R T-PCR.PCR products w ere quantified b y fragment analyz er, n = 4.

Figure 6 .
Figure 6.VulExMap shows enrichment of ESMs in known vulnerable exons in BRCA2 and CFTR .( A ) VulExMap of BRCA2 with GTEx data.Black lines indicate ESMs: nine in the vulnerable e x on 3 (a), five in the vulnerable exon 5 (b), 11 in the vulnerable exon 7 (c), eight in the vulnerable exon 12 (d), nine in the vulnerable e x on 18 (e), one in the vulnerable e x on 19 (f) and one in the resilient e x on 24 (g).( B ) VulExMap of CFTR with GTEx data.Black lines indicate ESMs: five in the vulnerable exon 3 (a), two in the resilient exon 5 (b), six in the vulnerable exon 10 (c), one in the vulnerable exon 12 (d), two in the vulnerable e x on 13 (e) and two in the vulnerable exon 14 (f).

Figure 7 .
Figure 7. Mutations creating the same motif do not alw a y s result in aberrant splicing.Model of single and dual change to the splicing code.When a mutation disrupts an ESE or creates an ESS, the potential ESS elements across the vulnerable e x on will attract a negative splicing factor and cause e x on skipping by displacing the positive splicing factors.Resilient exons have a higher density of ESEs, as well as stronger splice sites, and are therefore better suited to handle single changes to the splicing code.A mutation that only disrupts an ESE or creates an ESS will often not be sufficient to cause e x on skipping.Only if a mutation se v erely alters the splicing code, by both disrupting an ESE and creating an ESS (dual change), will the balance be shifted sufficiently to cause skipping of the e x on.Figure made with BioRender.
), as well as the present analysis of mutations in BRCA2 exon 3 and ATM exon 40 indicate that the effect a mutation has on splicing is highly dependent both on the pre-existing balance between ESEs and ESSs in the affected exon and on the alteration (ESE loss or ESS gain versus ESE loss + ESS gain) to the splicing code caused by the mutation.The TA GA CA hnRNPA1-binding motif induces exon skipping in the vulnerable exons A C ADM exon 2, F9 exon 5 and BRCA2 exon 12, and in the resilient SMN1 exon 7 by disrupting SRSF1 binding to the CA GA CA motif (Figure 1 F).Previous studies highlight that only the C > T mutation causes complete SMN1 exon 7 skipping, with no effect from C > G and only a minor impact from C > A ( 60 ).In contrast, the vulnerable A C ADM exon 2 is skipped by both C > A and C > T, and in the vulnerable F9 exon 5 C > A, C > T and C > G all cause exon skipping