Stargardt disease-associated missense and synonymous ABCA4 variants result in aberrant splicing

Abstract Missense variants in ABCA4 constitute ~50% of causal variants in Stargardt disease (STGD1). Their pathogenicity is attributed to their direct effect on protein function, whilst their potential impact on pre-mRNA splicing disruption remains poorly understood. Interestingly, synonymous ABCA4 variants have previously been classified as ‘severe’ variants based on in silico analyses. Here, we systemically investigated the role of synonymous and missense variants in ABCA4 splicing by combining computational predictions and experimental assays. To identify variants of interest, we used SpliceAI to ascribe defective splice predictions on a dataset of 5579 biallelic STGD1 probands. We selected those variants with predicted delta scores for acceptor/donor gain > 0.20, and no previous reports on their effect on splicing. Fifteen ABCA4 variants were selected, 4 of which were predicted to create a new splice acceptor site and 11 to create a new splice donor site. In addition, three variants of interest with delta scores < 0.20 were included. The variants were introduced in wild-type midigenes that contained 4–12 kb of ABCA4 genomic sequence, which were subsequently expressed in HEK293T cells. By using RT-PCR and Sanger sequencing, we identified splice aberrations for 16 of 18 analyzed variants. SpliceAI correctly predicted the outcomes for 15 out of 18 variants, illustrating its reliability in predicting the impact of coding ABCA4 variants on splicing. Our findings highlight a causal role for coding ABCA4 variants in splicing aberrations, improving the severity assessment of missense and synonymous ABCA4 variants, and guiding to new treatment strategies for STGD1.


Introduction
Biallelic ABCA4 variants impair the function of the ATP-binding cassette subfamily A member 4 protein (ABCA4) and are responsible for Stargardt disease type 1 (STGD1) and related retinopathies (1).Almost 2400 distinct disease-associated variants in the ABCA4 gene have been discovered (www.lovd.nl/ABCA4).Located in the rod photoreceptor disc membranes and the cone photoreceptor cell membrane, ABCA4 acts as a f lippase of N-retinylidenephosphatidylethanolamine (N-Ret-PE), the Schiff-base adduct of retinal and phosphatidylethanolamine formed during the visual cycle (2).Its disfunction leads to accumulation of di-retinoid compounds in photoreceptor cells with a consequent visual impairment with varying degrees of severity (3)(4)(5)(6).STGD1 probands display very different phenotypes based on the allele combinations, from early-onset STGD1 or panretinal cone-rod dystrophy because of two severe alleles, to late-onset STGD1 characterized by foveal sparing because of p.(Asn1868Ile) or p.(Gly1961Glu) in trans with a severe allele (7).Cases of low penetrant alleles, the discovery of sex imbalance for some combinations of ABCA4 variants, and putative trans-modifiers in PRPH2 and ROM1 strongly suggest a multifactorial or polygenic inheritance of disease for a subset of STGD1 cases (8)(9)(10).Therefore, to provide a prognosis of disease progression as accurately as possible it is important to assess the severity of ABCA4 variants.
Missense variants in ABCA4 are the major underlying cause for STGD1, accounting for 48% of all unique and 62% of all disease-associated variants in STGD1 probands (11).Comprehensive in vitro functional analyses of missense variants determined their pathogenic effect across all structural domains in ABCA4, where most of the variants were predicted to either impair protein folding and trafficking or severely reduce its ATPase activity (12)(13)(14)(15)(16).So far, the effect of missense variants in ABCA4 was determined solely on the change in the amino acid sequence, whilst evidence of their possible effect on splicing is lacking.Based on extensive in silico studies by Cornelis et al. (11), the severity prediction of ABCA4 variants was carried out for variants identified in 5579 biallelic STGD1 probands.Subsequently, seven synonymous variants were classified as 'severe' or 'moderately severe,' without experimental proof.This observation led to the hypothesis that disease-associated coding variants in ABCA4, either nonsynonymous or synonymous, might have disruptive effects on splicing.
In this study, we report causality for synonymous and missense variants in ABCA4 by attributing them novel missplicing events.By combining the splice prediction tool SpliceAI (17) and in vitro midigene assays, we were able to shed additional light on the pathogenicity of previously classified severe STGD1 variants and to attribute new severity scores to others.Our findings have significant implications for a more accurate prognosis of disease progression and the development of appropriate therapeutic strategies targeting the (splice) coding ABCA4 variants in individuals with STGD1.

Selection of coding variants in ABCA4 based on SpliceAI predictions
We set out to identify missense and synonymous variants in ABCA4 that might affect splicing by assigning SpliceAI delta scores to all variants previously identified by Cornelis et al. (11).Specifically, this study assessed all ABCA4 variants detected in 5579 biallelic STGD1 probands and categorized them based on their severity.Variants that were of particular interest were those classified as moderately severe or severe, as their effect may not solely be based on an amino acid change.Also, missense variants that were previously classified as variants of uncertain significance and those that could not be categorized because of limited data were taken along with the intent of assigning them a severity score based on their impact on splicing (11).Importantly, we omitted the variants located at the first, second, penultimate and last position within exons under the assumption that these are very likely to affect splicing.The computational analysis prioritized those variants that displayed a delta score (DS) > 0.20 for splice acceptor gain (AG) or donor gain (DG).This threshold, regarded as the 'high recall' threshold by the developers (17), showed reliability in previous computational analyses of ABCA4 deep-intronic and non-canonical splice site variants (18).Here, we hypothesized that DS > 0.20 likely predicts the formation of new splice sites caused by coding missense and synonymous ABCA4 variants, which will compete with the canonical splice acceptor site (SAS) or splice donor site (SDS).As a result, 15 variants of interest were identified, with 4 variants having an AG DS > 0.20 and 11 with a DG DS > 0.20.Among these variants, only c.3407G>T was attributed both a DG DS and an AG DS > 0.20.Since the DG DS was higher than AG DS (0.95 > 0.30), this variant was taken into the DG variants group.We added three other variants to the analysis even though their AG DSs were < 0.20.Two of these variants are located nearby a so-called dual SAS/SDS in exon 30 (c.4454C>T and c.4457C>T), and thus are likely to affect the splicing.Additionally, one variant, c.4203C>A, was previously classified as benign based on ACMG/AMP criteria and served as a negative control for the splicing analysis (19).The chosen AG and DG variants are presented in Tables 1 and 2, respectively, while the complete overview of attributed SpliceAI predictions is shown in Supplementary Material, Table S1.

ABCA4 variants leading to alternative splice acceptor sites
Seven variants were selected based on their AG DS and introduced into previously described wild-type (WT) BA constructs, ensuring that the exon containing the variant was f lanked by at least one exon and the adjacent intronic sequences (20).The details of the used WT constructs can be found in Figure 1 and Supplementary Material, Table S2.Despite the use of a high-fidelity DNA polymerase, we observed polymorphisms (Supplementary Material, Table S3) introduced upon mutagenesis.However, they are not predicted to alter the pre-mRNA splicing.
As depicted in Figure 2, nearly all the investigated coding variants led to expected misspliced ABCA4 RNAs, apart from c.4203C>A and c.5367C>G that did not alter the splicing when compared with the corresponding WT control (Supplementary Material, Figs S1 and S2, respectively).Variant c.1977G>A was predicted to severely affect the SAS of exon 14.In fact, we identified the expected isoform with an alternative SAS 41 nt downstream of the original SAS, together with the ABCA4 transcript lacking both the 41 nt and the complete exon 15.These events led to predicted premature stop codons, resulting in p.Phe647Alafs * 105 and p.Phe647Alafs * 73, respectively.No WT ABCA4 mRNA was identified for this variant (Fig. 2A and Supplementary Material, Fig. S3).
Variant c.3703A>G led to a transcript missing 96 nt at the 5 end of exon 25, as predicted by SpliceAI.The WT isoform was the only one identified in the WT control plasmid, as opposed to the c.3096A>G sample where this was present only in traces (Figs 2B and Supplementary Material, Fig. S4).
Exon 30 variants c.4454C>T and c.4457C>T weakened the predicted strength of the canonical SAS of exon 30 and generated an alternative SAS 114 nt downstream of the canonical SAS.This aberration was found exclusively in combination with the complete skip of exon 29.All samples displayed the WT RNA as the most predominant isoform, and small amounts of an ABCA4 transcript lacking exon 29 (Fig. 2C and Supplementary Material, Fig. S5).
Variant c.5088C>G was anticipated to impose the formation of a new SAS 66 nt downstream from the original SAS.However, the isoform identified upon RT-PCR, as shown in Figure 2D and Supplementary Material, Figure S6, lacked 69 nt from the 5 end of exon 36.This frameshift event was observed in only 15.0 ± 0.5% compared with total ABCA4.The WT ABCA4 transcript and a transcript lacking exon 35 and the first 69 nt of exon 36 were detected in both mutant and WT midigenes.

ABCA4 variants resulting in alternative splice donor sites
The 11 ABCA4 variants that were predicted to create a new SDS and thus compete with the canonical SDS, were introduced into the appropriate WT constructs (Fig. 1 and Supplementary Material, Table S2) (20).
Variant c.1022A>T, as predicted by SpliceAI, resulted in the deletion of the last 83 nt of exon 8 (fragment 2 in Fig. 3A and Supplementary Material, Fig. S7).This resulted in a frameshift   was identified in all mutant midigenes and the control WT sample (20), either as a single event (fragment 2, p.His721_Val794del), or combined with the predicted SDS disruption (fragment 4, p.Met710_Val794del).
For the mutant c.3096A>T BA15 construct, the RT-PCR analysis revealed the presence of two shortened ABCA4 RNA products, one missing the last 96 nt of exon 21 (fragment 2, p.Gly1032_Ser1063del), and the other one lacking the complete exon 21 (fragment 3, p.His1017Glnfs * 111) (Fig. 3C and Supplementary Material, Fig. S9).The creation of the alternative SDS in exon 21 (fragment 2) was strongly predicted by SpliceAI.The WT BA15 sample yielded only the WT ABCA4 transcript (Fig. 3C).
The RT-PCR of samples transfected with c.3407G>T and c.3462C>T BA16 midigenes confirmed the previously assigned (11) severe nature of the two investigated variants, represented in Figure 3D and Supplementary Material, Fig. S10.In fact, the presence of both variants severely affected the abundance of WT ABCA4, which was present at very low levels when compared with the WT midigene.Variant c.3407G>T led to the expression of an ABCA4 RNA product missing 117 nt at the 3 end of exon 23.This event was strongly predicted by SpliceAI, where the canonical SDS was weakened and an alternative SDS was favored.This resulted in an in-frame deletion of 39 amino acids (p.Gly1136_Glu1174del). Similarly, c.3462C>T created an alternative SDS 62 nt upstream the canonical exon 23 SDS, also strongly predicted by SpliceAI.The open reading frame was disrupted (p.Leu1155Aspfs * 19).None of the alternative isoforms were detected in the BA16 WT sample.
As shown in Figure 3E and Supplementary Material, Fig. S11, exon 30 variants c.4446C>A and c.4469G>A resulted in complex aberrant splicing events that involved exons 29 and 30.SpliceAI predicted that c.4446C>A would lead to the use of a cryptic SDS located 96 nt upstream the canonical SDS of exon 30.This RNA product was present at 92.7 ± 0.1% when compared with total ABCA4 RNA, together with the isoform lacking a combination of the upper event with ABCA4 exon 29 present at 7.3 ± 0.1%.The predicted ABCA4 proteins are p.Val1482_Gln1513del and p.[Ser1418_Pro1451delinsArg,Val1482_ Gln1513del], but as the latter was present in < 15% of total ABCA4 RNA, it was not represented in the new protein notation.As for the c.4469G>A variant, two similarly expressed transcripts were identified with RT-PCR.The first showed a 73-nt shortened exon 30 at the 3 end, which is in line with the previously determined splicing prediction and previous research (21).This led to a shift in the open reading frame and a premature stop codon (p.Cys1490Glufs * 12).In addition, the second identified fragment lacked the complete exon 29, whereas exon 30 underwent a 2-nt elongation at its 5 end and the abovementioned 73-nt truncation at its 3 end.
The ABCA4 variants c.6207C>T, c.6272T>A and c.6339C>G were predicted to strengthen cryptic SDSs in exons 45  Unlike the BA28 WT, none of the three mutant BA28 constructs produced WT RNA (apart from c.6207C>T for which WT RNA was measured at non-significant levels, 3.5 ± 0.7%).

Severity categorization of missense variants based on the levels of correctly spliced RNA
To assess the severity of variants based on their impact on splicing, cDNA products were quantified using the gel images.Details regarding the analysis for AG and DG variants can be found in Supplementary Material, Tables S4 and S5, respectively.Figure 4 illustrates the percentages of WT RNA together with the severity thresholds estimated earlier (11).As the c.4203C>A and c.5367C>G variants did not obstruct the splicing at all, these were classified as 'benign.'Variants c.5088C>G, c.4454C>T and c.3096A>T were categorized as 'mild,' as they yielded between 40% and 80% of WT RNA.None of the variants were considered 'moderately severe,' whereas the remaining 12 variants were labeled as 'severe' since they resulted in < 20% of WT ABCA4.The intervals for mild, moderately severe and severe ABCA4 alleles were derived from unpublished theoretical modelling studies in our group.The overview of all splicing events was summarized in Supplementary Material, Figure S13.Since several variants led to in-frame deletions within the coding sequence of ABCA4, there is, theoretically, a probability of remaining residual activity of ABCA4.Therefore, we performed an in silico analysis where we linked these deletions to the amino acid sequence and determined their possible impact on the protein's function.As expected, all of the in-frame deletions resulted in removal of critical elements of ABCA4, without which the protein would not be able to maintain its functionality (Supplementary Material, Tables S4 and S5).
In addition, the severity of the missense variants was assessed by consulting data from previously published STGD1 probands for their age of disease onset and the ABCA4 allele found in trans.A missense variant would be considered of severe nature if: 1. the ABCA4 variant in trans was previously categorized as 'mild,' it must be accompanied by a second severe allele in order for STGD1 to manifest, or 2. the age of onset was before 10 years of age, which in most cases is associated with the presence of two severe ABCA4 alleles.The details of this analysis are reported in Supplementary Material, Table S6.For the 18 investigated variants, we were able to attribute the 'severe' label to six variants, i.e. c.1977G>A, c.3407G>T, c.4469G>A, c.6207C>T, c.6272T>A and c.6339C>G.Interestingly, all mentioned variants yielded ≤ 20% of WT RNA and were, thus, classified as 'severe' in Figure 4, which is in line with the above in silico analysis.
Table 3 provides a summary of the newly discovered impact of the investigated variants on RNA and protein.The updated RNA and protein products have been deposited into the LOVD database for ABCA4 (www.lovd.nl/ABCA4).Moreover, the table reports their ACMG/AMP classification according to the previously published studies (22), as well as their updated ACMG/AMP classification, taking into account the impact on splicing reported in this study.It is important to note that, out of 18 investigated missense variants, the ACMG/AMP classification of four variants was raised to a higher severity class, indicating increased reliability of their pathogenic nature.

Discussion
In this study, we investigated whether synonymous and missense coding variants in ABCA4 had an effect on pre-mRNA splicing.Using SpliceAI to predict the possible effect on splicing and midigene-based splice assays to validate the predictions, we report novel missplicing events in 16 out of 18 investigated coding ABCA4 variants.These novel aberrations in splicing led to inframe deletions within exons or frameshifts, because of newly activated (one SAS, five SDS) or strengthened (four SAS, five SDS) cryptic splice sites.Interestingly, in silico analysis of in-frame deletions reported in Supplementary Material, Tables S4 and S5 suggests that although these are not obstructing the open reading frame, they affect major domains within the ABCA4 protein and are therefore having a severe effect on the protein's remaining activity.Furthermore, 12 variants led to ≤ 20% of WT ABCA4 RNA, which corroborate the certainty regarding their pathogenicity and their classification as 'severe' STGD1 variants.So far, therapeutic strategies targeting missense variants associated with inherited retinal diseases (IRDs) focused on geneticbased approaches in order to express the wild-type protein, which involved gene augmentation, nuclease-based genome editing and RNA editing.A potential RNA therapy currently applicable to missense variants is ADAR-mediated editing, that targets, however, only missense G>A variations.This therapeutic strategy employs short RNA molecules that recruit the naturally occurring endogenous adenosine deaminase acting on RNA (ADAR) enzymes able to convert an adenosine to inosine, recognized as guanine (23)(24)(25).It is estimated that 23% of all missense variants underlying IRDs are treatable by ADAR-mediated editing, while this number falls to 10% for all causal ABCA4 variants (26).Next to this, Ascidian Therapeutics showed promising results with their Exon Editor technology, which makes use of the RNA trans-splicing process to replace the disease-associated pre-mRNA sequence in ABCA4.This approach would address both the splicing aberration and the amino acid change upon the missense variants in this study (27).On the other hand, encouraging progress in the development of potential therapeutic modalities targeting splicing variants in ABCA4 has been reported by several groups, either by applying permanent intronic DNA-based alterations (28) or reversible RNA-based approaches that involve the application of antisense oligonucleotides (29)(30)(31)(32)(33)(34)(35).The variants reported in this study could make a great target for AON-based approaches, which would solve their pathogenicity imposed by the aberration in splicing.However, the presence of the amino acid change would still hinder the protein's full functionality.Therefore, it is important to determine the effect of the amino acid substitution to define its effect over the overall protein's functionality.The specialized function of ABCA4 imposes challenges in determining its activity as it is not behaving like many other transmembrane proteins.While its structure resembles the one of the transmembrane conductance regulator (CFTR) from the ABCC7 group (36), ABCA4's function is more complex than CFTR's straightforward ion f lux, which allowed setting up protocols for in vitro functional assays (37).So far, functional analyses of ABCA4 have been performed only in extracellular environments or cultured cells and in conditions that involve the use of mild detergents, which can, regardless of their mild nature, affect the activity of the protein and give a false image of one variant's effect.For example, the functional assessment of the very common p.(Gly1961Glu) variant suggests the variant's detrimental effect on ABCA4 function, regardless of its previously established mild or hypomorphic nature (15).Considering all, the above-mentioned ADAR-mediated editing and Exon Editor technology would address the entire underlying cause, whereas AONs would only address the aberrant splicing, leaving the impact of the amino acid change untreated.However, since the ADAR system relies firstly on the recruitment and, secondly, the limited availability of endogenously expressed ADARs, correcting the target can only be achieved within the confines of the available ADAR enzymes.
We observed very low or no expression of correctly spliced ABCA4 mRNA for 12 out of 18 tested variants, and the severity category for some of them worsened.In fact, with regard to the ACMG/AMP classification, the certainty for severity increased in four variants because of the observed in-frame deletions in the RNA or shifts in the open reading frame.Before this study, variants c.1977G>A, c.3407G>T, c.3462C>T and c.3703A>G were classified as variants of uncertain significance.Based on the splice assay results they became likely pathogenic.From the five synonymous variants that were part of the analysis, only the c.4203C>A variant, that was previously classified as 'benign' and thus served as a negative control (11), led to exclusively correctly spliced ABCA4 transcript, as expected.On the other hand, three synonymous variants (c.3462C>T, c.4446C>A, c.6207C>T) yielded ≤ 20% of correctly spliced mRNA which allowed them to be labeled as 'severe.'In addition, the c.3096A>T variant led to the generation of 41% correctly spliced mRNA, which categorized it as a 'mild' STGD1 variant.Since these variants do not alter the amino acid sequence, the discovery of missplicing suggests that a splicing modulation approach could be the only necessary treatment in alleviating the STGD1 phenotype.In addition, this study excluded coding ABCA4 variants located at the first, second, penultimate and last positions of exons as these very likely have an impact on splicing.The in vitro assessment of these would likely reveal additional missense and synonymous ABCA4 variants that inf luence splicing and are therefore interesting candidates for future studies.Moreover, we omitted variants that were previously classified as 'moderately severe' and 'mild,' regardless of whether the DS for AG or DG attributed by SpliceAI exceeded the predefined threshold.These variants may have a partial effect on splicing based on their severity assessment, making them compelling candidates for future investigations.
Bioinformatic-based assessment of pathogenicity prediction is vastly used to determine the possible underlying cause for the association between variants and disease.However, it is worth mentioning that certain pathogenic events linked to missplicing arise in a tissue-specific environment, and in silico prediction tools that operate using generic splicing data may fail to predict them.Therefore, these pathogenic events may require validation through in vitro experiments (18).For example, the common severe ABCA4 c.5461-10T>C variant is not predicted to cause splicing aberrations when using SpliceAI (delta scores for AG, AL, DG, and DL < 0.08); however, in vitro investigations clearly showed its severe effect because of exon 39 or exons 39/40 skipping resulting in frameshifts (38).The use of midigenes for splicing prediction purposes highly facilitated the discovery of novel splicing aberrations in ABCA4 that allowed for development of splicing-targeted therapies.Results from previous investigation have reaffirmed the preference for midigenes over minigenes.Minigenes usually contain one exon and short sequences of adjacent introns, thus their very restricted genomic content may provide misleading results (20).Most variants examined in this study were attributed DS > 0.20 by SpliceAI either for creation or activation of alternative cryptic SASs or SDSs, which were confirmed, in most cases, by midigene assays.However, c.5088C>G was the only variant to cause the splicing aberration in sites predicted differently by SpliceAI, whilst c.5367C>G did not display any splicing disruption and yielded 100% correctly spliced ABCA4 RNA.Variant c.5367C>G was attributed a DS above the 0.20 threshold (DS = 0.32) for AG and did not fall within any severity category because of limited data from previously screened STGD1 probands (11).In addition, this variant is also the only AG variant that did not display an AL DS > 0.10, as opposed to all other selected AG variants, which suggests that the AG prediction might need to be combined with a prediction for AL in order to result in the predicted missplicing event.It is likely that c.5367C>G could result in splicing aberration if investigated in a wider genomic context, or that its pathomechanism is yet to be discovered.In fact, some studies have highlighted inconsistencies between splicing aberrations detected using midigenes or simpler cellular models, as opposed to more sophisticated models derived from reprogrammed induced pluripotent stem cells, such as photoreceptor precursor cells or retinal organoids.As the genomic context of the complex models closely resembles that of the native retinal environment, these are more representative of the actual splicing process in the retina (29,32,39,40).These findings point to the limits imposed by the midigene system.Therefore, development of novel therapeutic compounds should always involve the validation in complex cellular systems, while midigenes could serve as a screening tool only.
Four exon 30 variants were analyzed, two of which, i.e. c.4454C>T and c.4457C>T, were associated with AG DSs of 0.14 and 0.12, while the other two variants, c.4446C>A and c.4469G>A, were attributed high DG DSs (0.89 and 0.80, respectively).We identified that variants c.4454C>T and c.4457C>T strengthened an existing cryptic SAS at position c.4466, while variant c.4469G>A strengthened an existing cryptic SDS at the neighboring c.4467 position.Moreover, variant c.4446C>A leads to activation of a second existing cryptic SDS in position c.4444.The existence of these 'dual splice sites,' a term coined for neighboring and partially overlapping acceptor and donor splice sites, may result in the binding of many splice factors and thereby could play a role in the effect of the relatively mild variants c.4454C>T and c.4457C>T (41).However, we also observed that all these variants contributed to changes of exonic splicing regulatory (ESR) sequences, as shown in Supplementary Material, Figure S14, regardless of the significant differences in the attributed DSs.In fact, both the low DS variants c.4454C>T and c.4454C>T contributed to the deletion of exonic splice enhancers (ESEs), just as the c.4446C>A variant.At the same time, variants c.4446C>A, c.4454C>T and c.4469G>A led to generation of novel exonic splice silencers in exon 30.These observations support the hypothesis initially proposed by Moles-Fernández and colleagues that SpliceAI may face challenges in detecting changes in the ESR landscape, thus overlooking the effect one variant may have on splicing (42).Previous research has shown that even synonymous coding variants can have a role in altering ESRs and causing missplicing events associated with disease.For example, Collin et al. described a synonymous variant in the TECTA gene to cause a loss of an ESE, resulting in defective splicing and, consequently, DFNA8/12 hearing impairment (43).Therefore, besides assessing the effect one variant may have on the splice sites, it is crucial to determine its effect on the ESR landscape to accurately predict potential splicing aberrations.
SpliceAI was deemed a dependable in silico tool for assessing potential splice alterations in non-canonical splice site and deep-intronic ABCA4 variants, following a comparison of several established and deep learning tools (18).However, it is worth mentioning that positive predictions for these types of variants should not be generalized to the overall validity of variant interpretation, especially since the tool's training may only focus on estimating the potential impact on splicing for such variants rather than the coding variants.Nonetheless, the fact that 15 out of 18 investigated variants demonstrated the missplicing predicted by SpliceAI in midigene-based assays, lends support to the reliability of this in silico tool for interpreting splicing in coding ABCA4 variants.We observed a significant difference in the mean DSs between AG (0.44) and DG (0.83).Among the analyzed AG variants, two variants, c.1977G>A and c.3703A>G, were attributed considerably higher DSs of 0.99 and 0.97, respectively, followed by DS of 0.34 that was attributed to c.5088C>G.The large discrepancy in DS was ref lected in the amount of correctly spliced RNA, as c.1977G>A and c.3703A>G yielded considerably lower amount of correctly spliced ABCA4 and were categorized as 'severe' variants, unlike the other investigated AG variants, none of which fell within the 'severe' category.Furthermore, the majority of analyzed DG variants was assigned to the 'severe' category based on the amount of correctly spliced ABCA4, save the 'mild' c.3096A>T variant, which also had the lowest DS when compared with the rest of investigated DG variants.Even though the strength of a DS attributed by SpliceAI does not necessarily correlate with the amount of misspliced product, but rather represents the likelihood for the event to happen, we could not overlook this interesting observation.The possible correlation between the strength of the DS and the amount of correctly spliced RNA in missense ABCA4 variants needs further investigation in order to draw more robust conclusions.In addition, we observed that 14 out of 15 variants with AG or DG DS > 0.20 resulted in predicted splicing effects, which allows us to conclude that the arbitrary threshold for AG or DG DS > 0.20 is very likely to predict the missplicing.However, further investigations that involve coding variants with AG or DG DSs < 0.20 are necessary to strengthen this conclusion.
In conclusion, we report novel missplicing events linked to missense and synonymous variants in ABCA4 by implementing in silico-based splicing predictions by SpliceAI and in vitro splicing assays.The aberrant splicing allows a better understanding of the causality of the tested variants, and a more trustworthy severity assessment.The implications of these findings are significant, as they point toward novel treatment strategies that could be used for STGD1 individuals who carry these variants.Particularly, those involving synonymous variants that lead to splice aberrations may be amenable for splice modulating therapies.These results underscore the urgent need to further explore the pathomechanisms of other coding variants in ABCA4 in order to better understand the manifestation of disease and develop and pursue new treatment approaches targeting STGD1.

SpliceAI for the selection of missense and synonymous variants for midigene splice assays
To select the missense and synonymous variants in ABCA4 with a potential effect on splicing, the intronic regions were excluded from the analysis.Furthermore, variants affecting nucleotides located at the first, second, penultimate and ultimate position of each exon were not analyzed as these are most likely to interfere with splicing.The remaining coding nucleotides were filtered based on the assigned protein effect and only the missense or synonymous variants previously identified in STGD1 probands (therefore reported in LOVD; www.lovd.nl/ABCA4)that were previously (11) classified as 'severe,' 'moderately severe' or lacked the severity score were selected.These were assigned a delta score ranging from 0 (no predicted effect on splicing) to 1 (very likely effect on splicing) for acceptor gain (AG), acceptor loss (AL), donor gain (DG) and donor loss (DL) using SpliceAI.This study focused on those variants that were assigned a delta score > 0.20 for AG or DG.Variants c.4203C>A, c.4454C>T and c.4457C>T were assigned DS AG < 0.20, but were included in the analysis either as negative control or because of their localization near the cryptic SAS/SDS in exon 30 in ABCA4.

Figure 1 .
Figure 1.Schematic representation of wild-type ABCA4 midigene-constructs used in this study for introducing ABCA4 coding variants for splicing analysis.The numbers below rectangles indicate the ABCA4 exons.The binding locations of primers used for transcript analysis by RT-PCR are represented by triangles.and premature stop codon (p.Tyr339Cysfs * 37).This event was not identified in the WT BA7 sample.The high DSs for the BA12 DG variants c.2128A>G and c.2273C>T predicted the creation of new SDSs, competing with the canonical SDSs of exons 14 and 15, respectively.These

Figure 2 .
Figure 2. Splicing defects caused by five missense ABCA4 variants creating new SASs.The RT-PCR of RNA derived from mutant midigenes and the corresponding WT plasmids were visualized with gel electrophoresis.The green triangles represent the splice site predictions for acceptor gain (AG) by SpliceAI.Acceptor losses (AL; red triangles) were also shown if the delta scores (DSs) were > 0.10.Fragments for which the sequence information suggested a PCR artifact are indicated by asterisks.(A) For the mutant c.1977G>A midigene, two splicing defects were detected (fragments 2 and 4) that corresponded to a skip of the first 41 nt of exon 14 and a complex splice defect resulting in the absence of the first 41 nt of exon 14 and the complete exon 15.Skipping of exon 15 was also observed in the WT construct as seen previously (20).(B) RT-PCR and sanger sequencing of the c.3703A>G BA17 midigene revealed an in-frame 96-bp deletion (fragment 2) at the 5 end of exon 25, as opposed to the WT BA17 which did not show this defect.(C) Missense variants c.4454C>T and c.4457C>T led to the generation of fragments that were confirmed by Sanger sequencing to lack exon 29 (fragment 2) completely, and a combination of a deletion of exon 29 and the first 114 nt of exon 30 (fragment 3).The latter was absent in the RNA derived from WT BA20.(D) Variant 5088C>G led to a novel transcript with an in-frame deletion of 69 nt at the 5 end of exon 36 (fragment 2).A complete deletion of exon 35 was observed in fragment 3.

Figure 3 .
Figure 3. Splicing defects caused by 11 variants creating new SDSs.The RNA products content, extracted from midigene-transfected HEK293T cells, were studied with RT-PCR and Sanger sequencing.The green triangles represent the splice site predictions for donor gain (DG) by SpliceAI.Donor losses (DL; red triangles) were also shown if the delta scores (DSs) were > 0.10.(A) Variant c.1022A>T led to the formation of an alternative SDS 83 nt upstream of the canonical SDS.(B) Variants c.2128A>G and c.2273C>T caused in-frame deletions at the 3 end of exons 14 and 15, respectively, because of newly created SDS upstream the canonical donor sites.ABCA4 exon 15 was observed in both mutant and WT constructs as a single event (fragment 2) or, in case of c.2128A>G, in combination with the predicted splicing aberration (fragment 4).(C) RT-PCR for the c.3096A>T BA15 construct identified two isoforms that were not present in the BA15 WT sample (fragments 2 and 3).Fragment 2 missed the last 96 nt of exon 21, which was predicted by the in silico analysis.Asterisks denote fragments in which the sequence information resulted from PCR artifacts, or the sequence could not be identified.(D) Variants c.3407G>T and c.3462C>T caused the activation of cryptic SDS 117 nt (fragment 2) and 62 nt (fragment 3) upstream the canonical SDS in exon 23, respectively.(E) Variants c.4446C>A and c.4469G>A resulted exclusively in misspliced ABCA4 RNAs.For c.4446C>A, sequence analysis confirmed fragment 3 to lack 96 nt at the 3 of exon 30, while fragment 4 missed the complete exon 29 and first 96 nt of exon 30.Similarly, for c.4469G>A, fragments 5 and 6 represented the exon 29 and the latter in combination with a deletion of 73 nt at the 3 of exon 30.Complete skip of exon 29 was detected upon expression of the BA16 WT, which was observed for variant c.4446C>A as well.(F) Aberrant splicing was detected for variants c.6207C>T, c.6272T>A and c.6339C>G introduced in BA28, where new cryptic SDSs were activated 77 nt (fragment 2) or 9 nt (fragment 3) upstream of the canonical SDS of exon 45, or 47 nt upstream the canonical SDS of exon 46 (fragment 4).
and 46, which was confirmed by RT-PCR results presented in Figures 3F and Supplementary Material, Fig. S12.Specifically, c.6207C>T caused the activation of a new cryptic SDS that excluded the last 77 nt of exon 45, resulting in p.Thr2070 * .Variant c.6272T>A produced an in-frame deletion of the last 9 nt in exon 45 because of a new SDS upstream of the canonical SDS, leading to p.Val2092_Leu2094del. Variant c.6339C>G in exon 46 caused a shift in the open reading frame by activating a new SDS 47 nt upstream the 3 end of exon 46, resulting in p.Val2114Hisfs * 5.
variants and their novel effect on RNA and protein, with the old ACMG/AMP classification assessed by Cornelis et al. (22) and the updated ACMG/AMP classification based on the observed splice defects.The variants for which the ACMG/AMP classification changed are represented in bold.VUS, variant of uncertain significance

Table 1 .
(11)4 coding variants predicted to introduce new SASs.DS AG, delta score for strongest acceptor gain prediction.Severity category assessed by Cornelis et al.(11).N/A, not applicable because of limited amount of data in previous study

Table 2 .
(11)4 coding variants predicted to introduce new SDSs.DS DG, delta score for strongest donor gain prediction.Severity category assessed by Cornelis et al.(11).N/A, not applicable because of limited amount of data in previous study