Widespread alternative splicing dysregulation occurs presymptomatically in CAG expansion spinocerebellar ataxias

Abstract The spinocerebellar ataxias (SCAs) are a group of dominantly inherited neurodegenerative diseases, several of which are caused by CAG expansion mutations (SCAs 1, 2, 3, 6, 7 and 12) and more broadly belong to the large family of over 40 microsatellite expansion diseases. While dysregulation of alternative splicing is a well defined driver of disease pathogenesis across several microsatellite diseases, the contribution of alternative splicing in CAG expansion SCAs is poorly understood. Furthermore, despite extensive studies on differential gene expression, there remains a gap in our understanding of presymptomatic transcriptomic drivers of disease. We sought to address these knowledge gaps through a comprehensive study of 29 publicly available RNA-sequencing datasets. We identified that dysregulation of alternative splicing is widespread across CAG expansion mouse models of SCAs 1, 3 and 7. These changes were detected presymptomatically, persisted throughout disease progression, were repeat length-dependent, and were present in brain regions implicated in SCA pathogenesis including the cerebellum, pons and medulla. Across disease progression, changes in alternative splicing occurred in genes that function in pathways and processes known to be impaired in SCAs, such as ion channels, synaptic signalling, transcriptional regulation and the cytoskeleton. We validated several key alternative splicing events with known functional consequences, including Trpc3 exon 9 and Kcnma1 exon 23b, in the Atxn1154Q/2Q mouse model. Finally, we demonstrated that alternative splicing dysregulation is responsive to therapeutic intervention in CAG expansion SCAs with Atxn1 targeting antisense oligonucleotide rescuing key splicing events. Taken together, these data demonstrate that widespread presymptomatic dysregulation of alternative splicing in CAG expansion SCAs may contribute to disease onset, early neuronal dysfunction and may represent novel biomarkers across this devastating group of neurodegenerative disorders.


Introduction
Spinocerebellar ataxias (SCAs) are a genetically heterogeneous group of rare, dominantly inherited neurodegenerative disorders characterized by progressive ataxia.2][3] Other interconnected regions of the nervous system, in particular the brainstem, are also involved and drive symptomology to a greater or lesser extent across the different SCAs.The most common SCAs are those caused by CAG repeat expansion mutations including SCA types 1, 2, 3, 6, 7, 12 and 17 in which the expanded CAG tracts are located in ATXN1, ATXN2, ATXN3, CACAN1A, ATXN7, PPP2R2B and TBP, respectively. 1,2For SCAs 1-3, 6, 7 and 17, the CAG repeat expansions are within the coding regions of the respective genes resulting in expression of polyglutamine expansion proteins.The expanded polyglutamine tracts alter protein conformation leading to disruption of normal function and protein aggregation. 3iven the shared trinucleotide repeat expansion mutation and the shared broad symptomology across CAG expansion SCAs, disruption to several cellular pathways has been implicated in disease pathogenesis across CAG expansion SCAs.Disruption to neuronal ion channels and alterations in membrane potential that impact Purkinje neuron firing and function have been reported across many CAG expansion SCAs. 2,45][6][7] These changes in ion channel properties and membrane potential are not surprisingly associated with changes to synaptic signalling and function. 3lterations in nuclear processes, such as DNA damage and repair and transcriptional dysregulation, have also been reported across multiple CAG expansion SCAs. 2,3Whilst attempts have been made to understand how CAG expansion mutations lead to disruption of these cellular pathways and contribute to neuronal dysfunction, degeneration and symptom onset, it remains a challenge to identify key transcriptomic drivers of disease pathogenesis in presymptomatic models of CAG expansion SCAs.
In several other trinucleotide repeat expansion diseases, alternative splicing dysregulation has been directly linked to key symptoms. 8Alternative splicing is a highly regulated mechanism that increases genetic diversity by processing RNA transcripts into distinct combinations of exons.][10] This process produces the mature mRNA transcripts and protein isoforms required by the cell type or developmental stage for normal functioning. 9In disease, these regulatory processes can go awry leading to dysregulation of alternative splicing resulting in different mRNA and protein isoform expression, which can have specific functional consequences and result in disease symptoms. 8,10he contribution of disease-associated dysregulation of alternative splicing to key symptoms is well characterized in the CTG repeat expansion disease, myotonic dystrophy type 1 (DM1). 8For example, missplicing of the CLCN1 chloride channel mRNA leads to inclusion of exon 7A, which contains a premature stop codon.This triggers nonsense mediated decay of CLCN1 transcripts, resulting in fewer CLCN1 ion channels and the hallmark symptom of DM1, myotonia. 11Likewise, missplicing of SCN5 sodium channel mRNA, leads to inclusion of fetal exon 6A instead of adult exon 6B and results in cardiac arrythmia. 124][15] Together, these studies demonstrate that alternative splicing dysregulation can contribute directly to disease symptoms and can be used as biomarkers for monitoring disease progression in patients.
Studies in other CTG/CAG trinucleotide repeat expansion diseases have also implicated dysregulation of alternative splicing in disease pathogenesis.For example, in SCA8, the expression of the CUG expansion RNA has been shown to lead to splicing changes in a glutamate ionotropic receptor NMDA type subunit and a GABA transporter, which may contribute to the predicted loss of GABAergic inhibition in SCA8. 16Likewise, expression of expanded CUG RNAs was also linked to missplicing of a glutamate ionotropic receptor NMDA type subunit in SCA2. 179][20] Similarly, expression of CAG repeat expansions has been linked to missplicing of CLCN1 and SERCA1 (ATP2A1) in cell culture systems, with SERCA1 being misspliced in SCA3 patient-derived fibroblasts. 21hile these studies indicate the potential for alternative splicing dysregulation in several CAG repeat expansion SCAs, we do not currently understand the extent of splicing changes and their possible contribution to disease across these disorders.To address this gap, we performed a comprehensive analysis of alternative splicing across published RNA sequencing (RNA-Seq) data from CAG expansion SCA mouse models.In our analysis we compared mouse models expressing short and long CAG repeat expansions, investigated the interplay between differential gene expression and alternative splicing dysregulation and assessed the potential of alternative splicing as a target engagement biomarker for therapeutic studies in CAG SCAs.

RNA sequencing data analysis
RNA-Seq datasets were acquired using the NCBI Sequence Read Archive (SRA) Explorer.FASTQ file quality was assessed using FastQC (version 0.11.9) and datasets with an average read depth of >35 million paired end reads were included in this study.FASTQ files with >20% adapter content were trimmed using fastp (version 0.23.2) and for samples comprised of more than one FASTQ file, files were merged to reach read depth threshold.FASTQ files were then aligned to the GRCm38/mm10 mouse reference genome using STAR (version 2.7.10a). 22RNA-Seq reads aligned to the Atxn1 reference sequence were viewed using the Integrative Genomics Viewer (IGV 2.16.0) 23 to identify mutations associated with the 154Q expansion allele.Differential gene expression was performed in RStudio (2022.12.0;R 4.2.2) using DESeq2 (version 3.16) 24 and genes that passed a threshold of Padj < 0.05 and log2FC > |1.5| were considered significantly differentially expressed.Alternative splicing analysis was performed using rMATS (version 4.1.2) 25 and events were considered significantly misspliced if the false-discovery rate (FDR) < 0.1 and ΔPSI (per cent spliced in) > |0.1|.The number of significant skipped exon events passing a threshold of P < 0.05 and ΔPSI > |0.1| is also reported for comparison with published datasets.All ΔPSI values are converted from a ratio to a percentage with the threshold adjusted accordingly: ΔPSI > |10%|.Exon numbers are referred to by previously published exon numbers for the same coordinates or based on counting the first exon in a gene as exon 1. Coordinates for reported exons are provided in Supplementary Table 1.Upset plots were generated using the ComplexHeatmap (2.10.0)R package.

Gene ontology enrichment analysis
Gene ontology enrichment analysis was performed using Metascape (version v3.5.20230101) 26 and the Database for Annotation, Visualization and Integrated Discovery (DAVID). 27,28Functional annotation data from DAVID was used to create enrichment maps in Cytoscape (version 3.9.1) 29with a node Q-value of 0.05 and an edge cut-off of 0.375 using the Edge weighted spring embedded layout based on overlap size.

Allele-specific expression analysis for Atxn1 2Q and Atxn1 154Q using RNA-Seq data
Custom FASTA genomes were generated using the GRCm38/mm10 mouse reference genome edited to remove all sequences of Atxn1 and Atxn1-l and add in minimal sequences for Atxn1 2Q and Atxn1 154Q alleles such that a 126 bp read or a 150 bp read would align with at least three sequence differences between the two alleles.Kallisto (version 0.48.0) 30 was used to perform pseudo-alignment and quantify transcripts per million (TPM ± standard error).These custom genomes are available on request from the authors.Differential gene expression analysis was performed using sleuth (version 3.9) 31 and data are reported as log2FC between SCA1 and SCA1 treatment or genetic cross ± standard error with FDR corrected P-values.

Mouse studies
Mice used in this study were housed and treated in accordance with the NIH Guide for the Care and Use of Laboratory Animals and complied with the Albany Medical College Institutional Animal Care and Use Committee (IACUC) guidelines under approved animal care and use protocol numbers 20-04002 and 23-03002.Atxn1 154Q/2Q mice were originally obtained from Jackson Laboratories (strain number 005601) and maintained according to established breeding protocols with genotyping performed by PCR, at weaning and retroactively, following existing protocols. 32Age and gender matched Atxn1 154Q/2Q and wild-type (WT) littermates were anaesthetized at 6, 12 or 23 weeks of age using urethane in saline (1.2-1.5 g/kg) followed by a double thoracotomy and perfusion through the ascending aorta with 20 ml 1 × PBS.Whole cerebellum was removed and stored at −70°C.

RT-PCR splicing analysis
RNA was extracted from cerebellum of wild-type and SCA1 Atxn1 154Q/2Q KI mice using TRIzol (Ambion, Life Technologies) following the manufacturer's instructions and a DNA digestion was performed using the TURBO DNA-free Kit (Invitrogen).RNA concentrations were measured using nanodrop and 500 ng total RNA was reverse transcribed using SuperScript IV reverse transcriptase (Invitrogen) with random hexamers (IDT).PCR for selected splicing events was performed using the Taq 2x master mix (NEB) with 2 μl cDNA under the following conditions: 95°C 30 s-32 cycles of 95°C 30 s, primer specific Tm 30 s, 68°C 30 s-68°C 5 min.Primer sequences, annealing temperatures and product sizes are as follows: Kcnma1 exon 23b Fw (5ʹ GGACAGATCATCACCCGACA 3ʹ), Rv (5ʹ ACAAGCAAAGGGCTGTGTGA 3ʹ), Tm 59.6°C, inclusion 260 bp, exclusion 180 bp; Anks1b exon 5 Fw (5ʹ ACAGACAGAGAATTCT ACAAGCGA 3ʹ), Rv (5ʹ TTGAGGCTGTGGCTTCATTA 3ʹ), Tm 48.4°C, inclusion 610 bp, exclusion 540 bp; Trpc3 exon 9 primers are as previously reported. 33PCR products were resolved on a 5300 Fragment Analyzer (Agilent Technologies) using the 905 separation gel for Kcnma1 and the 910 separation gel for Trpc3 and Anks1b, following the manufacturer's protocol.PCR products were resolved in technical triplicates on the fragment analyser and the average PSI is reported for each sample.PCR products were also resolved via agarose gel electrophoresis to confirm banding patterns seen via fragment analysis.

Allele-specific RT-qPCR for Atxn1 2Q and Atxn1 154Q
To confirm the specificity of our bioinformatic allele selective expression analyses for Atxn1 2Q and Atxn1 154Q , an allele specific RT-qPCR assay was performed.The following primer sets were used for detection of the expanded allele 154Q-Fw (5ʹ CTTACGCGGGCTTTATCCCT 3ʹ) and 154Q-Rv (5ʹ AGCCTTGTGTC CCGGCG 3ʹ) and the wild-type allele 2Q-Fw (5ʹ CAGGCACCAGGA CATAAGGTTG 3ʹ) and 2Q-Rv (5ʹ CGTCTGATGGGGATGGAGGT 3ʹ), agarose gel electrophoresis was used to confirm primer specificity.Control reactions were performed for Gapdh Fw (5ʹ GCGAGACCCCACTAACATCA 3ʹ), Rv (5ʹ GGCGGAGATGATGAC CCTTT 3ʹ) and total Atxn1 Fw (5ʹ GAGAATCGAGGAGAGCCAC 3ʹ), Rv (5ʹ AGACTTCGACACTGACCT 3ʹ).All qPCR reactions were performed in technical quadruplicates using 1 μl cDNA with the PowerUP SYBR green master mix (ThermoFisher) according to the manufacturer's instructions.To confirm specificity of primers, qPCR was performed on RT reactions in which the RT enzyme was replaced with H 2 O (RT−) under the same conditions.RT-qPCR data were analysed using the 2 −ΔΔCt method. 34

Data analysis
For qPCR and alternative splicing validation analyses, statistical analysis was performed using GraphPad Prism 9. Grubb's test with an alpha of 0.05 was used to identify and remove any outliers for alternative splicing analyses of 6-week-old mice.Data are represented as mean ± standard error of the mean (SEM) and statistical analyses were performed using two-tailed Student's unpaired t-test.

RNA sequencing data from CAG expansion SCA mouse models show alternative splicing dysregulation
6][37][38][39][40][41][42][43][44][45][46][47][48][49] Here we sought to understand if changes in alternative splicing broadly represent a transcriptomic signature in CAG expansion SCAs.To do this, we performed alternative splicing analysis of publicly available RNA-Seq datasets.We performed a comprehensive search using the Gene Expression Omnibus (GEO) for RNA-Seq data from SCA mouse models using exact matches with search terms 'Spinocerebellar ataxia type N' and 'SCAN' where N = 1, 2, 3, 6, 7, 8, 12 or 17.All datasets identified using these search terms, that were from brain regions of CAG repeat expansion expressing mouse models, generated using RNA-Seq and available before August 2022 were included in our analysis.In addition, publicly available datasets reported with BioProject numbers but no GSE numbers were identified from publications.][37][38][39][40][41][42][43][44][45][46][47][48][49] To quantify a specific alternative splicing event using RNA-Seq data, multiple reads must map across the exon-exon junctions for both the inclusion and exclusion products.This contrasts with differential gene expression where reads mapping anywhere in a specific transcript can be used for quantification of that transcript's expression.1][52] The 14 studies identified have variable read depths and read lengths (Table 1).Based on a conservative estimate from our previous alternative splicing studies alongside those of others, 13,25,51,53 to maximize sensitivity of detecting alternative splicing events and validity of cross-dataset comparisons, we set a minimum threshold of 35 million paired-end reads.Of the identified studies, 11 passed this threshold including data from SCA1, SCA3 and SCA7 mouse models.Together these 11 studies include 29 control versus disease model comparisons, which we will refer to as datasets (Table 1).0]53 Across all datasets analysed, changes in alternative splicing, which may represent missplicing or dysregulation of alternative splicing, were identified (Table 1 and Supplementary Tables 2-5).These data demonstrate the presence of a shared transcriptomic hallmark across CAG expansion SCA mouse models.

Alternative splicing dysregulation is a cerebellar transcriptomic hallmark of CAG expansion SCA mouse models
To understand whether dysregulation of alternative splicing could contribute to cerebellar phenotypes in CAG expansion SCA mouse models, we performed a comprehensive analysis of the 13 cerebellar datasets identified (Table 1).Of the significant misspliced events identified with a ΔPSI > 10% and FDR < 0.1, skipped exon (SE) or cassette exon events were the most frequently dysregulated across all datasets.In 12 of the 13 datasets, skipped exon events accounted for more than 50% of the total misspliced events (Fig. 1A).Cassette exon or skipped exon events are alternative splicing events where an intervening exon is included or excluded between two other exons to form two different mature mRNAs and result in distinct protein isoforms. 8,10,25,54Alternative splicing of mutually exclusive exons (MXE), retained introns (RI), alternative 5ʹ splice site (A5SS) and alternative 3ʹ splice site (A3SS) were also detected in all datasets (Fig. 1A and Supplementary Table 6).As skipped exon events were the most frequently dysregulated, we sought to understand the extent and effect of their dysregulation.
Although differences in read depth, read length, library preparation and sequencing methods hinder comparisons of splicing event numbers between different studies, 25,[50][51][52]54 for datasets within a study there was an increase in skipped exon events throughout disease progression. Forexample, ATXN1-82Q cerebellum had 93 skipped exon events at 5 weeks and 136 at 12 weeks.Likewise, in the Atxn1 154Q/2Q mouse model there was an increase from 87 skipped exon events at 5 weeks to 154 at 12 weeks of age in dataset GSE122099.This was also seen in the 304Q/304Q SCA3 mouse model with 547 skipped exon events at 2 months of age and 648 at 12 months (Fig. 1A; FDR < 0.1, ΔPSI > 10%).All datasets showed dysregulation of both inclusion events (an exon is included more frequently in SCA than wild-type mice) and exclusion events (an exon is excluded more frequently in SCA than wild-type mice) and across the datasets there was no predominance for dysregulation of inclusion versus exclusion events.The mean ΔPSI for inclusion events ranged from 19.5% to 42.1% with the maximum ΔPSI per dataset ranging from 52.1% to 94.2%.For exclusion events, the mean ΔPSI ranged from 20% to 37.4% with the maximum ΔPSI for exon exclusion ranging from 49.7% to 89.9% across datasets (Fig. 1B).
We next sought to understand the extent to which skipped exon missplicing was shared between the datasets.Each dataset contained skipped exon events that were misspliced in only that dataset, as well as events that were misspliced in two, three or more datasets (Fig 1C, Supplementary Fig. 1A and Supplementary Table 4).Across all datasets, a total of 2945 skipped exon events (ΔPSI > 10%, FDR < 0.1) were detected with 524 being shared between two or more datasets and one event, Trpc3 exon 9, misspliced in seven datasets (Fig 1D and Supplementary Fig. 1A).By performing a principal component analysis (PCA) of skipped exon events shared between four or more datasets, we saw a global trend of SCA samples (triangles) clustered to the left and wild-type samples (circles) to the right.This distribution occurred as a global trend across all datasets and within each dataset, except for the 2 month and 17.5-month SCA3 datasets, which showed limited overlap with the SCA1 and SCA7 datasets (Fig 1E, Supplementary Fig. 1A and Supplementary Table 4).
We next wanted to identify if the cerebellar missplicing occurred in disease-relevant pathways previously implicated in CAG expansion SCAs.We performed gene ontology enrichment analysis of the genes with skipped exon events dysregulated in two or more datasets (524 skipped exon events) using Metascape. 26he four most enriched gene ontology summary terms were regulation of synapse organization, actin filament-based processes, regulation of membrane potential, and regulation of system process.Of the top 20 enriched gene ontology terms, 17 clustered into three broad categories based on the top four terms: synapse structure and function (e.g.splicing factor NOVA regulated synaptic proteins) (Supplementary Table 7), cytoskeleton and neuron projection (e.g.neuron projection development), and ion channels and membrane potential (e.g.calcium ion import) (Fig. 1F, Supplementary Fig. 2 and Supplementary Table 8).These pathways have all previously been shown to be affected or implicated in SCA disease. 1,2To confirm the enrichment of these pathways at the level of individual datasets, we performed gene ontology enrichment analysis using Metascape 26 (Supplementary Fig. 3A)  and functional annotation clustering using DAVID [27][28][29] (Fig. 1G and Supplementary Fig. 4) for significant skipped exon events (P < 0.05, ΔPSI > 10%).Both analyses confirmed that skipped exon events occurred in genes implicated in synapse structure and function, the cytoskeleton and ion channels.Interestingly, both of these broader analyses revealed enrichment of nuclear process-associated terms such as DNA damage and repair, and zinc-fingers and transcriptional regulation.Terms specifically enriched in single datasets such as Ankyrin repeat domain in the 12-month 304Q/304Q SCA3 dataset were also identified in this analysis, but these were less common than the shared terms (Fig. 1G, Supplementary Figs.3A, 4 and Supplementary Table 8).Together these data demonstrate that dysregulation of alternative splicing is a shared transcriptomic hallmark of cerebellum from CAG expansion SCA mouse models and that alternative splicing occurs in genes involved in cellular pathways previously implicated in SCAs.

Alternative splicing dysregulation occurs across brainstem and cortical regions of CAG expansion SCA mouse models
We next investigated whether alternative splicing occurred in other affected brain regions of SCA mouse models.Similar to the cerebellar analyses, skipped exon events accounted for >50% of all misspliced events (ΔPSI > 10%, FDR < 0.1) in SCA mice versus wild-type mice for 10 of the 13 cortical and brainstem region datasets (Fig. 2A and Supplementary Table 6).Importantly, few events were detected in the inferior olives from ATXN1-82Q mice (Fig. 2A), which only express the CAG expansion transgene in cerebellar Purkinje neurons. 35,55Across these datasets, dysregulation of both inclusion and exclusion events was observed with the mean ΔPSI for inclusion events ranging from 17.6% to 42.8% and for exclusion events from 17.1% to 39.8%.The ranges for maximum ΔPSI values were 43.1% to 100% for inclusion events and 40.3% to 86.6% for exclusion events (Fig. 2B).Again, we identified more dysregulated skipped exon events unique to each dataset than shared, but all datasets except the ATXN1-82Q inferior olive datasets included events that were misspliced in five or six datasets (Fig. 2C and Supplementary Fig. 1B).Overall, 2039 skipped exon events were identified with 396 dysregulated in two or more datasets (Fig. 2D, Supplementary Fig. 1B and Supplementary Table 5).Using events that were shared between five or six datasets, we performed PCA and saw a global separation of the wild-type (circles, upper left) and SCA (triangles, lower right) samples as a global trend and within each dataset (Fig. 2E).
To understand possible implications of these misspliced events, gene ontology enrichment analysis of the genes with skipped exon events dysregulated in two or more datasets (396 skipped exon events) was performed using Metascape. 26The four most enriched gene ontology summary terms were potassium ion transmembrane transport, negative regulation of axon extension, regulation of neuron migration, and centriole replication (Fig. 2F, Supplementary Fig. 5 and Supplementary Table 8).Gene ontology enrichment analysis of significant skipped exon events (P < 0.05, ΔPSI > 10%) at the level of individual datasets, using both Metascape 26 and functional annotation clustering, [27][28][29] confirmed that skipped exon events occurred in genes implicated in the cytoskeleton and neuron projections, ion channels and membrane potential, and nuclear processes, including regulation of transcription and DNA damage and repair.These broader analyses also identified enrichment of terms related to synapse structure and function, such as synaptic signalling, regulation of vesicle-mediated transport and postsynaptic density (Fig. 2G, Supplementary Figs.3B, 6 and Supplementary Table 8).Together these analyses demonstrate that alternative splicing dysregulation also occurs across affected brain regions other than cerebellum from SCA1 and SCA3 mouse models and is implicated in the dysregulation of disease relevant pathways.

Mouse models with short versus pathogenic CAG repeats have distinct splicing profiles
To understand how alternative splicing dysregulation is connected with CAG repeat expansions, we assessed if missplicing was repeat-tract length-dependent.Of the 11 studies identified that passed our read depth threshold, three included data from mouse models with short, non-pathogenic CAG tracts generated using the same strategy as the CAG expansion SCA mouse model in each study. 38,39,46We performed alternative splicing analysis of these short repeat mice versus the age matched expansion SCA mice and, where available, age-matched wild-type mice (Table 1).We noted an increase in the number of skipped exon events (FDR > 0.1, ΔPSI > 10%) in the expansion mice compared to the short repeat mice when analysed against wild-type mice for both SCA1 (30Q versus WT: 136 skipped exon; 82Q versus WT: 208 skipped exon) and SCA3 (15Q versus WT: 76 skipped exon; 84Q versus WT: 157 skipped exon) datasets (Table 1).Analysis of the short repeat versus pathogenic repeat length mice revealed alternative splicing changes for all event types, with skipped exon events accounting for >50% of misspliced events across all three datasets (Fig. 3A).The maximum ΔPSI for inclusion values across these comparisons ranged from 66.5% to 84.4% and for exclusion values from 64.7% to 84.5%.The mean ΔPSI for inclusion events ranged from 26.4% to 36.3% and for exclusion values from 27.7% to 33.4% (Fig. 3B and Supplementary Table 9).These data demonstrate that the magnitude of splicing changes for skipped exon events from short repeat versus SCA mouse models are comparable to the magnitude of splicing changes seen for wild-type versus SCA comparisons (Figs 1B  and 2B).
Gene ontology enrichment analysis 26 of skipped exon events significantly alternatively spliced between short and pathogenic repeat length mice (P < 0.05, ΔPSI > 10%) demonstrated that misspliced genes were in disease-related processes.Amongst the enriched terms were synapse structure and function (e.g.splicing factor NOVA regulated synaptic proteins) (Supplementary Table 7), the cytoskeleton (e.g.microtubule cytoskeleton organization), nuclear processes (e.g.nuclear localization) and ion channels (e.g.calcium ion transmembrane transport).Interestingly, nuclear process-associated terms were more commonly enriched across the SCA3 datasets, while ion channel-associated terms were more commonly enriched across the cerebellar datasets (Fig. 3C and Supplementary Table 8).
To understand the relative splicing profiles of short repeat and SCA mouse models, we performed PCAs.For the SCA1 dataset, a PCA of significantly misspliced skipped exon events (FDR < 0.1, ΔPSI > 10%) in one or more of the pairwise alternative splicing analyses (30Q versus WT, 82Q versus WT and 30Q versus 82Q) was performed (Fig. 3D).For the SCA3 datasets, the PCA was performed based on skipped exon events significantly misspliced in two or more pairwise comparisons (Fig. 3E).In both cases, clear separation between short and pathogenic repeat samples was observed.For the SCA1 dataset, PC1 explained 32.5% of the variance between samples, with separation of the 82Q mice versus the 30Q and wildtype mice occurring in this direction.The 30Q and wild-type mice  8 for member gene ontology terms.Full gene ontology terms: *GO:0045843 negative regulation of striated muscle tissue development; # GO:0010823 negative regulation of mitochondrion organization; ‡ GO:0061178 regulation of insulin secretion involved in cellular response to glucose stimulus.(G) Functional classification analysis of significantly misspliced skipped exon events, P < 0.05, ΔPSI > 10%; see Supplementary Fig. 6 for detailed annotation.FDR = false discovery rate; PSI = per cent spliced in.did show distinct clustering along PC2, which explained less variance in the data (25.5%)(Fig. 3D) and could be driven by effects based on transgene insertion sites and transgene copy number. 55For the SCA3 data, the 15Q and wild-type pons samples clustered together and were distinct from the 84Q pons samples (blue, dataset GSE117605).The 15Q and 84Q cerebellar samples (green, dataset GSE178367) likewise showed distinct clustering with the 15Q samples clustering nearer the wild-type and 15Q pons samples (Fig. 3E).
Finally, we assessed the contribution of alternative splicing dysregulation to repeat length-dependent transcriptomic signatures.We performed transcriptomic analysis between the short and pathogenic repeat length mice.For the presymptomatic (5 week) SCA1 82Q versus 30Q cerebellum, we saw over three times as many misspliced skipped exon events (n = 163) as differentially expressed genes (n = 47).Similarly, in the early symptomatic (22 week) SCA3 84Q versus 15Q pons, only three genes were differentially expressed but there were 119 misspliced skipped exon events.However, the symptomatic (17.5 month) SCA3 84Q versus 15Q cerebellum showed more differentially expressed genes (n = 561) than misspliced skipped exon events (n = 445) (Fig. 3F).Together these data demonstrate that mouse models of short CAG repeats and CAG repeats in the pathogenic expansion range for SCAs have distinct splicing profiles affecting pathways relevant to SCA disease, and that at early disease time points alternative splicing may contribute more to global transcriptomic dysregulation than differential gene expression.

Presymptomatic SCA mouse models show disease relevant dysregulation of skipped exon events
We observed an increase in misregulation of skipped exon events across disease progression (Fig. 1A) and greater dysregulation of alternative splicing than differential gene expression in pre-and early symptomatic CAG expansion versus short repeat mice (Fig. 3F).Based on these observations, we next sought to understand if dysregulation of alternative splicing could represent a pathogenic mechanism for early neuronal dysfunction in CAG expansion SCAs.We performed transcriptomic analysis for all presymptomatic datasets, which included six datasets from 5-and 6-week-old ATXN1-82Q and Atxn1 154Q/2Q SCA1 mice and one dataset from 2-month-old 304Q/304Q SCA3 mice.In all datasets, using stringent thresholds for both differential gene expression (Padj < 0.05, log2FC > |1.5|) and alternative splicing (FDR < 0.1, ΔPSI > 10%), we identified more skipped exon events than differentially expressed genes (Fig. 4A and Supplementary Table 10).This effect was especially pronounced in the SCA3 dataset [one differentially expressed gene (DEG); 547 skipped exons] and the 6-week-old Atxn1 154Q/2Q dataset (seven DEGs; 298 skipped exons; Fig. 4A), the latter of which has roughly double the read depth and longer read length than the other presymptomatic SCA1 datasets (Table 1).
Between the four SCA1 presymptomatic cerebellar datasets there were 28 skipped exon events alternatively spliced (FDR < 0.1, ΔPSI > 10%) in two or more of the datasets.Gene ontology enrichment analysis 26 of these events identified that the genes function in: synapse structure and function (e.g.splicing factor NOVA regulated synaptic proteins) (Supplementary Table 7), the cytoskeleton (e.g.regulation of actin cytoskeleton organisation), ion channels (e.g.response to calcium ion) and cell adhesion (e.g.positive regulation of cell substrate adhesion).Of the 15 enriched gene ontology terms, 13 clustered into these four functional categories (Fig. 4B, Supplementary Table 8 and Supplementary Figs 3A and 4 for individual dataset gene ontology analyses).As cell adhesion has not previously been identified as an enriched functional category in our analyses, we performed gene ontology analysis using Metascape 26 for the presymptomatic SCA3 cerebellar dataset (FDR < 0.1, ΔPSI > 10%).Interestingly, we also identified cell adhesionassociated terms such as cell-substrate junction assembly and integrin mediated cell adhesion, suggesting that cell adhesion may be impaired during early stages of disease.We again identified terms related to the cytoskeleton (e.g.regulation of microtubulebased processes) and ion channels and membrane potential (e.g. protein localization to membrane), but we identified very few terms associated with synapse structure and function (Fig. 4C and Supplementary Table 8).These analyses demonstrate that presymptomatic transcriptomic dysregulation in both SCA1 and SCA3 mice is characterized by disease-relevant defects in alternative splicing and not by widespread changes in differential gene expression.

Dysregulation of key alternative splicing events occurs across SCA1, SCA3 and SCA7 mouse models
To understand the possible contribution of alternative splicing to neuronal dysfunction and disease in CAG expansion SCAs, we tracked three specific alternative splicing events dysregulated in presymptomatic mice across all datasets.The three skipped exon events that were dysregulated in six or seven cerebellar datasets (Fig. 1D) were all significantly dysregulated in two of the four  (S776A) in the Atxn1 154Q allele (study GSE163885) 42 and the other had a mutation of a key amino acid (M120A) in the enzyme (PKA Cα) responsible for phosphorylating serine 776 in the ATXN1[82Q] mouse model (study GSE114815) 44 ; both these mouse models showed reduced levels of expanded ataxin1 protein.For all studies, the PSI values were calculated for the treatment condition for all skipped exon events significantly alternatively spliced (FDR < 0.1, ΔPSI > 10%) between control (WT or 15Q) and SCA mice (Supplementary Table 11); we then performed PCA based on these events.Across all eight datasets analysed there was clear separation between the three conditions (control, SCA, SCA-treated), for seven of the eight datasets, this separation occurred primarily in the PC1 direction and for six of these datasets the SCA-treatment mice were approximately equidistant between the control and SCA mice (Fig. 6A and B).The major exception to this was the ATXN1[82Q]-M120A dataset in which the SCA-genetic cross mice clustered distinct from, but close to, the SCA mice with the wildtype mice separated along PC1 from both these clusters (Fig. 6A).Overall, this analysis demonstrates that therapeutic strategies can mitigate alternative splicing dysregulation but that this response occurs to different extents depending on the treatment or genetic manipulation strategy.
To investigate the extent of splicing rescue in each dataset, we next categorized the skipped exon events (control versus SCA: FDR < 0.1, ΔPSI > 10%) based on the size of change (per cent rescue) between the SCA and SCA treatment mice.Events were classified as rescued if the PSI for SCA-treatment changed >10% in the direction of the control mice PSI value and had a ΔPSI > 5% compared to SCA mice.Events were classified as changed in the opposite direction if the ΔPSI > 5% for SCA-treatment versus SCA mice and there was >10% shift in PSI in the opposite direction to the control mice.Events that were not rescued had a ΔPSI < 5% between SCA-treatment and SCA mice or showed <10% rescue.For all datasets except ATXN1[82Q]-M120A, the events that were not rescued or showed an opposite effect represented approximately a quarter to a third of the events per dataset.For the ATXN1[82Q]-M120A dataset, the numbers of rescued and not-rescued/opposite effect events were comparable (Fig. 6C).
Next, we further filtered these skipped exon events to identify events significantly alternatively spliced between SCA and SCAtreated conditions and assessed whether key SCA splicing changes were rescued.We focused on the SCA1-ASO datasets and the ATXN1[82Q]-M120A dataset as these showed significant dysregulation of core SCA splicing changes with an FDR < 0.1 and ΔPSI > 10% for wild-type versus SCA mice (Fig. 5 and Supplementary Fig. 7).In the SCA1 cerebellar ASO dataset, we observed a 78% rescue of Trpc3 exon 9 missplicing (FDR < 0.1; Fig. 6D) and a 77% rescue of Kcnma1 exon 23b missplicing (P < 0.05; Fig. 6E) following Atxn1 ASO treatment.In addition, we assessed rescue of Itpr1 exon 41, which was significantly misspliced (FDR < 0.1, ΔPSI > 10%) in five of 13 cerebellar datasets (Fig. 1D and Supplementary Table 4) and found that the missplicing was rescued by 96% (FDR < 0.1, Fig. 6F).Of all the cerebellar treatment datasets, ATXN1[82Q]-M120A was the only dataset that showed significant changes in splicing in the opposite direction to control mice (Supplementary Table 11).For example, Camk2a exon 14 showed a 78% exacerbation of missplicing (FDR < 0.1; Supplementary Fig. 10A).In contrast to treatment with the Atxn1 ASO, mutation of PKA Cα M120A did not rescue Trpc3 exon 9 or Kcnma1 exon 23b missplicing (Supplementary Fig. 10B and C) but did lead to a 35% rescue of Itpr1 exon 41 missplicing (FDR < 0.1, Supplementary Fig. 10D).Finally, we identified that Atxn1 ASO treatment significantly rescued Bcas1 exon 9 and exon 10 missplicing in the medulla and pons with a minimum rescue of 66% for exon 10 (Fig. 6G) and 82% for exon 9 (Supplementary Fig. 10E).Together these data demonstrate that alternative splicing dysregulation can be used as a transcriptomic readout in therapeutic studies for CAG expansion SCAs.
To investigate the relationship between CAG expansion expression and alternative splicing we took advantage of the sequence differences between the 2Q and 154Q alleles 57 present in the same mouse, to distinguish between the alleles bioinformatically.To confirm these sequence differences, we performed a PCR across the repeat expansion and sequenced the 2Q and 154Q alleles (Supplementary Fig. 11A-C).To validate the specificity of using these sequence differences to distinguish between the 154Q and 2Q alleles, we designed qPCR primers selective for each allele (Supplementary Fig. 11A).The 154Q-allele selective primers did not amplify a product in wild-type mice (Supplementary Fig. 11D and E) and while there was no difference in the total levels of Atxn1 RNA between wild-type and Atxn1 154Q/2Q mice, as expected, the 2Q-allele selective primers detected a 52% reduction in Atxn1 2Q RNA in Atxn1 154Q/2Q mice compared to wild-type mice (P = 0.0019; Supplementary Fig. 11D and F).
Having confirmed the specificity of this approach using qPCR, we generated a custom genome containing sequences, of the same length, specific to the 2Q and 154Q alleles.To account for read length, unique genomes were used for datasets with different read lengths such that every read would have to overlap with at least three of the sequence differences to align to either allele.Because of this, relative TPMs for Atxn1 alleles are not comparable between datasets and do not represent TPMs for full length Atxn1.Comparable to the qPCR approach, the 154Q allele was not detected in wild-type mice from datasets GSE114674 and GSE163885, and SCA1 mice showed ∼30-55% less expression of the 2Q allele in both datasets (GSE114674: WT 40.84 ± 13.44, SCA1 18.77 ± 5.13, Fig. 6H; GSE163885: WT 24.99 ± 6.29, SCA1 17.19 ± 6.717; Fig. 6I).Housekeeping genes did not show differences between WT, SCA1 and SCA1 treated mice within each dataset (Fig. 6H and I).While no difference was seen in expression for either the Atxn1 2Q or Atxn1 154Q alleles for SCA1 versus SCA1-S776A mice, both alleles showed a reduction in expression (Atxn1 2Q : log2FC = −1.03;P = 0.12) with significant reduction of the Atxn1 154Q allele in the SCA1-ASO mice compared to the wild-type mice (Atxn1 154Q : log2FC = −1.61;P = 0.037; Fig. 6J).Together these data demonstrate that expression of Atxn1 2Q and Atxn1 154Q alleles can be selectively quantified and that alternative splicing dysregulation can be rescued by reducing levels of CAG expansion RNAs or by reducing expansion protein levels, as is the case for GSE163885. 42These data suggest that alternative splicing dysregulation may represent a tractable biomarker for preclinical therapeutic studies in CAG expansion SCAs.

Discussion
Dysregulation of alternative splicing has been identified and extensively studied as a key driver of disease pathogenesis in the CTG repeat expansion disorder myotonic dystrophy. 8,10,14Despite evidence implicating splicing changes in multiple CAG expansion disorders, [16][17][18][19][20][21] the extent of alternative splicing dysregulation and its possible contribution to disease across CAG expansion SCAs is poorly understood.Here, we identified that dysregulation of alternative splicing is widespread across CAG expansion mouse models of SCAs 1, 3 and 7.These changes were detected presymptomatically, persisted throughout disease and were present in the cerebellum and other brain regions implicated in SCA pathogenesis, including the pons and medulla.Widespread alternative splicing dysregulation was also detected when CAG expansion SCA mice were compared to mouse models expressing non-pathogenic length repeat tracts.Furthermore, these differences in alternative splicing were comparable in magnitude to the differences between CAG expansion SCA and wild-type mice.We demonstrated that across disease progression, changes in alternative splicing occurred in genes  1).(C) Number of skipped exon events rescued in each dataset for all events significantly alternatively spliced between wild-type and SCA mice.Rescue is defined as >10% rescue (change in splicing towards wild-type PSI) in SCA treated versus SCA untreated mice with a minimum ΔPSI of 5% between SCA treated and untreated.Opposite effect is defined as >10% change in splicing away from wild-type PSI with a minimum ΔPSI of 5% between SCA treated and untreated.All events not classified as rescued or opposite effect are considered non-rescue events unless no information is available for the events in the treatment condition.that function in pathways known to be impaired in SCAs, such as ion channels, synaptic signalling and the cytoskeleton.Finally, we demonstrate that alternative splicing can be used as a target engagement biomarker with key CAG SCA alternative splicing events rescued by Atxn1 targeting ASO.
One of the key findings of this study is the presence of widespread alternative splicing dysregulation prior to global changes in differential gene expression (Fig. 7A).While previous studies have demonstrated widespread differential gene expression changes from early symptomatic stages of disease onwards, differential gene expression studies have so far failed to identify consistent, presymptomatic drivers of disease at the transcriptomic level. 35,37,38ere we demonstrate presymptomatic missplicing of multiple skipped exon events that could directly contribute to neuronal dysfunction prior to degenerative changes.These changes in alternative splicing were also seen between short repeat and expanded repeat mice at presymptomatic and early symptomatic stages of disease when differential gene expression changes were limited. 38mportantly, at both pre-and early symptomatic time points in inferior olive of ATXN1-82Q mice, we did not detect widespread missplicing and, of the changes seen, there was very limited overlap seen with other datasets.Since the ATXN1 transgene is only expressed in Purkinje neurons in this mouse model, 35,55 this finding indicates that alternative splicing dysregulation is likely not a late, downstream response to disease and that it is primarily governed by cell autonomous effects.Together these data demonstrate that alternative splicing dysregulation is a repeat length-dependent transcriptomic hallmark of CAG expansion SCAs and may represent a presymptomatic driver of disease pathogenesis in these disorders.
Given that dysregulated skipped exon events were primarily found in genes linked to pathways known to be disrupted in CAG expansion SCAs, it is plausible that dysregulation of these alternatively spliced events could contribute to the disruption of these interrelated pathways.Interestingly, some pathways were affected in subgroups of the datasets studied indicating disease stage, brain region or SCA type-specific effects of missplicing.For example, cell adhesion was enriched for presymptomatic cerebellar datasets and nuclear process associated terms were more frequently associated with SCA3 or brainstem datasets than in SCA1 cerebellar datasets.The latter of these findings may reflect a contribution of the early, skipped exon missplicing in zinc fingers and transcriptional regulators to the subsequent widespread dysregulation of gene expression, but accounts for the strong link between alterations in mutant Atxn1 interactions with the transcriptional repressor capicua and differential gene expression in SCA1 cerebellum. 3Conversely, enrichment of cytoskeleton and neuron projection associated terms occurred across all datasets.Numerous avenues of evidence suggesting that cytoskeletal disruption occurs in ataxias, [58][59][60][61][62][63][64] most notably, mutations in the β-III spectrin gene cause SCA5 59 and impairs spectrin-actin cytoskeletal dynamics and dendritic arborization. 58We identified dysregulation of alternative splicing of Anks1b which acts as a scaffold protein linking the membrane and cytoskeleton at postsynaptic densities. 65While studies have demonstrated a role for Anks1b in glutamatergic neurotransmission and synaptic plasticity, 65 other Ankyrin cytoskeletal scaffolding proteins have been shown to be essential for Purkinje neuron survival by connecting potassium channels to the β-III spectrin cytoskeleton. 66Similarly, across datasets from brainstem and SCA3 cerebellum, we saw dysregulation of alternative splicing of Bcas1, a marker of early myelinating oligodendrocytes of which loss causes hypomyelination. 67,681][72][73][74] These data indicate that, at the level of pathways and individual genes, dysregulation of alternative splicing could contribute to disruption of cellular processes known to be implicated in pathogenesis of ataxia.
While the missplicing of Anks1b and Bcas1 suggest possible functional links to pathways impaired in SCAs, the relevance of the individual splicing events is not clear; however, for several of the skipped exon events identified in ion channels the specific functional effect of the missplicing is well defined.Two key events that were identified and validated in this study occurred in ion channels in which mutations have been shown to cause ataxia: Trpc3 mutations cause SCA41 4,5 and mutations in Kcnma1 cause progressive ataxia 4,75,76 (Fig. 7B).We also detected missplicing of skipped exon events in an additional six genes known to cause eight different types of SCA, 2,4 in two or more cerebellar datasets (Fig. 7B).In Trpc3 we found that exon 9 is included more often in SCA cerebellum than wild-type cerebellum.Normally, exon 9 is excluded ∼70% of the time in the cerebellum, which leads to Trpc3 proteins with incomplete CIRB domains and increased membrane conductance due to enhanced channel opening frequency and increased Ca 2+ entry. 33,56In SCAs, this cerebellar specific adaptation is disrupted by an increase in exon 9 inclusion, which would lead to reduced Ca 2+ entry and decreased membrane conductance (Fig. 7C).Likewise, an increase in inclusion of exon 23b in Kcnma1 in SCA cerebellum leads to an incomplete calcium bowl, a region within the RCK2 domain, which impairs Ca 2+ binding, reduces channel activation and K + efflux-in turn, slowing synapse repolarization 77,78 (Fig. 7D).Importantly, impaired calcium signalling, disrupted membrane conductance and impaired K + channel functioning have previously been reported in CAG expansion SCAs. 2,4These specific events highlight how missplicing could directly impair cerebellarspecific adaptations and functions of ion channels encoded for by genes which harbour mutations that cause ataxia.
In addition to genetic causes of ataxia, there are also several forms of paraneoplastic cerebellar degeneration which cause ataxia and are associated with antineuronal antibodies produced by cancers, which target specific proteins or cells. 79One of these immunological forms of ataxia is caused by anti-Ri antibodies, which target NOVA alternative splicing regulator-1 (NOVA1) and NOVA2. 79,80he NOVA proteins are neuron-specific RNA binding proteins that regulate alternative splicing of specific events with critical roles in axon projections, voltage-gated ion channel regulation, and synapse structure and function, with NOVA1 being predominantly expressed in the cerebellum and spinal cord. 81In our study, splicing factor NOVA regulated synaptic proteins was one of the most frequently dysregulated gene ontology enrichment terms across all datasets, and the most enriched term in presymptomatic SCA1 datasets and comparisons between short and long repeat length mouse models.While this finding indicates overlap between alternative splicing in CAG expansion SCAs and another form of ataxia, it also suggests the potential for NOVA RBPs to be implicated in the mechanism of alternative splicing in CAG expansion SCAs.
Although further studies are needed to understand the mechanism(s) of widespread alternative splicing dysregulation in CAG expansion SCAs, studies from other microsatellite expansion diseases indicate possible mechanisms that could contribute to the dysregulation of alternative splicing.For example, in DM1, the CUG expansion RNAs sequester the muscle-blind like (MBNL) family of RBPs into nuclear RNA foci.MBNL proteins regulate alternative splicing of specific MBNL-dependent skipped exon events.In DM1, this sequestration reduces the functional pool of MBNL proteins resulting in missplicing of MBNL-regulated alternative splicing events. 8,10This RNA-sequestration model of alternative splicing dysregulation is one possible mechanism through which widespread missplicing could be occurring in CAG SCAs.Indeed, this dysregulation could be mediated through either or both the sense CAG RNAs and antisense CUG RNAs, the latter of which is already supported by studies in SCA8 16 and SCA2. 17However, other studies focusing on SCA3 and Huntington's disease suggest that CAG expansion RNAs may play a role in alternative splicing dysregulation. 21nother possible mechanism supported by studies in Huntington's disease is a reduction in expression of multiple At the transition to symptomatic stages of disease, the number of differentially expressed genes (DEGs, pink) increases.Skipped exon alternative splicing events see a more-tempered increase in number than DEGs as disease progresses.(B) Overview of the number of misspliced skipped exon events shared between multiple cerebellar datasets highlighting genes known to cause SCAs that have misspliced skipped exons across two more datasets in our analysis (SE events: FDR < 0.1, ΔPSI > 10%).Data recreated from Fig. 1D.(C) In SCA models, there is increased inclusion of exon 9 in transient receptor potential cation channel subfamily C member 3 (Trpc3); in wild-type (WT) mice, exclusion of exon 9 is a cerebellar specific alternative splicing event.Exon 9 spans the calmodulin (CaM) and inositol trisphosphate receptor (IP3R) binding (CIRB) motif (highlighted in orange).When translated, the complete CIRB motif induces a conformational change in the channel that reduces activation and subsequent Ca 2+ influx.This results in reduced neuronal GPCR-Ca 2+ signalling efficiency and reduced membrane conductance.(D) In SCA models, there is increased inclusion of exon 23b in potassium calcium-activated channel subfamily M alpha 1 (Kcnma1); in wild-type mice, exclusion of exon 23b is a cerebellar specific alternative splicing event.Exon 23b partially spans the regulator of K + conductance (RCK2) domain and the calcium bowl region, a high-affinity Ca 2+ binding site (highlighted in red).When translated, exon 23b inclusion impairs Ca 2+ binding in the calcium bowl causing a reduction in channel activation and a reduction in the subsequent K + efflux.This reduced calcium sensing capability results in slower synapse repolarization.(A-D) This figure was generated using BioRender.com.FDR = false discovery rate; PSI = per cent spliced in.
RBPs, 18 which would again lead to a reduction in the functional pool of RBPs or change in the balance of RBP expression and subsequent dysregulation of alternative splicing events mediated by these RBPs.Due to the clear implication of differential gene expression in CAG SCAs, 35,37,38,41,46,49 effects on RBP expression may contribute to splicing dysregulation in later stages of disease progression.However, due to the detection of widespread alternative splicing dysregulation prior to widespread differential gene expression, this mechanism is unlikely to contribute to presymptomatic missplicing.
4][85] Polyglutamine and RAN proteins have been shown to be preferentially expressed and aggregate in different cell types and brain regions. 71,82,86As RBPs have brain region-specific expression profiles, 9,10,81 the differential affinities of each expansion protein for specific RBPs could lead to brain region-specific protein sequestration resulting in the differences in alternative splicing seen in this study.Changes in expression of both expansion proteins and RBPs, their cellular localization or post-translational modifications could all potentially impact this mechanism of alternative splicing dysregulation.While this mechanism is supported by evidence that reducing Atxn1 expression and aggregation 42,44 rescued missplicing to a certain extent (Fig. 6A, C and J and Supplementary Fig. 10A-D), further studies are needed to understand the contribution of protein and RNA mediated RBP sequestration, accounting for effects of polyglutamine and RAN proteins and sense and antisense RNAs, and identifying which RBPs are relevant for missplicing in CAG SCAs.
Here we identified dysregulation of alternative splicing as a novel presymptomatic pathogenic mechanism in CAG expansion SCAs by performing analyses of previously published RNA-Seq datasets.While confounding factors such as read-length, read-depth and limitations due to polyA selection, 25,[50][51][52]54,87 limit comparisons between datasets, this study highlights what can be learned from mining published data and the benefits of data-sharing. Futur studies will be needed to determine whether alternative splicing dysregulation is a key transcriptomic hallmark in CAG expansion SCA patient derived-model systems and if misspliced transcripts could be used as target engagement biomarkers in CAG SCAs.It is important to note that the use of rRNA depletion methods with long read lengths and paired-end sequencing in these types of studies would enable a greater volume of information to be extracted about alternatively spliced exons and the maturation of alternatively spliced isoforms.25,[50][51][52]54 Additionally, while we presented clear functional consequences for two of the identified splicing events, future studies will be required to understand the direct contribution of these alternative splicing events to disease symptoms, and will have to account for brain region specific missplicing.Together, these studies provide an overview of alternative splicing dysregulation in CAG expansion SCAs and represent a platform for future investigations into the mechanisms and therapeutic potential, both as biomarkers and therapeutic candidates, of alternative splicing dysregulation across this class of devastating diseases.

Figure 1
Figure 1 Widespread dysregulation of alternative splicing in CAG expansion SCA mouse model cerebellum.(A) Percentage of significantly misspliced skipped exon (SE), retained intron (RI), mutually exclusive exons (MXE), alternative 5' splice site (A5SS) and alternative 3' splice site (A3SS) events as a proportion of total splicing events in spinocerebellar ataxia (SCA) versus wild-type (WT) mice, number of each event shown on bar, FDR < 0.1, ΔPSI > 10%.(B) Percentage of exon inclusion (positive) or exclusion (negative) for significantly alternatively spliced skipped exon events per dataset, FDR < 0.1, ΔPSI > 10%, mean ± standard deviation (SD).(C) Number of skipped exon events per dataset with the proportion of events dysregulated in two to seven datasets shown, FDR < 0.1, ΔPSI > 10%.(B and C) Datasets are in the same order as A. (D) Total number of skipped exon events dysregulated in more than one dataset, FDR < 0.1, ΔPSI > 10%.(E) Principal component analysis of the 31 skipped exon events dysregulated in four or more datasets.(F) Enrichment of summary gene ontology terms identified using Metascape for the 524 skipped exon events dysregulated in two or more datasets; broad functional categories of terms are indicated by bar colour; see Supplementary Fig. 2 and Supplementary Table 8 for member gene ontology terms.Full gene ontology terms: *GO:2000463 positive regulation of excitatory postsynaptic potential; # GO:0086012 membrane depolarization during cardiac muscle cell action potential.(G) Functional classification analysis of significantly misspliced skipped exon events, P < 0.05, ΔPSI > 10%; see Supplementary Fig. 4 for detailed annotation.FDR = false discovery rate; PSI = per cent spliced in.

Figure 2
Figure 2 Alternative splicing is dysregulated across affected brain regions in CAG expansion SCA mouse models.(A) Percentage of significantly misspliced skipped exon (SE), retained intron (RI), mutually exclusive exons (MXE), alternative 5' splice site (A5SS) and alternative 3ʹ splice site (A3SS) events as a proportion of total splicing events in spinocerebellar ataxia (SCA) versus wild-type (WT) mice, number of each event shown on bar, FDR < 0.1, ΔPSI > 10%.(B) Percentage of exon inclusion (positive) or exclusion (negative) for significantly alternatively spliced skipped exon events per dataset, FDR < 0.1, ΔPSI > 10%, mean ± standard deviation (SD).(C) Number of skipped exon events per dataset with the proportion of events dysregulated in two to seven datasets shown, FDR < 0.1, ΔPSI > 10%.(B and C) Datasets are in the same order as A. (D) Total number of skipped exon events dysregulated in more than one dataset, FDR < 0.1, ΔPSI > 10%.(E) Principal component analysis of the nine skipped exon events dysregulated in five or six datasets.(F) Enrichment of summary gene ontology terms identified using Metascape for the 396 skipped exon events dysregulated in two or more datasets; broad functional categories of terms are indicated by bar colour; see Supplementary Fig. 5 and Supplementary Table8for member gene ontology terms.Full gene ontology terms: *GO:0045843 negative regulation of striated muscle tissue development; # GO:0010823 negative regulation of mitochondrion organization; ‡ GO:0061178 regulation of insulin secretion involved in cellular response to glucose stimulus.(G) Functional classification analysis of significantly misspliced skipped exon events, P < 0.05, ΔPSI > 10%; see Supplementary Fig.6for detailed annotation.FDR = false discovery rate; PSI = per cent spliced in.

Figure 3
Figure 3 Short and pathogenic repeat length mouse models show distinct splicing profiles.(A) Percentage of significantly misspliced skipped exon (SE), retained intron (RI), mutually exclusive exons (MXE), alternative 5ʹ splice site (A5SS) and alternative 3ʹ splice site (A3SS) events as a proportion of total splicing events in spinocerebellar ataxia (SCA) versus short repeat mice, number of each event shown on bar, FDR < 0.1, ΔPSI > 10%.(B) Percentage of exon inclusion (positive) or exclusion (negative) for significantly alternatively spliced skipped exon events per dataset, FDR < 0.1, ΔPSI > 10%, mean ± standard deviation (SD).(C) Enrichment of gene ontology terms for significantly misspliced skipped exon events between short and pathogenic repeat length mice across the three datasets; broad functional categories of terms are indicated by background colours.(D) Principal component analysis of significantly dysregulated skipped exon events in the SCA1 dataset GSE75778.(E) Principal component analysis of skipped exon events significantly dysregulated in two or more pairwise comparisons from the SCA3 datasets GSE117605 and GSE178367.(F) Number of differentially expressed genes (DEGs) and skipped exon events in short repeat versus pathogenic repeat length mice; numbers of DEGs are shown on bars; DEGs: log2FC>|1.5|,Padj < 0.05; SE: FDR < 0.1, ΔPSI > 10%.FDR = false discovery rate; PSI = per cent spliced in.

Figure 7
Figure 7 Potential functional consequences and disease relevance of missplicing in CAG expansion SCAs.(A) Temporal overview of transcriptional dysregulation across mouse models.Alternatively spliced skipped exon events (SE, blue) are dysregulated presymptomatically in SCA mouse models.At the transition to symptomatic stages of disease, the number of differentially expressed genes (DEGs, pink) increases.Skipped exon alternative splicing events see a more-tempered increase in number than DEGs as disease progresses.(B) Overview of the number of misspliced skipped exon events shared between multiple cerebellar datasets highlighting genes known to cause SCAs that have misspliced skipped exons across two more datasets in our analysis (SE events: FDR < 0.1, ΔPSI > 10%).Data recreated from Fig.1D.(C) In SCA models, there is increased inclusion of exon 9 in transient receptor potential cation channel subfamily C member 3 (Trpc3); in wild-type (WT) mice, exclusion of exon 9 is a cerebellar specific alternative splicing event.Exon 9 spans the calmodulin (CaM) and inositol trisphosphate receptor (IP3R) binding (CIRB) motif (highlighted in orange).When translated, the complete CIRB motif induces a conformational change in the channel that reduces activation and subsequent Ca 2+ influx.This results in reduced neuronal GPCR-Ca 2+ signalling efficiency and reduced membrane conductance.(D) In SCA models, there is increased inclusion of exon 23b in potassium calcium-activated channel subfamily M alpha 1 (Kcnma1); in wild-type mice, exclusion of exon 23b is a cerebellar specific alternative splicing event.Exon 23b partially spans the regulator of K + conductance (RCK2) domain and the calcium bowl region, a high-affinity Ca 2+ binding site (highlighted in red).When translated, exon 23b inclusion impairs Ca 2+ binding in the calcium bowl causing a reduction in channel activation and a reduction in the subsequent K + efflux.This reduced calcium sensing capability results in slower synapse repolarization.(A-D) This figure was generated using BioRender.com.FDR = false discovery rate; PSI = per cent spliced in.

Table 1 RNA sequencing datasets used in this study
Datasets in bold do not pass read depth threshold for alternative splicing analysis.An expanded version of Table1including replicate number, library selection method, sequencing method and read length for each dataset can be found in the Supplementary material.ASO = antisense oligonucleotide; FDR = false discovery rate; KI = knock-in; mo = months; NA = not applicable -datasets did not pass read depth threshold and so alternative splicing analysis was not performed; PSI = per cent spliced in; SCA = spinocerebellar ataxia; w = weeks; WT = wild-type.aPublicationstates 12 weeks, GSE states 12 months.bFastQfiles merged to reach read depth threshold.cTechnicalreplicates combined into one FastQ file.dFastQfiles contain ∼15-20% adapter content but had sufficient read depth and therefore untrimmed.eFastQfiles trimmed for high adapter content to increase quality and merged to reach read depth threshold.f Dataset GSE178367 does not contain wild-type mice, skipped exon numbers are reported for YAC84Q versus YAC15Q.Library selection and mouse model terminology matches dataset publications.