Analyzing pre-symptomatic tissue to gain insights into the molecular and mechanistic origins of late-onset degenerative trinucleotide repeat disease

Abstract How genetic defects trigger the molecular changes that cause late-onset disease is important for understanding disease progression and therapeutic development. Fuchs’ endothelial corneal dystrophy (FECD) is an RNA-mediated disease caused by a trinucleotide CTG expansion in an intron within the TCF4 gene. The mutant intronic CUG RNA is present at one–two copies per cell, posing a challenge to understand how a rare RNA can cause disease. Late-onset FECD is a uniquely advantageous model for studying how RNA triggers disease because: (i) Affected tissue is routinely removed during surgery; (ii) The expanded CTG mutation is one of the most prevalent disease-causing mutations, making it possible to obtain pre-symptomatic tissue from eye bank donors to probe how gene expression changes precede disease; and (iii) The affected tissue is a homogeneous single cell monolayer, facilitating accurate transcriptome analysis. Here, we use RNA sequencing (RNAseq) to compare tissue from individuals who are pre-symptomatic (Pre_S) to tissue from patients with late stage FECD (FECD_REP). The abundance of mutant repeat intronic RNA in Pre_S and FECD_REP tissue is elevated due to increased half-life in a corneal cells. In Pre_S tissue, changes in splicing and extracellular matrix gene expression foreshadow the changes observed in advanced disease and predict the activation of the fibrosis pathway and immune system seen in late-stage patients. The absolute magnitude of splicing changes is similar in pre-symptomatic and late stage tissue. Our data identify gene candidates for early drivers of disease and biomarkers that may represent diagnostic and therapeutic targets for FECD. We conclude that changes in alternative splicing and gene expression are observable decades prior to the diagnosis of late-onset trinucleotide repeat disease.


INTRODUCTION
An inherent paradox of inherited late-onset degenerative disease is the fact that the genetic basis for disease exists years before symptoms arise. At some point the genetic defect begins to trigger a cascade of cellular changes that result in disease, but how is the cascade initiated? Unfortunately, the search for early drivers of many genetic diseases that involve mutant RNA is hindered by lack of accessible pre-symptomatic tissue to identify early changes in gene expression that occur years prior to diagnosis. Corneal tissue provides an accessible source of both pre-and postsymptomatic tissue to explore the molecular origins of inherited late-onset disease.
Corneal diseases represent one of the leading causes of vision loss and blindness globally (1). Inherited corneal dystrophies can compromise the structure and transparency of the cornea. Late-onset Fuchs' endothelial corneal dystrophy (FECD) is one of the most common genetic disorders, affecting four percent of the population in the US over the age of 40 (2)(3)(4). The corneal endothelium is the inner hexagonal monolayer responsible for maintenance of stromal dehydration and corneal clarity.
Two-thirds of FECD cases are caused by an expansion of the trinucleotide CTG within the TCF4 gene (15)(16)(17)(18) making the corneal dystrophy the most common human disorder mediated by simple DNA repeats. FECD can also be caused by a CTG expansion within the 3 -untranslated region (3 -UTR) of the DMPK gene (19)(20)(21), implicating mutant expanded CUG RNA as the root cause for repeatassociated FECD (FECD REP). The remaining one third of patients lack the expanded repeat (FECD NR) but the two types of FECD are indistinguishable during normal clinical observation.
FECD can be treated with corneal transplantation and surgical outcomes have improved following the development of endothelial keratoplasty in which a single-cell endothelium monolayer is transplanted using donor corneal tissue (22). While effective, this approach is limited by the availability of donor corneas and access to suitable treatment centers. A deeper understanding of FECD might permit the development of pharmacological approaches that might reach a larger patient population and reduce or delay the need for surgical intervention. Successful drug development would benefit from a more detailed understanding of disease progression and the fundamental drivers of early disease.
Trinucleotide or hexanucleotide repeats are the cause for many delayed-onset neurodegenerative disorders such as myotonic dystrophy type 1 and C9orf72 amyotrophic lateral sclerosis/frontal temporal dementia (ALS/FTD) (23). FECD has unique advantages as a model for gaining insight into the molecular mechanisms of delayed-onset degenerative disease: (i) The tissue is more readily available for analysis because it is routinely removed during surgery ( Figure  1A-C); (ii) Samples are a single near homogeneous layer of cells, facilitating RNAseq and other methods for analyzing gene expression ( Figure 1D); (iii) The corneal endothelium and disease progression can be evaluated visually during a standard clinical examination; (iv) Like peripheral neurons, the corneal endothelium originates from neural crest and may serve as a model post-mitotic tissue to examine the effects of toxic repeat RNA over time; (v) Donor corneal tissues that are not used for surgery are available from eye bank samples for comparison; (vi) Because the prevalence of the triplet repeat mutation within the TCF4 gene is relatively high, donor eyes provide a significant number of pre-symptomatic samples (Pre S) that possess the CTG expanded repeat; (vii) The availability of tissue from four different cohorts (Figure 2), control, Pre S, FECD NR, FECD REP allows multiple cross-comparisons into the different stages and types of FECD. Analysis of the Pre S cohort has the potential to gain insights into early drivers of disease.
Here, we use the advantages gained from analyzing the corneal endothelial monolayer to evaluate transcriptomic data from four cohorts of corneal endothelial tissue: control, Pre S, FECD REP and FECD NR. We identify extensive and large magnitude changes in RNA splicing that are shared between Pre S and FECD-REP cohorts but are not observed in FECD NR samples. Levels of gene expression are changed in Pre S samples relative to controls and suggest early triggering of the fibrosis pathway prior to clinical observation of disease symptoms. In late stage disease, pathways related to fibrosis and activation of immune system are shared by FECD REP and FECD NR but mitochondrial dysfunction is more pronounced in the FECD REP cohort. These results lay a basis for understanding the onset of FECD and other trinucleotide repeat diseases and provide potential targets for therapeutic design.

Isolation of corneal tissue
The study was conducted in compliance with the tenets of the Declaration of Helsinki and with the approval of the institutional review board of the University of Texas Southwestern Medical Center (UTSW). Subjects underwent a complete eye examination including slit lamp biomicroscopy by a cornea fellowship-trained ophthalmologist. Patients underwent endothelial keratoplasty for FECD severity Krachmer grade 5 (≥5 mm central confluent guttae without stromal edema) or 6 (≥5 mm central confluent guttae with stromal edema) assessed by slit lamp microscopy (24). After surgery, surgically explanted endothelium-Descemet's membrane monolayers were preserved in Optisol GS corneal storage media (Bausch & Lomb, Rochester, NY, USA) prior to storage at -80 Celsius. Genomic DNA was extracted from peripheral blood leukocytes of each study subject using Autogen Flexigene (Qiagen, Valencia, CA, USA).
Corneal endothelial samples from post-mortem donor corneas preserved in Optisol GS corneal storage media (Bausch & Lomb, Rochester, NY, USA) were obtained from the eye bank of Transplant Services at UT Southwestern. Certified eye bank technicians screened the donor corneal endothelium with slit lamp biomicroscopy and Cellchek EB-10 specular microscopy (Konan Medical). Endothelium-Descemet's membrane monolayers from donor corneas were micro-dissected and stored as previously described (25). DNA from the remaining corneal tissue of each sample was extracted with TRIzol reagent (ThermoScientific).

TCF4 CTG18.1 polymorphism genotyping
Genomic DNA from subjects' peripheral leukocytes or corneal tissue was used for genotyping. The CTG18.1 trinucleotide repeat polymorphism in the TCF4 gene was genotyped using a combination of short tandem repeat (STR) and triplet repeat primed polymerase chain reaction (TP-PCR) assays as we have previously described (20). For the STR assay, a pair of primers flanking the CTG18.1 locus was utilized for PCR amplification with one primer labeled with FAM on 5 end. The TP-PCR assay was performed using the 5 FAM-labeled primer specific for the repeat locus paired with repeat sequence targeted primers for PCR amplification. PCR amplicons were loaded on an ABI 3730XL DNA analyzer (Applied Biosystems, Foster City, CA, USA) and the results analyzed with ABI GeneMapper 4.0 (Applied Biosystems). Large triplet repeat expansions were sized by Southern blot analysis using digoxigeninlabeled probes.

RNA isolation and sequencing
Total RNA was isolated from each of 25 tissue samples (6 FECD REP, 4 FECD NR, 6 Pre S and 9 Controls) by homogenization in QIAzol lysis reagent, chloroform extraction and isolation with NucleoSpin RNA XS (Macherey-Nagel GmbH & Co., Germany). RNA quantity and quality were determined by Bioanalyzer 2100. RNA libraries were prepared for each tissue sample with high RIN (> 5.0), using the TruSeq RNA sample Prep kit version 2 (Illumina, San Diego, CA, USA). For TruSeq stranded total RNAseq, ribosomal transcripts were depleted from total RNA, using Ribo-Zero Gold RNA removal kit followed by replacement of deoxythymidine triphosphate (dTTP) with deoxyuridine triphosphate (dUTP) during reverse transcription in the second strand synthesis, using TruSeq stranded total library preparation kit. The resulting libraries were minimally amplified to enrich for fragments using adapters on both ends and then quantified for sequencing at eight samples/flow cell by using a NextSeq 500/550 (Illumina) sequencer (PE 150).

Analysis of differentially expressed or spliced genes
Whole transcriptomic sequencing data from each tissue sample was analyzed using an analysis pipeline that includes STAR for initial mapping and Cufflinks (v2.21) for gene and isoform differential analysis, among other publicly available programs. For gene/isoform differential analysis, the minimum expression level of 1.5 FPKM and a false discovery rate (FDR) < 0.05 were chosen as the threshold. The meta gene pathway analysis was carried out with IPA (Qiagen). The binary alignment map files from STAR were analyzed using rMATS (v.4.0) that quantitates the expression level of alternatively spliced genes between groups. To find the most significant events, we used stringent filtering criteria within rMATS to perform pairwise comparisons among four groups: percentage of spliced in (PSI) changes > 0.15; FDR < 0.001. For PSI, rMATS calculates a value for every differential splicing event, providing a range from 0 to 1, with 0 being completely excluded and 1 being uniformly included in the splicing products. Alternative splicing events were also compared to those obtained in tibialis anterior muscle of myotonic dystrophy type one patients (DM1). DM1 raw data were obtained from Gene Expression Omnibus (GSE86356, six Control and six DM1 tissue samples were used) and analyzed similarly as the FECD data (visit DMseq.org for more information).  cultured as described (26) The C9 and VVM84 skin fibroblasts were maintained at 37 • C and 5% CO 2 in Minimal Essential Media Eagle (MEM) (Sigma, M4655) supplemented with 15% heat inactivated fetal bovine serum (Sigma) and 0.5% MEM nonessential amino acids (Sigma).

Measurement of TCF4 RNA half-life
Endothelial or fibroblast cells were seeded in 6-well plate at 90% confluence. At the next day, actinomycin D was added into the wells at 5 g/ml final concentrations. Cells were harvest using TRIzol agent (Sigma) at different time point. The TCF4 RNA levels were analyzed by qPCR.

Validation of differential gene expression and alternative splicing patterns
Total RNA was extracted from control, Pre S or FECD corneal endothelial tissues (Supplementary Table S1) by NucleoSpin RNA XS kit (Macherey-Nagel). cDNAs were prepared by reverse transcription. RT-PCR was performed using ChoiceTaq Blue Mastermix (Denville). PCR amplification was as follow: 94 • C for 3 min (1 cycle), 94 • C denaturation for 30 s, 60 • C annealing for 30 s and 72 • C extension for 1 min (38 cycles), and a 7 min 72 • C extension. The PCR primers were used as reported (27) Table  S2). The amplification products were separated by 1.5% agarose gel electrophoresis. qPCR experiments were performed on a 7500 real-time PCR system (Applied Biosystems) using iTaq SYBR Green Supermix (Bio-Rad). Data was normalized relative to levels of RPL19 mRNA.

Experimental design
Samples were homogeneous endothelial cell monolayers surgically removed from patients or dissected from donor tissue (Figures 1 and 2A). The homogenous nature of the tissue samples, in contrast to the more complex mixtures of cells often contained in tissues related to other disease, facilitates subsequent analysis and data interpretation.
We compared four groups of human tissue by RNA sequencing (RNAseq) with attention to changes in alternative splicing, differential gene expression and pathway analysis ( Figure 2A). Tissue that was mutant for the CTG expanded repeat within intron 2 of the TCF4 gene was obtained from pre-symptomatic eye bank donors (Pre S) and from FECD patients after transplant surgery (FECD REP). Tissue that was negative for the expanded CTG mutation within TCF4 was obtained from eye bank donors (Control) and from patients with non-CTG related FECD (FECD NR). Genotype data indicated that all samples had normal numbers of CTG repeats within the DMPK gene ( Figure 2B).
Control tissues from the eye bank were chosen for analysis that possessed normal endothelial morphology by specular microscopy, were negative for the expanded CTG mutation and were from donors with ages comparable to our FECD patient cohort ( Figure 2B). Obtaining tissue from Pre S individuals was possible because of the relatively high prevalence, 3%, of the expanded triplet repeat mutation in TCF4 gene within the general Caucasian population (2,3). Pre S tissue was identified by the presence of the CTG repeat expansion by genotyping. Normal corneal endothelial morphology (absence of central corneal guttae) in Pre S tissues was confirmed by specular microscopy.
RNA sequencing (RNAseq) was performed on tissue samples from nine non-FECD/non-expanded repeat donors (Control), six pre-symptomatic donors with the expanded repeat (Pre S), six patients with the expanded repeat and FECD (FECD REP) and four patients with FECD who did not have the expanded CTG repeat mutation (FECD NR) ( Figure 2B) (Supplementary Figures S1-3). Only samples with RNA integrity numbers greater than five were used (Supplementary Figure S3). Gene body coverage analysis suggests that all RNA samples are largely intact (Supplementary Figure S2). We carried out a pairedend 150 nt RNA-Seq on Illumina NextSeq sequencer. We regularly obtained an output of 50-60 million raw paired reads per sample. Obtaining adequate sequencing depth required using all or most of each sample. Because of the rarity of Pre S samples, a fraction of tissue was withheld from sequence for eventual experimental validation of gene expression changes suggested by RNAseq.
Clustering analysis of overall gene expression patterns revealed that samples from control and Pre S donors were closer to one another than to samples from patients with FECD REP or FECD NR late stage disease ( Figure 2C and D; Supplementary Figure S1E and F). This result is consistent with the clinical observation of large visible differences between diseased and non-diseased tissue and that the clinical manifestations of FECD REP and FECD NR are almost identical.

Stability of TCF4 intron 2 in FECD mutant and non-mutant tissue
Analysis of RNAseq data from intron 2 (the intron that contains the CTG repeat) within the TCF4 gene showed a striking difference between representative samples from individuals with the expanded CTG repeat mutation (FECD REP and Pre S) and individuals who lack the mutation (control and FECD NR) ( Figure 3A). Samples from the cohort of control individuals showed similar low numbers of reads upstream or downstream relative to the trinucleotide repeat region. RNA obtained from FECD NR patients' tissue also showed the same similarity between upstream and downstream reads.
By contrast, both Pre S and FECD REP tissue showed more reads for intronic RNA upstream of the trinucleotide repeat relative to downstream ( Figure 3A). The overall expression of mature TCF4 mRNA from the four cohorts was not significantly different making haploinsufficiency of gene product less likely as the mechanism of disease (Supplementary Figure S4). These results suggest that an early molecular disease trigger--increased stability the mutant TCF4 intron 2 upstream of the expanded repeat--occurs at the presymptomatic stage and distinguishes FECD REP from FECD NR in late stage disease.
To investigate factors that might contribute to the prevalence of upstream intronic reads, we examined cultured cells derived from FECD REP patient corneal endothelium (F35T), FECD REP patient skin fibroblasts (VVM84), control (without CUG expanded repeat) corneal endothelium (W4056) or control skin fibroblasts (C9) ( Figure 3BC). VVM84 FECD skin fibroblasts and F35T FECD corneal endothelial cells both have the expanded CTG repeat, but the F35T corneal cells have nuclear CUG foci that can be detected by RNA-FISH while VVM84 skin cells lack detectable foci indicating cell-specificity for expanded CUG repeat RNA accumulation. Expression of TCF4 is similar in F35T, F45SV and VVM84 cells (Supplementary Figure  S5).
We treated cells with actinomycin D to arrest transcription and examine the half-life of mature TCF4 mRNA and sequences either upstream our downstream relative to the intronic TCF4 CTG repeat using quantitative PCR (qPCR) ( Figure 3B and C). Regardless of whether the expanded repeat mutation was present, the half-life of the mature TCF4 mRNA was similar, >8 h ( Figure 3C top). Likewise, the half-life of the intron 2 downstream region was also similar in each cell line, varying from 10 to 30 min ( Figure 3C, middle). These data suggest that the mutation does not affect stability of the parent mRNA and has only a modest effect on the downstream region of intron 2.
By contrast, we observe a striking ∼20-fold increase in the half-life of the upstream region of intron 2 in FECD corneal cells (F35T and F45SV) that possess the expanded CTG mutation and have detectable foci ( Figure 3C, bottom). For cell lines that lacked expanded CUG nuclear foci, this region had a half-life of only 10 min. For FECD corneal cells, the half-life was 3-4 h.
This increased half-life of upstream intronic RNA is consistent with the observation from RNAseq of many more reads covering the upstream region of intron 2 TCF4 from Pre S or FECD REP tissue relative to samples from individuals in the control or FECD NR cohorts who lack the expanded CTG repeat ( Figure 3A). Stabilization of the upstream portion of TCF4 intron 2 in corneal endothelial tissue and cultured corneal endothelial cells makes more mutant repeat RNA available to perturb gene expression.
To address the possibility that the 5 half of intron 2 might be retained in the mature mRNA, we performed PCR using primers complementary to exon 2 and exon 3 in both FECD and control cells. We observed a single PCR product of the predicted length in both FECD and control cells (Supplementary Figure S6). This data supports the conclusion that the intron is not detectably retained.

Changes in alternative splicing triggered by the expanded repeat within TCF4
CUG repeat RNA is known to bind the splicing factors muscleblind-like 1 and 2 (MBNL1 and MNBL2) (28)(29)(30)(31). Previous studies have proposed that sequestration of MBNL1 and MBNL2 reduces levels of available MBNL protein, causing the changes of splicing observed in tissue from FECD REP patients (27,(32)(33) and in patients with myotonic dystrophy who possess expanded CUG repeats within 3 -untranslated region of the DMPK gene (28)(29)34).
We used RNAseq to evaluate splicing changes between control tissue and the Pre S, FECD REP and FECD NR tissue cohorts ( Figure 4). To classify changes, we used the FDR and delta PSI (the net change of inclusion percentage) as the determinant metrics. Any changes that are <0.001 on FDR and ≥0.15 on PSI were determined to be significant.
Regardless of which tissue was analyzed, the primary changes in alternative splicing relative to control tissue were increases in the number of skipped exons (SE) events (Figure 4A). FECD REP or FECD NR tissues showed more splicing changes than Pre S tissue ( Figure 4B). The greater number of splicing changes is consistent with the extensive cellular degeneration observed in late stage disease. However, while not as many as in tissue from advanced disease, ∼450 changes in alternative splicing were observed in Pre S tissue.
Three hundred and thirteen of the alternative splicing events in Pre S tissue involved exon skipping (Supplementary Table S3). Heatmap evaluation of 313 skipped exons in Pre S tissue revealed that the genes hosting the skipped exon events clustered most closely with FECD REP tissue (Figure 4C). One hundred and thirty-two skipped exon events were shared between Pre S and FECD REP tissue, compared to only 28 were shared between Pre S and FECD NR tissue ( Figure 4B), consistent with the hypothesis that the changes in Pre S tissue foreshadow the alterations observed in FECD-REP.
While not nearly as frequent, other forms of alternative splicing events, alternative 5 splice site (A5SS), alternative 3 -splice site (A3SS), mutually exclusive exon (MXE) and retained intron (RI), also showed that the Pre S cohort was more similar to the FECD REP group than the FECD NR cohort (Supplementary Figure S7A-D). Taken together, these data demonstrate that splicing changes in Pre S tissue are precursors to the changes in FECD REP--but not FECD NR--late stage disease.
We then separated the top 25 skipped exon events in Pre S tissue, which are shared with FECD REP for evaluation. As with the overall group of 313 events, the splicing of these genes was much more like FECD REP tissue than to control or FECD NR tissue ( Figure 4D). Of the top 25 skipped exon events shared by FECD REP and Pre S tissue, approximately half are also seen in tibialis anterior muscle of myotonic dystrophy type 1 (DM1) subjects (35) (Figure 4 D and E; Supplementary Figure S7E-G). This similarity, in spite of the comparison being made between samples from different tissues, suggests that the expanded CTG repeats in DM1 and FECD REP have a common mechanism for producing splicing changes and that this mechanism is activated in Pre S tissue.
Notable genes that show changes in splicing include the splicing factors MBNL1 and MBNL2. Splicing of these genes was also changed in tissue from a myotonic dystrophy mouse model (35) and in a comparison of FECD REP and FECD NR tissue by Fautsch and colleagues (36,37). Changes in MBNL1 and MBNL2 splicing in Pre S (Supplementary Table S3) cells may trigger other splicing changes and eventually lead to the larger scale change that characterizes late stage disease.

Changes in alternative splicing are quantitatively similar in Pre S and FECD REP tissue
We experimentally validated changes in splicing for Pre S relative to control tissues ( Figure 5 and Supplementary Figure S8). Genes were chosen for validation based on the gene expression level and the potential biological role of the gene suggested by pathway analysis (Figure 8).
We used reverse-transcription PCR to evaluate changes of splicing for six genes, INF2, NUMA1, SORBS1, SYNE1, MBNL1 and MBNL2 ( Figure 5). INF2 protein associates with microtubules, and may affect cell shape. NUMA1 protein is component of the nuclear matrix and may affect mitotic spindle organization and proper cell division. SORBS1 was chosen for its protein product's involvement in cell adhesion and extracellular matrix (ECM). SYNE1 encodes nesprin-1 which links the nuclear membrane to the actin cytoskeleton and may also affect cell morphology. MBNL1 and MBNL2 were chosen because potential alterations in their expression might feed back into even greater changes in splicing.
The changes in splicing predicted by global RNAseq analysis were confirmed by visual inspection of genes using Sashimi plots ( Figure 5A) and validated by semiquantitative analysis using reverse-transcriptase PCR (RT-PCR) ( Figure 5B) using monolayer human corneal endothelial tissue. Pre S tissue showed the changes in splicing predicted by our RNAseq data ( Figure 5). Both Sashimi   Supplementary Table S1B. plots and semi-quantitative RT-PCR analysis reveal that the absolute magnitude of splicing changes are similar in FECD REP late stage disease tissues. The amount of change in splicing in Pre S tissue is substantial, similar to that observed in late stage tissue.
We also carried out a differential alternative splicing analysis between Pre S and FECD REP tissues directly, rather than using the Control as the reference as described above. Although there are hundreds of significant skipped exon events identified between Pre S and FECD REP, out of 132 skipped exon events shared between Pre S/Control and FECD REP/Control comparison, only five of them are significantly different between Pre S and FECD REP (Supplementary Table S4). This means that majority of shared skipped exon events identified between Pre S/Control and FECD REP/Control have similar magnitude in inclusion of exon level changes, suggesting that missplicing of key genes does not gradually increase as disease symptoms progress, rather substantial missplicing is a leading indicator of disease.

Differential gene expression
We compared gene expression level changes in Pre S, FECD REP and FECD NR tissue cohorts relative to control tissue. To classify changes as significant, we used the adjusted P-value generated by Cuffdiff, one of the programs within Cufflinks suite, as the determining metric. Changes with a P-value < 0.05 were deemed significant and included in our analysis. Analysis by DeSeq2, an alternate program for evaluating RNAseq data, produced similar results when compared with the output of CuffDiff (Supplementary Figure S9 and Supplementary Table S5).
All sample cohorts showed expression changes relative to control tissue (Pre S: 215, FECD REP: 1330; FECD NR: 696) ( Figure 6A and B). The greater number of gene expression changes in the FECD REP and FECD NR tissues is consistent with the severe cellular phenotype observed in late stage disease. A total of 602 out of the 696 genes differentially expressed in the FECD NR tissues were also found in the FECD REP group suggesting significant overlap of the common final molecular genetic mechanisms in the two forms of late-stage disease. Pre S tissue had 215 genes with significantly altered expression levels relative to control tissue. Only five changes in gene expression were uniquely shared between Pre S and FECD NR tissue, compared to 73 shared changes with FECD REP ( Figure 6A). The closer relationship between Pre-S and FECD REP is consistent with our splicing data (Figures 4 and 5) and supports the conclusion that patterns of gene expression in mutant expanded repeat cells are established long before symptoms or disease findings are observable.
Volcano plots allow a global overview of individual gene expression changes. They are useful for visualizing patterns of changes and identifying 'outlier' genes that combine highly significant changes in gene expression with higher fold changes. The fold change among top genes are less in Pre S than in FECD REP or FECD NR and the identity of top genes differ ( Figure 6B Tables S6-8). This finding is consistent with the severity of late stage disease and the disruption of many gene expression programs.
Evaluation of Pre S tissue may provide a window to identify early gene drivers before symptom-driven secondary changes in gene expression overwhelm analysis. The top 20 differentially over-expressed genes identified in Pre S tissue ( Figure 6C) include genes involved in the ECM and its assembly, cochlin (COCH) (38,39), fibronectin (FN1) (40) and thrombospondin (THBS2) (41).

Quantitative analysis of gene expression changes
We used quantitative PCR (qPCR) to confirm the changes in the level of gene expression detected by RNAseq in both Pre S and FECD tissue (Figure 7). qPCR measurements of corneal endothelial tissues were challenging because of the limited amount of material available, but we could compare expression of eight genes identified in our RNAseq data, FN1, COL4A2, COCH, CTGF, MSI1, LUM, KDR and SOD3. Four of these genes, FN1, COL4A2, CTGF, KDR, were within the fibrosis pathway. COCH and LUM encode ECM proteins, MSI1 encodes a RNA binding protein/splicing factor and SOD3 protein is related to oxidative stress. The observed changes in gene expression confirm our RNAseq results ( Figure 6B and Supplementary Table S9).

Pathway analysis
To elucidate the potential impact of changes in the expression of individual genes on physiologic processes, we applied Ingenuity Pathway Analysis (IPA) to our RNAseq data. Overwhelmingly, the top common canonical pathway was hepatic fibrosis/hepatic stellate cell activation (42) ( Figure 8A and Supplementary Table S9). Involvement of the hepatic fibrosis pathway genes in FECD REP and FECD NR is consistent with the observed accumulation of ECM in advanced FECD with thickening of Descemet's membrane with focal excrescences (guttae) and with previously reported RNA measurements using tissue from late stage disease (43). Our FECD transcriptome data indicates robust ECM production in late-stage disease possibly regulated by transforming growth factor-␤ (TGF-␤), the most potent fibrogenic cytokine released by a number of cell populations in the body including the liver (42) (Supplementary Figure S11A and B).
Activation of the fibrosis pathway was also observed in Pre S tissue ( Figures 8A and B and 9). Genes that showed statistically significant increases in expression include fibronectin FN1, one of the highest differentially expressed genes in Pre S tissue (Figure 7). Other genes include connective tissue growth factor (CTGF) and four members of the collagen alpha chain family including COL1A2 which is also abundant in liver fibrosis (44). Kinase insert domain receptor (KDR, also known as vascular endothelial growth factor receptor-2) showed decreased expression. Relevant in the fibrosis pathway, KDR protein interacts with VEGF to mediate vascular endothelial cell proliferation. These genes are also disturbed in advanced disease -another indication that gene expression programs associated with fibrosis are activated in pre-symptomatic carriers prior to observable symptoms of disease. The pathways underlying FECD REP and FECD NR advanced stage disease had significant overlap. There were numerous shared pathways implicating the immune system related to helper T-cell activation, signaling and neuroinflammation ( Figure 8A and C). Marked overexpression of genes encoding proteins on the surface of antigenpresenting cells including the B7 protein, CD86 (>2000-fold increase) and class II major histocompatibility (MHC) proteins (>250-fold) both required for these cells to activate helper T cells implicate the immune system in both FECD groups in late-stage disease (45) (Figure 8C and Supplementary Figure S11C-E).
A few molecular pathways were changed in FECD REP but not FECD NR ( Figure 8D). The canonical pathway  Supplementary Table S1. related to mitochondrial dysfunction showed the large difference with P-values of 10 −5 and zero respectively for FECD REP and FECD NR ( Figure 8A and Supplementary Figure S11F-H). Decreased expression of oxidative phosphorylation genes was more pronounced in FECD REP compared to the FECD NR ( Figure 8D).

DISCUSSION
Identifying the early drivers of late-onset disease is important for understanding disease progression and developing therapeutics. Studying early drivers, however, is often not practical because pre-symptomatic tissue is difficult to obtain. Because the expanded CTG repeat mutation within TCF4 intron 2 that causes FECD REP is so prevalent (3% of the Caucasian population), significant numbers of pre-symptomatic samples can be obtained from individual donors positive for the CTG expansion. These tissues, together with FECD REP, FECD NR and control tissues (Figures 1 and 2) provide an advantageous model for better investigating the early links between expanded trinucleotide repeat mutations and disease.
The goal of this study was to understand whether the expanded CUG RNA repeat was changing gene expression in Pre S tissue and how such changes might relate to the gene expression and phenotypic changes known to occur in latestage FECD.

FECD is a disease of mutant RNA
FECD REP is caused by an expanded CTG trinucleotide repeat within mutant TCF4 intronic RNA ( Figure 10) (15)(16)(17)(18). Remarkably, FECD is also caused by the mutant CTG expanded repeat within the 3 -untranslated region of the DMPK gene that is also associated with myotonic dystrophy (19)(20)(21). Unlike other trinucleotide repeat diseases where the contribution of mutant RNA is debated, these data demonstrating that CUG RNAs expressed from two different genes are both responsible for FECD offers strong support for RNA playing a central role in the molecular origins of the disease.
The expanded CUG RNA can be detected by fluorescent in situ hybridization (FISH) as RNA foci. These RNA foci are a hallmark of both Pre S and FECD REP corneal en-  Initially, the GTC expansion expresses the CUG repeat RNA. The expanded repeat mutation that causes FECD-REP is a relatively common genetic mutation, making Pre S tissue available for analysis. While Pre S tissue appears normal upon clinical observation, foci can be detected, we observe changes in splicing and gene expression, the mutant intronic repeat is stabilized and early signs of fibrosis pathway activation are apparent. In late stage disease, more pronounced changes in splicing and gene expression accompany clinically observable systems and loss of vision. dothelium ( Figure 10) (25). While Pre S and FECD REP tissue both possess the RNA trigger for FECD, Pre S tissue is visually indistinguishable from control tissue upon specular imaging. By contrast, FECD REP tissue is dramatically different from control tissue, with reduced cell density and the formation of focal collagen accumulations known as guttae.
Consistent with our published clinical FECD cohort at our institution (18), the mean ages of the individuals in this study with late-stage disease in FECD REP and FECD NR groups were comparable at 66.5 and 68.8 years, respectively. The Pre S group had a mean age of 46.8 years suggesting that at least two decades from onset of mutant CUG repeat RNA mediated mis-splicing and al-tered expression of ECM genes are required to develop findings of late-stage disease which is clinically compatible with the slow natural disease progression of this age-related disorder.
What does it take for an individual with a mutant TCF4 repeat expansion to become a FECD patient? We can only speculate at this stage. It is possible that the gene expression changes observed in pre-symptomatic individuals increase the likelihood for dysfunction in individual cells. Eventually, this dysfunction crosses a molecular threshold and leads triggers the fibrosis pathway in individual cells. As patients age, more cells are affected, producing the findings of FECD visible to the ophthalmologist and the symptoms related to loss of vision noticed by patients.
Nucleic Acids Research, 2020, Vol. 48, No. 12 6755 The FECD CUG repeat RNA is present at only a few (<10) copies per cell in disease tissue (26). It is likely that each 'focus' detected by FISH is a single RNA molecule. A low copy number for a disease-causing RNA has also been observed for myotonic dystrophy type 1(DM1) (46) and C9orf72 ALS/FTD (47). We find that the CUG repeat expansion stabilizes TCF4 intronic RNA in corneal cells (Figure 3). This enhanced stability may contribute to an ability of a small number of RNA molecules to bind protein sufficiently to affect overall function in cells and eventually produce observable symptoms that characterize a delayed onset disease like FECD.
Although we observed stabilization of the mutant intronic RNA in two patient-derived corneal cell lines and not in skin fibroblasts harboring the expansion, additional studies in other cell types are warranted to test the hypothesis that there is a corneal cell-specific mechanism that stabilizes the expanded CUG repeat.
Previous work has suggested that the CTG repeat within intronic TCF4 and other microsatellite expansions are associated with intron retention (48). Our detection of nuclear CUG RNA foci by FISH and upstream intronic RNA due to its increased half-life may be compatible with a disease model where the TCF4 intron with the expanded repeat is spliced out, forms a linear stable intronic sequence RNA, and undergoes preferential 3 to 5 exonuclease degradation in the nucleus.
There is no definitive mechanistic insight into how the relatively rare expanded CUG repeat RNA can cause FECD. How one or a few copies of RNA triggers widespread changes in gene expression and late onset disease remains a major unanswered question. However, studies of myotonic dystrophy have suggested that the CUG repeat binds members of the muscleblind-like (MBNL) protein family. MBNL proteins are splicing factors and their sequestration reduces the concentration of free MBNL in cell nuclei and affects splicing. Overexpression of MBNL1 can reverse RNA missplicing and myotonia in a DM1 mouse model (34). MBNL is also associated with mutant CUG RNA in FECD cells and tissue (27,(32)(33). Blocking the CUG repeat region using antisense oligonucleotides can reverse missplicing in DM1 (caused by a CTG repeat within the DMPK gene) (49)(50)(51)(52) and FECD (26,53) tissues. These reports support the hypothesis that the mutant RNA may act by binding to MBNL and affecting splicing ( Figure 10).
We have recently reported that relatively little MBNL protein is in the nuclei of FECD REP cells and human tissue (54). Using quantitative protein titrations against a known standard, we calculated that there were 65,000 copies of MBNL1 and MBNL2 per cell and less than 2000 copies were present in cell nuclei. Low copy numbers for MNBL in the nuclei of affected tissue are consistent with the hypothesis that even a small amount of mutant expanded CUG repeat RNA may be sufficient to affect the available pool of MNBL protein. A reduction in available MBNL protein would produce the alterations of splicing that are a hallmark of FECD REP disease (Figures 4 and 5). It is also possible that the MNBL:mutant RNA interaction may nucleate additional protein or RNA interactions to amplify disruptive effects on gene expression and changes in alternative splicing.

Expanded CUG mutant RNA causes splicing changes in presymptomatic tissue
Many of the alterations in splicing observed in FECD REP tissue also define gene expression in presymptomatic tissue, Pre S (Figures 4 and 5). The splicing factors MBNL1 and MBNL2 are among the genes showing altered splicing in Pre S samples. Altered splicing of MBNL1 and MBNL2 may decrease the reservoir of functional MNBL protein.
Less functional MBNL protein, in addition to sequestration of MNBL protein by the expanded CUG repeat, may help push corneal endothelial cells toward full blown FECD.
The similarity of alternative splicing changes between symptomatic FECD REP and presymptomatic Pre S tissue is much greater than that between Pre S and FECD NR samples. The data suggest that there are fundamental differences in the origins of the two forms of FECD. While their origins differ, late-stage FECD REP and FECD NR converge at a common set of clinical findings.
We note that the FECD NR cohort shares gene splicing changes with both the Pre S (27) and FECD REP (456) cohorts. The 28 splicing events shared by Pre-S and FECD NR samples are less than the 132 splicing changes shared by the FECD REP and FECD NR samples. Nevertheless, the commonality of splicing changes between FECD NR and the other cohorts may indicate the presence of an undiscovered trinucleotide repeat expansion that might trigger changes in gene splicing.
The data also suggest that splicing changes and perturbation of ECM genes seen in FECD REP late stage disease tissue begin to be observed long before symptoms are observed during standard clinical evaluation. The molecular trigger, mutant RNA and early molecular changes, altered splicing and observable RNA foci, co-exist in cells that appear to have a normal phenotype. The changes in the magnitude of RNA splicing between control tissue and either Pre S or late stage FECD REP samples are similar. This observation suggests that the mutant RNA triggers the splicing changes in key genes independent of the progression of disease.
The finding that splicing is an early trigger has important implications for the development of agents to treat FECD. It is reasonable to expect that such agents would be most effective when administered early in disease progression prior to production of ECM with degeneration of the corneal endothelium and activation of the immune system. During drug development it should be possible to monitor the changes in alternative splicing and expression of ECM biomarkers caused by expression of the mutant trinucleotide repeat and rank drug candidates by their ability to return splicing to a more normal state--agents that reverse the splicing defect would be promising development candidates. Monitoring splicing of key genes would offer a rapid and definitive assay for screening compounds. We have previously shown that synthetic oligonucleotides complementary to the CUG repeat can reverse the splicing defect (25).
Recently, Fautsch and colleagues reported that some individuals without previously noted guttae who possessed the expanded repeat but developed 'non-FECD corneal edema' do not show changes in alternative splicing (36). These results might appear to be in conflict with our observation that all individuals with expanded CUG repeat RNAs exhibited substantial changes in alternative splicing, including many shared between Pre S and FECD REP individuals ( Figure 4). We note that the expanded repeat positive individuals with corneal decompensation without findings of FECD were between 67 and 83 years old, much older than any individual in our Pre S cohort. As postulated by Fautsch and colleagues, it is possible that these individuals possessed protective mutations that might prevent splicing changes and block the molecular events leading to symptomatic FECD.
As seen in many other unstable microsatellite diseases, we reported a significant positive correlation between the clinical severity of FECD and the triplet repeat length in peripheral leukocytes (18). Previously, we observed a trend of toward a positive correlation between the number of CUG repeat RNA foci in endothelial cells of surgical samples and triplet repeat allele but it did not reach statistical significance (26). However, our RNA-seq tissue cohort in this study was not adequately powered to examine for a possible correlation of differential splicing and expression of genes to repeat length.

The fibrosis pathway is activated in Pre S tissue
In the clinic, late stage FECD is a disease of cellular degeneration and aberrant ECM deposition ( Figure 10). Previous studies of FECD REP tissue have supported activation of the fibrosis pathway as a primary cause for late stage disease pathology (33). We confirmed fibrosis as the highest ranked canonical pathway in both late stage FECD REP and FECD NR ( Figure 8A). The fibrosis pathway is also activated in Pre S tissue ( Figure 8A and B). These results demonstrate that changes in splicing, changes in gene expression and changes in a key disease pathway begin years before symptoms are observed.
Among the top ten genes in pre-symptomatic tissue versus control tissue are cochlin (COCH) and fibronectin (FN1) with >16and >32-fold change increases respectively. Cochlin is a secretory ECM protein originally identified in the cochlear cells of the inner ear (38). COCH is also expressed by the endothelial cells within the trabecular meshwork of subjects with primary open-angle glaucoma (POAG), another common age-related degenerative disorder (39). There appears to be a convergence of two agerelated disorders of the anterior segment of the eye mediated by COCH. Primary open-angle glaucoma may be more prevalent in patients with advanced FECD (55). Primary open angle glaucoma is also a disease mediated by transforming growth factor-␤ which increases aqueous humor outflow resistance by dysregulation of ECM genes in the endothelial cells lining the trabecular meshwork (56). One area for future research will be to determine whether COCH or other ECM genes with altered expression in Pre S tissue can serve as biomarkers for early detection of FECD or to facilitate the monitoring of clinical trials.
All late onset diseases share a common question 'What triggers disease onset and what factors influence when disease onset occurs? Our data suggest that the CUG repeat expansion triggers changes in gene splicing. These changes are associated with gene expression changes, including changes in genes associated with fibrosis. We speculate that redundancy within cells and throughout the corneal endothelium delay the appearance of visible changes. Eventually, however, disruption of cellular processes becomes substantial enough to trigger outright fibrosis, which in turn leads to great disruption of corneal cells and tissues and visible symptoms of FECD.

Late stage FECD tissue is characterized by changes in immune cell-related and mitochondrial dysfunction pathways
Both FECD REP and FECD NR tissues show activation of genes related to immune system required for helper T-cell activation, signaling and neuroinflammation (Figures 8A and C and Figure 10). In both groups, we detected a 2000fold increase in CD86 and marked upregulation of MHC genes required by antigen presenting cells to activate helper T cells. This gene expression data along with the recent observation of cells with a dendritic morphology and positive for the hematopoietic marker CD45 in the endothelial tissue keratoplasty specimens of patients (42) suggest an important role for antigen presenting cells in late-stage disease in both forms of FECD.
The mitochondrial dysfunction pathway is activated in FECD REP tissue with the expression of over twenty genes changed, little change was seen in FECD NR tissue ( Figure  8D). Further research will be needed to determine whether this difference plays a role in disease progression or response to treatment.

CONCLUSION
FECD has many advantages as a model for understanding the origins of trinucleotide repeat disease because presymptomatic tissue is relatively accessible. Examination of Pre S tissue reveals changes in gene expression that preview the more extensive changes in late stage disease. In particular, there is early activation of key genes associated with the fibrosis pathway, the pathway that defines the primary phenotype observe during advanced disease. Splicing patterns and levels of expression for key genes change decades prior to observation of the clinical manifestations of FECD REP. Surprisingly, many changes in alternative splicing are similar in magnitude in Pre S and advanced stage FECD REP tissue. Many altered alternative splicing changes are shared with myotonic dystrophy, another disease caused by expanded CTG trinucleotide repeats, and it is possible that our findings will also be applicable to the genesis and temporal progression of other trinucleotide repeat diseases.

DATA AVAILABILITY
The Gene Expression Omnibus (GEO) accession number is GSE142538.