Fragile X-associated tremor/ataxia syndrome (FXTAS) is a late-onset neurodegenerative disorder in which patients carry premutation alleles of 55–200 CGG repeats in the FMR1 gene. To date, whether alterations in epigenetic regulation modulate FXTAS has gone unexplored. 5-Hydroxymethylcytosine (5hmC) converted from 5-methylcytosine (5mC) by the ten-eleven translocation (TET) family of proteins has been found recently to play key roles in neuronal functions. Here, we undertook genome-wide profiling of cerebellar 5hmC in a FXTAS mouse model (rCGG mice) and found that rCGG mice at 16 weeks showed overall reduced 5hmC levels genome-wide compared with age-matched wild-type littermates. However, we also observed gain-of-5hmC regions in repetitive elements, as well as in cerebellum-specific enhancers, but not in general enhancers. Genomic annotation and motif prediction of wild-type- and rCGG-specific differential 5-hydroxymethylated regions (DhMRs) revealed their high correlation with genes and transcription factors that are important in neuronal developmental and functional pathways. DhMR-associated genes partially overlapped with genes that were differentially associated with ribosomes in CGG mice identified by bacTRAP ribosomal profiling. Taken together, our data strongly indicate a functional role for 5hmC-mediated epigenetic modulation in the etiology of FXTAS, possibly through the regulation of transcription.
The fragile X mental retardation-1 (FMR1) gene is located on the X chromosome and is highly conserved among species (1). The FMR1 gene product, fragile X mental retardation protein (FMRP), is a well-characterized RNA-binding protein essential for normal cognitive development (2,3). There is a CGG trinucleotide repeat in the 5′ untranslated region (UTR) that ranges from 5 to 54 triplets in the general population (4,5). Abnormal expansion of CGG to over 200 repeats (the full mutation) can be hypermethylated, resulting in silencing of FMR1 transcription, resulting in fragile X syndrome (FXS), with symptoms including severe learning deficits or intellectual disability (6–9). Recently, premutation (PM) expansion alleles (55–200 CGG repeats) have been linked to an inherited neurodegenerative disorder termed fragile X-associated tremor/ataxia syndrome (FXTAS) (10). Individuals bearing the PM usually do not exhibit obvious signs in their early life, but they are at greater risk of developing severe symptoms, such as tremor and ataxia, as well as progressive cognitive decline from memory deficits to dementia when they reach the fifth decades of life (5,11). Brain autopsy of symptomatic PM carriers indicates degeneration of the cerebellum, including Purkinje cell death, Bergman gliosis and significant cerebral and cerebellar white matter disease (12).
Although FXS and FXTAS both involve expanded CGG repeats at the 5′UTR of the FMR1, the molecular mechanisms of pathogenesis are distinct (5). In contrast to the complete transcriptional silencing of FMR1 in FXS, FMR1 mRNA levels are increased from alleles carrying the PM expansion (13), leading to the proposal of an RNA toxic gain of function as the primary disease mechanism (10,14). Subsequent studies have revealed that CGG repeats can bind and sequester a number of RNA-binding proteins, including Purα, hnRNP A2B1 and Sam68 in both Drosophila (fly) and mammalian cells (15–18). Several mouse models have been established to study FXTAS, with their phenotypes partially overlapping (19–21). Hashem et al. (21) demonstrated that CGG repeats expressed specifically in Purkinje neurons, outside the context of the FMR1 gene in mice, are sufficient to induce FXTAS phenotypes, supporting an RNA gain-of-function model first described in Drosophila experiments. Although impressive progress has been made, the detailed molecular mechanisms underlying FXTAS require further study. Indeed, one important matter has gone unexplored: how alterations in epigenetic regulation, including cytosine modifications, might impact the transcription state in PM carriers.
Cytosine covalent modifications influence transcriptional states and ultimately cellular identity. DNA methylation (5mC), the best-characterized cytosine modification, usually occurs in genomic contexts where there is a high frequency of GC dinucleotides, termed CpG islands (CGI), and is often associated with a gene repressive environment (22–24). Another cytosine modification, 5-hydroxymethylcytosine (5hmC), has recently been proposed to play a significant biological role in vertebrates (25–28). Ten-eleven translocation 1 (TET1), a 2-oxoglutarate (2OG)- and Fe (II)-dependent enzyme, catalyzes conversion of 5mC to 5hmC (29). Although 5hmC was initially thought to be only an intermediate of active DNA demethylation products (30–33), it was since found to be very stable in mammalian genome and may play specific roles in regulating gene expression (34–38). Remarkably, the overall 5hmC level varies between tissues, with approximately 10 times more found in brain tissues like Purkinje neurons (34–38) than in other tissues (39–42). This differential distribution points to the possible functional importance of 5hmC in normal neuronal development and activity, and aberrant 5hmC patterns may lead to severe neurological and neurodegenerative disease. In fact, our earlier study showed that 5hmC levels are significantly higher in both the cerebellum and hippocampus of adult mice at 6 weeks and 1 year of age than in postnatal day 7 mice, suggesting an age-dependent acquisition of 5hmC and a role for it in the regulation of brain-specific gene expression. Importantly, our data from a Rett syndrome mouse model with methyl-CpG binding protein 2 (MeCP2) knockout revealed increased 5hmC levels, strongly supporting the importance of 5hmC homeostasis in normal neuronal activities (35). A very recent report has also shown a global loss of 5hmC in Huntington's disease (HD) mouse brain, linking 5hmC to neurodegenerative disease (43).
Because CGG expression in Purkinje neurons can cause the FXTAS phenotypes and the 5hmC level is high in Purkinje cells, we chose to investigate the role of 5hmC in an rCGG mouse model. Interestingly, recent advances suggest that PM carriers can develop ‘pre-symptoms’ before the onset of FXTAS, indicating the impact of the PM on global cellular processes at an earlier stage (44,45). Thus, we applied genome-wide 5hmC profiling in 16-week-old rCGG transgenic mice cerebella side-by-side with their wild-type age-matched littermates, noting that 16-week-old rCGG mice do not display noticeable FXTAS phenotypes at that age (21). We observed global reductions of 5hmC level in rCGG mice, mainly in gene bodies, and regions previously annotated as CGI. However, there were also gain-of-5hmC loci found in several repetitive elements, as well as cerebellum-specific enhancers. Genomic annotation and motif prediction using wild-type- and rCGG-specific differential 5-hydroxymethylated regions (DhMRs) reveals that they are highly correlated with genes and transcription factors that were important in neuronal development as indicated by functional pathway analysis. DhMR-associated genes partially overlapped with genes that were differentially associated with ribosomes in CGG mice. Our data offer important insights into the potential role of 5hmC in the early stages of FXTAS onset, and may point the way to possible therapeutic approaches in the future.
Global reduction of 5hmC in the cerebellum of a FXTAS mouse model
To determine the genome-wide 5hmC distribution in both FXTAS (rCGG) mice and their wild-type littermates, we employed a previously established chemical labeling and affinity purification method, coupled with high-throughput sequencing technology (36). Three rCGG mice at the age of 16 weeks were sacrificed as independent biological replicates, along with three age-matched littermates serving as controls. 5hmC-containing DNA sequences were enriched from cerebellar total DNA, and high-throughput sequencing resulted in a range of approximately 4.5 to 13.6 million non-duplicated reads from each biological replicate (Supplementary Material, Table S1). Chromosome-wide 5hmC read density mapping showed no visible differences between rCGG mice and wild-type controls (Supplementary Material, Fig. S1). 5hmC levels were depleted on the X chromosome, consistent with our previous observations (35). However, rCGG mice cerebella displayed distinguishable lower global 5hmC levels than wild-type controls, as shown in the global 5hmC heat map view (Fig. 1A). Genome-wide patterns of 5hmC were also evaluated by counting 5hmC mapped reads in each 10 kb bin from wild-type and rCGG mice genome and normalized to the total sequencing coverage. It is clearly shown that the majority of rCGG bins contained less 5hmC reads than wild-type, as the entire plot pattern moved toward the wild-type section (Fig. 1B). Figure 1C showed the overlapping features of normalized densities of 5hmC reads (per kilobase per million) in wild-type and rCGG mice with known genomic features on annotated gene bodies and defined CGI obtained from the UCSC Tables for NCBI37v1/mm9. Irrespective of phenotypic condition, 5hmC mapped reads were enriched mostly on intragenic CGI, with a 5.5-fold enrichment compared with its percent composition of the genome. In contrast, 5hmC reads were depleted (i.e. a fold change less than one) on CGIs within 500 bp of transcriptional start sites (TSS; Fig. 1C). Nonetheless, Figure 1C demonstrated an overall reduction of 5hmC levels in rCGG mice on these defined genomic features compared with wild-type controls.
DNA hypermethylation on repetitive elements is believed to play a critical role in maintaining genomic stability (46). To investigate the genome-wide distribution of 5hmC on repetitive elements, we aligned the total 5hmC reads to the RepeatMasker tracks of NCBI37v1/mm9. Although overall 5hmC levels were reduced in rCGG mice, notably several repetitive classes, including short interspersed nuclear elements (SINEs), long terminal repeats (LTRs) and simple repeats, showed specific acquisition of 5hmC in rCGG mice (Fig. 1D). This could imply that the specific gain of 5hmC on repetitive elements might correlate with the possible loss of 5mC in rCGG mice, thereby causing genomic instability and contributing to onset of FXTAS.
Differential 5hmC distribution in gene bodies, defined CGI, and enhancers in FXTAS mice
Since the overall 5hmC in rCGG mice was significantly reduced, we sought to visualize its distribution on defined genomic features. Figure 2A showed the 5hmC reads from wild-type and rCGG mice, and non-enriched DNA input mapped to 5 kb up- and downstream of gene bodies, TSS and transcription end sites (TES) by ngsplot (48), respectively. The mean and standard errors of 5hmC reads in each bin were obtained for group t-test analysis, and the average P-value was 0.0001. This indicates that the intragenic 5hmC is significantly different in wild-type and rCGG mice. The 5hmC levels in both wild-type and rCGG mice were largely depleted on TSS in cerebella, consistent with previous results (34,35). The distributions of 5hmC on TES showed the same depletion pattern as on TSS, but to a lesser extent. The wild-type intragenic 5hmC level was significantly higher in rCGG mice, in contrast to the mild differences on TSS or TES. This was particularly interesting since our previous data suggested that the intragenic 5hmC peaks are correlated with brain-specific gene expression (35). To further explore the correlation of 5hmC to gene expression, the aforementioned genes in wild-type and rCGG mice cerebella were ranked in descending order based on previously published wild-type mouse cerebellum RNA-seq data (see Materials and Methods). Interestingly, the strong 5hmC depletion, on TSS, and enrichment on the gene bodies were associated with genes with a higher expression profile in cerebellum. Thus, it is reasonable to speculate that the altered 5hmC on TSS, gene body and TES in rCGG mice may contribute to early onset of FXTAS by affecting gene with higher expression in the cerebellum.
Because DNA methylation occurs largely on CG dinucleotide regions termed CGI, we proceeded to examine the 5hmC distribution on defined CGI. As expected, the 5hmC levels in both wild-type and CGG mice were depleted on promoter CGI, but enriched on intragenic CGI (Fig. 2C and D), which correlates well with the observation in Figure 2A. However, a closer look at the 5hmC on promoters with differential CG abundance reveals that the depleted 5hmC was largely associated with high, but not intermediate or low, CpG promoters (HCP, ICP and LCP) as defined previously (49) (Supplementary Material, Fig. S1C and D). In addition, the rCGG mice overall 5hmC level was lower in all three promoter categories compared with wild-type controls (Supplementary Material, Fig. S1E). It had been suggested previously that 5hmC is located in cis-regulatory elements such as enhancers (41,50). Therefore, we plotted separately the wild-type and rCGG 5hmC in general enhancers and cerebellum-specific enhancers (47). Remarkably, the 5hmC level in rCGG mice was noticeably higher on cerebellum-specific enhancers than wild-type, but remained indistinguishable on general enhancers in the cerebellum (Fig. 2E and F), indicating explicit roles of 5hmC in the regulation of gene expression in the cerebellum, and loss of 5hmC in rCGG mice cerebellum-specific enhancer may cause selective mis-regulation of cerebellum-specific genes. Although the overall 5hmC level was lower in rCGG mice, gain of 5hmC on cerebellum-specific enhancers in rCGG mice could be a unique feature to account for their molecular abnormalities at early ages.
Identification and characterization of differentially hydroxymethylated regions (DhMRs) in a FXTAS mouse model
Since our present data suggested distinct 5hmC distribution patterns in wild-type and rCGG mice, we proceeded to identify and characterize the wild-type- and rCGG-specific DhMRs genome-wide, aiming to pinpoint specific loci that displayed altered 5hmC profiles between wild-type and rCGG mice (35,43). A total of 8658 wild-type-specific DhMRs and 4311 rCGG-specific DhMRs were identified (Fig. 3A, B and Supplementary Material, Tables S2 and S3). The lower number of rCGG-specific DhMRs was consistent with the global reduction of 5hmC in rCGG mice presented in Figure 1B. To investigate the defined genomic features associated with identified DhMRs, we annotated both wild-type- and rCGG-specific DhMRs of the mouse genome using HOMER (Hypergeometric Optimization of Motif EnRichment) software (51). HOMER annotations revealed that more wild-type-specific DhMRs were located on promoters and exons, whereas more rCGG-specific DhMRs were found on introns, repetitive elements and intergenic regions (Fig. 3C). To further explore the biological significance of these identified DhMRs, we used GREAT (Genomic Regions Enrichment of Annotations Tool) to perform gene ontology (GO) analyses for the wild-type- and rCGG-specific DhMRs (52). Remarkably, several GO biological processes associated with neuronal function and development in brain were illustrated (Fig. 3D and E). These included cerebellar Purkinje cell signaling in wild-type-specific DhMRs, as well as regulation of oligodendrocyte, negative regulation of glial cell and somatic motor neuron differentiation in rCGG-specific DhMRs (highlighted in red). In addition, mouse phenotype prediction also recapitulated several phenotypes in rCGG mice in a later disease stage, including the absence of Purkinje cell layers (Supplementary Material, Fig. S2).
We thought it would be interesting to examine the common DNA sequence motifs associated with identified DhMRs, and eventually to predict altered DNA-binding protein occupancy, such as important transcriptional factors. To this end, we used the Multiple EM for Motif Elicitation (MEME) suite to predict the common motifs in wild-type- and rCGG-specific DhMRs (53). One common motif for wild-type-specific DhMRs and two major motifs for rCGG-specific DhMRs were predicted with relative certainty (Fig. 4A and B, left panel). The identified motifs were further analyzed by TFBIND, a software tool for searching transcription factor binding sites on given DNA motifs (54). Several putative transcriptional factors associated with either wild-type- or rCGG-specific DhMRs were identified (Fig. 4A and B, right panel). Many of these identified transcriptional factors, such as cyclic AMP response element binding protein 1 (CREBP1 or CREB2) (55,56), activator protein 1 (AP-1) (57,58), cAMP response element-binding protein (CREB) (59,60) and Aryl hydrocarbon receptor (AHR) (61,62), have been strongly linked to neurogenesis and neurological activities (see Discussion). These findings indicate that 5hmC-mediated epigenetic modulation plays a potential role in the onset of FXTAS.
DhMR-associated genes partially overlap with altered ribosome-bound mRNAs in a FXTAS mouse model
We investigated detailed lists of genes associated with identified DhMRs based on HOMER annotation results. Genes with DhMRs located on the UTRs, exons, introns and promoters were reported; genes with their closest TSSs to the intergenic DhMRs were also included. A total of 6026 and 2969 genes were found to be associated with wild-type- or rCGG-specific DhMRs, respectively (Supplementary Material, Tables S4 and S5). Plotting of 5hmC mapped reads on the two identified gene lists confirmed that the differential 5hmC distribution was primarily on the gene body (Fig. 5A and B). Many genes identified in these lists are known to play critical roles in neurological development and functions, among them Gli1 (63) and Fmr1 (5) (Fig. 5C and D, see Discussion). Taken together, we have identified lists of genes associated with differential 5hmC levels between rCGG and wild-type mice, suggesting their potential role in early FXTAS pathogenesis.
Beyond this, we also applied translating ribosome affinity purification (TRAP) to investigate the early translational changes in rCGG mice compared with wild-type littermates when they are 4 weeks of age (Galloway et al., co-submitted). A total of 498 genes were found to have altered ribosomal profiling on their mRNAs in rCGG mice. We compared our DhMR-associated gene lists to this ribosomal profiling. Interestingly, 142 out of 498 genes identified in TRAP experiments overlapped with wild-type DhMR-associated genes, and 70 out of 498 genes overlapped with rCGG DhMR-associated genes (Fig. 5E and F). Pearson's Chi-squared test with Yates’ continuity correction indicated these correlations were significant (P-values were 3.784e−05 and 0.007686, respectively). These data indicated that those genes that underwent translational changes could also be manipulated on a transcriptional level, possibly through the alteration of 5hmC levels.
In the present study, we have systematically characterized the cerebellar 5hmC profiles in Fmr1 transgenic mice bearing PM lengths of CGG repeats on their 5′UTR compared with wild-type littermates. We observed global reduction of 5hmC in our rCGG FXTAS mouse model, primarily on annotated gene bodies, including exons, introns, UTRs and promoters. However, we have also detected a gain of 5hmC on several repetitive classes, as well as cerebellum-specific enhancers. Aberrant distribution of DNA methylation has long been linked to various neurological diseases. For example, Wu et al. (64) found that DNA methyltransferase 3a (Dnmt3a) deletion leads to impaired postnatal neurogenesis. Mechanistically, Dnmt3a methylates intergenic regions and gene bodies flanking proximal promoters, with its knockout inducing the silencing of genes related to neurogenic activity. In FXS, the expanded CGG repeats can become hypermethylated and silence Fmr1 transcription (5). One recent report investigated the dynamic change of TET proteins and 5hmC in neurogenesis and found an increased 5hmC level during neuronal differentiation from neural stem cells (65). During the preparation of this article, Wang et al. (43) published data demonstrating the global loss of 5hmC in HD. Thus, it appears a genome-wide loss of 5hmC might be a signature for a spectrum of neurodegenerative diseases. Although global 5hmC loss has been seen in HD, there were no significant differences on RefSeq genes or repetitive elements (43). Our data reveal a unique FXTAS-specific 5hmC feature that distinguishes it from other neurodegenerative diseases. Substantial 5hmC enrichment in gene body CGI and depletion in TSS CGI in normal mouse cerebellum has been described in our earlier report (41). The present data confirm this trend in both wild-type and rCGG mice cerebellum. The 5hmC promoter depletion and gene body enrichment are associated with gene expression profiles and promoter CGI density, as shown in Figure 2 and Supplementary Material, Figure S1, implying a strong relationship between 5hmC and transcription state. Furthermore, we have also observed gain of 5hmC in specific genomic regions, such as cerebellum-specific enhancers, but not regular enhancers. 5hmC is known to occupy enhancer regions in both human and mouse embryonic stem cells (41,66,67). The 5hmC enrichment on cerebellum-specific enhancers indicates its unique roles in regulating cerebellum-specific gene expression, and a possible role in influencing key cerebellar gene aberrant expression in rCGG mice. In the meantime, our data indicate a gain of 5hmC on several repetitive classes in rCGG mice, including on SINEs, LTRs and simple repeats. DNA methylation on the repetitive elements was well known to maintain genomic stability (46); therefore, the increased 5hmC may be due to conversion from 5mC and make the rCGG genome vulnerable.
One key question remains to be resolved: the cause of the genome-wide 5hmC alterations from introducing extra copies of CGG repeats on the 5′UTR of Fmr1 transgene. Previous data from us and others pointed to a RNA gain-of-function model, based on the observations that a number of RNA-binding proteins are immobilized and sequestered by CGG repeats (5,68). Thus, it is reasonable to speculate that the target specificity or enzymatic activity of TET proteins or their co-factors could be ectopically affected in rCGG mice transcriptionally or post-transcriptionally. Proteomic analysis has revealed several TET-binding proteins, among them Ogt (69–71), Nanog (72), PARP1 (73) and SIN3A (74). Whether those proteins can be directly or indirectly sequestered by CGG repeats is not clear. Another possible cause of the loss of 5hmC in rCGG mice may be the altered availability of 5hmC or 5mC binding proteins. Overexpression of MeCP2 reduces global 5hmC levels (35), and MeCP2 was later shown to also bind to 5hmC (34). It is important to note that the maintenance of proper 5hmC homeostasis is critical for normal neuronal functions.
The identification of wild-type- and rCGG-specific DhMRs allow us to focus on unique genomic loci with altered 5hmC levels. Direct GO analysis using these loci reveal their strong correlation with various neuronal functions and brain development. Several transcriptional factors that recognize DhMR common motifs have been identified and found to play key roles in neurogenesis and neurological activities. For example, Lai et al. (55) showed CREBP1 localizes to distal dendrites of rodent hippocampal neurons and serves as cargo for importin alpha translocation from distal synaptic sites to the nucleus. CREBP1 is also known to be important for memory and plasticity (56). Moreover, AP-1 is required for axonal regeneration and is predicted to play a central role in neuronal gene regulatory networks (57,58). GATA1 is commonly known to play a role in erythroid development, with recent evidence also suggesting a role for it in neuronal activity. Kang et al. (75) showed that expression of GATA1 in prefrontal cortex neurons caused the loss of dendritic spines and dendrites, leading to depressive behavior in rat models of depression.
Other transcriptional factors, such as CREB, ATF, XBP1, AHR and USF1/2, are known to recognize rCGG-specific DhMR common motifs. Among those, CREB has been well characterized in neuronal plasticity and long-term memory formation (76), and changes in CREB function influenced specific populations of cells to achieve particular memory or learning activity (59,60). XBP1 is a key transcriptional factor for the unfolded protein response, and recent data suggested a function for it in normal brain development, as well as neurological diseases such as HD (77,78). AhR and Arnt proteins are present in mouse cerebellum from birth throughout postnatal development, and functional AhR can disrupt neural progenitor cell proliferation (61,62). Greenberg and colleagues (79) showed that upstream stimulatory factors (USF1/2) control gene expression in neurons, such as brain-derived neurotrophic factor, in a calcium-dependent manner. Overall, these transcriptional factors apparently participate in a wide range of neuronal activities and may lose or gain occupancy on these motifs when their binding sites are differentially hydroxymethylated, thus accounting for the FXTAS pathogenesis.
We have also generated detailed gene lists with DhMRs located on either the gene body or the nearest open reading frame. Many of the genes with differential 5hmC levels in rCGG mice are involved in critical neuronal functions. For example, the gene Gli1 shows reduced 5hmC on its gene body in rCGG mice compared with wild-type. This gene has been reported to play a critical role in modulating granule neuron precursor cell proliferation in the cerebellum by binding to hundreds of cerebellar gene promoters (63). In contrast, endogenous Fmr1 shows acquisition of 5hmC on the gene body in rCGG mice. Fmr1 mRNA is overexpressed in FXTAS patients and shows an increased ribosomal association in the TRAP experiments. The gain of 5hmC is likely to play a role in this process. The accurate correlation between 5hmC levels and gene expression needs to be studied further.
The TRAP methodology allows researchers to purify tagged ribosome, as well as its bound mRNA they are reading, for downstream high-throughput analysis (80). The Nelson laboratory has identified nearly 500 mRNA transcripts that showed altered ribosomal profiles in rCGG mice only 4 weeks old. Their data suggest an early abnormal translational profile, before any phenotypes developed. In line with their reported genes, we have found ∼30 and ∼14% overlap with wild-type- and rCGG-specific DhMR-associated genes. These data could imply the rCGG mice display abnormal gene expression on both transcription and translation stage, arguing that an ectopic condition has occurred, even before there is a noticeable phenotype. Recently, Vitamin C has been found to alter global 5hmC profile by controlling TET activity (81). Thus, 5hmC may make a good target for future drug development to treat PM carriers or even FXTAS patients.
In summary, our data presented here offer the first systematic 5hmC profile in a FXTAS mouse model, demonstrating a correlation of genome-wide 5hmC changes as a possible mechanism of FXTAS disease pathogenesis. Our research also sheds lights on possible future therapeutic interventions that can be used before the PM carriers develop FXTAS phenotypes.
MATERIALS AND METHODS
The FXTAS mouse model was described previously (21). Three 16-week-old rCGG transgene-positive mice were sacrificed at the Baylor College of Medicine, together with three age-matched wild-type littermates. Whole brains were dissected and flash-frozen in liquid nitrogen. Samples were transferred following the procedure of the Emory Incoming Material Transfer Agreement. Cerebella of each wild-type and rCGG mouse were dissected upon arrival and stored in −80° freezers. It has been previously reported that the rCGG mice start to show inclusions by 8 weeks of age and behavioral deficits at 20 weeks (21). We decided to investigate 5hmC distribution in the rCGG mice at 16 weeks of age without obvious phenotypes, to understand the impact of global 5hmC alteration before FXTAS onset.
Genomic DNA isolation and 5hmC-specific enrichment
Cerebella samples were homogenized on ice by 2 ml Dounce tissue grinder in lysis buffer [100 mm Tris–HCl (pH 8.5), 5 mm EDTA, 0.2% SDS (vol/vol), 200 mm NaCl]. Proteinase K was freshly added to the buffer, and samples were kept at 55°C overnight. An equal volume of phenol:chloroform:isoamyl alcohol [25:24:1, saturated with 10 mm Tris (pH 8.0) and 1 mm EDTA; P-3803, Sigma] was added at the second day, mixed completely and centrifuged at top speed on desktop centrifugation for 10 min. Supernatant was then mixed with an equal volume of isopropanol to precipitate DNA. DNA pellets were washed with 70% ethanol and eluted with 10 mm Tris–HCl (pH 8.0).
Chemical labeling-based 5hmC enrichment was described previously (36). Briefly, DNA was sonicated to 100–500 bp, and then mixed with 100 µl solution containing 50 mm HEPES buffer (pH 7.9), 25 mm MgCl2, 250 µm UDP-6-N3-Glu and 2.25 µm wild-type β-glucosyltransferase. Reactions were incubated for 1 h at 37°C. DNA substrates were purified via Qiagen DNA purification kit. 150 µm dibenzocyclooctyne modified biotin was then added to the purified DNA, and the labeling reaction was performed for 2 h at 37°C. The biotin-labeled DNA was enriched by Streptavidin-coupled Dynabeads (Dynabeads® MyOne™ Streptavidin T1, Life Technologies) and purified by Qiagen DNA purification kit.
Library preparation and high-throughput sequencing
5hmC-captured libraries were generated by the NEBNext ChIP-Seq Library Prep Reagent Set for Illumina according to the manufacturer's protocol. In short, 25 ng of input genomic DNA or 5hmC-captured DNA were used. DNA fragments between 150 and 300 bp were gel-purified after the adapter ligation step. An Agilent 2100 BioAnalyzer was used to quantify the amplified DNA. 20-pM diluted libraries were eventually used for sequencing. We performed 38-cycle single-end sequencing using Version 4 Cluster Generation and Sequencing Kits (Part #15002739 and #15005236 respectively) and Version 7.0 recipes. Image processing and sequence extraction were done using the standard Illumina Pipeline.
Sequence alignment and mapped reads annotation
FASTQ sequence files from biological replicates were separately aligned to mouse NCBI37v1/mm9 references using Bowtie 0.12.9. Each unique mapped read (.bed files) with no more than two mismatches in the first 25 bp was concatenated to achieve combined wild-type and rCGG 5hmC sequence. Chromosome-wide densities were determined as reads per chromosome divided by the total number of reads in millions. Expected values were determined by dividing 106 by the total genome length and multiplying by chromosome length. Association of mapped reads with genomic features was performed by overlapping reads files with known genomic features obtained from UCSC Tables for NCBI37v1/mm9, including 5′ UTR, exon, 3′ UTR, ±500 bp of TSS, CGI (±500 bp of CGI, Intergenic/Intragenic based on RefSeq Whole Gene).
Unique mapped reads were aligned to the RepeatMasker track of NCBI37v1/mm9 using Bowtie–q–best parameters, allowing no more than two mismatches across the entire 38 bp read. Aligned reads were assigned to repeat class annotations defined by RepeatMasker. 5hmC reads were counted in each repetitive class, and then divided by total aligned reads by Bowtie to achieve their percentage.
Mapped reads plotting to various genomic features and heat map generation
Unique 5hmC mapped reads were plotted to various genomic regions using an R program package termed ngsplot (48). Five kilobase upstream and downstream of given genomic features were divided into 100 bins; unique mapped reads were calculated in each bin and then normalized to reads per millions. The gene bodies were defined between TSS and TES. Relative gene positions were calculated due to the variable gene length. The various genomic regions were defined by ngsplot database or downloaded from mm9 table browser. Cerebellum RNA-seq data, cerebellum-specific enhancers and cerebellum general enhancers were downloaded from the ENCODE project (47). Mouse cerebellum RNA-seq data were also obtained from reference (47), and the 5hmC reads mapped to each gene were ranked in descending order based on their FPKM value (Fragments Per Kilobase of transcript per Million mapped reads) from the RNA-seq data. Heat maps were generated and visualized by ngsplot or java treeview.
DhMR identification, annotation and motif analysis
We used Model-based Analysis of ChIP-Seq (MACS) software (82) to identify DhMRs between wild-type and rCGG mice by directly comparing one to the other, rather than comparing to the input. The effective genome size = 1.87 × 109, tag size = 38, bandwidth = 200, P-value cutoff = 1.00 × 10−5. Identified wild-type- and rCGG-specific DhMRs were annotated to various genomic regions and associated genes by HOMER software (51). DhMRs were also analyzed by GREAT for GO analysis. The MEME suite was used to predict the common motifs in the top 20% of wild-type- and rCGG-specific DhMRs based on the P-value cutoff (53). Motifs with an E-value of <10−3 were considered to be significant. Identified motifs were input to the TFBIND, a software tool for searching transcription factor binding sites on given DNA motifs (54).
Read coverage and visualization
Genomic views of read coverage were generated using the Integrated Genomics Viewer tools and browser (IGV 1.4.2, http://www.broadinstitute.org/igv/), with a window size of 25 and a read extend of 200.
Pearson's χ2 test with Yates’ continuity correction and group t-tests were performed by using R software (http://www.r-project.org/).
Sequencing data have been deposited to GEO with accession number GSE49463.
This work was supported in part by the FRAXA Research Foundation and NIH grants NS051630 (P.J. and D.L.N.).
The authors would like to thank C. Strauss for critical reading of the article.
Conflict of Interest statement. None declared.