-
PDF
- Split View
-
Views
-
Cite
Cite
John F Fullard, Claudia Giambartolomei, Mads E Hauberg, Ke Xu, Georgios Voloudakis, Zhiping Shao, Christopher Bare, Joel T Dudley, Manuel Mattheisen, Nikolaos K Robakis, Vahram Haroutunian, Panos Roussos, Open chromatin profiling of human postmortem brain infers functional roles for non-coding schizophrenia loci, Human Molecular Genetics, Volume 26, Issue 10, 15 May 2017, Pages 1942–1951, https://doi.org/10.1093/hmg/ddx103
- Share Icon Share
Abstract
Open chromatin provides access to DNA-binding proteins for the correct spatiotemporal regulation of gene expression. Mapping chromatin accessibility has been widely used to identify the location of cis regulatory elements (CREs) including promoters and enhancers. CREs show tissue- and cell-type specificity and disease-associated variants are often enriched for CREs in the tissues and cells that pertain to a given disease. To better understand the role of CREs in neuropsychiatric disorders we applied the Assay for Transposase Accessible Chromatin followed by sequencing (ATAC-seq) to neuronal and non-neuronal nuclei isolated from frozen postmortem human brain by fluorescence-activated nuclear sorting (FANS). Most of the identified open chromatin regions (OCRs) are differentially accessible between neurons and non-neurons, and show enrichment with known cell type markers, promoters and enhancers. Relative to those of non-neurons, neuronal OCRs are more evolutionarily conserved and are enriched in distal regulatory elements. Transcription factor (TF) footprinting analysis identifies differences in the regulome between neuronal and non-neuronal cells and ascribes putative functional roles to a number of non-coding schizophrenia (SCZ) risk variants. Among the identified variants is a Single Nucleotide Polymorphism (SNP) proximal to the gene encoding SNX19. In vitro experiments reveal that this SNP leads to an increase in transcriptional activity. As elevated expression of SNX19 has been associated with SCZ, our data provide evidence that the identified SNP contributes to disease. These results represent the first analysis of OCRs and TF-binding sites in distinct populations of postmortem human brain cells and further our understanding of the regulome and the impact of neuropsychiatric disease-associated genetic risk variants.
Introduction
The 3-dimensional structure of chromatin plays a central role in the regulation of gene expression and is essential for the maintenance of cell identity and function (1). Specific patterns of open chromatin expose the repertoire of cis regulatory elements (CREs) required to regulate transcription in a given cell or cell-type. A large proportion of neuropsychiatric disease-associated loci are non-coding and exert their effects by disrupting CREs required for the correct spatiotemporal expression of genes (2–5). Importantly, cis regulation of gene expression is often specific for tissue and even cell type (3,4). Correspondingly, disease-associated loci have been shown to be enriched for CREs in tissues and cells relevant to the pathophysiology of disease (2,4–6).
Employing existing epigenome data to further our understanding of neuropsychiatric diseases is hindered by the fact that much of the epigenomic landscape remains unexplored in the relevant cells: The Encyclopedia of DNA Elements (ENCODE) consortium (4,7), Roadmap Epigenomics Mapping Consortium (REMC) (3,8), and FANTOM5 (9) all focused on actively dividing cells or used only homogenate brain tissue. A limitation to the latter approach is that the resultant data was derived from a mixture of markedly different cells such as neurons, microglia, oligodendrocytes, and astrocytes. Because CRE-mediated epigenetic regulation shows cell-type specificity (3,4), the study of mixed cell populations can fail to detect cell-type-specific signals. Furthermore, in studies using homogenized brain tissue, the proportion of each cell type contributing to the assay is undetermined, further increasing sample-to-sample variability.
Here we present, to our knowledge, the first cell-type-specific map of open chromatin regions (OCRs) of postmortem human frontopolar prefrontal cortex. We identify thousands of cell-type-specific OCRs from 2 broad cell types (neuronal (NeuN+) and non-neuronal (NeuN-)) isolated from frozen postmortem tissue by fluorescence-activated nuclear sorting (FANS). In addition, we examine transcription factor (TF) binding to interrogate cell-type differences in the regulation of gene expression. Finally, a number of the identified OCRs and TF-binding sites coincide with known schizophrenia (SCZ) risk loci, allowing us to pin-point the likely causative SNP therein and providing evidence for a putative mechanism by which these variants contribute to the etiology of SCZ.
Results
Profiling chromatin accessibility in neuronal and non-neuronal cells
Our aim was to catalog the OCRs in human brain tissue and examine differences among neuronal and non-neuronal cells. We assessed the landscape of OCRs in neuronal and non-neuronal nuclei extracted from 8 controls samples (Supplementary Material, Table S1) using ATAC-seq (Fig. 1A). We examined the effects of technical variability on ATAC-seq profiles and found a high concordance of read coverage among technical replicates (Pearson’s r = 0.97) (Supplementary Material, Fig. S1A). The average number of uniquely mapped and non-duplicated paired-end reads per sample was 54.8 million, with low mitochondrial DNA contamination (Supplementary Material, Table S2). After exploratory analysis (Supplementary Material, Fig. S1B), a total of 72,033 and 68,606 peaks were defined in neuronal and non-neuronal samples, respectively (Supplementary Material, Fig. S1C). For quantitative analysis of differences among neuronal and non-neuronal samples, we generated a matrix of mapped reads using a final consensus of 115,021 peaks. Unsupervised hierarchical clustering identified a clear distinction between neuronal and non-neuronal samples. (Fig. 1B). Comparison of neuronal vs. non-neuronal peaks identified more than 50% of OCRs that were significant after multiple testing corrections (60,653 differentially modified OCRs at FDR ≤ 0.01) (Fig. 1C, Supplementary Material, Table S3). Among these, 33,054 were neuronal and 27,599 were non-neuronal, with a moderate to large average fold change (FC) (median FC 2.19; range 1.46–15.86, Fig. 1D). For example, a neuron-specific regulatory region is positioned ∼29kb upstream of CADM3 (Cell adhesion molecule 3), a gene that is highly expressed in cortical pyramidal cells (10) (Fig. 1E). A non-neuronal region is positioned within the transcription start site of the shorter TGIF1 (TGFB-Induced Factor Homeobox 1) isoform, a gene with high expression in microglia (11) associated with Holoprosencephaly-4 (OMIM: 142946) (Fig. 1F). Overall, our analysis has identified significant differences in the epigenome landscape of OCRs among neuronal and non-neuronal populations.

Differential analysis of chromatin accessibility in neuronal and non-neuronal cells. (A) Schematic outline of study design. Briefly, nuclei isolated from frozen frontopolar prefrontal cortex specimens were separated in to NeuN + (neuronal) and NeuN- (non-neuronal) fractions by FANS. 50,000 nuclei from each cell population were then subjected to transposition reactions followed by library amplification and sequencing (50bp paired-end reads). (B) Unsupervised hierarchical clustering of ATAC-seq data. (C) Volcano plot showing the distribution of -log10 p-value and log2 fold-change of differential chromatin accessibility analysis across 115,021 OCRs. (D) Distribution of log2 fold-change of differential chromatin accessibility analysis for 33,054 neuronal and 27,599 non-neuronal OCRs. Averaged cell-type and gender-specific ATAC-seq signals at (E) CADM3 and (F) TGIF1 gene loci. Positive and negative logFC indicate neuronal and non-neuronal differential signals, respectively. Red box indicates the significant differentially accessible region in (E) neuronal and (F) non-neuronal cells.
Annotation of the cell-type-specific regulome in neuronal and non-neuronal cells
We first examined the location of OCRs in regards to distance from the transcription start site (TSS) and genic annotations. OCRs are in the vicinity of TSSs (Fig. 2A). Relative to non-neuronal OCRs, the differentially accessible regions of neurons are enriched for distal regulatory elements [10kb – 100kb downstream of TSSs (OR = 0.7, P = 9.8×10−39) and introns (OR =1.2, P = 6.5 × 10−130)], suggesting a more important role for long-range regulation of gene expression in neurons (Fig. 2B and C). We next examined if the identified OCR regions were under evolutionary constraint as evidenced by higher Genomic Evolutionary Rate Profiling (GERP) scores (12). High GERP score was found for both neuronal and non-neuronal OCRs (Fig. 2D). Interestingly, neuronal OCRs exhibit markedly higher GERP scores than non-neuronal OCRs (0.26 vs 0.14, P = 7.7 × 10−14 in a two-sided t test), which is more prominent in intergenic and promoter regions (two-sided t test P < 2.2 × 10−16 for both comparisons), respectively (Supplementary Material, Fig. S2). Overall, compared to those of non-neurons neuronal OCRs tend to be located distally from TSSs and are more evolutionarily conserved.
![Annotation of the neuronal and non-neuronal regulome. (A) Average read count frequency of OCRs in TSS regions. Confidence interval estimated by bootstrap method. (B) Distribution of all peaks and differential OCRs relative to TSS. (C) Distribution of genomic features of all and differential OCRs. (D) Average GERP score as a function of distance from the center of [-1000bp, 1000bp] of all peaks and differential OCRs. Curves and their 95% confidence intervals are calculated on a 50 bp sliding window. (E) Enrichment of all peaks and differential OCRs in various human prefrontal brain tissue chromatin states. (F) Gene Ontology terms for differential OCRs enriched in neuronal (dark red) and non-neuronal (dark green) samples. (G) Enrichment of differential OCRs for neuronal (dark red) and non-neuronal (dark green) cell-type-specific markers.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/hmg/26/10/10.1093_hmg_ddx103/4/m_ddx103f2.jpeg?Expires=1748790465&Signature=FqpmFhDAAZ3OLqth5JDs-VSn7RG~f~pWzxyi~sZMbRIKc-LeZwribPn7G-oJSRGWS-yphiGYrYbdfJsmaa6MmLkmsYzuuBtj-oF4zVKcAe8cnAzePFix4bDPCrmJLPQp6GXtNePpaUDkUYNOPbPJxf2gJFPgAWlEJ69xObo8s7g3fkFBbDE939yf2qRM~U44JAYqno3A4UAZHVxGQqSkDU6vX4YzbaRm5GkJwGrMVL1kVaH1593TbNSmo7UisS0mMvjXMQ6Did0LmeDLaCnX5xczqGsxFVyTqilTmEDCXJucM5hN3qmejOgcPh~wYeKjTy3WnuSt4Pz~cA2-CSgGKQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Annotation of the neuronal and non-neuronal regulome. (A) Average read count frequency of OCRs in TSS regions. Confidence interval estimated by bootstrap method. (B) Distribution of all peaks and differential OCRs relative to TSS. (C) Distribution of genomic features of all and differential OCRs. (D) Average GERP score as a function of distance from the center of [-1000bp, 1000bp] of all peaks and differential OCRs. Curves and their 95% confidence intervals are calculated on a 50 bp sliding window. (E) Enrichment of all peaks and differential OCRs in various human prefrontal brain tissue chromatin states. (F) Gene Ontology terms for differential OCRs enriched in neuronal (dark red) and non-neuronal (dark green) samples. (G) Enrichment of differential OCRs for neuronal (dark red) and non-neuronal (dark green) cell-type-specific markers.
OCRs are enriched for active and poised promoters and active and repressed enhancers derived from published histone modification data from homogenates of human prefrontal cortex (3) (Fig. 2E). Systematic comparison of cell-type-specific OCRs with published regulatory sequences across multiple tissues showed a strong enrichment in neuronal OCRs for brain-related enhancers, promoters and OCRs (Supplementary Material, Fig. S3). The non-neuronal OCRs show lower specificity for brain related CREs. To further explore the identified OCRs, we conducted pathway analysis by annotating peaks to proximal genes. In this analysis, neuronal peaks showed a significant enrichment for the terms synaptic transmission, cell-cell signaling, and cellular morphogenesis (Fig. 2F), whereas non-neuronal peaks were enriched for pathways related to protein kinase binding and telencephalon development. Next, we examined the overlap between genes proximal to the differentially accessible elements and cell type-specific markers (10,11). We detected a highly significant overlap between the set of predicted human neuronal genes and the set of cell type-specific genes for neurons, including pyramidal cells and interneurons (Fig. 2G). For non-neuronal OCRs, we observed a strong enrichment for oligodendrocytes and astrocytes. Overall, neuronal OCRs show stronger enrichment with published brain-related enhancers and are also enriched for molecular pathways related to neuronal function, including synaptic transmission and cell-cell signaling.
Cell-type-specific transcription factor binding and gene regulation
We performed TF-binding analysis based on a set of 432 motifs (13). Out of these motifs, we focus on 179 as they are enriched in the neuronal or non-neuronal OCRs compared to the genomic background. In addition to being enriched in motif-binding sites, the shape of the reads within ATAC-peaks can also be used to infer TF binding, where the actual binding site shows a “footprint” as the bound TF sterically protects the DNA from transposase integration (14,15) (Fig. 3A). In order to improve TF-binding prediction, we applied Protein Interaction Quantification (PIQ) (16), which, for each motif, learns the pattern of transposition insertion in the vicinity of potential binding sites. To measure the relative importance of each TF in the neuronal and non-neuronal samples, we used the output of PIQ to calculate motif scores and compared them against each other (Fig. 3B, Supplementary Materials, Tables S4 and S5). We note that similar results to the one described below could be obtained by comparing the enrichment of motifs in the ATAC-seq peaks regardless of footprinting (Supplementary Material, Fig. S4).

Transcription factor and gene regulome in neuronal and non-neuronal cells. (A) The average transposase insertion probability at all predicted binding CTCF- binding sites within neuronal and non-neuronal OCRs. (B) Motif scores in neuronal and non-neuronal cells. Based on the ratio of the motif scores between the two samples, the cell-type specificity of the motifs is categorized as probable (≥1.25) and definite (≥1.5). Motifs with an enrichment of matches within motifs compared to the genomic background of less than 1.25 were categorized with “no enrichment”. Based on the relative occurrence of motifs in promoter peaks compared to all peaks, the motifs were categorized as promoter depleted (≤1/1.25) or enriched (≥1.25). We highlight the most enriched motifs for all TFs (Left panel). A zoomed version (Gray box) is illustrated on the right panel. Here, the most cell-type-specific TFs are highlighted, listing only the most significant TF of each TF-family. (C) Two CTCF predicted sites near the 3’ and 5’ ends of the TRPM3 gene were detected only in neuronal cells. The exact location of the CTCF-binding motif for the 5’-CTCF and 3’-CTCF-binding sites is illustrated on the right. (D) The regulatory divergence of TF targets between the neuronal and non-neuronal samples. (E) The regulatory divergence of genes between the neuronal and non-neuronal samples. (F) Gene Ontology terms enriched in the top 1000 most divergently regulated genes.
The most prominent cell-type-specific TF differences were observed for promoter-depleted TFs. The TFs most specific for neurons were the FOS/JUN families (which together form the AP1-complex), showing a 2.7-fold higher motif score in neuronal cells. These TFs also showed greater than two-fold enrichment in neuronal peaks with, conversely, a slight depletion in peaks of non-neurons (Supplementary Material, Table S5). For non-neuronal cells, the two most specific motifs were ONECUT1/2/3 and PAX3/7, of which the latter has the most established role in brain function (17,18). We further note that the TF motifs specific to either the neuronal or non-neuronal cells appear to fall into distinct families. For instance, the Basic helix-loop-helix (bHLH) family seems to be primarily active in neuronal cells whereas the homeodomain/sox TF families seem to be predominantly non-neuronal (Supplementary Material, Table S4). On the other hand, several motifs of general TFs are identified mostly in promoter regions and show a high motif score and enrichment in both cell types (Fig. 3B). We provide an illustrative example of cell-type-specific TF-binding sites for TRPM3 (Transient Receptor Potential Cation Channel, Subfamily M, Member 3), where two CTCF sites are predicted only in neuronal samples (Fig. 3C). Overall, we identified a distinct repertoire of TFs for neuronal and non-neuronal cells, including AP1-complex and bHLH family which has an important role in neuronal function (19,20) and shows preferential activity for neurons in our data.
We next examined the likelihood that the TFs were regulating the same or different genes in the neuronal and non-neuronal samples by calculating the regulatory divergence (21). The two most divergent motifs jointly represented the neurogenin and neuroD TF families (Fig. 3D), which have an opposite effect on neuronal and non-neuronal function as they promote neurogenesis and inhibit gliogenesis (22). At the other end of the spectrum, the least divergent TFs consisted of many general TFs, including NFYB and NFYA (Supplementary Material, Table S5). As an alternative approach, we applied the concept of regulatory divergence to identify genes that showed either a similar or a dissimilar pattern of TF regulation in the two cell types (Fig. 3E). The most divergently regulated gene was SYT1, which encodes Synaptotagmin 1, a neuronally expressed protein known to mediate synaptic vesicle exocytosis (23). Pathway analysis of the top 1000 most divergently regulated genes showed significant enrichment for biological processes related to G-protein coupled receptor (GPCR) activity (Fig. 3F). This is consistent with recent findings demonstrating the cell-type-specific expression of GPCR genes in neuronal and non-neuronal cells (10).
Enrichment of SCZ risk loci for OCRs and TF-binding sites
As an initial analysis, we tested if the identified OCRs were enriched in risk loci for SCZ (24), Alzheimer’s disease (AD) (25) and non-neuropsychiatric diseases (26–29), using an empirical Bayes approach (30). This statistical approach leverages Genome Wide Association Study (GWAS) summary statistics and identifies functional annotations that are significantly enriched for risk loci. We found enrichment for SCZ in neuronal [log2 enrichment (95% Confidence Interval or 95CI) = 2.11 (1.21–2.73)] and non-neuronal [log2 enrichment (95CI) = 1.57 (0.01–2.38] OCRs (Supplementary Material, Fig. S5A). This is consistent with the tissue-specific enrichment of OCRs with relevant diseases (2,4). In addition, compared to non-neuronal OCRs, we found a stronger enrichment in neuronal OCRs, thereby providing additional support for neurons as the functional unit affected by SCZ susceptibility loci.
Given the significant enrichment of OCRs with SCZ loci, we next tested whether OCRs within specific genic annotations are more enriched in SCZ. The most enriched annotations were neuronal introns [log2 enrichment (95CI) = 2.95 (2.10–3.55)] and non-neuronal promoters [log2 enrichment (95CI) = 3.33 (2.38–3.98)] (Supplementary Material, Fig. S5B). These two annotations were used to build a combined model to identify the functional variant underlying each disease-associated locus. This approach incorporates functional data and reweighs GWAS SNPs, allowing for the identification of variants with a higher likelihood of being functional compared to other SNPs in the same locus. This yielded 29 SNPs that localize within 20 out of the 108 GWAS SCZ loci as the most likely candidates to be the causal polymorphisms in each region (Supplementary Material, Table S6 OCR model). For 19 out of the 20 GWAS SCZ loci, the functional SNP is not the GWAS index SNP. We found a substantial increase of ∼10 folds for the likelihood (estimated based on the fitted empirical prior probability) of a functional SNP (average prior = 0.38%) to be the causal polymorphism in this region compared to the index GWAS SNP (average prior = 0.036%). For 13 out of the 20 SCZ risk loci, we also identified an effect of the putative functional SNP on gene expression using expression quantitative trait analysis from the CommonMind Consortium (31).
SCZ causal variants might affect the binding sites for distinct classes of transcription factors in neuronal and non-neuronal cells. To examine this hypothesis, we performed enrichment analysis of SCZ GWAS summary statistics with TF-binding sites detected in neuronal and non-neuronal cells. This analysis was run by considering each TF separately (single TF model). We identified 15 TFs derived mostly from neuronal (11 TFs) compared to non-neuronal (4 TFs) cells that were enriched with SCZ genetic variants (Fig. 4A). In order to examine the combinational effect of the significant TFs in SCZ, we combined the 15 TFs in a joint model (combined TF model), using cross-validation to overcome overfitting in the model. Our best-fitting model included 4 TFs: ZSCAN10, NANOG/NANOGP1, CEBPZ and ZNF354C, all of which were specific to neuronal cells (Fig. 4A). The single and combined TF models were used to reweigh the SCZ GWAS and identified 7 SNPs that localize within 6 out of the 108 GWAS SCZ loci as the candidates most likely to be the causal polymorphisms in each region (Supplementary Material, Table S6 TF model). In this analysis, none of the functional SNPs were the GWAS index SNP. We found a substantial increase of ∼100 folds for likelihood (estimated based on the fitted empirical prior probability) of a functional SNP (average prior = 1.91%) to be the causal polymorphism in this region compared to the index GWAS SNP (average prior = 0.019%). For 3 out of the 6 SCZ risk loci, we also identified an effect of the putative functional SNP on gene expression of 4 transcripts (MAN2A1, SNAP91, MRAP2 and SNX19) using expression quantitative trait analysis from the CommonMind Consortium (31) (Supplementary Material, Table S6 TF model).

Integration of OCRs and TF-binding sites with SCZ risk variants and in vitro validation. (A) Enrichment of predicted TF sites within SCZ risk loci from neuronal and non-neuronal cells using a single TF model (top panel) and a joint model (bottom panel). The maximum-likelihood estimates and 95% confidence intervals of the enrichment parameter for each TF is illustrated. Annotations are ranked based on the improvement of the likelihood of the model (at the top are those that improved the likelihood the most). The number in parenthesis is the likelihood of the model for each TF (single model) or the combined model. (B) Regional plot surrounding the SNX19 locus. The top panel shows a plot of the r2 and P values for association with SCZ, including the index (rs10791097) and the functional (rs10750450) SNP. In the middle panel (Prior TF) is the fitted empirical prior probability based on the TF combined model and the positions of the TFs for this region. The overlap of the SNP with the binding sites of ZSCAN10 and ZNF354C are illustrated. In the lower panel (Prior OCR) is the fitted empirical prior probability and position of the OCR model. Note the difference in the fitted empirical prior probability between the TF and OCR model for the functional SNP (in gray box). (C) Examining the effect of rs10750450 risk-T and reference-G alleles and the binding motifs of ZNF354C and ZSCAN10 on transcriptional activity in in vitro experiments. Top panel is a graphical representation of the rs10750450 risk locus. All constructs result in increased luciferase activity in comparison to empty pGL4.24 vector. Compared to the reference G allele, the risk T allele enhances the luciferase expression, as does excision of the ZNF354C or ZSCAN10-binding site. Excision of the rs10750450 nucleotide gives rise to a further increase of luciferase expression and a similar effect is observed upon simultaneous excision of both ZNF354C and ZSCAN10-binding sites.
Supplementary Material, Figure S6 shows the distribution of the prior probability of functional SNPs across the 108 GWAS SCZ loci. Compared to the neuronal introns and non-neuronal promoters combined model, the TF models (single and combined) include multiple functional SNPs with high likelihood (priors > 1%) in SCZ loci (Supplementary Material, Fig. S6). A subset of variants has additional support for a functional role, affecting gene expression abundance of specific transcripts. Figure 4B shows an illustrative example for a locus near the SNX19 gene, where the combined TF model identified rs10750450 as the most likely causal variant in this region. This SNP has a P value of 2.19 × 10−12, which is close to the level of significance of the index SNP of that locus (rs10791097; P = 1.56 × 10−12). However, this SNP falls in the proximity of 2 TF-binding sites (ZSCAN10 and ZNF354C), leading the model to assign a prior (14.58%) that is almost three orders of magnitude higher than the prior of the index SNP (0.02%).
Functional validation of rs10750450 on transcriptional activity
To validate this finding, we examined the effect of rs10750450 risk-T and reference-G alleles and the binding motifs of ZNF354C and ZSCAN10 on transcriptional activity in in vitro experiments (Supplementary Material, Table S7). We found a significant effect of the different constructs on luciferase activity (ANCOVA: F = 42.73, df = 6, P < 2e-16) (Fig. 4C). All constructs resulted in increased luciferase activity ranging from 47% to 128% (all adjusted Ps < 0.0005) compared to empty pGL4.24 vector. Compared to the reference G allele, the risk T allele results in a 34% increase in luciferase activity (adjusted P = 1.4×10−4). rs10750450 is in proximity to binding sites for ZNF354C and ZSCAN10. Our data support a role for both TFs as transcriptional repressors, as excision of ZNF354C (34% increase, adjusted P = 1.2×10−4) or ZSCAN10 (36% increase, adjusted P = 6.4×10−5) binding sites result in increased luciferase activity compared to the reference G allele. We note that excision of both ZNF354C and ZSCAN10-binding sites resulted in a further increase of luciferase activity (47% increase, adjusted P = 1.5×10−7) and a similar effect was observed when we removed the rs10750450 nucleotide (56% increase, adjusted P = 1.0×10−9). Overall, our results indicate that, compared to the reference sequence, the schizophrenia risk allele leads to increased luciferase activity which is consistent with the association of the risk allele with increased SNX19 gene expression (32). Based on our data, this effect appears to be mediated through an allele-specific effect of the rs10750450-T allele on ZNF354C and ZSCAN10 binding, leading to de-repression of transcriptional activity.
Discussion
Recent genetic studies have implicated numerous common risk variants in SCZ (24). One of the next challenges is to further understand the biological mechanisms of the large number, and diversity, of genes that are associated with SCZ. To that end, we need to generate additional data capturing putative molecular processes that are relevant to the development of the disease. Most SCZ risk variants are found within non-coding regions of the genome and are predicted to disrupt the function of CREs. We explored SCZ genetic architecture by leveraging, for the first time to our knowledge, cell-type-specific OCRs mapped in human brain tissue. While SCZ-associated abnormalities have been demonstrated in neuronal and non-neuronal cells (2,33), here we demonstrate that SCZ risk variants show a higher enrichment in neuronal OCRs. This is consistent with genetic findings implicating genes that participate in neuronal function and synaptic transmission in the etiology of SCZ (24,34,35). It is worth noting that we did not find a significant association between risk loci for AD and OCRs of the brain. This is consistent with recent findings that suggest AD may have an immune cell component, as AD disease variants are enriched in enhancers of peripheral blood cells (3,36,37).
Comparison of neuronal and non-neuronal OCRs demonstrated the following interesting observations: First, although we observed an increased frequency of accessible elements for both cell-types in proximity to transcription start sites (TSSs), distal regulatory regions appear to play a more critical role in neurons when compared to non-neurons. This finding compliments a previous study, using an independent cohort of brain samples, which reported that, compared to non-neurons, neuron-specific methylated regions tend to be located distally from TSSs and are enriched within predicted enhancer elements (38). Second, while both neuronal and non-neuronal OCRs are evolutionary conserved, those of neurons appear to be under stronger functional constraint than other brain cells. Third, we also observe cell-type differences in the regulation of gene expression between the two cell types with promoter-depleted TFs showing the most marked differences between cells. Based on our analysis, the TFs with a higher binding affinity for neurons were among the FOS/JUN (39) and bHLH families (40) whereas PAX3/7 (41) and the homeodomain/sox (42) TF families display non-neuronal preference.
To further refine the OCRs, we performed TF digital footprinting analysis, which provides higher resolution of the functional genomic regions (from ∼1kb average OCR size to ∼10bp for TF-binding site) and assigns a potential functional role based on known TFs. Comparing TF to OCR data, we found a substantial increase of ∼5 folds for likelihood of a functional SNP being the causal polymorphism in certain SCZ risk regions. In addition, we identified a subset of TFs that are highly enriched in SCZ loci, including ZSCAN10, NANOG/NANOGP1, CEBPZ and ZNF354C, all of which were specific to neuronal cells. While little is known about the function of CEBPZ and ZNF354C, both NANOG and ZSCAN10 have been shown to play a role in the maintenance of pluripotency in embryonic stem cells (ESCs). NANOG is a downstream target of ZSCAN10 transcriptional activity (43) and NANOG expression is thought to be restricted to ESCs, becoming progressively down-regulated during differentiation and embryonic development (44,45). This raises the question as to why we detect TF footprints for developmental genes in neurons of the adult brain? While it is possible that NANOG and ZSCAN10 have heretofore unknown roles in postmitotic neurons, recent evidence suggests that pro-neural TF footprints can be maintained over time to ensure proper neuronal and glial differentiation (46). As such, we may be detecting the impression left by a protein on the structure of DNA, several decades after the fact. To that end, there is evidence supporting a role for ZSCAN10 in maintaining a multipotent progenitor cell population in mid-gestation embryos and adult organs (47). This is also supported by our data, where a large proportion of ZSCAN-binding sites (∼27%, see supplement) is present in active enhancers (defined based on H3K27ac ChIP-seq studies (48)) during adulthood.
SCZ risk loci are frequently large and often contain multiple implicated SNPs due to local linkage disequilibrium patterns. Below, we describe a number of genes identified in our analysis with particular emphasis on genes previously associated with increased risk for SCZ from the CommonMind Consortium (CMC) analysis (31).
We applied our TF footprinting models to reweigh the SCZ GWAS and identified the variants that have a putative functional role for 6 out of the 108 GWAS SCZ risk loci, none of which is the index SNP identified by GWAS. Of the 6 loci, 3 were identified as having an effect on gene expression using expression quantitative trait analysis from the CMC (31). One example is a locus adjacent to the SNX19 gene where the index SNP was identified as rs10791097. Our combined TF model identified a different SNP, rs10750450, as the most likely causal variant in this region, due to its proximity to binding sites for ZSCAN10 and ZNF354C. In vitro experiments confirmed that this SNP leads to an increase in transcriptional activity. In addition, rs10750450 is an expression quantitative trait locus (eQTL) for the SNX19 transcript in the human brain (31), but also in multiple tissues based on the GTEx data (http://www.gtexportal.org/). Furthermore, SNX19 was recently identified as a gene (out of a total of two genes) whose expression level was associated with SCZ (32). By locating the putative functional regulatory region for this gene within neurons, we add to the growing evidence that up-regulation of SNX19 increases the risk of SCZ.
Our approach has allowed us to identify additional, potentially functional, SNPs among regulatory regions proximal to a number of genes including; MAP3K, which has previously been implicated in SCZ, major depressive disorder (MDD) and Bipolar (BP) disorder (49,50), TOM1L2, a lipid pathway gene that resides in a locus associated with increased risk of dementia (51) and GATAD2A, a zinc-finger domain containing transcriptional repressor involved in methylation-dependent gene silencing (52). We also identify a locus adjacent to the gene encoding SNAP91. The GWAS index SNP in this instance is chr6_84280274_D, whereas our TF model identifies rs2022265 and rs2023569 as more likely causal SNPs given their proximity to MEF2A, MEF2B, MEF2C and MEF2D-binding sites. Both MEF2A and MEF2C are thought to play roles in neuronal differentiation (53). SNAP91 encodes a 91 kDa synaptosomal-associated protein, also known as Clathrin Assembly Protein 180 (AP180), that is highly expressed in neuronal presynaptic termini (54,55) where it is required for the formation and function of clathrin coated vesicles (56). SNAP91 has also been shown to control the growth of postmitotic neurons in embryonic hippocampus (57) and has proposed roles in calcium signaling and the Wnt pathway (58). In addition to its association with SCZ, SNAP91 has been identified in a GWAS for BP disorder risk loci (59).
Another SNP identified in our studies is rs4318227, adjacent to the gene encoding Double C2 Domain Alpha (DOC2A). DOC2A has been identified as an activity-dependent Ca(2+) sensor, active during the asynchronous phase of neurotransmitter release during synaptic transmission (60,61). In addition, our OCR model identified another SNP (rs6553440) adjacent to CLCN3, a voltage-gated chloride channel enriched in the glutamatergic synapses of the hippocampus. Clcn3-/- knockout mice have altered GABAergic function and display postnatal degeneration of both the hippocampus and the retina, indicating the critical role played by CLCN3 in normal neurotransmission in central neurons (62–64). More recently, CLCN3 has been shown to play a role in controlling glutamatergic synaptic strength by regulating neurotransmitter levels and synaptic vesicle (SV) release in cultured mouse hippocampal Clcn3-/- neurons (65).
Our study is not without limitations, including a relatively small sample size consisting of elderly non-schizophrenic controls. As such, our findings warrant further validation in future studies. As is the case with most postmortem studies, the current observations could be partially attributed to technical and clinical covariates, including postmortem interval and agonal state. In addition, we chose to focus on a discreet region of the brain and, due to the limitations of working with frozen postmortem tissue (in particular, the loss of cytoplasmic markers upon thawing), our analysis was restricted to the study of two broad populations of cells – neurons and non-neurons. However, both populations are themselves very heterogeneous, consisting of a variety of neural and non-neuronal sub-types. Because the isolation of non-neuronal nuclei relies on negative selection (NeuN negative), this population is more heterogeneous and includes several diverse cell types such as oligodendrocytes, astrocytes and microglia. This heterogeneity in the non-neuronal population will lead to less power to detect cell-type-specific OCRs (which are located more distally to TSS) and might explain some of the differences we observed in comparison to neurons. Studying different regions of the brain, coupled with the application of additional cell-type-specific nuclear markers, e.g. (66) would broaden the scope of our approach towards a more thorough understanding of the means by which CREs influence brain function and disease.
We have generated the first open chromatin maps of distinct populations of cells in human postmortem brain. This has allowed us to identify specific patterns of gene regulation in neuronal and non-neuronal cells. In addition, our approach provides a means to pin-point non-coding SCZ risk variants and to assign functional roles to these SNPs by identifying the genes and transcription factor pathways they disrupt.
Materials and Methods
Brain tissue specimens from the frontopolar prefrontal cortex (Brodmann area 10) of 8 controls were obtained from the NIH Brain and Tissue Repository (Supplementary Material, Table S1). Neuronal (NeuN+) and non-neuronal (NeuN-) nuclei were sorted using a FACSAria flow cytometer.
The Assay for Transposase Accessible Chromatin followed by sequencing (ATAC-seq) was performed using an established protocol (14) and sequenced on Hi-Seq2500 (Illumina) obtaining 2x50 paired-end reads. We used the edgeR package (67) to model the normalized read counts using negative binomial distributions including cell type, gender, age of death and PMI as covariates. P-values were adjusted for multiple hypothesis testing using false discovery rate (FDR) ≤ 1%. The protein interaction quantitation PIQ framework (16) was used to predict transcription factor binding sites from the genome sequence. To integrate functional annotations and GWAS results, we used the fGWAS software (30) and performed two different models that considered OCRs (OCR model) or TF-binding sites (TF model). Functional SNPs were further explored using expression quantitative trait loci (eQTLs) from prefrontal cortex (31). More details are provided in the Supplement Material.
Supplementary Material
Supplementary Material is available at HMG online.
Conflict of Interest statement. None declared.
Funding
National Institutes of Health [R01AG050986 to P.R., R01MH109677 to P.R.]; Brain Behavior Research Foundation [20540 to P.R.]; Alzheimer's Association [NIRG-340998 to P.R.]; and the Veterans Affairs [Merit grant BX002395 to P.R.].
References
- transcription, genetic
- autopsy
- cell nucleus
- cells
- chromatin
- dna-binding proteins
- fluorescence
- gene expression regulation
- genes
- neuroglia
- neurons
- oncogenes
- single nucleotide polymorphism
- schizophrenia
- transposase
- brain
- transcription factor
- cognitive impairment
- genetic risk
- atac trial
- brain cells
- enhancer of transcription
- sorting