-
PDF
- Split View
-
Views
-
Cite
Cite
Candice J. Coppola, Ryne C. Ramaker, Eric M. Mendenhall, Identification and function of enhancers in the human genome, Human Molecular Genetics, Volume 25, Issue R2, 1 October 2016, Pages R190–R197, https://doi.org/10.1093/hmg/ddw216
- Share Icon Share
Abstract
The study of gene regulation has rapidly advanced by leveraging next-generation sequencing to identify and characterize the cis and trans elements that are critical for defining cell identity. These advances have paralleled a movement towards whole genome sequencing in clinics. These two tracks have increasingly synergized to underscore the importance of cis-regulatory elements in development as well produce countless studies implicating these elements in human disease. Other studies have emphasized the clinical phenotypes associated with variation or mutations in trans factors, including non-coding RNAs and chromatin regulators. These studies highlight the importance of obtaining a comprehensive understanding of mammalian gene regulation for predicting the impact of genetic variation on patient phenotypes. Currently lagging behind the generation of vast datasets and annotations is our ability to examine these putative elements in the dynamic context of a developing organism.
Introduction
Differential gene expression is essential to metazoan development. In humans, a single cell will give rise to several hundred different cell types with a broad range of morphological and functional variability despite containing the same genome. This incredible diversity is possible through the complex interaction of cis-regulatory elements, traditionally defined as promoters and enhancers, and trans-acting factors, generally the binding of regulatory proteins known as transcription factors ( Figure 1 ). Clinical genomics has identified numerous examples of DNA sequence variation in cis and trans factors leading to human diseases ( 1–7 ). Promoters have proven to be more easily identified, in large part due to their immediate proximity to protein coding genes, but other regulatory regions, such as enhancers, have been historically more difficult to identify ( 8 ). Additionally, the distinction between the accepted classes of regulatory elements is becoming more nuanced with the discovery of enhancers bound to RNA polymerase and initiating transcription, as well as the discovery of promoters acting as enhancers for other promoters ( 9 , 10 ). This updated view of enhancers and promoters suggests significant overlap in the functionality of these elements ( 11 , 12 ). We also note that insulators, boundary elements and distal repressors have been described and play critical roles in gene regulation ( Figure 1 ). However, we will limit the focus of this review to distal enhancers as the critical cis-regulatory element in establishing a gene regulatory network (GRN) that defines cell identity.

Classes of Regulatory elements and Trans factors in Gene Regulation. Shown are classes of proteins and non-coding RNAs that orchestrate gene regulation along with estimates of abundance in the human genome. These factors interact with distinct loci along the genome including enhancers, promoters, insulators and distal repressors.
As the identification and functional characterization of enhancer elements has evolved, it has prompted new nomenclature beyond the classical enhancer/promoter categories with terms such as locus control regions, shadow enhancers, super enhancers, megatrans enhancers, pleiotropic enhancers and poised enhancers ( 13–18 ). Recently, even long accepted notions of enhancer function such as independence from DNA strand orientation have been challenged. Furthermore, within the broad biochemical and DNA sequence attributes commonly used to characterize these regions, there is no single feature present in all known enhancers. Consequently, enhancer elements and their functional states cannot be adequately predicted from DNA sequence or from chromatin features alone. Irrespective of nomenclature and mechanistic models, enhancer elements and trans factors are crucial to development as well as cell homeostasis with each cell using a small subset of cis and trans factors to create a core GRN, which establishes that cell’s identity, and perturbations of these networks can lead to human disease ( Table 1 ).
A selection of resources for learning and understanding gene regulation on both a global or locus specific manner.
Name . | Link . | Description . |
---|---|---|
ENCODE ( 50 ) | genome.ucsc.edu/ENCODE/ | Database of over 25 assays on over 100 different cell lines aiming to create an encyclopedia of functional elements in the genome |
HaploReg ( 51 ) | www.broadinstitute.org/mammals/haploreg/haploreg.php | Web tool for exploring regulatory annotations and LD information pertaining to SNPs and indels of interest |
RegulomeDB ( 52 ) | regulomedb.org | Web tool providing a scoring metric for regulatory importance of a genomic region of interest and exploring relevant annotations |
UCSC Genome Browser ( 53 ) | genome.ucsc.edu | Genome browser containing a large collection of genomes and hundreds of pertinent annotation tracks |
Roadmap Epigenomics ( 54 ) | www.roadmapepigenomics.org | Database of epigenome annotations (DNA methylation, histone modifications, chromatic accessibility) for over 100 cell types tissues at different developmental stages |
Factorbook ( 55 ) | www.factorbook.org/human/chipseq/tf/ | Website integrating ENCODE transcription factor binding sites for each cell line with histone mark and nucleosome occupancy information |
Vista ( 56 ) | genome.lbl.gov/vista/index.shtml | Web tool for comparative analysis of genomic sequences across species |
Fantom ( 57 ) | fantom.gsc.riken.jp | Database of whole genome annotations in conjunction with the mouse encyclopedia project |
deltaSVM ( 58 ) | www.beerlab.org/deltasvm/ | Machine learning algorithm trained on DNA sequence associated with functional annotations for predicting the impact of regulatory variants |
GTEx ( 59 ) | www.gtexportal.org/home/ | Database containing RNA-seq and eQTL data from over 51 tissue sites |
Immgen ( 60 ) | www.immgen.org/ | Database containing gene expression and genetic variation information perteninent to the regulation of the immune system |
Blueprint Epigenome | www.blueprint-epigenome.eu | Database aiming to provide reference epigenomes for healthy and diseased individuals |
DeepBlue | deepblue.mpi-inf.mpg.de | Central hub for epigenomic data from several consortiums |
Histome | www.actrec.gov.in/histome/index.php | Database of histone modifications and their regulation |
MethylomeDB | www.neuroepigenomics.org/methylomedb/ | Database of genome-wide DNA methylation profiles for brain tissue from human and mouse |
3CDB ( 61 ) | 3cdb.big.ac.cn | Curated database of chromosome conformation capture (3C) data |
Noncode ( 62 ) | www.noncode.org | Database of noncoding RNAs for 16 species |
Classification of Human TFs | www.edgar-wingender.de/huTF_classification.html | Comprehensive relational database of human transcription factors |
CEEHRC | www.epigenomes.ca/ | Hub for sharing large genome and epigenome annotation datasets |
Name . | Link . | Description . |
---|---|---|
ENCODE ( 50 ) | genome.ucsc.edu/ENCODE/ | Database of over 25 assays on over 100 different cell lines aiming to create an encyclopedia of functional elements in the genome |
HaploReg ( 51 ) | www.broadinstitute.org/mammals/haploreg/haploreg.php | Web tool for exploring regulatory annotations and LD information pertaining to SNPs and indels of interest |
RegulomeDB ( 52 ) | regulomedb.org | Web tool providing a scoring metric for regulatory importance of a genomic region of interest and exploring relevant annotations |
UCSC Genome Browser ( 53 ) | genome.ucsc.edu | Genome browser containing a large collection of genomes and hundreds of pertinent annotation tracks |
Roadmap Epigenomics ( 54 ) | www.roadmapepigenomics.org | Database of epigenome annotations (DNA methylation, histone modifications, chromatic accessibility) for over 100 cell types tissues at different developmental stages |
Factorbook ( 55 ) | www.factorbook.org/human/chipseq/tf/ | Website integrating ENCODE transcription factor binding sites for each cell line with histone mark and nucleosome occupancy information |
Vista ( 56 ) | genome.lbl.gov/vista/index.shtml | Web tool for comparative analysis of genomic sequences across species |
Fantom ( 57 ) | fantom.gsc.riken.jp | Database of whole genome annotations in conjunction with the mouse encyclopedia project |
deltaSVM ( 58 ) | www.beerlab.org/deltasvm/ | Machine learning algorithm trained on DNA sequence associated with functional annotations for predicting the impact of regulatory variants |
GTEx ( 59 ) | www.gtexportal.org/home/ | Database containing RNA-seq and eQTL data from over 51 tissue sites |
Immgen ( 60 ) | www.immgen.org/ | Database containing gene expression and genetic variation information perteninent to the regulation of the immune system |
Blueprint Epigenome | www.blueprint-epigenome.eu | Database aiming to provide reference epigenomes for healthy and diseased individuals |
DeepBlue | deepblue.mpi-inf.mpg.de | Central hub for epigenomic data from several consortiums |
Histome | www.actrec.gov.in/histome/index.php | Database of histone modifications and their regulation |
MethylomeDB | www.neuroepigenomics.org/methylomedb/ | Database of genome-wide DNA methylation profiles for brain tissue from human and mouse |
3CDB ( 61 ) | 3cdb.big.ac.cn | Curated database of chromosome conformation capture (3C) data |
Noncode ( 62 ) | www.noncode.org | Database of noncoding RNAs for 16 species |
Classification of Human TFs | www.edgar-wingender.de/huTF_classification.html | Comprehensive relational database of human transcription factors |
CEEHRC | www.epigenomes.ca/ | Hub for sharing large genome and epigenome annotation datasets |
A selection of resources for learning and understanding gene regulation on both a global or locus specific manner.
Name . | Link . | Description . |
---|---|---|
ENCODE ( 50 ) | genome.ucsc.edu/ENCODE/ | Database of over 25 assays on over 100 different cell lines aiming to create an encyclopedia of functional elements in the genome |
HaploReg ( 51 ) | www.broadinstitute.org/mammals/haploreg/haploreg.php | Web tool for exploring regulatory annotations and LD information pertaining to SNPs and indels of interest |
RegulomeDB ( 52 ) | regulomedb.org | Web tool providing a scoring metric for regulatory importance of a genomic region of interest and exploring relevant annotations |
UCSC Genome Browser ( 53 ) | genome.ucsc.edu | Genome browser containing a large collection of genomes and hundreds of pertinent annotation tracks |
Roadmap Epigenomics ( 54 ) | www.roadmapepigenomics.org | Database of epigenome annotations (DNA methylation, histone modifications, chromatic accessibility) for over 100 cell types tissues at different developmental stages |
Factorbook ( 55 ) | www.factorbook.org/human/chipseq/tf/ | Website integrating ENCODE transcription factor binding sites for each cell line with histone mark and nucleosome occupancy information |
Vista ( 56 ) | genome.lbl.gov/vista/index.shtml | Web tool for comparative analysis of genomic sequences across species |
Fantom ( 57 ) | fantom.gsc.riken.jp | Database of whole genome annotations in conjunction with the mouse encyclopedia project |
deltaSVM ( 58 ) | www.beerlab.org/deltasvm/ | Machine learning algorithm trained on DNA sequence associated with functional annotations for predicting the impact of regulatory variants |
GTEx ( 59 ) | www.gtexportal.org/home/ | Database containing RNA-seq and eQTL data from over 51 tissue sites |
Immgen ( 60 ) | www.immgen.org/ | Database containing gene expression and genetic variation information perteninent to the regulation of the immune system |
Blueprint Epigenome | www.blueprint-epigenome.eu | Database aiming to provide reference epigenomes for healthy and diseased individuals |
DeepBlue | deepblue.mpi-inf.mpg.de | Central hub for epigenomic data from several consortiums |
Histome | www.actrec.gov.in/histome/index.php | Database of histone modifications and their regulation |
MethylomeDB | www.neuroepigenomics.org/methylomedb/ | Database of genome-wide DNA methylation profiles for brain tissue from human and mouse |
3CDB ( 61 ) | 3cdb.big.ac.cn | Curated database of chromosome conformation capture (3C) data |
Noncode ( 62 ) | www.noncode.org | Database of noncoding RNAs for 16 species |
Classification of Human TFs | www.edgar-wingender.de/huTF_classification.html | Comprehensive relational database of human transcription factors |
CEEHRC | www.epigenomes.ca/ | Hub for sharing large genome and epigenome annotation datasets |
Name . | Link . | Description . |
---|---|---|
ENCODE ( 50 ) | genome.ucsc.edu/ENCODE/ | Database of over 25 assays on over 100 different cell lines aiming to create an encyclopedia of functional elements in the genome |
HaploReg ( 51 ) | www.broadinstitute.org/mammals/haploreg/haploreg.php | Web tool for exploring regulatory annotations and LD information pertaining to SNPs and indels of interest |
RegulomeDB ( 52 ) | regulomedb.org | Web tool providing a scoring metric for regulatory importance of a genomic region of interest and exploring relevant annotations |
UCSC Genome Browser ( 53 ) | genome.ucsc.edu | Genome browser containing a large collection of genomes and hundreds of pertinent annotation tracks |
Roadmap Epigenomics ( 54 ) | www.roadmapepigenomics.org | Database of epigenome annotations (DNA methylation, histone modifications, chromatic accessibility) for over 100 cell types tissues at different developmental stages |
Factorbook ( 55 ) | www.factorbook.org/human/chipseq/tf/ | Website integrating ENCODE transcription factor binding sites for each cell line with histone mark and nucleosome occupancy information |
Vista ( 56 ) | genome.lbl.gov/vista/index.shtml | Web tool for comparative analysis of genomic sequences across species |
Fantom ( 57 ) | fantom.gsc.riken.jp | Database of whole genome annotations in conjunction with the mouse encyclopedia project |
deltaSVM ( 58 ) | www.beerlab.org/deltasvm/ | Machine learning algorithm trained on DNA sequence associated with functional annotations for predicting the impact of regulatory variants |
GTEx ( 59 ) | www.gtexportal.org/home/ | Database containing RNA-seq and eQTL data from over 51 tissue sites |
Immgen ( 60 ) | www.immgen.org/ | Database containing gene expression and genetic variation information perteninent to the regulation of the immune system |
Blueprint Epigenome | www.blueprint-epigenome.eu | Database aiming to provide reference epigenomes for healthy and diseased individuals |
DeepBlue | deepblue.mpi-inf.mpg.de | Central hub for epigenomic data from several consortiums |
Histome | www.actrec.gov.in/histome/index.php | Database of histone modifications and their regulation |
MethylomeDB | www.neuroepigenomics.org/methylomedb/ | Database of genome-wide DNA methylation profiles for brain tissue from human and mouse |
3CDB ( 61 ) | 3cdb.big.ac.cn | Curated database of chromosome conformation capture (3C) data |
Noncode ( 62 ) | www.noncode.org | Database of noncoding RNAs for 16 species |
Classification of Human TFs | www.edgar-wingender.de/huTF_classification.html | Comprehensive relational database of human transcription factors |
CEEHRC | www.epigenomes.ca/ | Hub for sharing large genome and epigenome annotation datasets |
Identification of a Million Putative Enhancers in the Human Genome
The identification putative enhancers and their location within the genome have largely been accomplished through classical genetics, comparative genomics and biochemical methods. Biochemical approaches capitalize on the distinctive chromatin features, including DNA accessibility, transcription factor occupancy, and patterns of histone modifications that are associated with functional non-coding regions and are discussed further below. Although these features are useful for identifying candidate regions, the functional contribution of each factor remains unknown, giving correlative but not causative data. Large consortiums such as ENCODE have used these biochemical assays to produce catalogues that are currently approaching one million putative enhancers in the human genome ( 19–21 ). Of these one million predicted elements, only small subsets have ever been assayed in functional experiments and many are not deeply conserved among mammalian species. This number has raised an obvious question, are all of these elements truly functional? ( 22 , 23 ) And indeed, what is “function” in a regulatory element context and how do we universally define it?
Evolutionary approaches infer evidence of function through both sequence conservation as well as rapidly accelerated change and have had success in identifying putative enhancers and motifs ( 24 , 25 ). Although powerful, the presence or absence of conservation is not definitive proof of function. This has been demonstrated with the genetic deletion of highly conserved regions that conferred no observable phenotype and examples of functionally validated regulatory elements showing little to no sequence conservation ( 26–29 ). Conservation approaches are limited by the need for sequence alignments across species which themselves can be plagued with issues and clarity decreases rapidly with decreasing phylogenetic distance making the identification of species-specific elements difficult ( 30 , 31 ). Enhancers in particular pose a great challenge due to their rapid evolution ( 32 ). While a promoter’s half-life is comparable to that of protein coding regions’, conservation analysis combined with biochemical assays across 20 mammalian species has indicated that an enhancer has a half-life about one third as long and a large portion of mammalian enhancers appear to be species specific ( 33 ). Thus prioritizing or over-emphasizing conservation when analyzing putative enhancers is likely misleading.
Classical genetics relies on mutations with phenotypic consequence to define function. While commonly done in model organisms, the throughput of mammalian models such as mice and rats means very few human relevant enhancers were identified by classical genetic mutation approaches. Although the ease and efficiency of targeted genetic modification is increasing rapidly, this approach is still limited by a fairly modest throughput ( 34–38 ). Additionally, while genetic perturbations with an observed phenotype can positively identify function, they cannot rule out function upon lack of observable consequences due to the highly contextual, and often redundant, nature of gene regulation.
More direct ways to identify functional enhancers have recently been developed including large-scale multiplex reporter assays and direct targeting of putative enhancers with epigenome editing. These multiplex reporter assays commonly test tens of thousands of elements using transient expression and quantifying the RNA produced by each element with deep sequencing ( 39–42 ). Improvements upon these methods include testing millions of elements in unbiased manner using STARR-seq, or integrating reporter elements into the genome to test in the context of chromatin ( 43 , 44 ). Lastly, endogenous putative enhancers can be tested using epigenome editing to prevent the enhancer from functioning and identifying the target gene ( 45–47 ).
Although there is significant overlap in predictions across evolutionary and biochemical methods, they still vary considerably both in the extent of the human genome predicted as functional and in the specific elements they identify. There are two extreme views which one could approach answering the function question. One approach predicts that only a minority of putative enhancers are functional, and one should assume all elements are non-functional unless proven otherwise through a defined functional assay ( 22 , 23 ). This “non-functional until proven functional” approach is partially supported by low conservation among these putative elements. However, there are numerous examples of functional but not conserved elements ( 23 , 48 , 49 ). The gene regulation field has been fiercely debating how to assign function to these DNA elements. Genetic deletion of a functional enhancer should alter the transcriptional regulation of a nearby gene. However, if genetic deletion shows no altered expression a number of explanations exist, including compensation by a nearby enhancer or functional activity in a different cell type or cellular context. Thus describing a putative enhancer as non-functional is a tenuous conclusion to make.
Key Regulatory Factors: Histones and Chromatin
Chromatin is a dynamic structure that regulates the accessibility of DNA through histone exchange and post-translational modification (PTMs) of histones. This fluid system plays a significant role in maintaining, or buffering, the action of both cis and trans factors and can be critical in establishing appropriate gene regulation at a locus ( 11 , 12 ). Canonical histones are often replaced with histone variants that alter the biochemical properties of the nucleosome and affect PTMs, protein interactions and chromatin structure ( 63 ). The variants H2A.Z and H3.3 are enriched at active promoters as well as active enhancers marked by H3K27ac and certain transcription factors such as CTCF and Zzz3 prefer these specific epigenetic environments ( 64 , 65 ). These variants are generally less stable than their canonical counterparts and require a seemingly redundant cycle of removal and replacement. It has been suggested that this facilitates the initial transcription factor binding and leads to recruitment of TF-dependent chromatin remodeling complexes and nucleosome depletion ( 66 ). The resulting open chromatin has been exploited for the identification of potential active regulatory regions through assays such as DNase-seq and ATAC-seq.
Analyses of histone modification patterns identified through ChIP-seq have provided a more robust distinction between the classes of cis-regulatory elements and are used to generate genome-wide and cell-type specific annotations. H3K4me1 was the first histone modification globally associated with distal enhancers ( 67 ). The modification is not exclusive to enhancers however and often precedes H3K27ac and enhancer activity. Studies in both human and mouse ESCs have shown enhancers that govern development are antecedently marked by H3K4me1 ( 15 , 68 ). It was proposed that these enhancers are in a “poised” or primed state for later activation of nearby genes. This poised state prior to activation upon differentiation has been subsequently observed in multiple models ( 69–73 ). The presence of H3K27ac is widely used to distinguish active from poised enhancers. Although H3K4me1 and H3K27ac are widely identified with enhancers, additional histone modifications such as H3K9ac and H3K18ac, histone crotonylation, and H3K79me2 and 3 have been identified in enhancer regions but have yet to be globally mapped ( 70 , 74 , 75 ).
Key Regulatory Factors: Transcription Factors
Human cis-regulatory elements have proven resistant to computational classification and prediction. Given a 100–2000 basepair long cis-regulatory element, the key DNA sequences for establishment and regulatory function can be a handful of DNA motifs 6–8 basepairs long ( 76 ). This small minority of the total sequence directs the binding of transcription factors, which recruit in secondary effectors such as chromatin regulatory complexes that in turn establish H3K4me and H3K27ac. Not all DNA motifs, and the transcription factors they bind, are created equal. Transcription factors that bind to closed and inactive DNA are coined pioneer factors ( 77 ). Pioneer factors can establish open DNA, which is accessible to a second set of TFs, called settler factors ( 78 ). These settler factors are likely required for both the proper gene regulatory abilities of the cis-regulatory element and recruitment of chromatin modifying complexes to establish the function of that region.
Key Regulatory Factors: Noncoding RNAs
Comprehensive analysis of the transcriptome has revealed that a majority of the genome is transcribed, however protein-coding exons only account for 1.5% of the genome ( 79 , 80 ). The impact of the remaining transcribed genome on gene regulation has been a major focus of investigation for the last two decades. Currently ∼40,000–65,000 different RNAs derived from the transcription of enhancers (eRNAs) have been described in humans ( 57 ). eRNA transcription has been indicated as a strong marker of enhancer activity by large scale reporter assays suggesting that putative enhancers harbouring higher levels of eRNA transcription are significantly more likely to show reporter activity ( 57 ). A recent study suggested a functional role for eRNAs in transcription factor trapping by demonstrating the presence of tethered eRNAs significantly augmented the binding of the transcription factor YY1 to enhancers ( 81 ). Subsequent ablation of eRNA transcripts lead to a concurrent disruption of target gene expression confirming function ( 82 ).
Long noncoding RNAs (lncRNAs) have also shown to be a critical set of trans regulators of transcription. LncRNAs are a diverse class of transcripts greater than 200 nucleotides in length and represent a majority of the mammalian non-coding transcriptome ( 83 ). Foundational examples such as the regulation of the HOX loci by the lncRNA HOTAIR and X-chromosome inactivation by the lncRNA XIST demonstrated the role of lncRNAs in establishing or maintaining repressive chromatin modifications at specific genomic loci ( 84 , 85 ). Additionally, lncRNAs act as direct regulators of transcription factor binding by competing for transcription factor occupancy with genome target sites. The best-described transcription factor decoy, GAS5 , binds the DNA-binding domain of glucocorticoid receptor (GR) and inhibits regulation of GR-responsive genes with important implications in autoimmune disease and cancer ( 86 , 87 ).
Gene Regulation and Human Disease
Mutations in enhancers
The first human disease directly attributed to sequence variation in a gene regulatory region were rare Mendelian disorders such as alpha and beta thalassemias ( 88–90 ). For a subset of patients lacking in protein coding mutations, extensive long-range mapping and sequencing of DNA discovered that many globin chain imbalances were due to deletions or chromosome rearrangements of enhancers required for normal globin gene expression. Since then many examples of these ‘position effects’ have been uncovered in diseases such as Van Buchem disease, Leri-Weill dyschondrosteosis, Rieger syndrome, and hypoparathyroidism. Additionally, there are examples of single-nucleotide changes in human enhancers acting as the direct cause of human disorders. The most well known example, preaxial polydactyly, involves the enhancer ZRS of SHH and is located at the extreme distance of about 1 megabase from SHH.
Studies implicating regulatory elements that lead to human disease, both Mendelian and complex, are rapidly increasing. DNA variation in enhancers, as well as de novo mutations, can result in lower, higher or ectopic transcription of a target gene and contribute to a broad spectrum of human phenotypes. Currently there are 3414 diseases implicated mutations in “regulatory” regions according to HGMD ( 91 ) and more than one third of GWAS SNPs are predicted to be causal and non-coding. Mutations in enhancers have been associated in many complex diseases including cardiac disease, type 1 diabetes, rheumatoid arthritis, and multiple sclerosis ( 92 ). Although much focus has been on more rare variants, common SNPs have been shown to be strong factors in human pathology as well. Hirschsprung's disease was found to be linked to the RET locus in people lacking any accompanying functional RET coding mutations. An enhancer sequence located in an intron of RET was identified and found to contain a common variant contributing more than a 20-fold increased risk over rarer alleles ( 93 ). Although enhancer variation is clearly important in human disease risk, in many cases the variant itself is not sufficient cause alone, highlighting the complex nature of these alleles and complicating positive identification.
Mutations in chromatin regulators and transcription factors
Both inherited and de novo DNA variation in transcription factors and chromatin regulatory machinery can lead to a disease phenotype in humans. Mutations in the promoter-associated general transcriptional machinery were recently found to be causal for Cornelia de Lange syndrome and a phenotypically similar CHOPS syndrome (Cognitive Impairment, Course Facial, Heart Defects, Obesity, Pulmonary Involvement, Short Stature and Skeletal Dysplasia) ( 94 , 95 ).
Mutations in the many chromatin regulators produce multiple congenital abnormalities, but almost all cause some neurological disease including autism spectrum disorders, microcephalies and developmental delay ( 96 ). A few specific examples include Rett syndrome (MECP2), Kabuki Syndrome (MLL2), Weaver Syndrome (Ezh2) and MRD32 (KAT6A) ( 97 , 98 ). The reasons why chromatin dysregulation leads to neurological disease is not precisely known, but as these proteins act globally and effect all GRNs the developing neurons could be more dependent on precisely controlled GRNs than other cell types. This was recently shown in a mouse model of X-linked Intellectual disability ( 99 ).
Mutations in transcription factors likely perturb fewer GRNs compared to chromatin regulators, and can have a broader variety of disease phenotypes. A few recent examples include Maturity Onset Diabetes of the Young (MODY) which is caused by mutations in the HNF family of transcription factors and Common variable immunodeficiency (CIVD), which is caused by mutations in the TF Ikaros ( 100 ) {Conley:2016tt}.
Mutations in non-coding RNAs
The identification of genetic variation in lncRNAs with impact on human disease has been a challenge to date due to the difficulty in interpreting the functional importance of the primary sequence of non-coding RNAs, which often form complex secondary structures ( 83 ). However, several large deletions or amplifications involving lncRNAs have been described in disease. A balanced translocation impacting DISC1 and the lncRNA DISC2 have been associated with schizophrenia. Although the exact role either gene plays in schizophrenia pathogenesis is unclear, a current hypothesis is the translocation disrupts DISC2’s regulation of DISC1 ( 101 ). Another example is a large germline deletion in the 9p21 chromosomal locus harbouring the lncRNA ANRIL that has been associated with hereditary cancer ( 102 ). Several GWAS SNPs near this locus have also been linked to cardiovascular disease and type 2 diabetes ( 102 ). Lastly, microsatellite expansions in ATXN8 and the antisense lncRNA ATXN8OS have been linked to spinocerebellar ataxia type 8 with lncRNA accumulations in the nucleus disrupting splicing of GABA-A transporter 4 and decreasing neuron inhibitory signalling ( 103 , 104 ).
Conclusion
A complete understanding of gene regulation is necessary for both a basic biological understanding of cell state as well as an interpretation of genetic variants discovered via clinical whole genome sequencing. DNA variation in non-coding regions is complex and inferring downstream phenotypic consequences remains difficult. Likewise, DNA variation in regulators such as transcription factors, chromatin regulators and non-coding RNAs are increasingly being discovered as casual for neurological diseases among others. Establishing a critical mass of functional studies will produce definitive and well-accepted models and importantly, an expanded vocabulary for the key factors that control gene regulation.
Conflict of Interest statement . None declared.
References