-
PDF
- Split View
-
Views
-
Cite
Cite
Meenakshi Bagadia, Keerthivasan Raanin Chandradoss, Yachna Jain, Harpreet Singh, Mohan Lal, Kuljeet Singh Sandhu, Evolutionary Loss of Genomic Proximity to Conserved Noncoding Elements Impacted the Gene Expression Dynamics During Mammalian Brain Development, Genetics, Volume 211, Issue 4, 1 April 2019, Pages 1239–1254, https://doi.org/10.1534/genetics.119.301973
- Share Icon Share
Abstract
Loss of linear proximity between a gene and its regulatory element can alter its expression. Bagadia and Chandradoss et al. report a significant loss of proximity between evolutionarily constrained non-coding elements and...
Conserved noncoding elements (CNEs) have a significant regulatory influence on their neighboring genes. Loss of proximity to CNEs through genomic rearrangements can, therefore, impact the transcriptional states of the cognate genes. Yet, the evolutionary implications of such chromosomal alterations have not been studied. Through genome-wide analysis of CNEs and the cognate genes of representative species from five different mammalian orders, we observed a significant loss of genes’ linear proximity to CNEs in the rat lineage. The CNEs and the genes losing proximity had a significant association with fetal, but not postnatal, brain development as assessed through ontology terms, developmental gene expression, chromatin marks, and genetic mutations. The loss of proximity to CNEs correlated with the independent evolutionary loss of fetus-specific upregulation of nearby genes in the rat brain. DNA breakpoints implicated in brain abnormalities of germline origin had significant representation between a CNE and the gene that exhibited loss of proximity, signifying the underlying developmental tolerance of genomic rearrangements that allowed the evolutionary splits of CNEs and the cognate genes in the rodent lineage. Our observations highlighted a nontrivial impact of chromosomal rearrangements in shaping the evolutionary dynamics of mammalian brain development and might explain the loss of brain traits, like cerebral folding of the cortex, in the rodent lineage.
AROUND 4–8% of the human genome is evolutionarily constrained, of which coding elements contribute only ∼1.5% while the rest is noncoding (Lindblad-Toh et al. 2011; Thurman et al. 2012; Rands et al. 2014). The massive amount of data produced by the ENCODE (the Encyclopedia of DNA Elements) and Epigenome Roadmap projects have confirmed that the majority of the evolutionarily constrained noncoding DNA serves as protein-binding sites (ENCODE Project Consortium 2012; Skipper et al. 2015). These conserved noncoding elements (CNEs) are interwoven with the protein-coding genes in a complex manner. Several lines of evidence converge to support the nontrivial regulatory impact of CNEs on proximal genes (Table 1). Around 200,000 human-anchored CNEs have been identified in mammals, which likely exhibit gene regulatory potential as measured through enhancer-associated chromatin marks (Roh et al. 2007; Seridi et al. 2014; Babarinde and Saitou 2016). Most CNEs cluster around developmental genes in relatively gene-poor regions of the genome (Woolfe et al. 2005; Akalin et al. 2009; Babarinde and Saitou 2016). These clusters, termed as gene regulatory blocks, have constrained linear and spatial genome organization (Kikuta et al. 2007; Harmston et al. 2017; Polychronopoulos et al. 2017). Most CNEs contain clusters of overlapping binding sites of developmental transcription factors, which might explain their extreme conservation (Viturawong et al. 2013; Warnefors et al. 2016).
Examples of loss-of-function mutations in CNEs
# . | Gene . | Disease/phenotype . | Reference . |
---|---|---|---|
1 | SOST | Van buchmen disease | Loots et al. (2005) |
2 | SHOX | Leri weill dyschondrosteosis syndrome | Sabherwal et al. (2007) |
3 | PAX6 | Aniridia | Plaisancié et al. (2018) |
4 | IGF2 | Beckwith weidman syndrome | Sparago et al. (2004) |
5 | α/β-globins | α/β-thalassemia | Driscoll et al. (1989), Hatton et al. (1990) |
6 | AR | Evolutionary loss of penile spines and sensory vibrissae in human | McLean et al. (2011) |
7 | ADGRL3 | Attention-deficit/Hyperactivity disorder | Martinez et al. (2016) |
8 | MEIS1 | Restless legs syndrome | Spieler et al. (2014) |
9 | BMP2 | Brachydactyly type A2 | Dathe et al. (2009) |
10 | SOX9 | Brachydactyly-anonychia; pierre robin sequence | Kurth et al. (2009) |
11 | IHH | Syndactyly and craniosynostosis | Klopocki et al. (2011) |
12 | POU3F4 | X-linked deafness type 3 | de Kok et al. (1996) |
13 | FOXL2 | Blepharophimosis syndrome | Beysen et al. (2005) |
14 | SOX10 | Waardenburg syndrome type 4 | Bondurand et al. (2012) |
15 | GDF6 | Evolutionary loss of digit shortening of human feet. | Indjeian et al. (2016) |
16 | IRF6 | Cleft lip | Rahimov et al. (2008) |
17 | ZIC2 | Holoprosencephaly | Roessler et al. (2012) |
18 | IRX | Familial idiopathic scoliosis and kyphosis | Justice et al. (2016) |
19 | RET | Hirschsprung disease | Emison et al. (2005) |
20 | CUX1, PTBP2, GPC4, CDKL5 | Autism specturm disorders | Doan et al. (2016) |
21 | SHH | Preaxial polydactyly | Lettice et al. (2003) |
# . | Gene . | Disease/phenotype . | Reference . |
---|---|---|---|
1 | SOST | Van buchmen disease | Loots et al. (2005) |
2 | SHOX | Leri weill dyschondrosteosis syndrome | Sabherwal et al. (2007) |
3 | PAX6 | Aniridia | Plaisancié et al. (2018) |
4 | IGF2 | Beckwith weidman syndrome | Sparago et al. (2004) |
5 | α/β-globins | α/β-thalassemia | Driscoll et al. (1989), Hatton et al. (1990) |
6 | AR | Evolutionary loss of penile spines and sensory vibrissae in human | McLean et al. (2011) |
7 | ADGRL3 | Attention-deficit/Hyperactivity disorder | Martinez et al. (2016) |
8 | MEIS1 | Restless legs syndrome | Spieler et al. (2014) |
9 | BMP2 | Brachydactyly type A2 | Dathe et al. (2009) |
10 | SOX9 | Brachydactyly-anonychia; pierre robin sequence | Kurth et al. (2009) |
11 | IHH | Syndactyly and craniosynostosis | Klopocki et al. (2011) |
12 | POU3F4 | X-linked deafness type 3 | de Kok et al. (1996) |
13 | FOXL2 | Blepharophimosis syndrome | Beysen et al. (2005) |
14 | SOX10 | Waardenburg syndrome type 4 | Bondurand et al. (2012) |
15 | GDF6 | Evolutionary loss of digit shortening of human feet. | Indjeian et al. (2016) |
16 | IRF6 | Cleft lip | Rahimov et al. (2008) |
17 | ZIC2 | Holoprosencephaly | Roessler et al. (2012) |
18 | IRX | Familial idiopathic scoliosis and kyphosis | Justice et al. (2016) |
19 | RET | Hirschsprung disease | Emison et al. (2005) |
20 | CUX1, PTBP2, GPC4, CDKL5 | Autism specturm disorders | Doan et al. (2016) |
21 | SHH | Preaxial polydactyly | Lettice et al. (2003) |
The table represents the genes affected by the genetic mutations in the proximal conserved noncoding regulatory elements, associated diseases or phenotypes, and the citations to the original studies.
# . | Gene . | Disease/phenotype . | Reference . |
---|---|---|---|
1 | SOST | Van buchmen disease | Loots et al. (2005) |
2 | SHOX | Leri weill dyschondrosteosis syndrome | Sabherwal et al. (2007) |
3 | PAX6 | Aniridia | Plaisancié et al. (2018) |
4 | IGF2 | Beckwith weidman syndrome | Sparago et al. (2004) |
5 | α/β-globins | α/β-thalassemia | Driscoll et al. (1989), Hatton et al. (1990) |
6 | AR | Evolutionary loss of penile spines and sensory vibrissae in human | McLean et al. (2011) |
7 | ADGRL3 | Attention-deficit/Hyperactivity disorder | Martinez et al. (2016) |
8 | MEIS1 | Restless legs syndrome | Spieler et al. (2014) |
9 | BMP2 | Brachydactyly type A2 | Dathe et al. (2009) |
10 | SOX9 | Brachydactyly-anonychia; pierre robin sequence | Kurth et al. (2009) |
11 | IHH | Syndactyly and craniosynostosis | Klopocki et al. (2011) |
12 | POU3F4 | X-linked deafness type 3 | de Kok et al. (1996) |
13 | FOXL2 | Blepharophimosis syndrome | Beysen et al. (2005) |
14 | SOX10 | Waardenburg syndrome type 4 | Bondurand et al. (2012) |
15 | GDF6 | Evolutionary loss of digit shortening of human feet. | Indjeian et al. (2016) |
16 | IRF6 | Cleft lip | Rahimov et al. (2008) |
17 | ZIC2 | Holoprosencephaly | Roessler et al. (2012) |
18 | IRX | Familial idiopathic scoliosis and kyphosis | Justice et al. (2016) |
19 | RET | Hirschsprung disease | Emison et al. (2005) |
20 | CUX1, PTBP2, GPC4, CDKL5 | Autism specturm disorders | Doan et al. (2016) |
21 | SHH | Preaxial polydactyly | Lettice et al. (2003) |
# . | Gene . | Disease/phenotype . | Reference . |
---|---|---|---|
1 | SOST | Van buchmen disease | Loots et al. (2005) |
2 | SHOX | Leri weill dyschondrosteosis syndrome | Sabherwal et al. (2007) |
3 | PAX6 | Aniridia | Plaisancié et al. (2018) |
4 | IGF2 | Beckwith weidman syndrome | Sparago et al. (2004) |
5 | α/β-globins | α/β-thalassemia | Driscoll et al. (1989), Hatton et al. (1990) |
6 | AR | Evolutionary loss of penile spines and sensory vibrissae in human | McLean et al. (2011) |
7 | ADGRL3 | Attention-deficit/Hyperactivity disorder | Martinez et al. (2016) |
8 | MEIS1 | Restless legs syndrome | Spieler et al. (2014) |
9 | BMP2 | Brachydactyly type A2 | Dathe et al. (2009) |
10 | SOX9 | Brachydactyly-anonychia; pierre robin sequence | Kurth et al. (2009) |
11 | IHH | Syndactyly and craniosynostosis | Klopocki et al. (2011) |
12 | POU3F4 | X-linked deafness type 3 | de Kok et al. (1996) |
13 | FOXL2 | Blepharophimosis syndrome | Beysen et al. (2005) |
14 | SOX10 | Waardenburg syndrome type 4 | Bondurand et al. (2012) |
15 | GDF6 | Evolutionary loss of digit shortening of human feet. | Indjeian et al. (2016) |
16 | IRF6 | Cleft lip | Rahimov et al. (2008) |
17 | ZIC2 | Holoprosencephaly | Roessler et al. (2012) |
18 | IRX | Familial idiopathic scoliosis and kyphosis | Justice et al. (2016) |
19 | RET | Hirschsprung disease | Emison et al. (2005) |
20 | CUX1, PTBP2, GPC4, CDKL5 | Autism specturm disorders | Doan et al. (2016) |
21 | SHH | Preaxial polydactyly | Lettice et al. (2003) |
The table represents the genes affected by the genetic mutations in the proximal conserved noncoding regulatory elements, associated diseases or phenotypes, and the citations to the original studies.
Establishing genome-wide association between CNEs and phenotypes remains a daunting task. “Forward genetics” approaches, like genome-wide association studies (GWAS), and the “reverse genetics” approaches, like mouse mutagenesis, are notoriously difficult to scale up for high-throughput genotype–phenotype associations (Welter et al. 2014). With the availability of whole-genome sequences of multiple species, evolutionary methods are instrumental in deciphering genotype–phenotype associations at the genome scale. Through comprehensive multi-species comparison, it has been inferred that most CNEs are syntenic to the nearest gene in linear proximity and are likely to regulate the same (Naville et al. 2015; Babarinde and Saitou 2016). Attempts have been made to link evolutionary loss and sequence divergence of CNEs to lineage-specific traits, like the auditory system in echolocating mammals, adaptively morphed pectoral flippers in marine mammals, and the loss of penile spines and sensory vibrissae in humans (McLean et al. 2011; Davies et al. 2014; Marcovitz et al. 2016). In this study, we asked whether the lineage-specific evolutionary alterations in relative chromosomal positions of CNEs are associated with lineage-specific changes in gene expression. Through analysis of chromosomal positions of orthologous CNEs and genes from five different mammals, we observed that a significant number of genes had lost proximity to their adjacent CNEs independently in the rat lineage. This loss-of-proximity (LOP) was significantly associated with the downregulation of genes involved in neurogenesis and neuronal migration during fetal brain development, and coincided with the independent evolutionary loss of several brain traits in the rat lineage. The study suggested a significant contribution of chromosomal rearrangements in the evolutionary divergence of developmental gene expression trajectories in mammals.
Materials and Methods
Compilation of chromosomal position data
Human (hg19), rat (rn5), dog (camFam3), horse (equCab2), and cow (bosTau6) genome assemblies were used in the analysis. CNEs were taken from Marcovitz et al. (2016). Marcovitz et al. identified CNEs by filtering the 46-way phastCons track for the coding potential as per knownGene, Ensembl, the Mammalian Gene Collection, RefSeq, Exoniphy, the Vertebrate Genome Annotation database, the Yale Pseudogene database, the miRNA registry, and snoRNA-LBME-db. Minimum length of CNEs was set to 50 and all the CNEs within 20 bp were merged to get the longer ones. Only the CNEs that were present in at least 7 of the 19 mammals having genome sequences as well as having morphological trait data from MorphoBank were kept in the final data set of 266,115 CNEs (∼1.5% of the human genome). Mean and maximum lengths of the final CNEs were 174 and 2191 bp, respectively. While phastCons-based identification of CNEs does not include percent identity criteria in the definition, the least and the average percent identities for human–rat CNE alignments were 61 and 85% respectively in the rat-LOP set. The least percent identity of 61% was higher than the expected value of 45% calculated as 1 − b, where b is the branch length (b = 0.55) between human and rat in the sequence alignment-driven phylogenetic tree.
Our choice of species and the CNE data set was constrained by the following considerations: (1) we wanted sufficient evolutionary depth in the analysis and Marcovitz et al. had considered the criteria of conservation in at least 7 of the 19 mammalian genomes when compiling the CNEs; (2) since our analysis considered the chromosomal positions of CNEs and the genes, we only considered the genomes for which complete chromosome assemblies were available (for example, chromosome assemblies for the orders Cetacean, Chiroptera, and Proboscidea etc. are not presently available); (3) to obtain the sufficient number of orthologous genes across species, we restricted our analysis to fewer mammalian lineages as considering multiple species would have compromised the total number of orthologous genes to start with; and (4) to assess the loss of phenotypes, it was important to consider the species for which morphological trait data were available in the structured form, like those in the MorphoBank database. Among the sequenced and extant rodents, trait data were only available for rat and not for mouse, partly due to the slightly bigger brain size of the rat compared to that of the mouse, which had allowed the systematic morphological dissection of rat brains in the past. Similarly, analysis of time course gene expression data for pre- and postnatal brain development was an important component of this study. Such a data set was available for the rat with multiple fetal and postnatal time points, but not for mouse.
We obtained the ortholog positions of human CNEs in query species using a standard approach of mapping through LiftOver (https://genome-store.ucsc.edu/) chains at 0.95 mapping coverage (Marcovitz et al. 2016). The “deleted,” “partially deleted,” and “duplicated” mappings were removed from the data set. Finally, we compiled 114,219 CNEs that had orthologous positions in all five species. We independently obtained the table of orthologous genes across five mammals from Ensembl (https://www.ensembl.org/index.html). Using CNE and gene tables, a list of the nearest genes that were within 1 Mb of the CNEs was obtained for humans. The positions of orthologous CNEs and genes in other mammals were assessed, and CNE–gene pairs were classified as syntenic if the distance between the two was < 1 Mb in all five species, and as LOP if the CNE and the gene were > 2 Mb apart, or were on different chromosomes in one of the species and remained within 1 Mb in rest of the species. If there were multiple orthologs for the same gene, we took the nearest gene to the CNE on the same chromosome to ensure that a syntenic pair would not be classified as LOP due to ortholog redundancy. The distance cutoff of 1 Mb was determined based on the distribution of the number of CNE–gene pairs at different distance cutoffs. At ∼1 Mb, the overall distribution approached a plateau and the numbers did not increase significantly after that (Figure 1B). The 2-Mb cutoff for LOP ensured that CNE and the gene were distant by at least twofold in their LOP form when compared to their syntenic form. Larger distance cutoff was also likely to be robust against the annotation artifacts of gene coordinates. A flow chart illustrating the overall strategy is given in Supplemental Material, Figure S1. All the data are available in the supplemental data file.

Conservation of proximity and lack thereof between CNEs and the nearest genes. (A) Illustration of the strategy to infer the synteny and the LOP between CNEs and the neighboring genes across five representative mammalian orders. CNE–gene pairs were classified as syntenic if they remained proximal (< 1 Mb) in all the five species, and as LOP if they departed by > 2 Mb or were on different chromosomes in one of the species while maintaining synteny in the other four species. (B) Pdf of all CNE–gene distances in the human genome. Most CNE–gene pairs were < 1 Mb apart and therefore a cutoff of 1 Mb was applied for CNE–gene synteny. (C) Sequence conservation, as measured through mammalian PhyloP scores, and (D) length distribution of CNEs in syntenic and LOP sets. P-values were calculated using Mann-Whitney U-test. (E) Enrichment of retrotransposons ± 50 kb around syntenic and LOP CNEs. Asterisk indicates significant P-values (< 0.05) calculated using a Mann–Whitney U-test of enrichment values ± 10 kb around CNEs. CNE, conserved noncoding element; cons., conservation; LINE, long interspersed elements; LOP, loss-of-proximity; pdf, probability distribution function; SINE, short interspersed elements; Syn, syntenic.
To assess the genome assembly artifacts, we mapped the rat-LOP CNE pairs to known problematic regions of the rat genome (https://github.com/shwetaramdas/rataccessibleregions/). Out of 2711 CNEs and out of a total of 245 genes, only three CNEs (0.1%) and three genes (ABCC6, FOS, and BNIP2; 1.2%) mapped to these regions, respectively. Exclusion of these regions was unlikely to change our claims. We further mapped the rat-LOP CNE–gene pairs of the rn5 rat assembly to the rn6 assembly. Out of 2711 CNE–gene pairs, 2667 pairs (98.4%) were successfully lifted over to rn6. In total, 2227 (83.5%) pairs maintained rat-LOP status in rn6 as well (Figure S11A). Removing the ambiguous pairs did not alter the significance of brain association (Figure S11B). We also replaced the Ensembl ortholog information by other_refseq data in the above analysis to assess the correctness of ortholog mapping. Therefore, the concordance of 83.5%, and the persistence of brain association, confirmed that the observations presented in the article were robust against the technical artifacts of genome assembly and gene orthology.
Analysis of genomic attributes
Chromosomal coordinates of repeat elements were downloaded from the UCSC table browser. Repeat elements were mapped ± 50 kb around syntenic and LOP CNEs, and the average values of enrichment in 2-kb bins were plotted. For conservation analysis, PhyloP scores of placental mammals (http://ccg.vital-it.ch/mga/hg19/phylop/phylop.html) were mapped ± 1 kb to CNEs.
Functional enrichment analysis
Normalized gene expression data for the developing cerebral cortex and heart of humans, rats, and sheep were taken from BrainSpan (human cortex; http://www.brainspan.org/static/download.html), GSE71148 (human heart), Stead et al. 2006 (rat cortex), GSE53512 (rat heart), Clark et al. (sheep cortex), GSE66725 (sheep heart), and GSE63482 (mouse cortex). Average gene expression was plotted across a developmental time course. Null distributions were represented by mean and 95% C.I.s of 500 random samples of the same size as the original gene sets.
Enhancer analysis
The regulatory potential of CNEs was assessed by mapping ChromHMM data obtained from the Epigenome roadmap (http://egg2.wustl.edu/roadmap/web_portal/imputed.html#chr_imp) and ENCODE (http://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeBroadHmm/) projects onto CNEs. Enhancer coordinates from FANTOM (http://enhancer.binf.ku.dk/presets/) were also mapped to CNEs. Cumulative overlap across the aforementioned three resources was calculated. Data sets for H3K4me1 (histone-3-lysine-4 monomethylation) for fetal and postnatal/adult human tissues were obtained from the Epigenome Roadmap (http://www.roadmapepigenomics.org/data/) with following accession identifiers (IDs) and age groups: fetal brain (E081 and E082; 17GW), adult brain (E067, E068, E069, E071, E072, E073, and E074; pooled 73Yr/75Yr/81Yr), fetal muscle (E089 and E090; 15GW), postnatal muscle (E107; pooled 54Yr/72Yr) and fetal thymus (E093; 15GW), postnatal thymus (E112; 3Yr), fetal heart (E083, 91 days), postnatal heart (E95, E104, and E105, pooled 3Yr/34Yr), fetal small intestine (E085, 15GW), and postnatal small intestine (E109, pooled 3Yr/30Yr). H3K4me1 data for fetal and adult mouse brain were obtained from ENCODE (ENCSR000CCZ and ENCSR000CAI for E14.5 embryos and 8-week-old adults, respectively). Fold-change over input DNA was used for aggregation plots. The Washington University (WashU) epigenome browser was used for visualization. Motif analysis was performed through “peak-motif” package from Regulatory Sequence Analysis Tool (RSAT) (http://rsat.sb-roscoff.fr/peak-motifs_form.cgi) using JASPAR core matrices for vertebrate genomes. Syntenic CNEs were taken as background control sequences.
Mapping of proxy GWAS SNPs
Chromosome Conformation Capture data
Sequence Read Archive (SRA) files of high-throughput chromosome conformation capture (HiC) data sets for fetal and adult brains were obtained from GSE77565 and GSE87112. Data sets were processed using HiC User Pipeline (HiCUP) (https://www.bioinformatics.babraham.ac.uk/projects/hicup/), and contact maps were normalized using the iterative correction and eigenvector decomposition method (https://github.com/hiclib/iced). To generate interaction profiles similar to Circular Chromosome Conformation Capture (4C), the transcription start site (TSS) in each CNE–gene pair was taken as bait (reference point) and its intrachromosomal interactions were obtained from HiC matrices. A Loess regression line was fitted to the HiC counts as a function of genomic distance from the bait. Significant interactions with the bait were identified by applying a cutoff of a 3-SD distance from the regression line (Klein et al. 2015).
DNA breakpoint analysis
We obtained 552 germline breakpoints associated with congenital disorders having brain abnormalities and 68,018 somatic cancer breakpoints from van Heesch et al. (2014). The matching RNA-sequencing (RNA-Seq) data of peripheral blood of the patient and the mother were obtained from the European Nucleotide Archive (https://www.ebi.ac.uk/ena) with the accession IDs ERX358048 and ERX358046, respectively.
In total, 2061 evolutionary DNA breakpoints for rodents were taken from Bourque et al. (2004, 2006), Larkin et al. (2009), and Lemaitre et al. (2009). These breakpoints were then mapped onto interspacer regions between CNEs and the nearest gene-TSSs. The random null was obtained by picking CNE–gene pairs, of the same sample size and CNE–gene distance as that of the rat-LOP set, from the syntenic set 1000 times and mapping the breakpoint in the interspacer regions. The number of CNE–gene pairs with at least one breakpoint in between were counted for each sample. The distribution of these numbers was regarded as random null. P-values were calculated using the equation used in the GWAS SNP analysis. The breakpoint distances from the mammalian ancestor were obtained from Luo et al. (2012).
Mammalian traits
The statuses of morphological traits in five mammals were obtained from project ID P773 of the MorphoBank database (https://morphobank.org/). Traits that exhibited the same status in at least three of the mammals including rats, but that showed a different status in humans, were classified as independently modified traits in humans. Similarly, the traits that had the same status in at least three species including humans, but had changed status in rats, were denoted as independently modified traits in the rat.
Data availability
The authors affirm that all data necessary for confirming the conclusions of this article are represented fully within the article, its tables and figures, and the associated supplemental material. Supplemental material available at Figshare: https://doi.org/10.25386/genetics.7750919.
Results
Evolutionary loss of genomic proximity between the CNE and the proximal gene
Using chromosomal position data of CNEs and representative primate (human), rodent (rat), carnivore (dog), perrisodactyl (horse), and artiodactyl (cow) protein-coding genes, we obtained 51,434 syntenic CNE–gene pairs (4241 genes), wherein CNEs and the nearest gene-TSSs were < 1 Mb distant in all five species. There were 3579 LOP cases (334 genes) wherein the CNEs and the gene-TSSs were on different chromosomes, or were > 2-Mb apart independently only in one of the species (Figure 1A and Figure S1, Materials and Methods). The rationale of 1-Mb distance cutoff for the synteny was based on the observation that the distribution of all CNE–gene distances saturated when they approached a 1-Mb range (Figure 1B). A similar approach has been used previously to infer enhancer–promoter linkage based on evolutionary synteny between the two in a 1-Mb range (Naville et al. 2015). The distance cutoff of 2 Mb for LOP ensured that the minimal expansion in CNE–gene distance would at least be twofold. To test if the CNEs in syntenic and LOP sets were comparable, we assessed their lengths and degree of conservation in mammalian genomes. Figure 1C showed an insignificant difference in the degree of sequence conservation of syntenic and LOP CNEs, suggesting that the sequence of LOP CNEs had not diverged among mammals as compared to that of syntenic CNEs. The length distribution of syntenic and LOP CNEs showed only a marginal difference toward slightly longer CNEs in the LOP set (Figure 1D). The density of CNEs per gene (within 1 Mb) was also not significantly different in the two sets (P = 0.15, Figure S2). However, the syntenic and LOP CNEs were located in the genomic domains of distinct sequence properties. We analyzed the enrichment of Short Interspersed Elements (SINE), Long Interspersed Elements (LINE), and Long Terminal Repeats (LTR), which cover ∼ 50% of the mammalian genome, around syntenic and LOP CNEs. The syntenic CNEs were enriched in the regions of open chromatin, as signified through greater enrichment of SINE content (P < 7e−04) as compared with the LOP CNEs, and might have a more widespread role across different cell lineages (Figure 1E). In contrast, the LOP CNEs were located in the domains enriched with LTR repeats (P < 0.05), marking their susceptibility to genomic rearrangements through mechanisms like nonallelic homologous recombination (Roeder and Fink 1980; Chan and Kolodner 2011) (Figure 1E). LINE elements, in general, did not exhibit significant differences in the two sets (except in rat). These observations were largely consistent across species, marking an ancestral property (Figure 1E), except in rat wherein the LINE elements were enriched (P = 3.4e−04) around LOP CNEs. This exception can be explained by the fact that the rodents retained the least of the ancestral retrotransposons when compared to other mammals and instead accumulated novel retroelements (Buckley et al. 2017).
A relatively large number of syntenic CNE–gene pairs (93.5%) confirmed the widespread conservation of linear proximity between CNEs and their adjacent genes (Babarinde and Saitou 2016). Of the total 3579 LOP instances, a significant number (2711, 75%, P < 2.2e−16) were associated with the rat genome alone, in line with the significantly greater number of structural variations in the rodent clade (Luo et al. 2012) (Figure 2, A and B). Positive scaling (ρ = 0.53) between the number of LOP CNE–gene pairs and the breakpoint distances of species from the common ancestor signified that the CNE–gene proximity was an ancestral trait (Figure 2B). Due to the significant loss of CNE–gene proximity in the rat lineage as compared to others, we focused on rat instances in this study. By rat-LOP, we refer to the “loss of CNE-gene proximity in rat” from Figure 2C onward. To directly assess the association of rat-LOPs with structural variations in the genome, we analyzed the rodent-specific evolutionary breakpoints (Materials and Methods). We observed that 930 (34%) of all rat-LOP instances had at least one rodent-specific breakpoint between the gene-TSS and the CNE, as compared to 319 (11%) on average for the random null prepared through distance-controlled bootstrap sampling of syntenic CNE–gene pairs (P < 0.001, Figure 2C, Materials and Methods). This suggested that the rodent-specific genomic rearrangements nonrandomly coincided with the loss of CNE–gene proximity in the rat, but could not explain the entire repertoire of rat-LOP cases (Figure 2C). We argue that the sequence alignment-based annotations of evolutionary breakpoints might not represent the entire repertoire of genomic rearrangements and, therefore, analyzed the gene orthology on either side of LOP CNEs to map the various rearrangement scenarios through which CNE–gene proximity was lost. We found that the translocation-like scenario, as marked by “i” in Figure 2D, largely explained the interchromosomal (trans) splits of CNE and the adjacent gene. The scenario that reflected the mapping artifacts, as in panel iii, was underrepresented (5%, Figure 2D). Analysis of intrachromosomal (cis) splits suggested inversion-like events separating the CNE–gene pairs. Scenarios iv and v showed events wherein regions adjacent to CNEs (on the left side in scenario iv and right side in scenario v) underwent local rearrangements, of which 30 and 90% events, respectively, were confirmed as inversion-like events (red-colored bars in Figure 2D) by analyzing the changes in relative strand orientations of neighboring genes. Scenario vi represented 1.4% of the total cases wherein the in-between region of CNE and the gene was expanded without involving any apparent rearrangement. We illustrated examples of trans and cis splits of CNEs and genes in Figure 2E. Gene POU3F2 on human chromosome 6 was syntenic to a CNE, which was 45-kb upstream. The orthologous CNE and the gene in the rat were on chromosomes 8 and 5, respectively, marking the trans split of the CNE and the gene through translocation (Figure 2E). Another CNE was 18-kb upstream of gene ADAM23 on human chromosome 2. The rat orthologs were separated by 2.4 Mb on chromosome 9 through an inversion (Figure 2E).

Genomic rearrangements underlying the loss of proximity. (A) Number of CNEs and genes (in parentheses) exhibiting LOP independently in different mammals. The P-value for the LOP instances in rat and the next highest value (in dog) was calculated using Fisher’s exact test. (B) Scaling between the number of LOP CNE–gene pairs and the evolutionary breakpoint distance of species from the common mammalian ancestor. A breakpoint distance signifies adjacency of two genes that were together in one genome, but not in its neighbor in the breakpoint-based phylogenetic tree. Dashed line: least squares regression. (C) The number of rat-LOP CNE–gene pairs having at least one rodent-specific breakpoint between a CNE and a gene-TSS, overlaid onto the null distribution prepared from the syntenic set. (D) Distinct trans and cis chromosomal rearrangements as inferred from the analysis of gene orthology flanking the rat-LOP CNEs. Shown are the neighboring genes around CNEs. Red color represents the target gene and orange color represents the nearest gene on the other side of the CNE. White box represents the neighboring gene of the target gene. The bar plot represents the proportion of total rat-LOP CNE–gene pairs in each scenario. The red color in the bar plot marks the proportion for which inversion could be confirmed through analysis of gene orientations. (E) Shown are the two examples illustrating the loss of CNE–gene proximity in the rat. In the first example, a CNE was located 45-kb upstream of the gene POU3F2 in human but was split on different chromosomes in the rat. The second example shows that an inversion event had distanced the CNE and the proximal ADAM23 gene by up to 2.4 Mb in the rat genome. chr, chromosome; CNE, conserved noncoding element; LOP, loss-of-proximity; Syn, syntenic.
We concluded that the rodent-specific genomic rearrangements, as inferred from gene orthology approach, largely explained the loss of CNE-gene proximity in the rat.
Genes that had lost proximity to CNEs in rats were associated with fetal brain development
Significant differences in the genomic attributes around syntenic and rat-LOP CNEs hinted at their distinct functional roles. To assess their functions, syntenic and rat-LOP gene-lists were subjected to GO and MPO analyses. The analysis of GO terms revealed enrichment of general as well as various tissue-specific development-related terms in the syntenic set (P < 6e−06), while the rat-LOP set was specifically enriched with nervous system development-related terms (P < 0.03, Figure 3A). In MPO analysis, syntenic set exhibited enrichment of neonatal lethality and skeletal phenotypes (P < 9e−12), while the rat-LOP set was associated with brain morphology-related phenotypes (P < 0.04, Figure S3A). Species, other than rat, did not exhibit enrichment of any particular functional term, owing to smaller sample size.

Functional characterization of genes in syntenic and rat-LOP sets. (A) Enrichment of gene ontology terms among genes in syntenic and rat-LOP sets. P-values were calculated using a hypergeometric test and were corrected using the Benjamini–Hochberg method. (B) Tissue-specific expression analysis of genes in the syntenic and rat-LOP sets. Relative significance was plotted as negative of log10-transformed corrected P-values of Fisher’s exact test for the overlap with the tissue-specific genes at stringency score (pSI) < 0.05. The horizontal gray-colored line represents a P-value of 0.01. For the syntenic set, mean values and SE of significance for 100 random samples of 245 genes (the size of the rat-LOP set) from syntenic sets were plotted. (C) Expression specificity of genes in syntenic and rat-LOP sets across brain regions and across developmental stages. For the syntenic set, the sample that exhibited maximum significance for brain specificity in (B) was taken. Size of the nested hexagons represents the proportion of all genes specifically expressed in a particular tissue at a particular developmental stage. Hexagons are nested inward based on relative stringency of tissue-specificity scores (pSI = 0.05, 0.01, 0.001, and 0.0001, respectively). Color gradient represents the magnitude of corrected P-values of Fisher’s exact test. dev., development; LOP, loss-of-proximity; pos., possible; pSI, tissue-specificity index; reg., regulation; sys., system; transc., transcription.
We further followed the above observations through tissue-specific gene expression analysis in humans. The syntenic set had a widespread representation of genes expressed in different cell-lineages and, therefore, did not exhibit significant tissue specificity, while genes in the rat-LOP set were specifically expressed in the brain (P = 0.001, Figure 3B). The brain-specific expression of genes in the rat-LOP set was also confirmed through enrichment analysis of anatomical terms from the Bgee database (Figure S3B). Within the brain, the rat-LOP set was enriched with the genes specifically expressed in the cerebral cortex during fetal, but not postnatal, development (P < 0.025, Figure 3C). In contrast, the genes in the syntenic set did not exhibit any specificity for brain tissues and developmental stages (Figure 3C). These observations highlighted fetal brain-specific roles of genes in the rat-LOP set.
CNEs exhibiting LOP to genes in rats function as fetal brain-specific enhancers
To test whether the differences between syntenic and rat-LOP sets observed through the functional analysis of genes were coherent with the associated CNEs, we tested the regulatory potential of CNEs by analyzing their epigenomic properties across tissues. Through analysis of enhancer-associated chromatin state annotations from the Epigenome Roadmap, and the ENCODE and Fantom consortia (Materials and Methods), we observed that 74% of syntenic and 61% of rat-LOP CNEs overlapped with the enhancer-associated regulatory sites in at least one of the tissues or cell types, marking the enhancer potential of CNEs. Relatively less representation of enhancers in the rat-LOP set might relate to their tissue or developmental stage-specific functions, a hypothesis that we further reconciled through detailed analysis of histone modification associated with enhancers, namely H3K4me1. We chose this mark because of its strong association with the enhancer potential and the availability of genome-wide data sets for all the cell lineages that we were interested in. We observed that: (1) the CNEs in the syntenic set exhibited consistent H3K4me1 enrichment across several fetal and adult tissues like the thymus (endodermal), muscle (mesodermal), heart (mesodermal), intestine (endodermal), and brain (ectodermal) (Figure 4A and Figure S4, A and B); (2) H3K4me1 enrichment over CNEs in the rat-LOP set was specifically higher (comparable to that of syntenic CNEs) in the fetal, but not adult, brain (Pfetal = 0.03 vs. Padult = 2.8e−06 for comparison with the syntenic CNEs, Figure 4A). We further observed the significant enrichment of binding sites of ectoderm-specific transcription factors, which were specifically upregulated in fetal brain in rat-LOP CNEs as compared to syntenic CNEs (P < 0.05, Figure S5, Materials and Methods). These observations were largely coherent with our proposal that rat-LOP CNE–gene pairs were associated with fetal brain development.

Enhancer properties of CNEs in syntenic and rat-LOP sets. (A) H3K4me1 Chromatin immunoprecipitation enrichment (over input), on and around CNEs in syntenic and rat-LOP sets, in fetal and postnatal tissues. Shaded regions around the curves represent 95% C.I.s. P-values for the difference between syntenic and rat-LOP CNEs were calculated using the Mann–Whitney U-test of H3K4me1 enrichment values in 1-kb spanning windows on either side of the CNEs. (B) Virtual 4C analysis of CNE–gene interactions in fetal and adult human brains. The barplot shows the fetal-to-adult ratio of the proportion of CNE–gene pairs exhibiting significant (above 3σ distance from the Loess regression fit) physical chromatin interactions. P-value was calculated using Fisher’s exact test. (C) The examples of GPR85 and FEZF2 genes are shown for illustration. Vertical gray (reference point) and yellow bars represent the TSS and CNE positions, respectively. Red and gray curves show virtual 4C and H3K4me1 signals, respectively. The black smooth line represents the Loess fit of the 4C signal as a function of genomic distance from the reference point. The dotted line represents the 3σ distance from the Loess regression line. (D) The proportion of rat-LOP CNEs having at least one brain-associated SNP superimposed onto the null distribution obtained from syntenic CNEs. P-value was calculated using the bootstrap method by randomly sampling 2711 CNEs (size of the rat-LOP set) from the syntenic set 1000 times. (E) An example of the EPHA4 gene and its proximal CNE having a schizophrenia-associated GWAS SNP is shown. The tracks for PhyloP conservation score, H3K4me1, and RNA-sequencing data of fetal and postnatal brains are aligned accordingly. The orthologous CNE and gene were 7.2-Mb apart on chromosome 9 in the rat. Cons, conservation; CNE, conserved noncoding element; cons., conservation;; GWAS, genome-wide association studies; LOP, loss-of-proximity; RNA-seq, RNA-sequencing; Syn., syntenic; TSS, transcription start site; 4C, circular chromosome conformation capture.
To assess the physical association between CNEs and cognate genes, we generated virtual 4C data by processing available HiC data sets of fetal and adult human brains (Figure S4C). Figure 4B shows the significant (P = 0.0069) fetal-to-adult ratio of the proportion of rat-LOP CNE–gene pairs showing significant physical interactions as compared to that of syntenic CNE–gene pairs. We illustrated the physical interactions between CNEs and genes through examples (Figure 4C and Figure S4D). TSSs (reference point) of the GPR85 and FEZF2 genes, both associated with neurological phenotypes (Matsumoto et al. 2008; Chen et al. 2011; Eckler et al. 2014), showed significant interaction frequency (above 3 SD from regression fit) to their cognate CNEs in human fetal brain, but not in the adult brain. The H3K4me1 signals at TSSs and CNEs were also significant in fetal brain as compared to the adult. Epigenomic analyses thus suggested that the majority of the rat-LOP CNEs exhibited enhancer-associated hallmarks in the fetal brain.
By mapping the trait/disease-associated SNPs from GWAS and the nearby SNPs (proxy) in the linkage disequilibrium based on 1000 genome data, we observed that 105 of the rat-LOP CNEs had at least one brain-related SNP (Figure 4D). This representation was statistically significant (P = 0.006) when compared with that of the syntenic set (Figure 4D). These observations represented genetic evidence of brain-specific roles of rat-LOP CNEs. We highlighted the example of the EPHA4 gene, which is required for radial neuron migration, and is involved in the pathways leading to lissencephaly and schizophrenia in humans (Sentürk et al. 2011; Steinecke et al. 2014). An upstream CNE to this gene had a schizophrenia-associated SNP. Fetal brain specificity of CNEs and gene expression were illustrated using H3K4me1 and RNA-Seq tracks of fetal and postnatal human brains (Figure 4E).
Therefore, our observations through enhancer data sets, epigenomic marks, differential motif enrichment analysis, and brain-associated SNPs concomitantly established that the rat-LOP CNEs were specific to fetal brain development in humans.
Developmental tolerance of LOP to CNEs
While we have shown that the genes and the CNEs that had lost proximity in the rat were associated with fetal brain development, whether or not CNE–gene proximity itself was linked with the brain-specific expression of the cognate gene remained to be addressed. Toward this, we assessed the representation of germline breakpoints associated with the congenital disorders exhibiting brain abnormalities, and the somatic cancer breakpoints between CNEs and gene-TSSs, in syntenic and rat-LOP sets. Since the observed germline breakpoints are the ones that had survived through germline and the embryonic development, their presence and absence between CNEs and the adjacent genes signifies developmental tolerance and intolerance, respectively, of loss of CNE–gene synteny. In contrast, the cancer breakpoints of somatic origin do not undergo such selection, and hence do not indicate the developmental tolerance or lack thereof. Figure 5A shows the number of rat-LOP CNE–gene pairs having at least one DNA breakpoint between the gene-TSS and the CNE. To test the significance, we prepared the random null from syntenic CNE–gene pairs of a similar distance distribution as that of the rat-LOP. We observed significantly greater representation (P < 0.001) of germline breakpoints in the rat-LOP set as compared to the syntenic set, while the representation of somatic breakpoints showed an insignificant difference (P = 0.92, Figure 5A). We interpreted that DNA breaks between rat-LOP CNEs and the proximal genes were developmentally tolerable, and possibly mediated the genomic rearrangements in the genome of the rodent ancestor, which consequently allowed the LOP between CNEs and genes in rats. We further elaborated an example of germline chromosomal rearrangements that had split the CNE and the adjacent gene in congenital disorders with brain abnormality (Figure 5B). The example in Figure 5B shows a chromothripsis event wherein an inversion has split a CNE–gene pair. The involved gene, BCL11A, regulates cortical neuron migration, and mutations therein associate with microcephaly and intellectual disability in human (Wiegreffe et al. 2015). The BCL11A gene also exhibited a 3.6-fold loss-of-expression in the peripheral blood of the patient having genomic rearrangement as compared with the normal mother of the patient.

Tolerance and intolerance of CNE–gene split. (A) The number of rat-LOP CNE–gene pairs flanking at least one germline breakpoint associated with the congenital disorders having brain abnormalities (left panel) and the somatic cancer breakpoint (right panel), superimposed onto null distributions obtained from the syntenic set of the same CNE–gene distance distribution as that of the rat-LOP set. P-values were calculated using the bootstrap method with 1000 random samplings. (B) An example illustrating a chromothripsis event breaking CNE–gene proximity in a congenital disorder with brain abnormalities. The right panel shows the difference in expression level of involved gene (BCL11A) in the patient having genomic rearrangement and in the normal mother. chr, chromosome; CNE, conserved noncoding element; LOP, loss-of-proximity; RPKM, reads per kiobase per million; syn., syntenic.
LOP to CNEs coincided with the loss of fetus-specific upregulation of genes in rat brains
An important question was whether or not the evolutionary loss of CNE–gene proximity in the rat was associated with a loss of gene expression. To assess the functional fate of associated genes, we analyzed their time course gene expression trajectories for the developing cerebral cortices of humans, rats, and sheep (as an out-group). Sheep were inducted in the analysis due to the availability of gene expression data sets for pre- and postnatal tissues. We found that 99.4% of CNE–gene pairs that had lost proximity in the rat were syntenic in sheep as well, again confirming the independent LOP in the rat lineage. We observed a relative loss of fetus-specific gene expression in the rat brain as compared to that of humans and sheep, suggesting that the LOP correlated with a loss of fetus-specific gene expression in the developing rat brain (Figure 6A). These observations were also robust against varying CNE–gene distances in the rat-LOP set (Figure S6). Enrichment of neurogenesis-related genes and downregulation thereof in the fetal brains of rats has implications in understanding the loss of brain traits in the rat lineage. Indeed, an analysis of mammalian trait data suggested that a significant number of brain traits were independently modified in rats, but were preserved in other mammals analyzed in this study (P = 0.07, Figure 6B).

Evolutionary dynamics of developmental gene expression associated with the loss of CNE–gene proximity. (A) Red curves in the plots represent the mean expression of genes in the rat-LOP set and black line with 95% C.I.s (gray bars) represents the random null prepared from the syntenic set. Fetal samples are highlighted in gray background. Asterisk (*) indicates significant (P < 0.01) expression of rat-LOP genes in the fetal stages of brain development as compared to postnatal. Left panel represents cerebral cortex and right panel is for heart data sets (control). Numbers of fetal and postnatal time points are indicated in parentheses at the bottom of each plot. P-values are calculated using a one-tailed Student’s t-test for the fetal upregulation of rat-LOP genes when compared to postnatal gene expression in the same species and data set. Note that the expression units are not comparable across plots. (B) Bar plot representing the rat-to-human ratio of the proportion of traits modified independently in humans and rats. Top three traits that were independently modified in humans and rats are marked. A greater number of brain traits were independently modulated in rats (#9) as compared to humans (#3) (P = 0.07, Fisher’s exact test). Brain traits independently modified in rats are listed on the right. CNE, conserved noncoding element; LOP, loss-of-proximity; RPKM, reads per kilobase per million; TPM, transcripts per million.
Taken together, our analysis suggested a strong association between evolutionary dynamics of chromosomal positions of gene regulatory elements and the gain or loss of gene expression, aligning to the notion of “position effect” through loss of enhancer-promoter associations. The tissue and developmental stage-specific impact of structural variations highlighted the possibility of their significant role in altering developmental dynamics toward evolutionary gain or loss of lineage-specific traits.
Discussion
It is not always the change in number and the sequence of protein-coding regions in the genome that leads to phenotype alternation in evolution; the dynamics of gene expression is equally relevant in this context. One of the ways the gene expression is altered is through position effect, i.e., the relative chromosomal position of the gene in the genome can alter its expression through altered proximities to regulatory elements, chromatin states in the neighborhood, and spatial localization to different subnuclear compartments. Position effect was first discovered through the observation that the chromosomal arrangement of duplicated copies of the bar gene in bar-mutant flies had an influence on its expression, and consequently caused a relative decrease in the number of eye facets (Sturtevant 1925, 1928). Similarly, the white gene, when localized near heterochromatin, gives a mottled eye phenotype with red and white patches in the Drosophila eye (Wallrath and Elgin 1995; Martin-Morris et al. 1997). Despite its significance, the role of position effect in the evolution of traits has not been investigated thoroughly. In this study, we analyzed a kind of position effect due to evolutionary loss of proximities of genes to the conserved noncoding regulatory elements in mammals. Through comparative genomic analysis, we showed that the CNE–gene pairs that were syntenic in most mammals, but lost their close linear proximity independently in rats, were associated with alterations in the transcriptional program during fetal brain development. We presented evidence regarding how the genomic rearrangements might have impacted the evolution of lineage-specific phenotypes by modulating the developmental trajectories in early stages.
Enhancers can function at distances longer than several Megabases and spatial synteny has been observed among genomic regions that have been rearranged during evolution (Véron et al. 2011; Bagadia et al. 2016). How then does the loss of linear proximity to CNEs downregulate the expression of genes? Position effect significantly alters the expression noise of the genes (Chen and Zhang 2016). Evidence also suggests that long-range or trans enhancer–promoter interactions occur at the cost of increased expression noise (Sandhu 2012; Singh et al. 2016; Kustatscher et al. 2017). As a result, the overall expression level in the tissue is expected to decline due to increased stochastic fluctuations in gene expression across cells. Therefore, we hypothesized that the loss of linear proximity between a CNE and a gene would have compromised the expression level of the gene by allowing stochastic variations in enhancer–promoter interactions. An alternate explanation can stem from the functional divergence of orthologous CNEs. While CNEs have sufficient depth in sequence conservation across lineages, their developmental expression pattern can vary in a lineage-specific manner (Polychronopoulos et al. 2017). Therefore, the CNEs losing proximity to their cognate genes in rats might have developmentally diverged functions elsewhere in the genome.
Enrichment of brain development-related genes in the rat-LOP set might relate to the developmental plasticity of the brain as compared to other tissues. Genomic alterations at the loci important for the development of the basic body plan and functioning would be embryonic lethal, which largely explains the significant representation of skeletal/heart development- and neonatal lethality-related genes in the syntenic set (Figure 3A and Figure S3A). The brain, despite having neurodevelopment plasticity, exhibits the least genome-wide expression divergence across mammalian species (Khaitovich et al. 2006; Strand et al. 2007). However, within the space of a small rat-LOP gene set, expression divergence was observed. This suggested that the least expression divergence observed for brain is due to cellular functions that need to be precisely regulated to maintain the delicately shaped brain tissues of all mammals in general, while the ones that exhibit divergence would be implicated in developmental functions specific to the fetal brain. Our data showed that one of the ways that such expression divergence was modulated in evolution was through the alteration of genomic proximity between CNEs and neighboring genes. Fetus brain-specific downregulation of neurogenesis-related genes that had lost proximity to CNEs in rats aligned to the hypothesis that observed genomic alterations might link to brain traits that were lost in the rodent lineage. To test this hypothesis, we obtained the quantitative data of mammalian traits from MorphoBank. Through analysis of the trait data, we showed that among the species in this analyses, rats exhibited the greatest number of independently modified brain traits, including the ones directly associated with neurogenesis, like an absence of cerebral folding of the cortex, absence of claustrum separation from the cortex, and an absence of the lateral geniculate nucleus magnocellular layer etc. (Figure 6). Among the rodent brain traits, loss of cerebral folding of the cortex, i.e., a lissencephalic or smooth brain phenotype, is the largest visible alteration in rodent brains. Folded or gyrencephalic brain, in general, is considered to be an adaptation in mammals with a greater encephalization quotient, intelligence, and complex behavioral traits (Prothero and Sundsten 1984; Toro et al. 2008). It can, therefore, be contended that CNE–gene proximity and associated fetal brain-specific expression was not lost in rats, but was rather gained in other mammals that had bigger, gyrencephalic brains. However, we argue that significant nonuniformity in the cerebral cortex has been observed across several different mammalian species (Herculano-Houzel et al. 2008), and the assumption that the common ancestor of placental mammals had a smaller and simpler brain has been challenged recently (Rowe et al. 2011; Lewitus et al. 2014). Evidence has supported the gyrencephalic brain of the eutherian ancestor and the subsequent loss of cortical gyration in rodents (Kelava et al. 2013). The enrichment of genes associated with brain morphology phenotypes (Figure S3), extracellular matrix–receptor interactions, and actin cytoskeleton regulation (ACTN, ITGA1, ITGA11, DCLK2, ASAP1, LAMB4, LZTS1, CD36, Frabin, TWISTNB, DDX11, SGCG, and SCIN etc.), as well as ones implicated in human cortical malformations (MYCN, NRXN1, RASA1, FEZF2, EFNA5, and GLI3 etc.) (Piñero et al. 2017), in the rat-LOP set further support the LOP to CNEs in rat rather than a gain-of-proximity in other mammals (Figure S7).
We also emphasized that the LOP in the rat was inferred by filtering the CNE–gene pairs, which were syntenic in all other species, and hence were evolutionarily constrained, except in rats. Assessing gain-of-proximity was difficult because a CNE–gene pair that was distant in all species except one cannot be considered as an evolutionarily constrained CNE–gene pair. We suspected that gain-of-proximity inferred in this flawed manner would not have shown any functional association. Indeed, this was observed through an independent analysis (Figure S7).
While our analysis was limited to the species for which morphological trait data were available, assessment of rat-LOPs in the mouse was warranted to infer if the LOP to CNEs was a property of the rodent ancestor. We inducted mice in the analysis for this purpose. The overlap of mouse-LOP with that of rat-LOP was small (62 genes and 596 CNEs, Figure S8). While rats and mice are phylogenetically close, their genomes were independently rearranged from the common rodent ancestor (Luo et al. 2012). Therefore, lower overlap was not completely unexpected. Though the genes that had commonly lost proximity to CNEs in rats and mice did not show significant enrichment of any ontology term owing to a smaller sample size, the genes exhibited fetal brain-specific gene expression in humans, but not in rats and mice (Figure S8). Several genes related to cytoskeleton regulation (DCLK2, TWISTNB, DDX11, FEZ1, Frabin, SGCG, and SCIN) commonly lost proximity to CNEs in rats and mice. Among these genes, the DCLK2, FEZ1, Frabin, TWISTNB, and DDX11 genes have known association with cortical malformations (Kosan and Kunz 2002; Chen et al. 2005; Nakanishi and Takai 2008; van der Lelij et al. 2010; Kang et al. 2011; Stouffer et al. 2016; Romero et al. 2018). Yet another explanation for the lower overlap of rat-LOP and mouse-LOP genes could be the limited number of orthologous mappings across six species. To expand the ortholog data, we performed an independent analysis of primates (humans and chimps) and rodents (mice and rats), and fetched the instances where genes were proximal to CNEs in both primates but lost proximity in both rodents. As shown in Figure S8, the rodent-LOPs were enriched with cell migration and actin filament regulation-related functions. The rodent-LOP genes obtained in this analysis also exhibited fetal brain-specific upregulation in humans, but not in mice and rats (Figure S8). These observations suggested that the association of LOP with cell migration, cytoskeleton regulation, extracellular matrix organization, and fetal brain-specific upregulation of genes was common to the rodent ancestor. The CNEs in the rodent-LOP sets also showed significant enrichment of the H3K4me1 mark when compared to syntenic CNEs in human, but not in mouse, fetal brains (Figure S9). Therefore, the concurrently higher fetal brain-specific activities of LOP-CNEs and genes in a syntenic configuration (in human), and loss thereof in an LOP configuration (mouse), strongly supported our major claims.
We also assessed our observations with the genes that exhibited LOP to CNEs in rats but not in mice, and vice versa. In adherence to our hypothesis, the genes exhibited loss of fetal brain-specific expression only in the case of LOP to CNEs either in rats or in mice, ruling out the possibility that the lack of fetus-specific expression was a general property of all the genes in rodent brains (Figure S10).
It remains debatable whether or not the alterations in the brain traits in the rodent lineage represent adaptive selection, or were products of neutral drift. Some studies have suggested that smaller and lissencephalic brains were adaptively selected among mammals with distinct life history traits, like narrow habitats and smaller social groups, than those of gyrencephalic species (Lewitus et al. 2014). The distinct neurogenic potentials of gyrencephalic and lissencephalic species have been attributed to the observed differences (Lewitus et al. 2014). The increased proliferative potential of basal progenitor cells is necessary and sufficient to explain gyrencephalic brains (Lewitus et al. 2014). The loss of such proliferative potential, which was likely an ancestral trait, might have caused an inefficient neurogenic program in lissencephalic species. Our observation that the genes that had lost their proximity to CNEs in rats were involved in neuronal differentiation and were downregulated in the fetal rat brain is largely coherent with the above proposal.
Altogether, our observations highlight a link between genome order and the evolutionary dynamics of temporal gene expression patterns associated with mammalian brain development. The study also suggests that the genomic rearrangements, without any change in the genomic content, might have impacted the developmental trajectories and shaped the evolution of phenotypes.
Acknowledgments
K.S.S. received financial support from the Department of Science and Technology (EMR/2015/001681) and the Department of Biotechnology (BT/PR16366/BID/7/598/2016), India. M.B. and M.L. were financially supported by the Indian Institute of Science Education and Research Mohali. K.R.C., Y.J., and H.S. thank the University Grant Commission (UGC), Department of Biotechnology (DBT), and Science and Engineering Research Board (SERB) respectively for their fellowships.
Footnotes
Supplemental material available at Figshare: https://doi.org/10.25386/genetics.7750919.
Communicating editor: J. Birchler
Literature Cited
Author notes
These authors contributed equally to this work.