-
PDF
- Split View
-
Views
-
Cite
Cite
Meenakshi Bagadia, Arashdeep Singh, Kuljeet Singh Sandhu, Three Dimensional Organization of Genome Might Have Guided the Dynamics of Gene Order Evolution in Eukaryotes, Genome Biology and Evolution, Volume 8, Issue 3, March 2016, Pages 946–954, https://doi.org/10.1093/gbe/evw050
- Share Icon Share
Abstract
In eukaryotes, genes are nonrandomly organized into short gene-dense regions or “gene-clusters” interspersed by long gene-poor regions. How these gene-clusters have evolved is not entirely clear. Gene duplication may not account for all the gene-clusters since the genes in most of the clusters do not exhibit significant sequence similarity. In this study, using genome-wide data sets from budding yeast, fruit-fly, and human, we show that: 1) long-range evolutionary repositioning of genes strongly associate with their spatial proximity in the nucleus; 2) presence of evolutionary DNA break-points at involved loci hints at their susceptibility to undergo long-range genomic rearrangements; and 3) correlated epigenetic and transcriptional states of engaged genes highlight the underlying evolutionary constraints. The significance of observation 1, 2, and 3 are particularly stronger for the instances of inferred evolutionary gain, as compared with loss, of linear gene-clustering. These observations suggest that the long-range genomic rearrangements guided through 3D genome organization might have contributed to the evolution of gene order. We further hypothesize that the evolution of linear gene-clusters in eukaryotic genomes might have been mediated through spatial interactions among distant loci in order to optimize co-ordinated regulation of genes. We model this hypothesis through a heuristic model of gene-order evolution.
Introduction
Multiple lines of evidence refute the presumption that eukaryotic genome organization is random and it is not plausible anymore to consider a gene as an autonomous transcriptional unit (Hurst et al. 2004; Kosak and Groudine 2004). Eukaryotic genome is often organized into short gene-rich regions or “gene-clusters” interrupted by long gene-poor regions (Lawrence 1999; Hurst et al. 2004). The linear clustering of genes is shown to be evolutionarily constrained in eukaryotic genomes (Blanco et al. 2008). What might have constrained such ordering of genes? Various studies on numerous organisms suggest that the genes in the gene clusters tend to coexpress (Cohen et al. 2000; Spellman and Rubin 2002), can be involved in the same metabolic pathway (Lee and Sonnhammer 2003) and may interact with each other at protein level (Teichmann and Veitia 2004). The present understanding is that the deposition of similar chromatin marks and concurrent opening and closing of chromatin might mediate coexpression of functionally similar genes and selection of such chromatin level coordination might have been favored in the evolution. This model is also supported by the existence of large domains of distinct chromatin types, like actively transcribing, weakly transcribing, poised, and repressed chromatin domains (de Wit et al. 2008; Filion et al. 2010). This idea of domain organization of genome has important implications in developmental reprogramming, disease development, and pathogenicity due to position effect (Kleinjan and van Heyningen 2005; Kleinjan and Lettice 2008; van Heyningen and Bickmore 2013). Common cis-element requirement might impose another constraint that could favor the linear clustering of genes (Frasch et al. 1995; Flint et al. 2001; Ohlsson et al. 2001; Lomvardas et al. 2006; Splinter et al. 2006; Spitz and Duboule 2008; Sandhu et al. 2009; Vernimmen et al. 2009; Tena et al. 2011; Berlivet et al. 2013; Maeso et al. 2013). Some studies have also suggested that the constraints to minimize the expression noise by keeping the essential genes in the regions of open chromatin can guide the linear clustering of genes (Batada and Hurst 2007).
The mechanistic basis that governed the evolution of linear gene clustering is not entirely clear. Gene-family clusters, like HIST, HOX, KRT, OR, etc., wherein neighboring genes also share sequence similarity in addition to functional similarity, are argued to have evolved through duplication events (Ferrier and Holland 2001; Demuth et al. 2006). However, the genes in clusters other than gene-family do not generally show sequence similarity, suggesting that tandem duplication alone cannot explain their evolutionary convergence. Understanding the underlying mechanisms of gene order change, therefore, could provide some insights to evolution of gene clusters. One way the gene order can change is the segmental or whole-genome duplication followed by sequential loss of one copy duplicated genes (Fischer et al. 2001). Genomic rearrangements like inversions and translocations can also serve as potential mean to relocate genes and selection can then favor or disfavor the reconfigured gene order. Indeed, islands of genes conferring adaptation to changing environment have been proposed to be formed through genomic rearrangements (Yeaman 2013). However, there is no strong evidence supporting this model. Here, we present a few lines of evidence through analysis of 3D genome architectures, evolutionary break-points and functional data sets that support the role of 3D genome organization in mediating evolutionary dynamics of long range alterations in gene order.
Materials and Methods
Data Sets
The details of all the data sets used in the study are given in the supplementary material, Supplementary Material online.
Resampling Procedure
= proportion of resampled pairs exhibiting trans chromatin interactions
= proportion of trans chromatin interactions among the observed pairs (supplementary fig. S1, Supplementary Material online)
= expected proportion of trans chromatin interactions in the genome (supplementary fig. S1, Supplementary Material online).
Similar approach was used for figures 2b, 3, and 4.
Heuristic evolutionary model of gene clustering
The population of reorganized genomes was then subjected to evolutionary selection using Roulette Wheel method, while keeping the population size constant (n = 100). The best configuration of genome was drifted directly to the next generation (elitism). After selection step, the population was again subjected to random rearrangement, rewiring of interactions, fitness scoring, selection, elitism, and the whole process was repeated for 3,000 generations. If the coregulated genes tended to cluster linearly, the correlation between linear distance of genes and their coregulatory score would show gradual decrease in correlation over time. All the processed data sets are provided in the supplementary material, Supplementary Material online.
Results
We exploited the linear and the 3D genome organization data of budding yeast (Scer), Drosophila (Dmel), and human (Hsap) for our analyses. To infer the gene order in the hypothetical common ancestor of yeast and Drosophila, we obtained the genome assembly of a choanoflagellate, Monosiga brevicollis (Mbre) (King et al. 2008). Choanoflagellates are strategically well placed between fungi and metazoans in the phylogenetic tree (King et al. 2008) (fig. 1a). Similarly, to infer the common ancestor of human and Drosophila, we used the genome of Ciona intestinalis (Cint), an ascidian that shares many properties of free-living ancestral chordate (Satoh et al. 2014). We then compiled four sets of gene-pairs as illustrated in figure 1b and supplementary figure S1, Supplementary Material online: 1) genes on different chromosomes (“split” organization) in yeast and Monosiga, but within 1 Mb on the same chromosome (“clustered” organization) in Drosophila; 2) genes split in Drosophila and Monosiga, but clustered in yeast (<100 kb); 3) genes split in Drosophila, but clustered in yeast and Monosiga (same scaffold); and 4) genes split in yeast, but clustered in Drosophila and Monosiga. We had the same four scenarios for Drosophila–Ciona–Human comparison. For scenario 1 and 3 in Yeast–Monosiga–Drosophila comparison, we inferred that if the gene order (clustered or split) of a candidate gene-pair in Monosiga genome was similar to that of yeast, the common ancestral genome of eumetazoan and fungal clades would had similar gene order as illustrated in figure 1b (scenerio 1 and 3). Similarly, in Drosophila–Ciona–Human comparison, we inferred that if the gene order of a gene-pair in Ciona was similar to that of Drosophila, the common ancestor of Deuterostoma and Protostoma would also, most likely, had similar gene order (fig. 1b). For scenario 2 and 4, inference of gene order in their hypothetical common ancestral genome needed some additional support because Monosiga is closer to metazoans than fungi as shown in the phylopgenetic tree in the figure 1a. For example, if a candidate gene-pair was split in Drosophila and Monosiga, but clustered in yeast, we could not directly infer that the gene-pair was split or clustered in the hypothetical common ancestor of metazoans and fungi. However, if the gene-pair was also split in Arabidopsis thaliana (Atha) in addition to Drosophila and Monosiga, it could be inferred that the gene-pair was most likely split in the common ancestor of metazonas and fungi. Similar inferences were drawn for scenario 2 and 4 in Drosophila–Ciona–Human comparison by comparing gene order of human and Ciona with that of Monosiga genome. Implementing the above strategy, we showed that the 85% of the split gene-pairs of Monosiga and Drosophila in Yeast–Monosiga–Drosophila comparison, which could be mapped onto Arabidopsis, were split in Arabidopsis too, suggesting that the clustered organization in yeast was independently acquired in the fungal clade as shown in the figure 1b. Similarly, 95% of the split gene-pairs of Ciona and human in Drosophila–Ciona–Human comparison, which could be mapped onto Monosiga, were split in Monosiga too, suggesting that clustered organization in Drosphila was independently acquired in the protostomal clade. There was insufficient mapping of gene pairs in scenario 4 to Arabidopsis and Monosiga genomes for Yeast–Monosiga–Drosophila and Drosophila–Ciona–Human comparisons, respectively, and we could not test precisely, for example, whether the clustered gene-pairs (196 only) in Dmel and Mbre genomes were also clustered in Atha genome. Nevertheless, by using the above criteria we were mostly able to demarcate the instances where clustering was independently acquired or lost in the analyzed clades.

(a) Relative positioning of yeast, Monosiga, Drosophila, Ciona, and Human in the phylogenetic tree. Dark dots represent the common ancestors of metazoans/fungi and mammals/arthropods. (b) Schematic representation of different scenarios of gene clustering and splitting instances. Smooth and dashed lines represent clustered and split organization of gene-pairs. Black line represents the inferred gain or loss of gene clustering independently in one of the clades. Scenario 1 and 2 represent inferred gain of clustering and scenario 3 and 4 represent inferred loss of gene clustering along one of the clades.
We assessed the spatial connectivity of genes when they were distant in one of the species in all four scenario shown in the figure 1a. We had two key observations: 1) significant number of the gene-pairs in each comparison exhibited spatially connectivity in their split form (fig. 2a) and 2) the statistical significance (measured as Z-score) of spatial connectivity of split gene-pairs were significantly greater (average of 9σ and 16σ increase in Yeast–Monosiga–Drosophila and Drosophila–Ciona–Human comparisons, respectively) in case of clustering as compared with splitting of genes, suggesting that significantly fewer number of gene-pairs remained spatially proximal after split as compared with the gene-pairs that got clustered (fig. 2a). These observations led us to hypothesize that the spatially proximal genes impinging from different chromosomes in the ancestral genome might have undergone repositioning through long-range genomic rearrangements like translocations. Since the prerequisite to translocation is DNA breakage event (Roukos and Misteli 2014), we tested if the interacting loci were enriched with DNA breakpoints. A comparison with the null distribution obtained from all the trans interactions suggested nonrandomly greater number of interacting loci with DNA break-points (on both loci) in most comparisons, highlighting the susceptibility of engaged loci to long-range genomic rearrangements (fig. 2b). Again, it was notable that the cases of clustering exhibited greater statistical significance as compared with the cases of splitting (average of 6.3σ and 19σ increase in Yeast–Monosiga–Drosophila and Drosophila–Ciona–Human comparisons, respectively; fig. 2b). These results were suggestive of an evolutionary mechanism that endowed long-range reordering of genes, which might have, in part, guided the evolution of gene-clustering.

(a) Spatial connectivity of loci that independently acquired or lost gene clustering along one of clades. The species in which the spatial connectivity was assessed is marked in bold letters in each scenario. Observed values of spatial connectivity is represented by vertical bar overlaid upon the null distributions generates using the strategy given in the Materials and Methods section. (b) Number of interacting pairs of loci with DNA break-points overlaid upon null distribution for the same. Z-score is plotted for the spatial connectivity and enrichment of DNA break-points in order to compare relative values across different scenarios. Change shown in each comparison is the average of Z-score change (1)–(4) and (2)–(3).
What would have constrained the spatial colocalization of the analyzed gene-pairs? To address this, we used genome-wide multidimensional data sets associated with the chromatin states, transcription, and function of genes (Materials and Methods). For each data set, we obtained null distribution by randomly picking pairs of loci having interactions in trans and calculating Pearson’s correlation coefficient for those set of gene-pairs (Materials and Methods). As shown in the figure 3, epigenetic and transcriptional profiles of spatially proximal genes, in general, had significantly greater Pearson’s correlations when compared with the null distributions. However, they did not exhibit functional association as inferred from semantic similarity of gene ontology terms, protein–protein interactions, etc, suggesting that the access to common chromatin/transcription factor foci for the coordinated transcriptional response, not necessarily the functional similarity, of engaged genes might be the common requirement of spatially proximal genes. Further, the splitting instances in the figure 3 showed relatively lower significance (often insignificant) of epigenetic and transcriptional attributes as compared with clustering events. We also confirmed that the genes remained transcriptionally correlated in their linearly clustered organization in gene-clustering instances, but not in gene-splitting instances, suggesting that the coregulation of engaged gene might have served as a constraint preferring the evolutionary selection of linearly clustered organization of rearranged loci, which were spatially proximal in the ancestor genome (fig. 4). These results hinted at selective evolutionary constraint favoring linear clustering of distant genes and not necessarily splitting of clustered genes.

Epigenetic, transcriptional, and functional similarities of trans-interacting genes, which were on different chromosomes (split) in the highlighted (in bold letters) species in each comparison. The observed values of similarities are plotted as vertical bars overlaid upon the null distributions (colored dots superimposed over violin plots). Smooth and dashed lines in the cartoon of the phylogenetic tree represent clustered and split organization of gene-pairs respectively. Black colored smooth and dashed lines in the tree cartoon represent inferred gain and loss of gene-clustering, respectively. Three boxes in each vertical panel represent epigenetic, transcriptional and functional attributes, respectively. Change shown in each comparison is the average of Z-score change (1)–(4) and (2)–(3).

Average coexpression (correlation of expression profiles) of clustered gene-pairs superimposed over corresponding null distributions for different scenarios. The species in which we assessed the coexpression is highlighted in bold letters. Changes shown on the right side are the changes in Z-scores. The gene expression data for Scer and Dmel was “mega gene expression data set” and “time course embryonic development,” respectively. The interacting gene-pairs for the null distributions are taken from the same chromosome (within 1 Mb for Dmel and within 100 kb for Scer).
Our observations suggested that the recurrent events of long-range chromosomal rearrangements at spatially proximal and epigenetically correlated genomic sites might have served as one of the mechanisms that guided the evolution of gene order. We further pressed upon a possibility whether aforementioned mechanism of gene order change could explain the formation of profound gene-clusters in eukaryotes from the ancestors that had relatively less profound clustering of genes. We simulated the evolutionary process computationally, details of which is given in the Materials and Methods section and in supplementary figure S2, Supplementary Material online. Briefly, a population of 100 hypothetical genomes, each having two different chromosomes with equally spaced 50 genes, was subjected to the process of interchromosomal rearrangement (translocation), induced by spatial proximity of engaged loci, in a probabilistic manner. In figure 5, we applied the translocation frequency of 0.1, which was roughly equivalent to the maximal rate of gene order loss in yeast lineage (Fischer et al. 2006). Varying this frequency from 0.01 to 0.2 did not impact the overall observations except that the convergence took more iterations for lower translocation frequencies (supplementary fig. S3, Supplementary Material online). The population of reconfigured genomes then underwent a probabilistic selection based on coregulation of spatially proximal genes. This process was iterated over 3,000 times. As shown in the figure 5, we observed a gradual decline in the correlation between genomic distance and the coregulation of neighboring genes, suggesting that if the maximization of coregulation was the evolutionary favored strategy of genome, then the genes tended to cluster linearly through long-range genomic rearrangements. Based on these observations, we hypothesized that our proposed mechanism of gene order change might account for the profound linear gene clustering in eukaryotes. Our results hinted that the convergence from spatial proximity to linear proximity might serve as one of the strategies to maximize the transcriptional coordination among genes, whereas the divergence of linearly clustered genes to distinct chromosomes might only occur for the gene-pairs which were not significantly constrained by their transcriptional coordination as illustrated in our results.

Results obtained from heuristic model of gene order evolution. (a) Average coregulatory fitness of interacting genes in a population for each generation. (b) Correlation between genomic distance and the coregulation between genes at each generation. (c) Chromosomal heatmaps depicting intergenic distances in a representative chromosome of a population at each generation.
Role of spatial proximity in mutagenic processes in cancer genomes has been proposed earlier (Lin et al. 2009; Mathas et al. 2009; Duan et al. 2010; Veron et al. 2011; Engreitz et al. 2012). Our observations suggested that the same mechanism might have been exploited in the evolution to alter the gene order and select the one that was beneficial to optimize certain genomic function. Given that the statistical significance levels for gene-clustering instances were consistently greater as compared with gene-splitting instances throughout our analyses, it can be inferred that the pronounced gene-clustering in eukaryotic genomes might have been evolved, to some extent, through long-range mechanisms of gene order change, a hypothesis that we simulated using a heuristic model. It is noteworthy that the percentage of instances where we observed spatial proximity among split genes that acquired clustering along one of the clades varied from 7% to 69%, clearly suggesting that the translocation events alone cannot explain all the instances of gene clustering in the evolution, neither we claim so. First of all, the mechanism of repositioning of genes might not necessarily be the translocation event. Long-range inversions can also give similar results for the genes impinging from distant positions on the same chromosome. Important point to be considered here is that inversion would also require physical proximity of distantly located genomic elements. Second, segmental or whole-genome duplications followed by sequential loss of one of the gene-copies serve as potent mediator of gene order change in the eukaryotic genome (Fischer et al. 2001). It is notoriously difficult to map these events for distant species and appears untestable in our hands at present. Evolution of gene clusters through tandem duplication alone is out of context here because we tested the gene-pairs that were present, either distantly or proximally, in all three species analyzed in each comparison. Moreover, these gene-pairs did not share sequence homology and each gene in the pair belong to distinct gene family as observed through EPGD database.
We further extrapolate that the evolutionary dynamics of linear gene-clustering might have been consequently implicated in radial organization of gene clusters in the nuclear space based on relative gene-densities. The linear gene clusters would result in local attractors or “black-holes” sequestering most of the protein factors important for essential genomic functions like transcription and replication. As a consequence, the distal gene clusters need to be proximal in the nuclear space in order to access those factors and allow the efficient transcription/replication of genes. Therefore, if distinct gene clusters are considered analogous to “planets” and their affinity to bind to shared transcription factors is considered as “gravitational” attraction, the gene-clusters might naturally converge to “galaxy-like” structures, where gene-clusters with high gene density would converge interior of the nucleus, whereas the ones with low gene density would locate toward periphery. Though speculative at present, such a hypothesis can be tested using dynamical simulations in future.
Conclusion
In summary, the study reports strong evidence supporting a rather underappreciated mechanism that could have guided the evolution of gene order in eukaryotes. Three dimensional organization of genome predisposes certain interacting loci to long-range genomic rearrangements and the rearranged linearly proximal loci that had correlated chormatin and transcriptional states would have been selected through evolution.
Acknowledgments
Authors acknowledge the financial support from Ministry of Human Resource and Development (MHRD), India. M.B. thanks Mr Keerthivasan Raanin Chandradoss for technical help.
Literature Cited
Author notes
Associate editor: Partha Majumder