-
PDF
- Split View
-
Views
-
Cite
Cite
Miguel Garavís, Carlos González, Alfredo Villasante, On the Origin of the Eukaryotic Chromosome: The Role of Noncanonical DNA Structures in Telomere Evolution, Genome Biology and Evolution, Volume 5, Issue 6, June 2013, Pages 1142–1150, https://doi.org/10.1093/gbe/evt079
- Share Icon Share
Abstract
The transition of an ancestral circular genome to multiple linear chromosomes was crucial for eukaryogenesis because it allowed rapid adaptive evolution through aneuploidy. Here, we propose that the ends of nascent linear chromosomes should have had a dual function in chromosome end protection (capping) and chromosome segregation to give rise to the “proto-telomeres.” Later on, proper centromeres evolved at subtelomeric regions. We also propose that both noncanonical structures based on guanine–guanine interactions and the end-protection proteins recruited by the emergent telomeric heterochromatin have been required for telomere maintenance through evolution. We further suggest that the origin of Drosophila telomeres may be reminiscent of how the first telomeres arose.
The Necessity of Both End Protection and Segregation Functions at the End of Nascent Linear Chromosomes
It has been hypothesized that after the endosymbiosis of an α-proteobacteria into an archaebacterial host, the massive invasion of the symbiont’s mobile group II introns into the circular genome of the host gave rise to spliceosomal introns (Koonin 2006) and that this was the driving force for the origin of the nucleus (fig. 1). The invention of the nuclear membrane was necessary to physically separate slow splicing from fast translation (Martin and Koonin 2006) (fig. 1). It has also been proposed that this invasion eventually lead to the origin of linear eukaryotic chromosomes (Villasante, Abad, et al. 2007; Villasante, Méndez-Lago, et al. 2007). The host’s tolerance to the mobile element invasion and to other eukaryotic innovations could be facilitated by a low effective population size and the consequent weak purifying selection (Lynch and Conery 2003; Lynch 2007; Koonin 2011).

Schematic representation of a possible evolutionary scenario for the origin of eukaryotic chromosomes. The scheme shows the origin of spliceosomal introns and the origin of the nucleus as hypothesized (Koonin 2006; Martin and Koonin 2006) and also the proposed origin of the ancestral “proto-telomere” with capping and segregation properties. The eubacterial genome appears in blue and the archaebacterial genome in orange.
Mobile group II introns are retroelements of eubacterial origin that contain a catalytic RNA and a multifunctional protein with reverse transcriptase (RT) activity (Lambowitz and Zimmerly 2004). These retroelements are thought to be the ancestors of both spliceosomal introns and nonlong terminal repeat (non-LTR) retrotransposons (Sharp 1985). The evolutionary relationship between group II introns and non-LTR-retrotransposons is based on the similarity of their RT sequences (Xiong and Eickbush 1990; Blocker et al. 2005) and retrotransposition mechanisms (Luan et al. 1993; Zimmerly et al. 1995). After the initial proliferation of group II introns within the protoeukaryotic nuclear genome, their RNA domains degenerated and evolved into spliceosomal snRNAs that functioned in trans in a common splicing apparatus (Sharp 1991; Mohr et al. 2010). Although most group II introns evolved as eukaryotic introns, some lost their splicing capability and gave rise to non-LTR-retrotransposons.
It is likely that the continuous breakage of the presumed circular chromosome activated all the mechanisms of DNA repair, including the one mediated by non-LTR retrotransposons (Moore and Haber 1996; Morrish et al. 2002). In this evolutionary scenario, it has been hypothesized that the repetitive capture of non-LTR retrotransposons, with a G/C strand bias, at the ends of DNA double-strand breaks (DSBs) could have eventually resulted in end protection (capping), instead of repair, giving rise to the “proto-telomeres” of the first linear chromosomes (fig. 1) (Villasante, Abad, et al. 2007). The biased distribution of guanine and cytosine between the two strands could have been selected because G-rich sequences have the intrinsic capacity to fold into noncanonical secondary structures that were utilized for capping or sequestering chromosome ends (Villasante, Abad, et al. 2007; Villasante, Méndez-Lago, et al. 2007). In addition, the iterative transposition generated the first terminal repeats and that also allowed the elongation of chromosome ends by the existing mechanisms of homologous recombination (de Lange 2004). As will be described later, a similar situation occurs in Drosophila, where telomeres are maintained by retrotransposition of G/C strand-biased non-LTR elements and by terminal recombination (Mason et al. 2008; Villasante et al. 2008).
The universal 5′–3′ nuclease-mediated cleavage of DNA ends created 3′-single-stranded overhangs that were coated by an abundant single-stranded DNA (ssDNA) -binding protein (replication protein A or RPA) via its characteristic oligosaccharide/oligonucleotide-binding fold (OB fold) domains. This nonsequence-specific ssDNA-binding protein is required for multiple processes such DNA replication, DNA repair, and DNA damage signaling. Therefore, to protect the DNA ends and to avoid their repair, a specialized sequence-specific ssDNA-binding protein should evolve to coat the 3′-G-rich overhangs (Gelinas et al. 2009) and to promote their folding into non-B DNA conformations, which could be (but not necessarily) quadruplex-like structures. Recent studies have shown that molecular crowding (Heddi and Phan 2011; Xu et al. 2011) or high viscosity conditions (Lannan et al. 2012) stabilize G-quadruplexes, suggesting that cell environment may facilitate the formation of quadruplex-like structures.
On the other hand, a cell-cycle-regulated switch of those ssDNA-binding proteins was needed to re-establish telomere capping after DNA replication, and now it is known that telomeric repeat-containing RNA (TERRA) contributes to induce that switch (Flynn et al. 2011). TERRA is also required to organize a special chromatin structure: the telomeric heterochromatin (Azzalin et al. 2007; Schoeftner and Blasco 2008; Deng et al. 2009; Shpiz et al. 2011).
Here, it is fundamental to notice that “proto-telomeres” with dual function in capping and segregation were required to ensure accurate inheritance of the first linear eukaryotic chromosomes (fig. 1). Thus, the formation of the first heterochromatin at nascent chromosome ends should have facilitated both the recruitment of end-protection proteins and the attachment of spindle microtubules, most likely by means of ribonucleoprotein complexes (Villasante, Abad, et al. 2007; Villasante, Méndez-Lago, et al. 2007). Later on in eukaryogenesis, a mature segregation function evolved at subtelomeric regions. The mechanism of unequal exchange and gene conversion led inevitably to the divergence of the internal subtelomeric repeats, and the strand asymmetry of the repeats provided the potential to form the sequence-independent secondary structures that gave rise to the centromeres (Villasante, Abad, et al. 2007; Villasante, Méndez-Lago, et al. 2007). In this scenario, the recurrent appearance of unstable dicentric chromosomes, through the formation of new centromeres from telomeres, provided an additional mechanism of genome fragmentation. The birth of multiple eukaryotic linear chromosomes was the key innovation that allowed adaptive evolution by means of transient aneuploidy (chromosomal duplications) (Chen et al. 2012; Yona et al. 2012).
If primitive centromeres evolved from “proto-telomeres,” it would be reasonable to expect that telomeric regions may also have some centromere-like properties. Indeed, there are already results that seem to support this assertion.
1) In Schizosaccharomyces pombe, deletion of an endogenous centromere leads to neocentromere formation at subtelomeric regions (Ishii et al. 2008), and it has been shown that their centromeric and subtelomeric DNA sequences must possess particular features that promote the incorporation of the centromere-specific histone 3 variant (CENP-A) (Choi et al. 2012).
2) In Drosophila melanogaster, the centromere of the Y chromosome contains a large array of HeT-A- and TART-derived telomeric retrotransposons (Agudo et al. 1999), and the sequence of this satellite DNA has revealed that this centromeric region evolved from a telomere (Méndez-Lago et al. 2009). Furthermore, overexpression of the Drosophila CENP-A induces preferential formation of neocentromeres near telomeres (Heun et al. 2006; Olszak et al. 2011).
3) In some plants and animals, neocentromere activity appears at subtelomeric heterochromatin during meiosis (reviewed in Puertas and Villasante 2013).
4) The evolutionary history of chromosome 3 in primates shows at least three examples of telomere–centromere functional interchange (Ventura et al. 2004). Similarly, other telomere-to-centromere conversions have been described after the comparative analysis of eight mammalian genomes (Murphy et al. 2005). Because the subtelomeric repeats could have a role in these conversions, this chromosomal behavior could be due to the ancestral centromeric competence of a telomeric region.
Similarly, if primitive centromeres began at DSBs, one could wonder whether the dynamic chromatin formed around breakage sites could have centromere-like features. Here, too, there are results in favor of this consideration.
1) It has been shown that the centromeric proteins CENP-A, CENP-N, CENP-T, and CENP-U are rapidly recruited to DSBs (Zeitlin et al. 2009) and has been hypothesized that, under certain circumstances, this recruitment could generate a neocentromere (Zeitlin et al. 2009).
2) Strikingly, it had been previously noticed that several human neocentromeres were located near breakpoints and had been hypothesized that these breaks could induce the emergence of neocentromeres (Ventura et al. 2003; Marshall et al. 2008).
The previous hypothesis for the origin of the eukaryotic chromosome proposed that centromeres arose before telomeres and that probably evolved from the origin of replication region of the bacterial chromosome (Cavalier-Smith 1981). Recently, Cavalier-Smith (2010) has still suggested that centromeres arose first and has proposed that they originated from the partitioning locus, a region proximal to the bacterial origin of replication implicated in bacterial chromosome partitioning/segregation. But he did not say how the fragmented prokaryotic genome could give rise to a centromere on each nascent linear chromosome and what was the hypothetical process that led to the formation of regional centromeres containing repetitive DNA. In support of an ancestral regional centromere, a recent study in Saccharomyces cerevisiae has found centromere-like regions (without a specific DNA sequence) in close proximity to the native point centromere (Lefrançois et al. 2013). Because these small regions promote proper segregation, possibly through sequence-independent centromeric structures, they seem to be evolutionary remnants derived from a regional centromere rather than from a point centromere (Lefrançois et al. 2013).
To recapitulate, in this section, we have proposed that the origin of linear chromosomes (genomes in pieces) was a eukaryotic innovation generated by the mobilization of group II intron-derived retroelements as a response to endosymbiosis stress (McClintock 1984; Koonin 2011). Specifically, we have hypothesized that the repetitive capture of G/C strand biased non-LTR retrotransposons at the ends of DSBs gave rise to “proto-telomeres,” a primitive terminal heterochromatic structure with a dual function: end protection (telomeric function) and segregation (centromeric function) (fig. 1).
Noncanonical DNA Structures Based on Guanine–Guanine Interactions Seem to Have Played a Role in Telomere Origin and Evolution
In most eukaryotic chromosomes, telomere DNA sequences are arrays of short guanine-rich repetitive sequences that terminate in a 3′-single-strand G-rich overhang (150–200 nucleotides). The G-rich strand is synthesized by a telomere-specific RT, called telomerase, using a small region of its RNA subunit as template and the 3′-OH on the end of the chromosome as a primer (Blackburn 1992). Found in animals, fungi, and Amoebozoa, TTAGGG was the telomeric simple repeat sequence present in the ancestral Unikont. Moreover, its occurrence in some species of the supergroups Plantae, Chromalveolata, Excavata, and Rhizaria suggests that TTAGGG could be the ancestral telomeric repeat sequence for eukaryotes (Fulnecková et al. 2013).
On the other hand, the use of prokaryotic retroelements to root a RT phylogenetic tree shows that the telomerase seems to have evolved from the RT of an ancestral non-LTR retrotransposon (Eickbush 1997). Furthermore, it is believed that the ability of non-LTR RTs to use the 3′-OH of chromosome ends to prime reverse transcription was crucial for the birth of early telomerases (Moore and Haber 1996; Morrish et al. 2002, 2007; Curcio and Belfort 2007).
Telomerase-based telomeres brought two principal advantages: facilitated telomere homeostasis and a greater structural protection by the incorporation of simple G-rich repeats with the inherent ability to form G-quadruplex structures (Henderson et al. 1987; Arthanari and Bolton 2003; Teixeira and Gilson 2005). G-quadruplexes consist of stacked G-quartets, which are planar arrangements of four guanines held together by Hoogsteen hydrogen bonds (Neidle 2009) (fig. 2A). G-quadruplex formation may occur within the terminal G-rich 3′-overhang (fig. 2B) or when the overhang invades the adjacent double-stranded region of the telomere to form T-loop structures (fig. 2C) (Maizels 2006; Rhodes 2006; Xu et al. 2008; Bochman et al. 2012). Nevertheless, it has been hypothesized that after the appearance of telomerase, the maintenance of telomeres by the primitive T-loop-replication mechanism becomes less relevant (de Lange 2004). The first visualization of telomeric G-quadruplex formation in vivo was performed in the ciliate Stylonychia (Schaffitzel et al. 2001; Paeschke et al. 2005). Most recently, a highly specific DNA G-quadruplex antibody has been employed to visualize G-quadruplex structures at human telomeres (Biffi et al. 2013).

Schematic diagrams of a G-quartet and two telomeric G-quadruplexes. (A) Four guanines assemble in a planar arrangement to form a G-quartet. Hydrogen bonds are in dashed lines. (B) Diagram of an intramolecular G-quadruplex at a telomere end. (C) Diagram of a G-quadruplex at a T-loop. The G-quadruplexes in the figure are composed of three stacked G-quartets (shaded squares).
It is important to point out that the putative ancestral telomerase-synthesized sequence, (TTAGGG)n, is not only capable of folding into a G-quadruplex structure but is the best one at doing so in vitro (Tran et al. 2011). In addition, recent biophysical studies on the folding of these telomeric G-quadruplexes have shown that structure formation occurs in milliseconds. These folding kinetics are biologically relevant because they are comparable to those of transcription and DNA replication (Zhang and Balasubramanian 2012).
During evolution, mutations in the telomerase RNA template have given rise to repeat variants with different lengths of guanine motifs (G2, G4, and more). Recent experiments have found that the G-quadruplexes formed by telomeric repeats with only two consecutive guanines (TTAGG in arthropods and TTAGGC in nematodes) are in equilibrium with G-hairpins and other noncanonical structures (Tran et al. 2011).
In the silk moth Bombyx mori (Lepidoptera) and the flour beetle Tribolium castaneum (Coleoptera), the telomerase activity is weak, and telomere-specific non-LTR retroelements (TRAS and SART family elements in B. mori and SART family elements in T. castaneum) are inserted into the telomeric repeats in a specific manner (Fujiwara et al. 2005; Osanai et al. 2006) that preserves the G/C strand bias (fig. 3). The massive integration of these elements into the proximal regions of the TTAGG repeat arrays of B. mori and the TCAGG arrays of T. castaneum (an alternative telomere variant in insects) gives rise to huge telomeres with sizes larger than 200 kb.

Distribution of telomeric sequences within Bilateria. Most eukaryotes have G-rich telomerase-synthesized repeats with adjacent complex subtelomeric repeats called telomere-associated sequences (TAS). In most arthropods, telomere-specific retrotransposons are inserted into telomerase-synthesized repeats. As can be seen in the diagram, TRAS elements insert in reverse orientation to that of the SART elements. In an ancestor of diptean insects, the telomerase gene was lost (green line). In Chironomus tentans (lower Diptera), the telomeric sequences consist of complex tandem repeats maintained by recombination. However, Drosophila species have multiple telomere-specific retrotransposons (autonomous and nonautonomous) that transpose to chromosomal ends. The deletion event in the ancestral TAHRE element is shown with dashed lines.
The telomeres of the honey bee Apis mellifera (Hymenoptera) are exceptional among the arthropods because they do not have non-LTR elements inserted into their telomeric repeats. Instead, the telomere sequence consists of TTAGG repeat arrays (Robertson and Gordon 2006) interspersed with TCAGGCTGGG, TCAGGCTGGGTTGGG, and TCAGGCTGGGTGAGGATGGG higher order repeat arrays (Garavís M, Villasante A, unpublished results) (fig. 3). These higher order repeats arose by amplification of the mutated repeats present in proximal telomeric regions and the interspersed pattern developed by further amplifications of the 5-bp repeat arrays together with higher order repeat arrays. However, the TTAGG repeats of Acyrthosiphon pisum (Hemiptera) and Pediculus humanus (Phthiraptera) contain insertions of non-LTR retrotransposons of the TRAS and SART family, respectively (International Aphid Genomics Consortium 2010; Kirkness et al. 2010) (fig. 3). As Hemiptera and Phthiraptera are basal to Hymenoptera, Coleoptera, Lepidoptera, and Diptera (fig. 3), the telomeres of A. mellifera seem to represent a case where the TRAS and/or SART retrotransposons were lost at a later stage in evolution. It is tempting to speculate that the appearance of those higher order repeat arrays with propensity to form 3-quartet G-quadruplexes caused the decay, and eventual loss, of the telomeric retrotransposons.
Because the telomeres of the spider mite Tetranychus urticae (from the basal branch Chelicerata) are also a mosaic of short TTAGG repeats interrupted by non-LTR retrotransposons closely related to TRAS (Grbić et al. 2011) (fig. 3), the telomeres of the arthropods seem to be maintained by telomerase, by insertion of specific non-LTR retrotransposons into the TTAGG repeat array and by recombination. The same system of telomere maintenance has also been found in some nonarthropod species (Arkhipova and Morrison 2001; Yamamoto et al. 2003; Gladyshev and Arkhipova 2007; Starnes et al. 2012). It has not escaped our notice that the appearance of this apparently suboptimal mechanism of telomere maintenance, which might have created chromosome instability, seems to have coincided with the great arthropod radiation into Chelicerata and Mandibulata.
On the other hand, certain yeasts from the Ascomycota phylum have telomeric repeats that are diverse in terms of their sequence, length, and homogeneity (McEachern and Blackburn 1994). In these yeasts, the degenerate repeats result from the nonprocessivity of their telomerases (Prescott and Blackburn 1997). Importantly, these repeats, despite their TG-richness, are less prone to fold into G-quadruplexes (Tran et al. 2011), and it has been shown that in these organisms, the telomere-binding proteins are fast evolving (Teixeira and Gilson 2005). Therefore, it is possible that these yeasts are using an ancestral system of chromosome end protection where the ssDNA-binding proteins facilitate the folding of the 3′-overhangs into G-quadruplex-like structures. This yeast-capping mechanism likely arose de novo by convergent evolution.
Do Noncanonical Secondary Structures Have a Role in the Maintenance of Telomeres without Telomerase?
Once telomerase becomes completely dysfunctional, the gene encoding telomerase could be lost if telomeres are maintained by the ancestral alternative mechanism of homologous recombination. Apparently, this is what happened in the ancestor of Diptera about 260 Ma (Wiegmann et al. 2011). In the lower Diptera, Anopheles, Rhynchosciara, and Chironomus, long tandem repeats are present at chromosome ends, suggesting that telomere maintenance takes place by homologous recombination (Nielsen and Edstrom 1993; Biessmann et al. 1998; Madalena et al. 2010) (fig. 3). In Drosophila, however, telomere maintenance occurs primarily by transposition of telomere-specific retrotransposons to receding chromosome ends (fig. 3). In addition to retrotransposition, Drosophila telomeres are also maintained, as in any eukaryote, by recombination/gene conversion (Kahn et al. 2000).
In D. melanogaster, three telomeric retrotransposons TART, TAHRE, and HeT-A (a nonautonomous element derived from an ancestral TAHRE that loss its RT), transpose occasionally to chromosome ends using the free 3′-OH at chromosome termini to prime reverse transcription (Biessmann et al. 1990, 1992; Sheen and Levis 1994; Abad et al. 2004a, 2004b). In agreement with this mechanism, the telomeric elements appear randomly mixed in head-to-tail arrangements and variably truncated at the 5′-end (Mason et al. 2008; Villasante et al. 2008; Pardue and DeBaryshe 2011) (fig. 3). It is noteworthy that deletion of the RT coding region of the telomeric elements has occurred recurrently during Drosophila evolution, and multiple nonautonomous elements appear at the telomeres of the Drosophila species examined. As an example, up to four nonautonomous elements along with their corresponding autonomous elements have been found in D. mojavensis telomeres (Villasante et al. 2007). Interestingly, similar situations occur with group II introns where their RTs also act in trans to mobilize multiple deleted introns (Mohr et al. 2010).
Because Drosophila telomeres consist of retrotransposon arrays in constant flux, there is not a specific terminal sequence and their telomere-capping proteins (the “terminin” complex) have evolved to bind chromosome ends independently of the primary DNA sequence (Raffa et al. 2009, 2010). The “terminin” complex is functionally analogous to the “shelterin” complex (human telomere-capping proteins), but their components are not evolutionarily conserved (Palm and de Lange 2008; Raffa et al. 2009, 2010). Thus, Drosophila telomeres are made of rapidly evolving telomeric retrotransposons (Villasante et al. 2007) and telomere-capping proteins (Gao et al. 2010; Raffa et al. 2010). Moreover, as Verrochio is a telomere-capping protein with one OB-fold domain and all telomeric proteins containing OB folds are 3′-overhang binding proteins, Drosophila telomeres also seem to have single-strand overhangs (Raffa et al. 2010).
It is noteworthy that, despite the complexity of telomeric sequences in the genus Drosophila and Chironomus, the Drosophila telomeric retrotransposon arrays and the Chironomus telomeric complex repeats also have the telomeric G/C strand bias (Nielsen and Edstrom 1993; Danilevskaya et al. 1998). The conservation of this G/C strand bias may indicate that telomere capping depends on the formation of noncanonical structures based on guanine–guanine interactions. In agreement with this idea, it has been shown that the 3′-untranslated region of the abundant D. melanogaster telomeric element HeT-A contains sequences with propensity to form G-quadruplexes (Abad and Villasante 1999).
The structural and phylogenetic analyses of all Drosophila telomeric-specific retrotransposons show that they had a common ancestor and indicate that non-LTR retrotransposons have been recruited to perform the cellular function of telomere maintenance. Therefore, we propose that the recruitment of Drosophila telomeric elements may resemble the ancestral mechanism that led to the maintenance of the “proto-telomeres” of the first eukaryotic chromosomes.
On the other hand, it has been found that yeast cells lacking telomerase can survive telomere sequence loss through the formation of terminal blocks of heterochromatin. This happens by amplifying and rearranging either subtelomeric sequences in S. cerevisiae and S. pombe or rDNA sequences in S. pombe (Lundblad and Blackburn 1993; Jain et al. 2010). Significantly, the S. cerevisiae subtelomeric Y’ repeats also have purine/pyrimidine strand bias (Nickles and McEachern 2004), and the S. pombe end-protection protein POT1 (protection of telomeres 1) binds, in a nonsequence-specific manner, to the 3′-overhangs of G-rich rDNA (Jain et al. 2010). Interestingly, adaptive recombination-based mechanisms of telomere maintenance (called ALT for alternative lengthening of telomeres) also occur in tumor cells that lack telomerase (Bryan et al. 1995; Cesare and Reddel 2010).
To summarize, in species that have lost telomerase either during evolution (order Diptera) or through experimental manipulation, the data available suggest a role of structural DNA features in telomere maintenance, reveal the importance of telomeric heterochromatin (regardless of the underlying primary sequence) in the recruitment of end-binding proteins, and show how easily backup mechanisms may have been used to maintain telomeres during evolution.
Conclusions
We have discussed how genomes have exploited their noncoding structural potential to establish primordial innovations during eukaryogenesis. In particular, how the highly polymorphic secondary structures based on guanine–guanine interactions have evolved in concert with proteins to allow the origin and evolution of telomeres. In addition, we have hypothesized that the first linear eukaryotic chromosomes arose by the appearance of “proto-telomeres” at DSBs. This ancestral terminal structure had the dual function of end protection and segregation. Furthermore, we have discussed that the concomitant “proto-telomere” heterochromatin formation was fundamental for this key evolutionary innovation. Once again, the study of a noncanonical mechanism, like the maintenance of Drosophila telomeres, has generated new insights into the evolution of eukaryotes.
Acknowledgments
The authors thank Jim Mason for discussions on the origin of Drosophila telomeres, Maria J. Puertas for discussions on the heterochromatin of meiotic neocentromeres, and Douglas V. Laurents for revision of the manuscript. They are grateful to the reviewers for their valuable comments and suggestions. They apologize to those whose work was not cited in this review. This work was supported by the FPI fellowship BES-2009-027909 from the Ministerio de Ciencia e Innovación to M.G., by the Ministerio de Ciencia e Innovación (CTQ2010-21567-C02-02) to C.G., the Ministerio de Economía y Competitividad (BFU2011-30295-C02-01) to A.V., and by an institutional grant from the Fundación Ramón Areces to the Centro de Biología Molecular “Severo Ochoa.”
Literature Cited
Author notes
Associate editor: Bill Martin