The initiation of DNA replication is a very important and highly regulated step in the cell division cycle. It is of interest to compare different groups of eukaryotic organisms (a) to identify the essential molecular events that occur in all eukaryotes, (b) to start to identify higher-level regulatory mechanisms that are specific to particular groups and (c) to gain insights into the evolution of initiation mechanisms.
This review features a wide-ranging literature survey covering replication origins, origin recognition and usage, modification of origin usage (especially in response to plant hormones), assembly of the pre-replication complex, loading of the replisome, genomics, and the likely origin of these mechanisms and proteins in Archaea.
In all eukaryotes, chromatin is organized for DNA replication as multiple replicons. In each replicon, replication is initiated at an origin. With the exception of those in budding yeast, replication origins, including the only one to be isolated so far from a plant, do not appear to embody a specific sequence; rather, they are AT-rich, with short tracts of locally bent DNA. The proteins involved in initiation are remarkably similar across the range of eukaryotes. Nevertheless, their activity may be modified by plant-specific mechanisms, including regulation by plant hormones. The molecular features of initiation are seen in a much simpler form in the Archaea. In particular, where eukaryotes possess a number of closely related proteins that form ‘hetero-complexes’ (such as the origin recognition complex and the MCM complex), archaeans typically possess one type of protein (e.g. one MCM) that forms a homo-complex. This suggests that several eukaryotic initiation proteins have evolved from archaeal ancestors by gene duplication and divergence.
INTRODUCTION: THE INITIATOR–REPLICATOR MODEL
Observation of DNA replication in bacteria led to the development of the initiator–replicator model in which the initiator is a sequence of DNA that defines a starting point (origin) for replication and the replicator is a complex of proteins that mediate the various steps involved in replication. Electron microscopy revealed that the circular genomes of bacteria possess one origin of replication (one initiator) and we now have a clear and very detailed understanding of the role of the many different proteins involved in all steps from recognition of the origin to termination of replication. Description in detail of bacterial DNA replication lies outside the scope of this review except to note that the initiator–replicator model serves equally well as a basis for our understanding of the initiation of DNA replication in eukaryotes. In terms of basic biochemistry, DNA is DNA in whichever cell it is located and it is thus not surprising that proteins with functional homology to those in bacteria are present in eukaryotes. Thus, in respect of the main theme of this review, both bacteria and eukaryotes possess origin recognition proteins and proteins that prepare the template for the main phase of daughter-strand synthesis. However, in eukaryotes there are added complications. Firstly, the DNA molecules are much longer. This has led, in evolution, to the acquisition of multiple origins of replication. Secondly, the structure of eukaryotic chromatin, in which the DNA is extensively complexed with proteins, has implications for origin binding, the progress of the replication forks and also means that for complete chromosome replication, a whole further set of chromatin proteins must be synthesized. Thirdly, in addition to controls and checkpoints operating within individual cell cycles there are regulatory mechanisms that are related to the ‘life-style’ of eukaryotic organisms. The latter have, over the course of evolution, become increasingly complex as the regulation of the cell division cycle has been embedded in the development and on-going lives of the more complex multi-cellular eukaryotes. This is discussed briefly later in this review with specific focus on the effects of plant hormones on the initiation of replication.
ORGANIZATION OF PLANT DNA AS UNITS OF REPLICATION
Plants, in common with all eukaryotic organisms, organize their DNA for replication as multiple units known as replicons (reviewed by Bryant and Francis, 2008; Bryant, 2010). A replicon is defined as a tract of DNA replicated from one origin of replication (ori). Replication proceeds by the movement of the replication forks outwards from the origin in two directions (see Bryant et al., 2001). Fork movement stalls when the forks reach replication termini. In some replicons, termini may be specific sites (Hernández et al., 1988b) but this is probably not the general case (Bryant, 2010; Lee et al., 2010).
Measurement of ‘fork rates’, the rates at which daughter strands are synthesized (Francis et al., 1985), reveals two important features. Firstly, in plants, as in all eukaryotes, rates of DNA synthesis are very much lower than in bacteria (Van't Hof, 1988). This has been ascribed to the structure of chromatin; the replicative enzymes are dealing with DNA that is for the most part complexed with histones in the form of nucleosomes. Dissociation of DNA from the nucleosomes is transient, occurring just in front of the replication forks, probably one or two nucleosomes at a time (Bryant and Dunham, 1988); nucleosomes are then re-assembled behind the forks. The level of phosphorylation of the linker histone H1 is important here (Thiriet and Hayes, 2009). In Physarum polycephalum, phosphorylation leads to the transient loss of H1 from chromatin; nucleosome displacement can then occur (see also Kohn et al., 2008).
The second feature is that the replication of any individual replicon takes much less time than the whole period of genome replication (i.e. S-phase). Thus in pea (Pisum sativum) an individual replicon is replicated in approx. 2 h but the S-phase lasts for 8 h (Van't Hof and Bjerknes, 1981). Data of this type led to the concept of replicon ‘families’; each family has its own particular time within the S-phase during which it is active in replication. For example, the small genome of Arabidopsis thaliana has only two replicon families. One of these completes replication in the first 2 h of the S-phase. Initiation of replication in the second family is delayed until 35–40 min after the start of the S-phase; this family also takes about 2 h to complete replication (Van't Hof et al., 1978). Intriguingly, a more recent analysis of replication timing on chromosome 4 of A. thaliana also suggests that the replicons are organized into two groups (Lee et al., 2010).
In A. thaliana the two families ‘fire’ in the same order in all S-phases (Van't Hof et al., 1978). This implies that there is a specific temporal control to ensure that replicon families are replicated in a specific order, a concept that is supported by a recent analysis in unicellular eukaryotes (Raghuraman and Brewer, 2010). However, the general applicability of the idea has been challenged (see discussions in Bryant and Francis, 2008; Bryant, 2010). Nevertheless, Gilbert (2010) considers this to be an important general feature of the regulation of initiation in eukaryotic organisms.
CHARACTERIZATION OF REPLICATION ORIGINS
Visualization of replicating plant DNA by fibre autoradiography or by fluorescence labelling facilitates measurement of replicon lengths on individual DNA molecules (see Bryant et al., 2001). These measurements usually reveal an obvious mode of replicon length with some variation either side of the mode. Modal length varies between species as exemplified by pea (Pisum sativum), 54 kb (Van't Hof and Bjerknes, 1977), rye (Secale cereale), 60 kb (Francis and Bennett, 1982) and tobacco (Nicotiana tabacum), 30 kb (Quélo and Verbelen, 2004). However, although modal replicon lengths may be typical for a given species, there is also a great deal of flexibility in origin-to-origin spacing, examples of which are discussed below.
‘Seeing’ replication origins in fibre autoradiographs or in fluorescence micrographs is relatively straightforward, but isolating them from the bulk of the DNA is altogether more difficult. In unicellular eukaryotes such as budding yeast (Saccharomyces cerevisiae) and fission yeast (Schizosaccharomyces pombe) this problem has been partly solved in that origin sequences may be identified by their ability to confer autonomous replicative ability on plasmids which lack their own replication origin. In formal terms, these sequences are known as ars elements but they have been shown by several techniques to be bona fide sites of the initiation of chromosomal DNA replication (e.g. Brewer and Fangman, 1987; also see Bryant, 2010). In S. cerevisiae, the ars elements contain a conserved and essential 11-bp sequence (Marahrens and Stillman, 1992): A/TTTTTATG/ATTTA/T. However, there is not a similar tight sequence requirement in S. pombe (Chuang and Kelly, 1999; Bell, 2002): it is simply the presence of several short AT tracts which is important. Indeed, the general consensus is that eukaryotes, apart from budding yeast, do not exhibit strict sequence requirements for origin function (see Gilbert, 2010). However, in view of the low number of bona fide origins that have been isolated [even with the newer range of analytical techniques and bioinformatics tools available (Nieduszynski et al., 2007; Cotterill and Kearsey, 2009; Bryant, 2010; Gilbert, 2010)], this generalization may be premature.
To date the only higher plant DNA replication origin to be isolated and characterized is that from the non-transcribed spacer (NTS) between the repeated genes that code for rRNA. Origins occur in the NTSs of many eukaryotes and the ‘replication bubbles’ associated with initiation at these sites are visible in electron micrographs of replicating DNA (Van't Hof, 1988). In pea (Pisum sativum), these genes are highly repeated, with approx. 4000 copies per haploid genome. This degree of amplification is one of the factors that enabled Van't Hof's group (Van't Hof et al., 1987a, b; Hernández et al., 1988a) to localize the site of initiation of replication to a 1500-bp region within the NTS and to characterize that 1500-bp fragment. Two-dimensional gel electrophoresis confirmed that the initiation of strand separation occurs within this region, which contains a very AT-rich domain that includes four good matches to the S. cerevisiae ars core sequence (Hernández et al., 1988a; Van't Hof and Lamm, 1992). However, in very AT-rich DNA, the probability that close matches to this sequence occur by chance are very high. So, unless or until a specific role for these ars-like sequences can be demonstrated, we cannot state whether or not they form a functional part of the replication origin. Replicating DNA isolated in very early S-phase in synchronized pea root meristems is enriched for AT-rich DNA (Bryant, 1994), again including matches to the ars core sequence. However, it was not demonstrated that these sequences actually contained any replication origins and even if they did, the reservations about the possible function of the ars-like sequences are still applicable.
Because of the scarcity of information on plant DNA replication origins, it is necessary to consider briefly the origins of other eukaryotes (see the review by Bryant, 2010). In general the initiation of replication in Metazoa has much ‘looser’ sequence requirements than in budding yeast (DePamphilis, 1993; Bogan et al., 2000; Bell, 2002; Antequera, 2004; Schwob, 2004; Cvetic and Walter, 2005; Bryant, 2010). Thus amongst mammals from which over 20 different replication origins have been now characterized, no sequence common to them all has been identified. They too, however, share two overall general features: they are AT-rich and some contain tracts of bent or curved DNA; both these features facilitate strand separation (see Marilley et al., 2007). The origins of replication from which the chorion (egg-shell protein) genes in Drosophila are amplified are also AT-rich (Austin, 1999; Spradling, 1999). It is also quite common to find in mammalian replication origins that the AT-rich regions are located in the vicinity of CpG islands (Antequera, 2004; Paixao et al., 2004; Schwob, 2004) a feature that is also true of the origin from the NTS of the pea rRNA genes. Further, the region in which initiation actually occurs may be just a few hundred base pairs long, as in the origin associated with the mammalian lamin-B2 genes (Paixao et al., 2004) or may be spread over several kilobases with several possible initiation sites (Dijkwel et al., 2002; DePamphilis, 2003).
FLEXIBILITY OF ORIGIN USE AND THE EFFECTS OF PLANT HORMONES
Variation in origin-to-origin spacing and thus in replicon length may occur in plants in relation to different phases of development, in response to nutrients or hormones or as a response to experimental manipulation. The behaviour of the shoot meristem during the transition from the vegetative to the flowering state is a good example. In Sinapis alba, transition to flowering (following exposure to the floral stimulus) is associated with halving of the modal replicon length from 15 kb to 7·5 kb in the shoot meristem – i.e. twice as many origins are utilized. The S-phase is thus significantly shortened (Jacqmard and Houssa, 1988). Similar phenomena occur in the shoot meristems of Silene coeli-rosa and Pharbitis nil that have been induced to flower (Durdan et al., 1998).
Hormones can also affect the utilization of origins. Thus, in Sinapis, application of cytokinin to the shoot apex causes recruitment of extra origins in the absence of the floral stimulus (Houssa et al., 1990). This also occurs when cytokinin is applied to dividing cells in the vegetative shoot apex of a grass, Lolium temulentum and in the ovule of tomato (Solanum lycopersicum) (Houssa et al., 1994). The mechanism by which cytokinin does this is not clear. One could speculate that it causes up-regulation of (some of the) genes encoding initiation proteins but other possible effects certainly cannot be ruled out.
Further, hormones can also have a negative effect on origin recruitment. Thus, in the Sinapis alba shoot meristem, abscisic acid caused a halving of the number of active origins and therefore a doubling of replicon length from 15 kb to 30 kb (Jacqmard et al., 1995). Its action is thus directly opposite to that of cytokinin. Trigonelline was first identified as an inhibitor of the plant cell cycle over 25 years ago; cells treated with the inhibitor arrest in G2 (Evans and Tramontano, 1984). However, it also has an effect on DNA replication. Application of trigonelline to lettuce roots leads to a halving of the number of origins that are used: two out of every cluster of four remain inactive (Mazucca et al., 2000). It cannot be stated whether this is one of the reasons for the inhibition of the cell cycle or whether it is a symptom of an inhibition that is exerted at a more basal level of regulation.
One example of changes in the use of plant DNA replication origins that does not appear to involve hormones comes from rye–wheat hybrids. In diploid rye (Secale cereale) (2n = 2x = 14) the modal replicon length is, as noted earlier, 60 kb (Francis and Bennett, 1982); in bread wheat (Triticum aestivum) (2n = 6x = 42) it is 16·5 kb (Kidd et al., 1992). However, the allohexaploid hybrid triticale (2n rye × 4n wheat) exhibits a modal replicon length of 15 kb (Kidd et al., 1992) which is different from that of either parent. Origin spacing in both genomes is thus re-set in the hybrid by a mechanism that remains completely unknown (Bryant and Francis, 2008). Indeed, the question of which features of sequence and/or chromatin structure contribute to ‘strength’ or ‘weakness’ of origins remains unanswered (see below).
A very dramatic example in plants of recruitment of extra origins occurred when the DNA of synchronized pea (Pisum sativum) root meristems was cross-linked with psoralen (in an attempt to stall movement of the replication forks). Extra origins were recruited between the cross-links, dramatically shortening the origin-to-origin distance (Francis et al., 1985). These data led to the suggestion that eukaryotic DNA contains ‘strong’ and ‘weak’ origins (Francis et al., 1985) a suggestion that now meets with growing support. Observations of initiation of replication in S. pombe are consistent with this idea: clustered origins of replication demonstrate hierarchies of initiation frequencies (reviewed by Bryant, 2010). Indeed, Gilbert (2010) points out that in unicellular eukaryotes only approx. 50 % of potential origins may be used, while in mammals it may be as low as 10 %. He summarizes our current understanding in the following way: “As a fail-safe, pre-replication complexes (pre-RCs) are assembled at many more potential origins than are used in any given cell cycle. A subset of these pre-RCs is chosen for initiation by as yet poorly understood mechanisms and the rest serve as dormant origins: ‘back-ups’ used if the cell experiences problems completing replication.” It is noted here that cross-linking by psoralen certainly causes problems in the completion of replication and it is good to see a view that we expressed 25 years ago being strongly supported by recent data (see also Woodward et al., 2006).
Data from yeasts and Metazoa (reviewed by Bryant and Francis, 2008) and now from one plant, A. thaliana (Lee et al., 2010), show that features of chromatin related to transcription also have an effect on origin distribution and/or usage. Origins in transcriptionally active regions of chromatin tend to initiate earlier in the S-phase than origins in less-active regions (Gomez and Brockdorff, 2004).There is also a correlation between hierarchies of origin activation and transcription patterns in budding yeast (Donato et al., 2006). Associated with this is the finding that regions of chromatin in which genes are rather sparsely distributed also contain few origins. Consistent with these observations, histone acetylation, a feature of transcriptionally active chromatin, is associated with origin activity in mammalian cells (Vogelauer et al., 2002). Further, it has been shown very recently that on chromosome 4 of Arabidopsis thaliana, acetylation of histone H3 at Lys36 is associated with the early origins (Lee et al., 2010). In contrast to histone acetylation, DNA methylation reduces transcriptional activity and, in Xenopus, also appears to block the activation of origins (Harvey and Newport, 2003).
INITIATION OF REPLICATION AT THE REPLICATION ORIGINS
Four phases may be discerned in the overall process of initiation: recognition of the replication origins, assembly of the pre-replication complexes, activation of the replicative DNA helicase(s) and loading of the replicative enzymes. These processes involve an array of proteins that is much more extensive than was thought probable 20 years ago. Because of the complexities involved in the recruitment of some of these proteins, it is often convenient to break down these four main phases into sub-phases (as in a recent review by Bryant, 2010). Our knowledge of the proteins has largely been built up from work with budding yeast, fission yeast and Metazoa; the amount of work on plant systems has been relatively small. Only recently has a more complete picture started to emerge from data obtained with plants (Shultz et al., 2007; see also the review by Bryant, 2010). It is clear that DNA replication starts in essentially the same way in all these eukaryotes with only minor variations in the complement of proteins involved. These proteins and their functioning in plants have been recently reviewed in detail (Shultz et al., 2007; Bryant, 2010) and therefore here we present a summary of the key points.
Origin recognition is, not surprisingly, mediated by the origin recognition complex or ORC (Bell and Stillman, 1992). In plants as in most other eukaryotes, there are six proteins in the complex (Gavin et al., 1995; Witmer et al., 2003; Collinge et al., 2004; Mori et al., 2005; Shultz et al., 2007). There is currently no information on the features of DNA required for binding of replication origins by plant ORCs. However, based on the general pattern exhibited in most other eukaryotes (budding yeast being an exception because it shows a strict sequence requirement) it is expected that the key features will be AT-richness, possibly including several oligo-A tracts (which are associated with DNA curvature) and possibly also including strand asymmetry (see Bryant, 2010).
Formation of the pre-replication complex (pre-RC) starts with the recruitment to the origin of the ORC-like CDC6 protein. This in turn recruits CDT1 (Nishitani et al., 2000) together with MCMs2-7. The ORC and CDC6 then act together to hydrolyse ATP (Randell et al., 2006) which enables at least two MCM2-7 ring-shaped complexes to bind around the DNA (Waga and Zembutsu, 2006). At the same time, CDT1 is thought to be released from the origin, although there is some disagreement about this (see Bryant and Francis, 2008). The array of proteins consisting of the ORC, CDC6 and MCM2-7 is known as the ‘pre-replication complex’ or ‘pre-RC’ and these form prior to the S-phase: in late mitosis or G1 phase of the cell cycle. It should be noted that there is an excess of MCM2-7 over the other proteins in the pre-RC and, indeed, several MCM2-7 hexamers may be loaded on to the DNA at or in the vicinity of single ORC; it is possible that under some circumstances these act as full pre-RCs (Woodward et al., 2006; Ibarra et al., 2008; cf. Francis et al., 1985) while in other circumstances some of the pre-RCs remain inactive (as discussed by Gilbert, 2010)
Activation of MCM2-7 is triggered by two cell cycle-regulated protein kinases, CDC7-DBF4 (DBF4-dependent kinase or DDK) and G1/S-phase cyclin-dependent kinase (CDK), leading to the displacement and destabilization of CDC6 and recruitment of another protein, MCM10. The specific function of MCM10 in plants has not been established, although its gene is certainly present (Schultz et al., 2007). In budding yeast and fission yeast it is essential for DNA replication and appears to have several functions (reviewed by Moore and Aves, 2008). Firstly, it stimulates the phosphorylation of MCM2-7 by CDC7-DBF4. It then participates, with MCMs2-7, in recruiting the DNA polymerase-α loading factor CDC45, after which DNA polymerase-α-primase is loaded at the origin (for details, see reviews by Schultz et al., 2007; Bryant, 2010). Finally, MCM10 remains associated with the replisome during the S-phase (Gambus et al., 2006). CDK acts, at least in yeasts, by phosphorylating the SLD2 and SLD3 proteins, causing them to form a complex with DPB11 (TopBP1). DPB11–SLD2–SLD3 then associates with origins and is required for CDC45 and DNA polymerase loading but, unlike MCM10 and CDC45, does not progress with the replisome (Aves, 2009).
Also essential for melting of the DNA double helix and initiation of replication are the four proteins of the GINS complex (PSF1-3 and SLD5; see Shultz et al., 2007), although their specific roles remain to be confirmed. The CMG complex of CDC45, MCM2-7 hexamer and the GINS proteins functions as a helicase (Moyer et al., 2006; Pacek et al., 2006) to separate the two DNA strands and form a replication ‘bubble’ (which may be up to several hundred base pairs in overall length: Bryant, 2008), away from which the two replication forks move. Replication is underway; other proteins required for bulk DNA synthesis and interaction with chromatin can now be recruited to form the full replisome complex at each replication fork (Ricke and Bielinsky, 2004; Gambus et al., 2006; Walter and Araki, 2006; Chilkova et al., 2007; Bryant, 2008). The overall process is shown in Fig. 1.
Although the complete picture has been built up mainly from work with the yeasts, genes encoding virtually all the relevant proteins are known from Arabidopsis and rice (Oryza sativa) and in some instances from several other species too (reviewed by Shultz et al., 2007). We do not have detailed information of the protein–protein interactions nor of the specific steps involved in building the pre-replication and initiation complex in plants but, given the conservation of the essential coding sequences, there is no reason to suppose that these processes in plants differ significantly from what is described here. Further, that view is confirmed by those events for which we do have specific information. Where plants differ from other organisms is not in the essential biochemistry but in the plant-specific aspects of regulation (Dambrauskas et al., 2003; Francis, 2007; Menges and Murray, 2008).
In our discussion of the initiation of DNA replication the main players become clearly recognizable: ORC proteins, CDC6 and CDT1 (instrumental in loading of the helicase), MCM2-7 proteins (the helicase core hexamer), CDC45 and the GINS proteins (essential helicase accessory factors), MCM10, and the DPB11–SLD2–SLD3 proteins. With the possible exceptions of SLD2 and SLD3, which are poorly conserved at sequence level (Fu and Walter, 2010), these main players are identifiable across the range of eukaryotes, raising the question of when, during the course of evolution, they appeared.
To consider this question we go back to the origin of the eukaryotes. Life on earth, according to current estimates, began at least 3·5 billion (3·5 × 109) years ago. Although early evolutionary relationships are the subject of debate it is probable that, relatively early in the history of life, the prokaryotic lineage split to give rise to the Bacteria and the Archaea (Dagan et al., 2010). These lineages of prokaryotes were the only life-forms for at least 1·6 billion years. The cyanobacteria diverged from the main bacterial lineage about 2·8 billion years ago and their photosynthetic activity led to the great oxidation event (approx. 2·4 billion years ago). The scene is thus set for the emergence of the eukaryotes. The timing of this event is strongly disputed. At one end of the scale there are those who hold to a ‘date’ of 800–900 million years before present, basing their view on the ‘snowball earth’ hypothesis. At the other end of the scale, many paleobiologists, using mainly ‘molecular clock’ data, place the origin of eukaryotes at 1·9–2·0 billion years before present.
For the present discussion, the timing does not actually matter. What does matter is that the evidence points very strongly to an endosymbiont origin for the eukaryotes (Margulis, 1981), in which one cell was engulfed by another, the engulfed cell eventually giving rise to mitochondria. But what types of prokaryotic cell were involved in this engulfment? Genomics, proteomics and a number of cellular features point to the host cell being an archaean while the engulfed cell was a bacterium (see Gross and Battacharya, 2010). This is examined in more detail immediately below. In the meantime it is noted that eukaryotes relatively rapidly split into about six ‘supergroups’, some of which were unikonts (with one flagellum; the Greek word kontos actually means barge-pole or punt-pole and is the origin of the English word quant), which gave rise to animals and fungi and others bikonts (with two flagella); it was amongst the latter that a second engulfment, this time of a cyanobacterium, gave rise to photosynthetic eukaryotes and eventually to chloroplasts as we know them today.
So, if the host cell in the engulfment event leading to the formation of the primal eukaryote was an archaean, do extant archaeans exhibit any eukaryotic features? In respect of DNA replication the question can be answered very positively. Firstly it is noted that archaeal DNA is packaged in the form of nucleosomes in which 80 bp of DNA are complexed with a tetramer of histones (Sandman et al., 2001). DNA compaction is thus achieved in a similar way to that observed in eukaryotes and has similar implications for the access of enzymes to the DNA. Secondly there are some members of the Archaea that possess more than one (typically three) origins of replication in their single chromosome (Lundgren and Bernander, 2005; Robinson and Bell, 2005). Origins are very AT-rich (up to 80 %), including oligo-A and oligo-AT tracts. They possess several ‘origin boxes’ in which the actual sequence is important, including not only the oligo-A and oligo-AT tracts but also individual G residues (Gaudier et al., 2007; Majernik and Chong, 2008). Origin box sequences may vary between species such that a box from one species is not recognized by the relevant protein from another species (Majernik and Chong, 2008).
The proteins themselves represent the essential core compared with the eukaryotic plenitude. It has already been noted that in eukaryotes, CDC6 has sequence similarities to ORC1. In the Archaea, although the number of ORC proteins varies widely, in many species the roles of ORC1-6 and CDC6 are performed by just a single protein. At the origins, the ORC/CDC6 binds to the origin boxes (Lundgren and Bernander, 2005; Gaudier et al., 2007); as with many eukaryotic ORCs, a C-terminal winged helix and an N-terminal ‘triple-A’ ATPase motif are features of the protein structure important for binding to DNA (Gaudier et al., 2007).
The presence of ORC/CDC6 at the origin permits the binding of the MCM helicase (De Felice et al., 2004) (just as ORC1-6 and CDC6 facilitate the binding of MCM2-7 in eukaryotes). However, in the Archaea, the MCM complex consists not of six different MCMs but most commonly of a hexamer of one type of MCM protein (Sakakibara et al., 2009). Over the course of evolution of eukaryotes, six copies of one protein have been replaced by one copy of each of six proteins. Likewise, in eukaryotes the four GINS proteins are each essential helicase accessory factors, whereas many Archaea appear to have just a single GINS protein (Yoshimochi et al., 2008).
The picture that emerges from these observations is that individual proteins, e.g. ORC/CDC6, MCM and GINS, in the Archaea are now represented by multiple versions in eukaryotes. The most obvious mechanism for this is gene duplication and subsequent divergence. Data which are clearly consistent with this idea have been obtained in a recent detailed bioinformatic study of MCM proteins across a range of eukaryotes (Liu et al., 2009). It is suggested that MCMs 2–9 arose by seven gene duplication events and that, because they are known in all the principal groups of extant eukaryotes, these gene duplications occurred before the last common ancestor of the eukaryotes. Further, in respect to MCM2-7, they all remain essential for replication presumably because, as sequences have diverged, so have specific functions, albeit in a very subtle way (see Liu et al., 2009).
The situation with GINS and the ORC proteins is very similar. Eukaryotes in all six supergroups possess four GINS proteins, again consistent with an early set of gene duplication events, as with the MCM proteins. Sequence data for ORC1-5 and for CDC6 are mostly consistent with an early set of gene duplication events, although members of the supergroup Excavata lack a separate CDC6, which could be explained by divergence of this lineage prior to the ORC1/CDC6 gene duplication. Furthermore, within the Excavata, the Trypanosoma each have only a single ORC/CDC6 protein (Godoy et al., 2009) despite having the full set of MCM2-7 and GINS proteins; whether this single ORC/CDC6 gene represents a primitive state as in the Archaea, or gene loss in this lineage, remains to be determined.
So eukaryotic replication initiation proteins are both more complex than in Archaea and are highly conserved. The key origin binding and helicase functions are represented by families of paralogous proteins (ORC/CDC6, MCM, GINS) which arose early in eukaryotic evolution by gene duplication and divergence events from their lone archaeal ancestors. In addition, eukaryotes possess other initiation proteins which appear to have no archaeal homologues: CDT1, MCM10, DPB11, SLD2 and SLD3. These mostly have regulatory roles which reflect the need to ensure that the initiation of DNA replication at origins is tightly coupled to the complex cell cycle in eukaryotes. The high level of conservation of these DNA replication initiation proteins suggests that similar core regulatory mechanisms operate in plants as in animals and yeasts. Specific plant regulatory pathways are likely to feed into the core process either at the cell cycle level via regulation of plant CDK or CDC7-DBF4 kinases (Francis, 2007; Menges and Murray, 2008), or via specific plant factors which interact directly with replication initiation proteins. One such candidate could be GEM, identified as a CDT1-interacting protein and which is implicated in controlling decisions between cell proliferation and cell fate; in several respects it is reminiscent of the well studied but unrelated animal-specific CDT1 inhibitor geminin (Caro and Gutierrez, 2007; Caro et al., 2007). Finally, angiosperms possess paralogues of many core replication initiation proteins that are normally unique in other eukaryotes (e.g. Arabidopsis has two of each of ORC1, CDC6, CDT1, TOPBP1 and GINS protein PSF3; Schultz et al., 2007). It is possible that these duplications may also have regulatory implications.
The main features of the initiation of DNA replication are very similar across the range of eukaryotic organisms, including plants. These features include organization of chromatin as multiple replicons, assembly of pre-replication complexes (pre-RCs) at replication origins, cell cycle kinase activation of pre-RCs, and assembly of the replisomes. Origins are bound by origin recognition complex ORC1-6 proteins and these form the platform for loading of MCM2-7 heterohexamers by CDC6 and CDT1 to form the pre-RC. Cell cycle kinases and the DPB11, SLD2, SLD3, MCM10, CDC45 and GINS proteins activate pre-RCs leading to origin ‘melting’ and assembly of two replisomes per activated origin. Differences between different eukaryotic groups lie in the different higher-level regulatory mechanisms that are related to the life-style of the organisms in question. Thus, for higher plants there is clear evidence that the hormones involved in the regulation of many facets of growth and development act on, amongst many other things, the initiation of DNA replication. This is seen, for example, in the recruitment of replication origins as a means of controlling the rate of DNA replication and thus the length of the S-phase.
The universal occurrence amongst eukaryotes of the same organizational features and the same proteins for initiating DNA replication raises questions about their evolutionary origins. Both the organizational features and the proteins are reflected in the Archaea. Thus, for example, archaeal chromatin is organized as nucleosomes (albeit with only 80 bp of DNA per nucleosome); some archaeans have more than one replication origin. The key difference is that, whereas eukaryotes have an array of initiation proteins, archaeans typically have very few. So, although some archaeal lineages have paralogues, typically there is only one ORC/CDC6 protein in contrast to the seven in most eukaryotes, one type of MCM protein which functions as a homo-hexamer in contrast to the hetero-hexamer in eukaryotes, and one GINS protein rather than four. This suggests that the evolution of this array of eukaryotic proteins has arisen by gene duplication and divergence events, a suggestion that is well supported by study of both the MCM and the ORC–CDC6 groups of proteins. Further, other eukaryotic replication initiation proteins such as CDT1, MCM10, DPB11(TOPBP1), SLD2 and SLD3 have no obvious archaeal homologues. Because of the universal occurrence of the protein plenitude among eukaryotes, it appears that the events giving rise to this occurred before the divergence of the eukaryotic lineage into the six supergroups that are recognized today.
We thank all our postdoctoral and postgraduate researchers who have worked with us on the initiation of DNA replication, especially Sara Burton, Gerardas Dambrauskas, Nanette Davies, Elizabeth Hart, Yuan Liu and Karen Moore. We also acknowledge with thanks the contributions of our collaborators in this research area, Dennis Francis, Tom Richards, Hilary Rogers and Jack Van't Hof.