CryoEM structures of human CMG–ATPγS–DNA and CMG–AND-1 complexes

Abstract DNA unwinding in eukaryotic replication is performed by the Cdc45–MCM–GINS (CMG) helicase. Although the CMG architecture has been elucidated, its mechanism of DNA unwinding and replisome interactions remain poorly understood. Here we report the cryoEM structure at 3.3 Å of human CMG bound to fork DNA and the ATP-analogue ATPγS. Eleven nucleotides of single-stranded (ss) DNA are bound within the C-tier of MCM2–7 AAA+ ATPase domains. All MCM subunits contact DNA, from MCM2 at the 5′-end to MCM5 at the 3′-end of the DNA spiral, but only MCM6, 4, 7 and 3 make a full set of interactions. DNA binding correlates with nucleotide occupancy: five MCM subunits are bound to either ATPγS or ADP, whereas the apo MCM2-5 interface remains open. We further report the cryoEM structure of human CMG bound to the replisome hub AND-1 (CMGA). The AND-1 trimer uses one β-propeller domain of its trimerisation region to dock onto the side of the helicase assembly formed by Cdc45 and GINS. In the resulting CMGA architecture, the AND-1 trimer is closely positioned to the fork DNA while its CIP (Ctf4-interacting peptide)-binding helical domains remain available to recruit partner proteins.


INTRODUCTION
Accurate and faithful duplication of our chromosomal DNA in preparation for mitosis is essential for cellular life (1). DNA synthesis in S-phase is a highly complex biochemical process carried out by the replisome, a large and dynamic multi-protein assembly of about thirty core components (2). The replisome contains all necessary enzymatic activities for copying the genetic information encoded in the parental DNA, as well as non-enzymatic factors that guarantee efficient DNA synthesis under normal conditions and during replicative stress.
Central to the replication process is the physical separation of the parental strands of DNA, to allow the templated polymerisation of new leading and lagging strands, according to the semi-discontinuous model of DNA replication. Replicative DNA helicases form hexameric rings that thread single-stranded DNA through their ring channel and achieve unwinding of double-stranded (ds) DNA by a process of strand exclusion (3,4). Each helicase subunit consists of an N-terminal domain and a C-terminal ATPase domain, which form a double stack of N-tier and C-tier rings. ATP binding takes place at the subunit interface and its hydrolysis requires residues from both subunits. Processive strand separation results from the allosteric coupling of ATP hydrolysis to concerted movements of the DNAbinding elements that line the ring pore in each subunit, as first shown for the replicative viral E1 DNA helicase (5) and the hexameric Rho RNA helicase (6).
Unwinding of parental DNA in eukaryotic cells is performed by the 11-subunit Cdc45-MCM-GINS assembly or CMG (7). The six MCM proteins, MCM2-7, belong to the AAA+ family of ATPases and form a hetero-hexameric ring that translocates on single-strand DNA (8). In the absence of DNA substrate, MCM2-7 adopts predominantly an open spiral conformation; the co-factors Cdc45 and GINS--a hetero-tetramer of Psf1-3 and Sld5--bind to the MCM5-2 interface and lock MCM2-7 into the closed-ring conformation required for robust DNA unwinding (9). In vivo, a multi-step process operates to assemble and activate the eukaryotic CMG DNA helicase at the start of Sphase. In the model system budding yeast, CMG activation involves loading of the MCM2-7 proteins at origin DNA as an inactive double hexamer, which is then activated by phosphorylation-dependent recruitment of the Cdc45 and GINS cofactors, intervention of MCM10, and ATP hydrolysis (10). Activation yields two CMG assemblies that segregate on opposite template DNA strands and move past each other to establish two independent replication forks (11). The translocating CMG tracks along the leading-strand template in the 3 -to-5 direction (12), with the N-tier ring of MCM2-7 at the leading edge of the advancing helicase (11). Strand separation is proposed to be achieved by a modified version of steric exclusion, whereby the lagging strand penetrates the N-tier of the CMG before separation (13).
The mechanism of translocation by which the CMG couples ATP hydrolysis to processive DNA unwinding is the current focus of intense research efforts. Based on structural analysis of bacteriophage, viral and bacterial systems (5,(14)(15)(16) a consensus has emerged for a sequential rotary mechanism of DNA unwinding by replicative DNA helicases. In this mechanism, ATP is sequentially hydrolysed by successive ring subunits so that each ring position cycles through ATP, ADP and apo states. In turn, the ATP state determines allosterically the position of the DNA-binding loops, that adopt a staircase arrangement matching the DNA spiral bound within the ring pore. The sequential hydrolysis of ATP around the ring causes the coordinated motion of the DNA-binding loops, resulting in translocation of the DNA substrate through the ring.
A complicating feature when trying to analyse CMG translocation is that, unlike the homo-hexameric helicases of simpler organisms, the MCM2-7 motor of the CMG is a hetero-hexamer of six related but distinct subunits (17). Indeed, biochemical measures of DNA unwinding by purified fly CMG showed that ATP binding and hydrolysis are not equally important at all MCM ring interfaces (18,19). Furthermore, biological evidence in yeast shows that the importance of DNA binding is different among MCM subunits (20,21). Recent cryoEM analyses of yeast CMG have led to the proposal of alternative translocation mechanisms, based on 'pumpjack' or 'inchworm' movements of the N-and Ctier of the MCM ring (22,23). A recent structural study of the fly CMG in conditions of DNA-fork unwinding (19) imaged four distinct states of the helicase; the states formed the basis for an asymmetric model of DNA unwinding that accounted for the different roles of the MCM2-7 subunits in translocation.
The critical insights provided by these initial landmark studies have not been sufficient to settle the important issue of the mechanism of DNA translocation by the CMG, and therefore further structural investigations are needed. It is especially important to obtain high-resolution cryoEM maps that will allow the determination of accurate atomic models of the helicase bound to fork DNA substrates, to elucidate unambiguously key aspects of the mechanism of translocation on DNA such as the protein-DNA interface and the geometry of the ATP-binding sites. Equally important is to obtain high-resolution information on the interactions of the CMG with other core replisome components. Furthermore, published structural analyses focused on CMGs from simpler model systems such as yeast or Drosophila, and no structural evidence is currently available for vertebrate CMG.
Here, we report the cryoEM structure at 3.3Å of human CMG bound to a fork DNA substrate in the presence of ATP␥ S. We also present the cryoEM structure of human CMG bound to AND-1, a core replisome component that acts as a platform for recruitment of replisome components to the replication fork. Unique features captured in our structures provide insights into DNA translocation and formation of larger replisome assemblies by the human CMG helicase.
ORFs were codon optimised for overexpression in human cells and designed with flanking restriction sites for insertion into the ACEMam1 and 2 vectors of the Mul-tiMam transient system (24). MCM4 was encoded with an N-terminal His 8 -TEV tag, while Psf2 was encoded with a C-terminal TEV-2xStrepII tag. Multi-cassette constructs encoding MCM4-6-7, MCM2-3-5 and Cdc45-Psf1-Psf2-Psf3-Sld5 were generated making use of the I-CeuI/BstXI sites in the MultiMam vectors. For expression of human CMG-AND-1 (CMGA), the full-length ORF of human AND-1 (IMAGE cDNA clone 6514641) was cloned with an N-terminal 2xStrepII-TEV tag into the ACEMam2 construct expressing MCM2-3-5, while the C-terminal Strep tag fused to the Psf2 gene was removed.
For each flask, a separate transfection mixture was prepared as follows: a total of 480 g recombinant plasmid DNA encoding CMG or CMGA was added to 40 ml fresh media. Equimolar ratios of the three relevant recombinant plasmids were used. 960 g PEI (Polyethylenimine, linear, MW∼25 000; Polysciences Inc, #23966) was added from a sterile 1 mg/ml stock prepared in 50 mM HEPES-KOH pH 7.5. The resulting transfection mixture was vortexed vigorously for 10 s, then incubated at room temperature for 15 min before being decanted into the cell culture.
Three hours post-transfection, 4 mM valproic acid (Sigma, #P4543) was added to the cultures, to enhance protein expression. Cultures were returned to shaking incubation for four days, before being harvested by centrifugation at 500 × g for 10 min at 10 • C. Cell pellets from ∼1.2 L cell culture were subsequently resuspended in 40 ml of chilled, sterile PBS supplemented with SIGMAFAST EDTA-free protease inhibitor cocktail (Sigma, #S8830). Washed cells were again harvested by centrifugation, then snap-frozen in 50 ml centrifuge tubes using liquid nitrogen, and stored at −80 • C.
Cell lysate was clarified by centrifugation at 45 000 × g for 1 hour at 4 • C, then filtered through a 5 m Acrodisc syringe filter (PALL, #4650), a 0.45 m HV Durapore vacuum filter (Millipore, #SCHVU01RE) and finally a 0.2 m Minisart syringe filter (Sartorius, #16534-K). The filtered sample was applied to a 5-ml HisTrap HP column (GE Healthcare, #17-5248-02), pre-equilibrated with Buffer N and 40 mM imidazole using a peristaltic pump. All subsequent chromatography steps were performed on anÄKTA Purifier (GE Healthcare). The loaded column was washed twice with 5 column volumes of Buffer N with 40 mM imidazole and bound protein was eluted in reverse flow using Buffer N with 300 mM imidazole. 2 ml fractions corresponding to ∼10 ml of the elution peak were pooled and applied to a 1-ml StrepTrap column (GE Healthcare, #28-9075-46) pre-equilibrated with Buffer N with 2 mM DTT. After sample loading, the column was washed with 10 column volumes of Buffer N with 2mM DTT and 4 column volumes of Buffer S (25 mM HEPES pH 7.5, 50 mM potassium chloride, 5 mM magnesium acetate, 2 mM DTT). The CMG was eluted in reverse flow using Buffer S with 15 mM D-Desthiobiotin (Sigma, #D1411); 0.5 ml fractions corresponding to the ∼1.5 ml elution peak were analysed by SDS-PAGE and stored at 4 • C.
To purify the CMG-ATP␥ S-DNA complex, the CMG purification protocol was followed until the StrepTrap column loading step. After sample loading, the column was washed for 5 column volumes using Buffer N with 2 mM DTT, followed by 5 column volumes of Buffer S. To form the CMG-ATP␥ S-DNA complex, ∼900 l of 4 M forked DNA duplex and 100 M ATP␥ S (Jena Bioscience, #NU-406-50) in Buffer S was applied to the column at 0.05 ml/min. The column was first washed in reverse flow for 4 column volumes with Buffer S and 100 M ATP␥ S, and the CMG-ATP␥ S-DNA complex was eluted in reverse flow using Buffer S, 100 M ATP␥ S and 15 mM D-Desthiobiotin. 0.5 ml fractions corresponding to the ∼1.5 ml elution peak were analysed by SDS-PAGE and stored overnight at 4 • C. The peak fraction was used for grid preparation.
To purify the CMGA complex, the CMG purification protocol was followed using cell cultures that had been transfected with the ACEMam2 construct expressing AND-1 and MCM2-3-5.

Preparation of fork DNA
A fork DNA substrate was generated using two 70 bp oligonucleotides, based on a design by Petojevic et al. (25): Oligonucleotides were supplied PAGE-purified by IDT and resuspended to 100 M in TE buffer pH 8 (Invitrogen, #AM9849). Equimolar amounts were mixed in 200 l aliquots at 40 M in TE supplemented with 50 mM NaCl, prior to annealing. The resulting fork DNA consisted of a 40 bp duplex region and a 30 nt fork comprising a 3 poly-dT tail for CMG loading in the correct orientation for unwinding and a 5 GC-rich tail that does not bind CMG (25).

CryoEM sample preparation and data collection
UltrAuFoil R1.2/R1.3 300 mesh gold grids (Quantifoil Micro Tools GmbH, #N1-A14nAu30-01) were glowdischarged twice (once on each side) for 1 min using a PELCO easiGlow system (0.4 mbar, 30 mA, negative polarity). Grid samples were then prepared using a Vitrobot Mark IV robot (FEI), set to 100% humidity, 4 • C, 2.5 s blot time and -10 blot force. 3 l of CMG-ATP␥ S-DNA complex was applied to both sides of the grid prior to vitrification in liquid ethane.
CryoEM grids of CMGA complex were prepared as above, with two variations. Firstly, ATP␥ S was added to the protein sample at a final concentration of 1 mM approximately 1 hour before grid preparation. Secondly, a 5% glutaraldehyde cross-linking solution was added in 1:10 volume ratio to the protein sample ∼10 min before grid preparation. The protein sample was incubated on ice at all times.
Grid samples were initially screened on a Talos Arctica (FEI) operating at 200 keV. High-resolution data were subsequently acquired for a single grid on a Titan Krios (FEI) operating at 300 keV. Automated data collection for Single Particle Analysis (SPA) was performed using the EPU package (FEI). Grid preparation, screening and data collection were performed at the CryoEM facility in the Department of Biochemistry.

CryoEM data processing
Statistics for data collection and processing are reported in Supplementary Table S1.
CMG-ATPγ S-DNA. Data processing was performed on the Cambridge Service for Data-Driven Discovery (CSD3) high-performance computer cluster, using RELION-3 (26). Motion correction for all 3694 movies was performed using 5 × 5 patch alignment in MotionCor2 (27) with 'InFm-Motion' activated to take into account frame motion blurring. CTF correction was performed using GCTF (28) against non-dose weighted averages, with 'equiphase averaging' activated. Micrographs yielding resolution limit estimates of >6Å were discarded. Laplacian-of-Gaussian-(LoG-) based auto-picking with a default threshold of -0.1 was used to pick a total of 845,596 particles across 3619 micrographs. This particle set was used for all downstream processing without the use of template-driven picking procedures. Particles were initially extracted 4×binned with a box size of 90 pixels (4.28Å/pixel). For iterative rounds of 2D classification, the option to 'Ignore CTFs until first peak' was activated. This yielded a final set of 365,202 highresolution CMG particles, which was then subjected to 3D classification using an internally generated initial model. Particles from three out of four resulting classes were pooled (297,183 total), re-extracted without binning (360 pixels, 1.07Å/pixel) and refined against a suitably re-scaled initial model. Subsequent soft mask generation and re-refinement with solvent flattened FSCs yielded a first high-resolution map of CMG at 3.50Å, with clear density for ssDNA in the pore. CTF refinement, Bayesian particle polishing and additional masked refinement of the polished particles using solvent flattened FSCs improved the resolution to 3.24 A.
Inspection of the polished map identified some unresolved heterogeneity in the structure, particularly in the Ctier ATPase domains of the MCM ring. The unbinned polished particle set was therefore subjected to a further round of 3D classification (3 classes) without alignments. The majority of the particles (213,527) formed a single DNAbound CMG class, which was used for further processing. A second DNA-bound CMG class (30,696 particles) was deemed poor quality and discarded, while a third CMG class (52,960 particles) did not have bound DNA.
The high-resolution DNA-bound CMG class was subjected to refinement with solvent flattened FSCs, yielding a final resolution of 3.29Å and providing clearer definition of the C-terminal ATPase domains. This model was used for the majority of model building. However, ATPase domains for MCM2 and 5 still exhibited some heterogeneity. To resolve this, all particles were first shifted to the centre of mass of the MCM2-7 ATPase ring and density outside this ring was subtracted from the shifted particles, facilitating focussed 3D classification (without alignment) of the MCM2-7 C-tier. The majority of particles (181,401, 85%) populated the first of two classes. This presented improved definition for the ATPase domains of MCM2 and 5, and also revealed good density for the C-terminal domains of MCM2 and 6. Masked, solvent-flattened refinement of this class resulted in a C-tier map extending to 3.41Å that was used to complete model building.
CMGA. Data processing was performed using Warp (29) for motion correction, CTF estimation, automated particle picking and particle extraction, and cryoSPARC (30) for all subsequent steps. A total of 140,128 particles were automatically picked and extracted from 1844 micrographs. An initial round of 2D classification identified 6 classes that presented sensible densities; contributing particles were used to generate two distinct initial models that were characterised as CMG (18,439 particles) and AND-1 (9600 particles). The CMG initial model exhibited weak additional density adjacent to the GINS/Cdc45 subunits, suggesting a mixed population of CMG and CMGA in the data. Accordingly, the CMG initial model and particle set were subjected to hetero-refinement for 2 classes, resulting in a CMG-only particle set (7860 particles) and a CMGA particle set (10,579 particles). The latter was again subjected to hetero-refinement for two classes to remove any remaining CMG-only particles, resulting in an improved CMGA map derived from 8888 particles.
The highest quality CMG, CMGA and AND-1 models derived from the initial 2D classification were then used to drive iterative rounds of hetero-refinement against the entire 140,128-particle dataset. Once suspected AND-1 particles had been eliminated, CMG particles were gradually removed from the CMGA dataset until a clean set of 15,393 particles produced a map for CMGA with continuous density for the trimeric AND-1 SepB domain. This set was finally subjected to non-uniform refinement to deliver a final map with a resolution of 6.77Å.

Model building and refinement
The crystal structures of human GINS (31) (PDB ID: 2E9X) and Cdc45 (32) (5DGO) were docked into the 3.29Å CMG-DNA map using UCSF Chimera (33), and manually edited in Coot (34). Homology models for full-length human MCM2-7 were generated using PHYRE2 (35), based on the cryoEM structure of the yeast MCM2-7 double hexamer (3JA8) (36) and docked into the map. Both N-terminal and C-terminal domains of the MCM homology models required extensive rebuilding, which was performed with a combination of remodelling using the Namdinator server (37) and manual rebuilding. Single-stranded DNA was built in Coot (34). Models for the ATPase domains of MCM2 and 5, and for the C-terminal domains of MCM2 and 6, were completed using the focussed C-terminal map. The complete CMG-ATP␥ S-DNA model was subsequently refined using phenix.real space refine (38) with bond-length and angle restraints for bound ATP␥ S, magnesium and zinc ions.
For the CMGA structure, the CMG structure and the crystal structure of the AND-1 trimer (5OGS) (32) were docked in the map and subjected to rigid-body refinement in phenix.real space refine (38). The N-tier ring of MCM proteins was treated as a single rigid body, whereas individual ATPase domains in the C-tier were allowed to move independently.
Statistics for real space refinement are reported in Supplementary Table S2. Figures were prepared using Chimera UCSF (33) and ChimeraX (39).

Expression and purification of human CMG-ATP␥S-DNA
To maximise our chances of producing correctly-assembled human CMG, we used transient transfection of serum-free suspension HEK293 cells with a plasmid system encoding all 11 subunits of the CMG assembly. After co-expression of MCM2-7, Cdc45 and GINS, human CMG was purified by Ni 2 -and Streptactin-affinity chromatography (Supplementary Figure S1A). A large endogenous protein that co-purified at sub-stoichiometric levels with the CMG over the two-step purification was identified as AND-1, a known replisome component. AND-1 co-purification indicates a tight constitutive association with the CMG in the human replisome, in agreement with the known association of AND-1's orthologue Ctf4 with the yeast CMG (40).
To capture a high-resolution snapshot of human CMG poised to translocate on a fork DNA substrate, we decided to use the ATP analogue ATP␥ S. Streptactin-bound CMG was incubated with buffer containing a fork DNA substrate and ATP␥ S before elution with desthiobiotin (Supplementary Figure S1B). The DNA consisted of a 40 bp duplex region with 30 nt tails and resembled closely a fork DNA that had been designed to measure CMG's helicase activity, with a 3 polydT tail for helicase loading in the correct orientation for fork unwinding and a 5 GC-rich tail that inhibits helicase binding (25).
CryoEM data were collected on a Titan Krios operating at 300 keV using a K2 Summit detector and processed with Relion-3 (41). After 2D and 3D classification and refinement, we obtained a 3.29Å map of CMG-ATP␥ S-DNA from a set of 213,527 particles, and a 3.41Å map of the CMG C-tier, comprising the ring of AAA+ ATPase domains, after masking of the N-tier ( Supplementary Figures S2 and S3). Both maps were used to build a molecular model of CMG-ATP␥ S-DNA. The excellent quality and high resolution of the map allowed an accurate description at atomic level of the protein-DNA interface and ATPbinding sites of the human CMG ( Figure 1A).

Overall structure
The 11-subunit assembly of the human CMG shows the familiar architecture first demonstrated for the yeast and Drosophila CMG (9,11): a two-tiered ring of MCM2-7 proteins, with the Cdc45 and GINS coactivators bound together to the N-tier portion of MCM2, MCM5 and MCM3, so that Cdc45 faces the MCM2-5 interface of ATPase domains ( Figure 1B). Each AAA+ ATPase domain contains a nucleotide-binding site at the subunit interface in the Ctier ring, and interacts with DNA via two ␤-hairpin loops named pre-sensor-1 (PS1) and helix-2 insert (H2I) (42) that line the pore of the C-tier ring (Supplementary Figure S4A). Flexible anchorage between MCM subunits is provided by a domain-swapped helix in each ATPase domain, which tethers each MCM to its neighbour subunit (Supplementary Figure S4B).
A continuous chain of eleven thymidine nucleotides is bound in a right-handed B-form spiral within the C-tier channel of ATPase domains. The single-stranded (ss) DNA contacts all six MCM subunits, from MCM2 with its 5 -end to MCM5 with the 3 -end, and thus traverses all MCM interfaces except MCM2-5 (Figure 2A, B). No clear density is visible within the N-tier of the CMG for either singlestranded DNA or the double-stranded portion of our fork DNA substrate. The likely explanation for this observation is that the slowly-hydrolysable ATP␥ S nucleotide has permitted the engagement of the CMG helicase with the leading-strand portion of the fork DNA substrate but prevented its translocation to the ss-dsDNA nexus.
Three of the six MCM interfaces in the ring: MCM6-4, MCM4-7 and MCM7-3, are bound to ATP␥ S, whereas the MCM3-5 and MCM2-6 interfaces contain ADP as prod-uct of ATP␥ S hydrolysis, while the MCM2-5 interface is empty ( Figure 2B). In accordance with the apo status of MCM5, the MCM2-5 gate remains ajar and the MCM2-7 C-tier ring adopts a shallow right-handed spiral conformation (Supplementary Figure S5A, B).
In addition to their N-and C-tier domains, each MCM subunit contains a smaller C-terminal winged helix (WH) domain. The WH domains of MCM2 and MCM6 were well resolved in the focused C-tier map and could be therefore be modelled in the density (Supplementary Figure S5C). Density for the WH domain of MCM5 could be identified in the lumen of the C-tier pore, but was not of sufficient quality to allow modelling. The similar MCM2 and six WH domains sit on the rim of the C-tier and interact with each other with approximate 2-fold symmetry. The first 14 proline-rich amino acids of the MCM3 isoform used in our study bind at the interface between the MCM3 N-tier and the GINS Psf3, likely extending and stabilising the GINS-MCM ring interface (Supplementary Figure S6).

DNA binding
In the structure, the ssDNA is embedded within the pore of the C-tier ring (Figure 2A). Nine of the eleven nucleotides from the 3 -end of the ssDNA adopt a right-handed spiral conformation that follows closely that of B-form DNA. Four MCM subunits, MCM6, 4, 7 and 3 make an identical set of contacts with DNA, involving both PS1 and H2I loops. MCM6, 4, 7 and 3 interact with four nucleotides each, with a two-nucleotide offset between contiguous subunits ( Figure 2B). The DNA-binding loops are arranged in a staircase matching the DNA spiral, from MCM2 at the top of the staircase (5 -end of the DNA) to MCM5 at the bottom (3 -end) ( Figure 2C). DNA binding correlates with nucleotide occupancy, as nucleotide-bound MCM6, 4, 7 and 3 make a full set of interactions with DNA.
Within each four-nucleotide footprint, an invariant serine at the start of loop H2I (MCM6 S425, MCM4 S539, MCM7 S410, MCM3 S419) is hydrogen bonded to the 5terminal phosphate, whilst an invariant lysine in loop PS1 (MCM6 K486, MCM4 K600, MCM7 K471, MCM3 K480) is ion paired to the subsequent phosphate ( Figure 3 and Supplementary Figures S7, S8A). The role of the invariant PS1 lysine is remarkably similar to that of K506 of the E1 papillomavirus replicative DNA helicase (5). The ssDNA is kept in close contact with each MCM subunit by two hydrogen bonds between phosphates of the second and third nucleotide in each binding site and main-chain nitrogens of the PS1 residue after the invariant lysine and of a first-strand residue in the H2I hairpin ( Figure 3).
Besides these polar contacts, the protein-DNA interface has substantial hydrophobic character: small aliphatic side chains of valine and alanine in both H2I and PS1 loops pack against the ribose-phosphate backbone of the DNA, creating a continuous hydrophobic surface in the C-tier pore that matches the spiral of the DNA (Figure 3). In addition to making extensive contacts with the DNA backbone, the MCM subunits use the H2I loop to interact with the bases: a pair of conserved H2I residues, consisting of a basic and an aromatic/hydrophobic amino acid six residues apart Nucleic Acids Research, 2020, Vol. 48 Correct positioning of the invariant serine at the start of the H2I loop for interaction with the phosphate backbone of the DNA requires adoption of a helical conformation by the five residues succeeding the serine (H2I ␣ N ; Figure 3 and Supplementary Figure S7). This local helical folding is driven by anti-parallel ␤-strand pairing of the two residues preceding the serine with the second ␤-strand of the PS1 loop in the preceding MCM subunit. This inter-subunit interaction helps merge the H2I and PS1 loops of individual MCMs into a continuous DNA-binding staircase that ex-tends around the pore in the C-tier, as noted recently for the archaeal homo-hexameric MCM ring (44). As expected for the MCM subunit at the bottom of the staircase, H2I ␣ N is disordered in MCM5.

ATP binding and hydrolysis
In the structure, five of the six ATP-binding sites in the MCM2-7 ring are occupied by a nucleotide (Figure 4 and Supplementary Figure S8B  ATP-binding sites of MCM6, MCM4 and MCM7 contain ATP␥ S, with a Mg 2+ ion coordinated between ␤ and ␥ phosphates. Interestingly, the cryoEM map shows clearly that MCM2 and MCM3 have hydrolyzed their ATP␥ S to ADP. Furthermore, the MCM5 nucleotide-binding site is empty. It is possible that MCM5 had hydrolyzed ATP␥ S and released ADP, or alternatively that ATP␥ S was never bound: both possibilities are compatible with the observed open state of the MCM5-2 interface. All residues previously identified as involved in ATP binding and hydrolysis engage as expected with the ATP␥ S moieties bound at the three MCM7-3, MCM4-7 and MCM6-4 interfaces, including Walker A and B residues and sensor-1 asparagine of the P-loop subunit, and arginine finger, sensor-2 arginine and sensor-3 histidine residues in the contiguous 'sensor' subunit ( Figure 4 and Supplementary Figure S9). The two ADP-bound interfaces between MCM3-5 and MCM2-6 show a very similar set of contacts, except that the arginine fingers in MCM5 (R513) and MCM6 (R529) are partially disordered, likely as a consequence of the absence of the ␥ phosphate. A noteworthy feature of ATP binding by MCM2-7 is the extensive range of hydrophobic interactions that shield the aromatic base of the nucleotide from solvent (Figure 4). These interactions include a 'sandwich' interaction made by an invariant isoleucine on one side of the base and two aliphatic residues in the domain-swapped helix on the other side ( Figure 4).
In addition, the adenine base engages in Watson-Crick like hydrogen bonding with the main-chain nitrogen and carbonyl moieties of a residue preceding a conserved 'positive ' glycine in the linker between helices 2 and 3 of the AT-Pase domain (Supplementary Figure S10).
An unresolved question is whether ATP-coupled conformational changes during DNA translocation are limited to movements of the DNA-binding loops or involve the entire ATPase domain. Structural superposition of the ATPase domains shows that the DNA-binding loops of nucleotidebound MCM6, 4, 7 and 3 are in a similar position relative to their ATPase domains ( Figure 5A, B); in contrast, the H2I loop of apo MCM5 at 3 -end of the DNA occupies a lower position and its H2I ␣ N is disordered ( Figure  5C). These observations suggest that DNA translocation might be achieved by a composite mechanism of wholedomain movements during the ATP-coupled translocation cycle, and rearrangement of the H2I loop at the end of the cycle, as H2I detaches itself from the staircase and its AT-Pase domain re-engages DNA at the 5 -end.
The ATP status of the ADP-bound MCM2 subunit appears anomalous given its position at the top of the ring staircase. Several indicators point to MCM2 acting as a 'seam subunit' (19) that has only partially engaged with the rest of the C-tier ring: the smaller interface area with MCM6 (1391Å 2 , instead of ∼2000Å 2 for the other nucleotide-occupied MCM interfaces), the disordered conformation of its DNA-binding element H2I ␣ N (Supplementary Figure S11A), and the higher B value attained during real-space refinement. The ADP moiety of MCM2 is also unusual in the way it sits in the P-loop, as the nucleotide is shifted so that its ␤-phosphate occupies the position occupied by the ␥ -phosphate in the ATP␥ S-bound interfaces (Supplementary Figure S11B).
Overall, the observations relative to ATP status and DNA binding in the MCM C-tier ring are consistent with a sequential rotary mechanism of ATP hydrolysis as the basis for translocation of the human CMG. The structural integrity of the C-tier during translocation is provided by a domain-swapped helix (Supplementary Figure S4B) that provides a flexible tether between contiguous MCM subunits, in a similar fashion as recently described for the replicative gp4 DNA helicase (16).

Interaction with AND-1
Analysis of purified human CMG overexpressed in HEK293 cells revealed the co-purification of substoichiometric amounts of endogenous AND-1, a known replisome factor and the human orthologue of yeast Ctf4 (Supplementary Figure S1). Our previous work had shown that yeast Ctf4 acts as a recruitment hub for replisome proteins, tethering multiple factors at the fork via its trimeric structure (45,46). AND-1 shares its oligomeric nature with Ctf4, although it appears to have a distinct mechanism of binding to its replisome partner, Pol ␣/primase (47).
Co-purification of endogenous AND-1 indicated a strong constitutive interaction with the human CMG complex. We therefore decided to co-express AND-1 together with the components of the human CMG, and succeeded in purifying a 14-subunit CMG assembly which we refer to as CMGA (Supplementary Figure S12). CryoEM analysis of CMGA using Warp for particle picking (29) and CryoSparc for image reconstruction (30) yielded a 6.77Å map that was readily interpretable and permitted the unambiguous docking of the high-resolution structure of human CMG and the crystal structure of the AND-1 trimer that we reported earlier (47) (Supplementary Figure S13).
The structure shows that the disk-shaped AND-1 trimer docks edge-on at a near perpendicular angle onto the lead-ing face of the CMG (Figure 6). Despite AND-1 being full-length, only the SepB trimerisation domain (47,48) of AND-1 is visible in the map, indicating that both the Nterminal WD-repeat domain and the extended C-terminal region spanning the HMG box are flexibly oriented in the trimeric structure. Relative to the CMG, the trimeric AND-1 disk is arranged so that its N-terminal segments are located ahead of the fork and in proximity of the parental double-stranded DNA. In contrast, the helical structure of the SepB domain and the C-terminal extensions project away from the CMG (Figure 6).
AND-1 binds at the perimeter of the CMG, engaging both Cdc45 and GINS with the ␤-propeller of one of its SepB-like domains (Figure 7). The CMG-AND-1 interface is formed by the B-domain of Psf2 and the helical portion of Cdc45 linking its two DHH domains, which together occupy the concave surface formed by blades 1 and 6 of the SepB ␤-propeller (Figure 7). The interface buries only 1087 A 2 , a surprisingly small area for a constitutive interaction. The limited resolution of our structure is insufficient for unambiguous identification of interface amino acids. However, we can determine that the interface is of mixed hydrophobic and hydrophilic nature, and that the tight binding of AND-1 to the CMG despite the relatively limited interface might be driven by the presence of charge-charge interactions that become solvent-excluded upon CMGA formation.

DISCUSSION
In this paper, we have used cryoEM to capture a highresolution view of the human CMG bound to a fork DNA substrate in the presence of ATP␥ S. Our map of human CMG-ATP␥ S-ssDNA allowed us to visualise unambiguously critical features of the complex, such as its protein-DNA interface and the nucleotide-binding sites, and to represent them in an accurate atomic model. We have also reported an intermediate-resolution structure of the CMGA assembly, which described the mode of interaction of CMG with the core replisome factor AND-1.

DNA binding
In our structure, all six MCM subunits contact ssDNA, spanning a total of 11 nucleotides. The footprint of an MCM subunit on ssDNA covers four nucleotides, rather than two as previously reported (19,49), with two overlapping nucleotides between neighbouring subunits (Figure 3). We believe that the difference in footprint relative to previous reports for the yeast and Drosophila CMGs (19,49) is due to the improved resolution of our study. As the MCM C-tier amino acids at the protein-ssDNA interface are invariant in yeast and Drosophila MCMs (Supplementary Figure S7), our description of the protein-DNA footprint is universally applicable.
Interaction of MCM6, 4, 7 and 3 with DNA takes place via an identical set of contacts mediated by both PS1 and H2I DNA-binding loops (Figure 3). A significant difference is that MCM6 and MCM2 use aromatic residues at a conserved H2I position to unstack the nucleotides at the 5 -end of the DNA and disrupt the B-form DNA. These aromatic residues intermesh with consecutive bases much as the teeth of a cogwheel; such contacts appear well suited to avoid slippage and transmit torque when an MCM subunit engages the leading DNA strand emerging from the N-tier, at the top of the staircase. Overall, the PS1 and H2I loops within the C-tier ring follow closely the DNA spiral, lowering steadily their vertical reach from MCM2 at the top of the binding staircase to MCM5 at the bottom ( Figure 2).
Earlier structural work on yeast and fly CMG had shown that DNA can be bound via two different sets of MCM subunits: MCM6, 4 and 7 (22) or MCM2, 3, 5 and 6 (11,50). A recent cryoEM analysis of fly CMG in the act of translocating on DNA revealed the existence of several different conformational states, which appear to encompass and extend the previously described DNA-bound states of the CMG (19). In light of this analysis, our CMG-DNA structure would most likely correspond to state 2B, in which MCM6, 4, 7 and 3 contact DNA and are bound to ATP, with MCM2 in the process of exchanging ADP for ATP and re-engaging with DNA at the 5 -end. Thus, the emerging evidence from this wealth of structural data strongly indicates multiple modes of asymmetric DNA binding in the C-tier ring as a key feature of DNA translocation by the CMG.

ATP site occupancy and hydrolysis
The site occupancy and hydrolysis status of ATP␥ S in our CMG-ATP␥ S-DNA structure is in general agreement with the sequential rotary model of ATP utilisation but also reveals some unexpected findings. Nucleotide occupancy is known to correlate with DNA binding, and in our structure ATP␥ S-bound MCM6, 4 and 7 interact with DNA. In the model, ring subunits at or near the bottom of the DNA-binding staircase have hydrolysed ATP; accordingly, the MCM3-5 interface has converted the slowlyhydrolysable ATP␥ S to ADP (Supplementary Figure S8E), whereas MCM5 is in the apo state. These observations are in line with the sequential rotary model, however they represent an apparent discrepancy with the model of Eickhoff and colleagues (19), in which ATP binding by MCM3, but not its hydrolysis, is important for asymmetric transloca-tion mediated by a MCM3-5 dimer. Interestingly though, the ADP-bound MCM3-5 interface remains as extensive as for the three ATP␥ S interfaces of MCM6-4, MCM4-7 and MCM7-3 (2078Å 2 of buried surface area versus an average value of 2034Å 2 for the ATP␥ S interfaces). Although we cannot exclude that ATP␥ S hydrolysis occurred before DNA binding, we consider it highly unlikely, for two reasons: the nucleotide states around the C-tier ring are mostly in line with DNA-binding state, as observed previously; Nucleic Acids Research, 2020, Vol. 48, No. 12 6993 ATP␥ S was added as the same time as DNA, and it improbable that the CMG might have hydrolyzed ATP␥ S before DNA binding.
Furthermore, in our structure the MCM5-2 interface is void of nucleotide and the MCM2-5 gate is open. Given the finding that MCM3 has hydrolyzed ATP␥ S, MCM5 might be one step ahead in the ATP cycle and may have released its ADP, in preparation for re-joining the C-tier ring at the top of the staircase. That MCM3 and MCM5 of all six subunits should have hydrolyzed ATP␥ S is in agreement with the effect of Walker A K-to-A mutations in fly CMG, showing that loss of ATP binding by MCM3 and MCM5 caused the largest decrease in ATP hydrolysis rates (18), and with a four-fold reduction in DNA unwinding caused by an 'arginine finger'-to-alanine mutation in fly MCM5 (19). Whether the open state of the apo MCM2-5 interface represents an intermediate state in the translocation cycle, or rather a stalled or paused state of the helicase remains to be established.

DNA translocation
Our structure captures a high-resolution snapshot of the CMG trapped on fork DNA with ATP␥ S and does not provide conclusive evidence concerning models of asymmetric translocation. The staircasing arrangement of the DNAbinding loops and the ATP-hydrolysis status in the MCM ring are both supportive of a sequential rotary mechanism of DNA translocation for the human CMG. Differences, such as the observed ATP-hydrolysis status of MCM3, with the proposed model of asymmetric translocation (19) remain to be explained and might be species-specific.
The ADP-bound status of MCM2, at the top of the staircase and interacting with the 5 -end of the ssDNA, is apparently inconsistent with a sequential rotary model of ATP hydrolysis. We have already described several elements of evidence indicating that the MCM2 appears to behave as a 'seam' subunit (19). The observed nucleotide hydrolysis by MCM2 might have been prompted by prolonged idling of the helicase upon incubation with the slowly hydrolyzable ATP␥ S. An intriguing possibility is that this might represent a clue of CMG's ability to translocate in reverse, although no evidence for such behavior is currently available. Evidence that the CMG can backtrack on ssDNA has been provided by recent single-molecule studies (51).
Insight into asymmetry in MCM2-7 behavior comes from analysis of the solvation free-energy for formation of the interfaces between contiguous ATPase domains ( i G) in the C-tier ring, using the EBI PISA server (52). The analysis shows striking differences in i G values among MCM interfaces. Although the three ATP␥ S interfaces, as well as the ADP-bound MCM3-5 interface, bury similar surface areas (between 1975Å 2 and 2140Å 2 ), formation of the MCM4-7 ATP␥ S and MCM6-4 ATP␥ S interfaces shows more than 2-fold higher energy gains and 7-fold lower P-values than for the MCM7-3 ATP␥ S and the MCM3-5 ADP interfaces (Supplementary Figure S14). Thus, the i G analysis indicates that the contacts binding together the MCM4-7 and 6-4 interfaces are much tighter and more specific than in the other MCM interfaces. These observations might provide a structural basis for the finding that human MCM4, 6 and 7 can be recovered as a stable heterotrimeric complex from HeLa cells (53). They further suggest that MCM subunits 4, 6 and 7 might behave as a rigid body in the asymmetric mode of DNA translocation. This would be in agreement with biochemical evidence that loss-of-ATP-binding K-to-A mutations in fly MCM6 and 4 causes only modest reductions in DNA unwinding (18), and that loss of ATP hydrolysis at the MCM6-4 and MCM4-7 interfaces caused by alanine mutation of the 'arginine finger' in MCM4 and 7 has equally modest effects on unwinding (19).
The evolutionary invariance of catalytic residues in all six MCM2-7 ATPases represents a challenge for models of asymmetric translocation that consider ATP binding and hydrolysis to be important for only a subset of MCM subunits. Symmetric translocation is adequate for DNA replication by homo-hexameric DNA helicases in viruses, bacteria and archaea, implying that it represents the evolutionary consensus for DNA translocation. Differentiation of the eukaryotic MCM into six distinct proteins may have evolved to endow the CMG with its unique mode of loading and activation and possibly termination. Asymmetric translocation might therefore represent an adaptation to cope with MCM sequence diversification rather than a process of optimisation. Consequently, the eukaryotic CMG might be capable of considerable mechanistic flexibility with regard to the role played by its MCM subunits during translocation.

CMGA
Our cryoEM structure of the human CMGA elucidates the interaction mechanism of the replisome component AND-1 with the CMG. AND-1 does not contact the MCM proteins and binds to the side of the CMG where Cdc45 and GINS are located. Thus, an important role of Cdc45 and GINS, in addition to activating the CMG helicase activity, is to mediate CMG's interaction with AND-1. Only the trimerization region of AND-1 represented by its central SepB domain is visible in the map, indicating that its N-terminal WD repeat and extended C-terminus are flexibly arranged relative to its central trimeric structure. The AND-1 trimer is docked onto the CMG like a rigid body, with a single ␤-propeller wedged in between Cdc45 and GINS. This mode of CMG binding is similar to the mode of interaction that was recently described for yeast Ctf4 with the CMG (54), suggesting that the resulting architecture of the CMGA is functionally important and has been evolutionarily preserved.
Because of the high interaction angle of the disk-like AND-1 trimer relative to the plane containing the leading edge of the CMG, the N-terminal WD repeat domains of AND-1, not visible in our structure, would be placed in the trajectory of the parental DNA. This striking feature of the CMGA architecture indicates that AND-1/Ctf4 can in principle contact the parental DNA ahead of the CMG's leading edge of translocation. Alternatively, AND-1 could help recruit in front of the helicase protein factors that respond to replication stress. In yeast, the N-terminal WD40 domain of AND-1's orthologue Ctf4 recruits the Rtt101 E3 ubiquitin ligase to promote replisome progression through damaged DNA (55). Conceptually, it makes sense to deal with physical impediments to DNA synthesis while they are still ahead of the fork, to avoid a collision and possible dis-assembly of the helicase. Clearly, these findings point to further discoveries and unexpected observations that await future structural studies of the eukaryotic replisome.
We had originally reported that Ctf4 can interact with the CMG via a CIP (Ctf4-Interacting Peptide) motif present in the N-tail of yeast Sld5 (45), as well as in Ctf4's multiple protein partners (46). In the light of our current observations, we believe that the mode of AND-1 interaction with the CMG described here and reported for yeast Ctf4 represents the principal mode of AND-1/Ctf4 recruitment to the replisome. Binding by the CIP of yeast Sld5 might further secure the association of Ctf4 to the yeast CMG, as well as possibly act as a safety mechanism for keeping Ctf4 anchored to the fork when its primary interaction site with the CMG is disrupted. A CMG-binding motif equivalent to the yeast CIP is not found in human Sld5 or any other human CMG components, indicating that the Ctf4-Sld5 CIP interaction is unique to budding yeast.