Substrate conformational dynamics drive structure-specific recognition of gapped DNA by DNA polymerase

DNA-binding proteins utilise different recognition mechanisms to locate their DNA targets. Some proteins recognise specific nucleotide sequences, while many DNA repair proteins interact with specific (often bent) DNA structures. While sequence-specific DNA binding mechanisms have been studied extensively, structure-specific mechanisms remain unclear. Here, we study structure-specific DNA recognition by examining the structure and dynamics of DNA polymerase I (Pol) substrates both alone and in Pol-DNA complexes. Using a rigid-body docking approach based on a network of 73 distance restraints collected using single-molecule FRET, we determined a novel solution structure of the singlenucleotide-gapped DNA-Pol binary complex. The structure was highly consistent with previous crystal structures with regards to the downstream primer-template DNA substrate; further, our structure showed a previously unobserved sharp bend (~120°) in the DNA substrate; we also showed that this pronounced bending of the substrate is present in living bacteria. All-atom molecular dynamics simulations and single-molecule quenching assays revealed that 4-5 nt of downstream gap-proximal DNA are unwound in the binary complex. Coarsegrained simulations on free gapped substrates reproduced our experimental FRET values with remarkable accuracy ( = -0.0025 across 34 independent distances) and revealed that the one-nucleotide-gapped DNA frequently adopted highly bent conformations similar to those in the Pol-bound state (ΔG < 4 kT); such conformations were much less accessible to nicked (> 7 kT) or duplex (>> 10 kT) DNA. Our results suggest a mechanism by which Pol and other structure-specific DNA-binding proteins locate their DNA targets through sensing of the conformational dynamics of DNA substrates. Significance Statement Most genetic processes, including DNA replication, repair and transcription, rely on DNA-binding proteins locating specific sites on DNA; some sites contain a specific sequence, whereas others present a specific structure. While sequence-specific recognition has a clear physical basis, structure-specific recognition mechanisms remain obscure. Here, we use single-molecule FRET and computer simulations to show that the conformational dynamics of an important repair intermediate (1nt-gapped DNA) act as central recognition signals for structure-specific binding by DNA polymerase I (Pol). Our conclusion is strongly supported by a novel solution structure of the Pol-DNA complex wherein the gapped-DNA is significantly bent. Our iterative approach combining precise single-molecule measurements with molecular modelling is general and can elucidate the structure and dynamics for many large biomachines.

Abstract DNA-binding proteins utilise different recognition mechanisms to locate their DNA targets. Some proteins recognise specific nucleotide sequences, while many DNA repair proteins interact with specific (often bent) DNA structures. While sequence-specific DNA binding mechanisms have been studied extensively, structure-specific mechanisms remain unclear. Here, we study structure-specific DNA recognition by examining the structure and dynamics of DNA polymerase I (Pol) substrates both alone and in Pol-DNA complexes. Using a rigid-body docking approach based on a network of 73 distance restraints collected using single-molecule FRET, we determined a novel solution structure of the singlenucleotide-gapped DNA-Pol binary complex. The structure was highly consistent with previous crystal structures with regards to the downstream primer-template DNA substrate; further, our structure showed a previously unobserved sharp bend (~120°) in the DNA substrate; we also showed that this pronounced bending of the substrate is present in living bacteria. All-atom molecular dynamics simulations and single-molecule quenching assays revealed that 4-5 nt of downstream gap-proximal DNA are unwound in the binary complex. Coarsegrained simulations on free gapped substrates reproduced our experimental FRET values with remarkable accuracy (<∆FRET> = -0.0025 across 34 independent distances) and revealed that the one-nucleotide-gapped DNA frequently adopted highly bent conformations similar to those in the Pol-bound state (ΔG < 4 kT); such conformations were much less accessible to nicked (> 7 kT) or duplex (>> 10 kT) DNA. Our results suggest a mechanism by which Pol and other structure-specific DNA-binding proteins locate their DNA targets through sensing of the conformational dynamics of DNA substrates.

Significance Statement
Most genetic processes, including DNA replication, repair and transcription, rely on DNA-binding proteins locating specific sites on DNA; some sites contain a specific sequence, whereas others present a specific structure. While sequencespecific recognition has a clear physical basis, structure-specific recognition mechanisms remain obscure. Here, we use single-molecule FRET and computer simulations to show that the conformational dynamics of an important repair intermediate (1nt-gapped DNA) act as central recognition signals for structurespecific binding by DNA polymerase I (Pol). Our conclusion is strongly supported by a novel solution structure of the Pol-DNA complex wherein the gapped-DNA is significantly bent. Our iterative approach combining precise single-molecule measurements with molecular modelling is general and can elucidate the structure and dynamics for many large biomachines.
Protein machines functioning on chromosomes and plasmids utilise different mechanisms to locate their targets on DNA. Sequence-specific DNA-binding proteins, such as restriction enzymes and transcription factors, recognize a particular nucleotide sequence via a combination of direct and indirect readouts (1). Whereas direct readout involves specific interactions between the DNA bases and protein amino acid side chains, indirect readout senses sequencedependent structural and mechanical features, such as major or minor groove width, and conformational flexibility (2). In contrast, structure-specific proteins have no sequence specificity; instead, they interact with particular DNA structures (e.g., gapped duplexes, and 5' or 3' overhangs). While sequencespecific DNA binding mechanisms have been studied extensively, structurespecific mechanisms remain unclear.
Many enzymes involved in DNA repair and replication are necessarily structurespecific, and have been shown to interact with bent DNA substrates; examples include Flap endonuclease 1 (3)(4)(5), DNA Polymerase β (6), XPF (7), and MutS (8,9). Although catalytic reasons for substrate distortion have been suggested for individual systems, it is unclear whether the structure and dynamics of bent states serve as general recognition signals for binding or substrate selectivity. A key related question is whether these proteins induce DNA bending upon binding (via an "induced fit" mechanism) or they recognise a pre-bent state adopted by the DNA prior to protein binding (a "conformational selection" mechanism), or a combination of both.
Here we studied the E. coli DNA polymerase I (Klenow Fragment; Pol), a structure-specific protein responsible for Okazaki fragment processing in laggingstrand DNA replication, as well as for DNA synthesis during DNA repair. In both roles, the polymerase recognizes and binds to a gapped DNA substrate and polymerizes across the gap. After gap filling, strand-displacement synthesis may follow; this is important for Okazaki fragment processing, as the polymerase continues to synthesize DNA whilst displacing an RNA primer, which is subsequently excised.
Attempts to understand the DNA binding and recognition mechanism for Pol are complicated by the absence of crystal structures of DNA-Pol binary complexes containing downstream duplex DNA, by the heterogeneity of Pol-DNA complexes (10,11), and by the conformational mobility of the free DNA substrate. As a result, there are many open questions regarding the mechanisms of stranddisplacement DNA synthesis and substrate recognition.
We investigated the mechanism of structure-specific recognition by Pol via a combination of single-molecule Förster resonance energy transfer (smFRET) and molecular modelling. The single-molecule nature of our work addressed the issues of conformational and compositional heterogeneity, and led to a FRETrestrained solution structure of the binary complex, Pol bound to 1-nt gapped DNA. This structure revealed a substantial bend in the DNA substrate (which was also supported by complementary FRET experiments in living bacteria), and provided insight into protein and DNA structural features crucial for stranddisplacement synthesis. The structure also served as the starting point for atomistic molecular dynamics simulations, which revealed the dynamic nature of the binary complex and the specific interactions between the protein and DNA.
Experimental smFRET measurements and coarse-grained modelling allowed us to characterise the conformational ensemble of the free substrate and propose a mechanism for substrate recognition and binding by DNA polymerase I, which is likely to apply to many other structure-specific DNA-binding proteins.

Structural analysis of multiple species in dynamic equilibrium.
To analyse the structure of the binary complex of Pol with a 1-nt gapped DNA in solution, we determined numerous DNA-DNA and protein-DNA distance restraints within freely diffusing Pol-DNA complexes using single-molecule confocal fluorescence microscopy combined with alternating-laser excitation (12-14). We measured DNA-DNA distances between labelled sites in the upstream and downstream duplex regions of gapped DNA containing a 3'-dideoxy nucleotide (to prevent any chemistry occurring; Fig 1A). We also measured DNAprotein distances between a FRET donor dye attached to one of three Pol residues (K550C, L744C, C907; Fig 1B) and a FRET acceptor dye attached to one of 13 labelling sites on gapped DNA. Pol activities were not significantly affected by the dye presence (10,15).
We determined the Pol concentration required to form binary complexes by monitoring FRET between upstream and downstream sites on the DNA at increasing Pol concentrations. Three distinct FRET states were observed during the titration, indicating three different conformations of the gapped DNA ( Fig 1C).
A low-FRET state corresponding to free DNA; a high-FRET state corresponding to a (Pol) 2 -DNA ternary complex; and a mid-FRET state corresponding to two distinct species, the Pol-DNA binary complex, and a (Pol) 2 -DNA ternary complex, with indistinguishable FRET signals (see Methods; Fig S1A-B). The increased FRET for the Pol-DNA binary complex vs. free DNA suggested substantial DNA bending in the binary complex. Using global fitting (Methods), we recovered dissociation constants of K D1 = 360 ± 60 pM for the binary complex, and K D2 = 9 ± 4 nM for formation of the mid-FRET (Pol) 2 -DNA dimer species (Fig S1B).
To characterize the structure of the DNA in the binary complex, we set the Pol concentration to 1 nM thereby maximising the population of the binary complex ( Fig 1D), and measured 34 DNA-DNA FRET distances (Table S1, and Fig S1).
We also obtained 39 protein-DNA distance restraints for the binary complex, measuring FRET between donor-labelled Pol and acceptor-labelled DNA (Table   S1 and Fig 1E). Single-molecule confocal measurements are generally limited to a maximum concentration of fluorescent species of ~100 pM, however, because of its low dissociation constant, a detectable quantity of the binary complex was present even at 100 pM dye-labelled Pol (Fig 1E).
The DNA substrate is bent by 120° in the Pol-DNA binary structure.
To obtain structural models of the Pol-DNA complex, we used our 73 distance restraints to perform rigid-body docking using Pol and two shortened DNA helices representing the upstream and downstream DNA (Fig 2A). We generated 32 refined structures and ranked them according to their fit to the measured distances (see Methods). A single model ( Fig 2B) emerged with a significantly better fit than the other structures ( Fig S2A). In this model, the position of the upstream DNA agreed very well with the position of a DNA fragment in a crystal structure of a Pol-DNA binary complex (16), (RMSD = 2.9 Å; Fig S2B), demonstrating the accuracy of our structural model. To test the robustness of our model, we generated 100 'bootstrapped' structures by randomly perturbing the 73 distance restraints in proportion to their experimental errors, repeating the docking calculations, and calculating the RMSD for each DNA backbone phosphate atom across all bootstrapped structures (average RMSD = 3.8 Å, Fig   S2E).
Having established the accuracy and precision of our model, we inspected it for insights into the DNA-binding and strand-displacement mechanisms. The most striking feature was the significant kink in the DNA substrate, a ~120° bend compared to straight duplex ( Fig 2B). Further, the downstream DNA is positioned close to the fingers subdomain ( Fig 2C), with the helical axis aligned with Y719 (Bst numbering used as default; this corresponds to residue F771 in E. coli).
Substitution of this residue with alanine was previously shown to significantly impair strand displacement synthesis by Pol (17). Extending the downstream DNA to its full length by modelling the previously deleted base pairs proximal to the gap, resulted in a clash between the additional DNA and the Pol fingers ( Fig   2D) indicating that the gap-proximal downstream DNA may be partially melted in the binary complex.
We also used 21 FRET restraints (Table S1) to obtain the relative orientation of the upstream and downstream DNA in the high-FRET (Pol) 2 -DNA ternary complex and generated a low-resolution model for this complex in which the DNA was more severely bent (by ~140°; Fig S2F; RMSD = 12 Å, Fig S2G).

All-atom MD simulations give structural insights into the strand displacement mechanism.
To probe the exact position of DNA in the binary complex, its dynamics and any specific contacts with Pol, we carried out all-atom MD simulations. We generated five different starting models by combining the DNA from our FRET-restrained structure with the short DNA fragment present in the 4BDP Bst X-ray structure (18) (see Methods and Fig S3B), and performed two unconstrained 100-ns simulations from each model.
Whereas the DNA fragment present in the X-ray structure remained stably bound to Pol (RMSD 2.8 ± 0.8 Å), the upstream and downstream segments flanking this DNA were much more mobile (RMSD 11.1 ± 4.4 Å and 19.2 ± 8.8 Å, respectively; Fig S3C), with the end-to-end DNA distance ranging from 24 to 144 Å ( Fig 3A and Fig S3D). The first six nucleotides of the downstream, nontemplate DNA (nucleotides T(+1) to T(+6), termed the non-template flap) also displayed appreciable dynamics (RMSD 8.7 ± 3.7 Å, Fig 3B and Fig S3E), and did not dock in a particular conformation. To study the extent of DNA melting in this region, we counted the hydrogen bonds formed between the six nucleotides of the non-template flap and the template strand: for most of the simulation time, 2 or 5 hydrogen bonds were present (Fig 3B), corresponding either to a single A-T or to an A-T plus a G-C pair, respectively, consistent with base-pairing of the two nucleotides at the base of the flap. Hence, in most conformations, 4 or 5 nucleotides of the flap were melted. Contacts between the flap and Pol were transient and diverse in terms of the residues involved; the most consistent interactions were sequence-unspecific, being formed between DNA phosphates and positively charged residues (mainly R729 and K730).
Many of the protein-DNA interactions in the active site (Y714, S717, Y719 and R789, all contacting the template strand) were similar to those observed in the Xray structure (18). Further, in our simulations, the conserved three-helix bundle (O, O1 and O2 helices in the fingers subdomain), and especially residue Y719 (F771 in E. coli) were consistently positioned between the downstream template and non-template strands ( Fig 3D); Y719 was typically positioned perpendicular to bases B(+1) and B(+2) of the template strand ( Fig 3D, upper panel), and occasionally stacked against them ( Fig S3F). The position of Y719 is consistent with a previously suggested mechanism in which Y719 acts as a "wedge", separating the non-template strand from its template counterpart (17). Despite the intrinsic dynamics of the non-template strand, the stable positioning of Y719 against the template strand likely prevents re-pairing during catalysis.
Finally, we observed interactions between downstream DNA and the polymerase, which consistently involved positively charged residues on the Pol surface and the negatively charged phosphate groups of the DNA backbone.
These interactions occurred in two regions: the first involved R779 (S831) and R784 (R836) that contact the duplex region of downstream DNA (Fig 3E), and the second featured K549 (K601) of the thumb region interacting with the unpaired template strand (Fig 3F). Whilst any individual nitrogen-phosphate interaction was transient, each residue contacted up to 6 phosphate groups, resulting in Pol-downstream DNA interactions persisting for most of the simulation time. The dynamic nature of these interactions likely reflects the need for rapid Pol movement along its DNA substrate during DNA synthesis.
Downstream DNA is melted in the DNA-Pol binary complex. To study the melting of the downstream non-template strand predicted by both our docked binary complex model ( Fig 2D) and MD simulations (Fig 3B), we used quenchable FRET (quFRET), a single-molecule assay able to detect local DNA unwinding (19-21). In quFRET, when the donor (Cy3B) and acceptor (Atto647N) are in close proximity (< 2 nm), their emission is quenched, yielding only few events with intermediate stoichiometry (0.4 < S < 0.8) (see Methods). Upon local DNA melting, the two dyes move further apart and the quenching is reduced, leading to a large increase in both the number, and proportion of events with intermediate stoichiometry (mostly occurring at high FRET efficiencies, as the inter-dye distance remains short).
We studied a 1-nt gapped DNA substrate labelled with donor and acceptor dyes at positions T(+1) and B(+4), respectively. In the absence of Pol, the dyes are in very close proximity; as a result, we detected few intermediate-S events (Fig 4), comprising only ~25% of all acceptor-containing molecules ( Fig S4A). On addition of Pol, we observed a ~4.5-fold increase in the number of such events per measurement, with a peak at high FRET (E* > 0.9; Fig 4), now comprising ~75% of all acceptor-containing molecules ( Fig S4B). These results demonstrate an increase in dye separation and reduced quenching, consistent with the presence of local melting at the 5'-end of the downstream non-template strand in the binary complex.
To monitor the extent of melting along the downstream DNA, we tested a substrate with donor and acceptor dyes at B(+9) and T(+8) respectively. For these labelling positions, we observed similar quenching in both the absence and presence of Pol ( Fig S4D-F Table S2). Corrected 'ES' histograms showed a single FRET peak, consistent with either a single DNA conformation or rapid (sub-millisecond) conformational averaging (Fig S5A-C). Rigid-body docking of the two duplex portions of the substrate, failed to produce a unique structural model; instead, five structures of the gapped-DNA emerged, with angles between the duplex arms spanning from 8° to 25° (Fig S5D). This approach assumed a single static structure was responsible for the experimental FRET distances; however, the substrate is expected to be highly dynamic (22, 23). To take into account these dynamics, we conducted coarse-grained molecular modelling on the gapped-DNA substrate using the oxDNA model, which allows rapid and efficient conformational sampling, and has been shown to describe well the structural, thermodynamic and dynamic properties of many DNA systems ( Fig S5E) (24-27).
Using an adapted AV approach (see Methods and Fig S5G) (28, 29), we calculated the FRET efficiency arising from each dye pair at regular simulation intervals ( Fig 5C). Transitions between different configurations were rapid (submicrosecond) and much faster than the temporal resolution of the smFRET experiments (~1 ms; see Methods). Therefore, the average FRET efficiencies from the simulations are expected to agree with those measured experimentally.
Indeed, we see excellent agreement between the experimental and modelled FRET efficiencies across all 34 measured FRET pairs (RMSD = 0.026, <∆FRET> = -0.004; Fig 5B and Table S2, c.f. the estimated experimental error, FRET error = ± 0.025, see Methods). The fit to the experimental data was significantly worse for the best of the five static structures obtained from rigid-body docking (RMSD = 0.054), suggesting that the coarse-grained simulations better describe the experimental conformational ensemble and dynamics for these highly dynamic substrates.
The simulations of the free substrate identified two classes of structures: in ~80% of configurations, both stacking interactions between the three nucleotides opposite the gap were maintained, resulting in a straighter geometry (Fig 5D, top). In ~20% of configurations, at least one of these stacking interactions was broken, resulting in a state in which the system can explore a wide variety of bend angles (Fig 5D -bottom). FRET efficiencies were typically larger for  Table S2). This finding strongly suggests that bent states are present in the experimental ensemble for the gapped substrate. Bent states were also detected in all-atom MD simulations on the gapped substrate ( Fig S5H).
Encouraged by the excellent agreement between the computational and experimental results for the free substrate, we calculated free-energy landscapes from the relative abundance of conformations with specific bend angles ( Fig 5E).
The landscapes show that the angle seen in the Pol-bound state of the gapped substrate (~120°) is accessible to the unbound gapped substrate (free energy difference of <4 kT); this bend angle is achievable only upon breaking at least one of the stacking interactions. However, once the stacking is broken, the substrate can freely explore a relatively flat landscape (Fig 5E -dashed lines), where the gap acts as a hinge. Simulations on nicked and duplex DNAs showed that it is harder for these substrates to adopt bend angles of 120° (Fig 5E), due to the increased energetic cost of breaking an additional stacking interaction and, in the case of the duplex, the extra chain connectivity constraints (>7 kT -nick, >>10 kT -duplex).
We also inspected the coarse-grained simulations for evidence of melting of the downstream duplex DNA in the gapped substrate alone. In 28% of all configurations, we observed melting of the A-T base pair (fraying) at the downstream site immediately adjacent to the gap ( Fig S5F); when looking only at unstacked configurations, this fraction increased to 35%, partly due to the loss of a stabilising cross-stacking interaction ( Fig 5D). The second nucleotide was We first internalized the T(-12)T(+8) gapped substrate, as it showed a large FRET change upon Pol binding in vitro (Fig S6A-B). In live cells, we observed a bimodal FRET distribution consistent with the existence of both unbound (80%, E = 0.40) and bound (20%, E = 0.83) populations ( Fig 6B). In contrast, a duplex control showed only a single, low-FRET peak (Fig 6C; c.f. the in vitro data - Fig  S6C). The absence of a high-FRET population for this construct, which is not a substrate for the polymerase, is consistent with the interpretation that the high-FRET population observed with the gapped-DNA construct is a result of bending induced by the endogenous full-length Pol.
Whilst the labelling scheme above discriminated well between the FRET signals arising from unbound and bound DNA, we could not resolve the smaller FRET difference between the binary complex and high-FRET ternary complex seen in our in vitro work ( Fig S6B). We thus internalised the T(-18)T(+15) gapped substrate, which showed in vitro a larger FRET difference between the binary complex and the high-FRET ternary complex (Fig S6D-E). The resulting FRET histogram from live cells lacked a significant high-FRET peak, but did exhibit two low-FRET peaks, consistent with the presence of unbound DNA, and DNA in the binary complex (Fig S6F), and suggesting that little high-FRET ternary complex was present in vivo.

Discussion
The combination of single-molecule FRET with both coarse-grained and all-atom molecular simulations has provided substantial mechanistic and structural insight into the recognition and binding of DNA substrates by Pol. We have characterised the structure and dynamics of multiple species present in solution: the substrate alone, the binary complex and the high-FRET ternary complex.
Further, we have obtained evidence for the in vivo relevance of the bent binary complex, detecting its FRET signature in live cells.
We obtained a unique, solution-based, high-precision structure (RMSD = 3.8 Å) of Pol bound to a gapped-DNA substrate, containing upstream and downstream duplex DNA flanking a 1-nt gap (Fig 2B and 3A). Previous structural efforts lacked any downstream duplex DNA and so its position and the conformation of the substrate were unknown. Gapped DNA in the binary complex structure adopted a 120° bend (discussed further below).
The location of the upstream DNA in the docked structure agrees very well with existing co-crystal structures containing primer-template substrates. This supports our rigid-body docking approach, and the accuracy of our positioning of the downstream DNA on the fingers subdomain. This positioning conclusively rejects early propositions that the DNA might be channelled through the cleft formed by the fingers and thumb subdomains (33, 34). Our structure served as a starting point for all-atom MD simulations, which showed DNA dynamics in the binary complex, and identified transient DNA interactions with specific Pol residues. Some of these interactions involved residues implicated in previous biochemical studies, e.g. Y719 (17), providing a structural and mechanistic explanation for the experimental data; other residues (e.g. K549) revealed novel interactions that will merit further study.
Our docked structure showed that the downstream DNA was positioned very close to Y719 (Fig 3D), confirming its involvement in strand displacement. DNA Pol I shares a three-helix bundle (O, O1 and O2) structural motif with T7 RNA polymerase (35). This motif participates in DNA binding and strand separation (36), and includes conserved residues Y719, S717 and R789 in Bst (F771, S769 and R841 in E. coli), which have been shown to be important for stranddisplacement by Pol (17). This role for Y719 was further supported in our simulations, which showed the three-helix bundle (and particularly Y719) to be positioned between the template and non-template strands of the downstream DNA. The exact position of Y719 close to bases B(+1) and B(+2) on the downstream-template DNA is consistent with cross-linking data (37, 38).
We also identified residues that interacted with the downstream DNA (R779 and R784; Fig 3E). These residues are highly conserved, with published sequence The binary complex structure from rigid-body docking suggested that the downstream DNA cannot be fully base-paired proximal to the Pol fingers ( Fig   2D). This idea was supported by our MD simulations, in which 4-5 nt of the downstream DNA remained single-stranded for the majority of the simulation time ( Fig 3B). Our quenchable FRET assay confirmed that the downstream DNA is indeed melted when bound by Pol (Fig 4). When carrying out Okazaki fragment processing or long-patch base excision repair, Pol must perform stranddisplacement DNA synthesis, replacing the RNA primer / damaged DNA with newly polymerized DNA. Our data suggest that the strand-displacement process starts before any DNA synthesis, with up to seven nucleotides being melted upon Pol binding to the substrate. Gapped DNA in the binary complex structure exhibited a 120° bend (Fig 2B and   3A). DNA bending was also observed in the crystal structure of the mammalian gap-filling DNA polymerase β, where the ~90° bend observed was suggested to be important for the mechanisms of polymerisation and fidelity (6). Our data support the idea that bending may be a necessary mechanistic step for gap-filling polymerases, exposing more of the template base for interrogation by the incoming nucleotide. However, we propose bending may also play a role in substrate recognition and selectivity.
Our coarse-grained simulations on the free gapped DNA showed remarkable agreement with the smFRET data ( Fig 5B) and have important implications for the binding mechanism of Pol. Since the breaking of the stacking interactions opposite the gap increases DNA bendability, unstacking will likely occur as a step on the path to Pol binding. In addition, the high flexibility of the unstacked DNA suggests that the substrate can adopt a close-to-final bent conformation even prior to Pol complex formation. The simulations also provide an explanation for Pol substrate specificity, specifically its increasing binding preference for gapped over nicked DNA, previously observed by gel shift assays (41)  Based on our results, we propose the following model for recognition and binding of a gapped DNA substrate by Pol involving conformational capture followed by an 'on-protein' rearrangement (Fig 7). The DNA substrate rapidly interconverts between stacked and unstacked states; the unstacked conformations are generally more bent and show increased fraying 1-2 nt around the gap. The Pol initially interacts with the upstream DNA while the substrate is in an unstacked state (conformational capture). This upstream region of the substrate resembles a primer-template structure, which is known to bind tightly to Pol (K D < 1 nM; Turner, Grindley and Joyce, 2003) forming a sufficiently stable complex for crystallization (16, 18). This conformational selection step does not necessarily require the substrate to adopt the precise 120° bend angle seen in the binary complex; rather, the DNA conformational flexibility helps to avoid blocking binding through steric clashes. Having bound the upstream duplex, the downstream duplex is free to sample conformational space (as seen in the MD simulations on the binary complex; Fig 3A), docking to the protein, and fraying the additional 3-4 nts, resulting in the complete binding of the gapped DNA (K D = 0.4 nM; Fig S1A). This proposed two-step binding mechanism comprises an initial conformational selection step in which the substrate is bound, followed by an 'on-protein' conformational search, in which the DNA and the protein both search conformational space Other structure-specific DNA binding proteins which have been shown to interact with bent DNA (e.g. FEN1, Pol β) are also likely to exploit the conformational dynamics of their substrates for recognition and binding. Thus, the mechanism we propose of an initial conformational selection step, sensing the increased flexibility of the substrate DNA, followed by an 'on-protein' rearrangement, may be generally applicable to many structure-specific DNA binding enzymes. It is an attractive model for how these enzymes operate during DNA repair, where vast regions of undamaged DNA are searched rapidly to identify sites that need to be repaired to stop accumulation of toxic intermediates and mutations, and ensure normal cellular function.     Table S1.      Gapped DNA is dynamic adopting bent and frayed states (orange haze). Pol can bind to the upstream DNA when the downstream DNA conformation is not impeding the Pol (conformational capture of slightly bent states). Following binding of the upstream DNA, the downstream DNA now docks and is further melted, beginning the process of stranddisplacement. The total DNA substrate concentration is: We used a global fitting approach to fit the variation in the fractional populations of the three FRET states simultaneously as a function of Pol concentration, and to determine the equilibrium constants K 1 , K 2 , and K 3 . From the equilibrium constants and their standard errors, we calculated the dissociation constants.

METHODS
Accurate FRET corrections. The apparent FRET efficiency, E* was calculated from the DA and DD photon streams: Similarly, the apparent stoichiometry, S* was calculated using: S* = (DD +DA) / (DD + DA + AA) (S10) To obtain the accurate FRET efficiency, the raw photon streams were sequentially corrected for background counts, cross talk, and gamma / beta factors (which take into account the different detection efficiencies, quantum yields and excitation cross sections of the two dyes), as described (12, 13).
First, the three photon streams were corrected for background, which arises from impurities, Raman scattering from the solvent and dark counts in the detectors. For each burst, the corrected counts were calculated from the raw counts by subtracting the background count rate, multiplied by the length of the burst. Typical background count rates were 1-3 photons per ms.
After the background correction, the leakage fraction of the donor emission into the acceptor detection channel and the direct excitation of the acceptor by the donorexcitation laser were obtained. The correction factor for leakage (lk) was determined from the FRET efficiency of the donor-only population, E don-only : The correction factor for direct excitation (dir) was determined from the apparent stoichiometry value of the acceptor-only population, S acc-only : dir = 1/(1/S acc-only -1) (S12) The DA intensities and the FRET efficiency and stoichiometry were then corrected as follows: DA corr = DA -DD*lk -AA*dir (S13) E PR = (DA corr ) / (DA corr + DD) (S14) S PR = (DA corr + DD) / (DA corr + DD + AA) (S15) Finally, the gamma and beta parameters were obtained from a linear fit to a plot of 1/S PR vs E PR : 1/S PR = slope*E PR + intercept (S16) β = intercept + slope -1 (S17) γ = (intercept -1)/(intercept + slope -1) (S18) The fully-corrected accurate FRET efficiencies and stoichiometries, E and S are given by: E = (DA corr ) / (DA corr + γ*DD) (S19) S = (DA corr + γ*DD) / (DA corr + γ*DD + AA/β) (S20) Accurate determination and application of all correction parameters was checked visually on the ES histograms, as all FRET populations should be located at S~0.5.
Gamma and beta factors were determined separately for DNA-DNA and DNA-Pol measurements. This was necessary because of the significant difference in the quantum yield of the donor when attached to DNA or protein (see below and Table   S4).
Conversion of accurate FRET to distance. Accurate FRET efficiency E, was converted to distance R, according to the equation: using experimentally determined values for the Förster radius, R 0 , which were calculated according to the equation: where Q D is the quantum yield of the donor (which must be measured; see below), N A is Avogadro's number, and n is the refractive index of the medium. The term κ 2 describes the relative orientation of the transition dipoles of the donor and acceptor. Its value lies in the range of 0-4, and it is often assumed to be equal to 2/3, which is the case when both fluorophores have unrestricted rotational freedom (46).The overlap integral J is a measure of the degree of overlap between the donor emission and acceptor excitation spectra (47), and can be calculated according to: where F D is the corrected donor fluorescence intensity at a particular wavelength λ, with the total intensity normalized to unity, and is the extinction coefficient of the acceptor at the same wavelength.  We used the accessible volume (AV) algorithm of the FPS software (29) to model the Simulation conditions. All simulations were carried out using Gromacs 4.6 (58). The Xray structure control simulations and complex simulations were done using explicit solvent (TIP3P) in a triclinic box, with a minimum 10-Å solvent edge, in the presence of 10 mM MgCl 2 . The system was neutralized with addition of magnesium ions, and energy-minimized using steepest descent minimization. In order to stabilize the temperature of the system, equilibration was performed in the NVT ensemble for 100 ps, with the temperature of 298 K maintained using a Berendsen thermostat (59). Next, the pressure of the system was stabilized by equilibration in the NPT ensemble for 1 ns, with the temperature of 298 K and the pressure of 1 bar retained using a V-rescale thermostat (60) and a Berendsen barostat (59), respectively. During equilibration, DNA, protein heavy atoms and the catalytic magnesium ion were position-restrained with a force constant of 1,000 kJmol -1 nm -2 . DNA was equilibrated for an additional 10 ns with protein heavy atoms restrained, under the NPT conditions. Atom velocities were preserved between the equilibration steps, and between equilibration and production steps. Unrestrained production was finally allowed to run for 100 ns, with the temperature of 298 K and the pressure of 1 bar maintained by the V-rescale thermostat and a Parrinello-Rahman barostat (61). Periodic boundary conditions and the Verlet cutoff scheme were used, and long-range electrostatic interactions were accounted for by the Particle-Mesh Ewald method (62). All bonds were treated as constraints with the LINCS algorithm, resulting in a time step of 2 fs. Coordinates were saved to an output trajectory every 5 ps. Repeat simulations were carried out using different randomly numbered seeds, generating different initial atom velocities each time.
In the case of full-length DNA-only simulations, the conditions were the same except that a square box was used, with dimensions equal to the length of the DNA plus a 10-Å solvent edge. The NVT and NPT equilibration steps were performed, and the production times were 20 ns. In the case of high-temperature DNA simulations carried out as part of model preparation, the conditions were the same as for the complex simulations except that the temperature during the equilibration and production runs was 400 K, and the production times were 2 ns. All DNA heavy atoms were position-restrained during the production runs, except for the 6 base pairs in the protein-proximal, downstream part of the DNA, which were unpaired in the starting configuration.
Analysis. All analysis was carried out using Gromacs 4.6 or 5.0, and VMD (63).
Trajectories were repaired for periodic boundary conditions, and processed to include The bend angle was calculated from the vectors placed along the midlines of the two helical segments, as described previously (68). The relative free energies were calculated from the MD trajectories, as follows: where (| |) is the free energy, is the Boltzmann constant, (| |) is the observed probability density for the DNA adopting a bend angle | |, and | 0 | is the reference bend angle, for which (| 0 |) = 0.
FRET efficiencies were calculated from the molecular dynamics trajectories, by adapting the accessible volume (AV) model for dye positions detailed above. Briefly, a grid of points was produced around the DNA base attached to the dye, with the spacing between grid points set to half the smallest dye dimension (see table above). Points were excluded if their distance to a base or backbone site, was smaller than the sum of the dye radius and the excluded volume radius of the base or backbone, respectively.
This overlap check was repeated with the three different dye radii and the resulting AV clouds were combined (Fig S5G). A position that could accommodate all three dye radii was therefore weighted three times more than a position that could only accommodate one. The FRET efficiencies were averaged over all dye distances for each configuration and then again over all configurations in our molecular dynamics trajectories (of length ~15μs). One dimensional histograms of E* were produced from projections of the mid-S data onto the E* axis. The quFRET assay offers two related readouts for DNA melting: an increase in the absolute number of mid-S bursts, and an increase in the relative proportion of mid-S bursts compared with low-S bursts; the latter is a more robust measure, being independent of sample concentration and measurement time.