Function of the N-terminal segment of the RecA-dependent nuclease Ref

The bacteriophage P1 Ref (recombination enhancement function) protein is a RecA-dependent, HNH endonuclease. It can be directed to create targeted double-strand breaks within a displacement loop formed by RecA. The 76 amino acid N-terminal region of Ref is positively charged (25/76 amino acid residues) and inherently unstructured in solution. Our investigation of N-terminal truncation variants shows this region is required for DNA binding, contains a Cys involved in incidental dimerization and is necessary for efficient Ref-mediated DNA cleavage. Specifically, Ref N-terminal truncation variants lacking between 21 and 47 amino acids are more effective RecA-mediated targeting nucleases. We propose a more refined set of options for the Ref-mediated cleavage mechanism, featuring the N-terminal region as an anchor for at least one of the DNA strand cleavage events.


INTRODUCTION
Encoded by bacteriophage P1, the recombination enhancement function (Ref) protein is a 21-kDa RecA-dependent HNH endonuclease that can be targeted to produce doublestrand breaks (DSBs) at any desired DNA sequence. Early reports showed that expression of Ref enhanced recombination events in a RecA-and RecBCD-dependent manner in Escherichia coli (1)(2)(3). Deletion of the ref gene in bacteriophage P1 has no measurable effect on lysogeny or lytic cycles, and an in vivo role for ref has not yet been elucidated (3).
Unlike many other temperate phages, the P1 prophage is maintained as a low copy number, autonomous plasmid. The linear double-stranded (linear ds) genome is cyclized upon infection. Typically, this process occurs via the phageencoded Cre-lox site-specific recombination system, but it can also occur by RecA-mediated homologous recombination (4). Ref may play a role in this RecA-dependent cyclization of the genome (5). Some phage HNH proteins partner with terminases in the genome packaging reaction (6). However, Ref exhibits no homology to the terminaseassociated HNH protein nucleases.
An effort to characterize the bacteriophage P1 Ref protein as a suspected RecA-regulator protein revealed that Ref is actually a RecA-dependent, HNH-family endonuclease (7). Ref has the novel property of only cleaving DNA to which RecA protein is bound. In the absence of any cofactors or proteins, Ref will bind to both single-stranded DNA (ssDNA) and double-stranded DNA (dsDNA), but no DNA cleavage occurs. In the presence of RecA, ATP and Mg 2+ , Ref produces extensive degradation of ssDNA (7). The ATP and Mg 2+ are required for an active RecA nucleoprotein filament, and this in turn is necessary for Ref nuclease activity. More importantly, Ref will create DSBs in a small target area within a RecA formed displacement loop (D-loop). RecA forms a nucleoprotein filament on an oligonucleotide (100-150 nucleotides (nt) in length) and initiates strand invasion in the homologous region of a ds-DNA. In strand invasion, the RecA-bound oligonucleotide base pairs to the complementary strand, and the other strand of the duplex is displaced. Ref then cleaves the paired and displaced strands of the targeted duplex DNA sequentially within the D-loop region (8). The initial cleavage of the paired strand is relatively fast, does not require ATP hydrolysis by RecA and is promoted by the Ref active site mutant H153A (8). The second cut on the displaced strand is produced at a much slower rate, requires RecA-mediated ATP hydrolysis and is not promoted by Ref H153A (8). The two cleavage events thus appear to be mechanistically distinct.
The Ref protein (21 kDa; 186 amino acids) consists of a 76 residue amino-terminal domain and a 110 residue Cterminal globular domain. The structure of the C-terminal globular domain of Ref has been determined to 1.4-Å resolution (PDB ID: 3PLW) (7). The asymmetric unit of Ref contains a monomer and two stably bound Zn 2+ ions. The structure features an HNH domain defined by a ßß␣ metalbinding core. Outside of this core element, HNH nucleases are structurally and catalytically diverse, including group I and II homing endonucleases, transposases, restriction endonucleases and bacterial colicins (9). Colicins digest ds-DNA non-specifically, while homing endonucleases create nicks or double-stranded breaks at specific DNA sequences.
Ref does not exhibit any dominant sequence specificity, however, there is a preference for a phosphodiester bond to the 5 side of a pyrimidine base (8).
In contrast, there is limited characterization of the Nterminal domain. Electron density for the N-terminal region was absent in the crystal structure (7), suggesting disorder. In the past decade, attention has been placed on intrinsically disordered proteins or parts of proteins (10)(11)(12). In many cases, protein domains that are intrinsically disordered in solution become ordered upon binding to a ligand. The sequence of Ref N-terminal domain exhibits recognized hallmarks of an intrinsically disordered region (10)(11)(12)(13), particularly in its high concentration of amino acid side chains that should be charged at neutral pH, with 25 positively charged and 9 negatively charged among the 76.
Here, we characterize key aspects of structure-function in the Ref protein, with a focus on the active oligomeric form and the function of the disordered N-terminal 76 amino acids. Although a complete N-terminal domain deletion still retains nuclease activity, a 100-fold higher concentration is required to reach wild-type (WT) cleavage levels (7). This early work also found that when the N-terminal domain was absent, the Ref protein no longer bound to DNA. Visual inspection of the sequence revealed a possible signature charge distribution motif (imperfectly repeated), which was used to guide the design of N-terminal deletion variants. We demonstrate that partial removal of the N-terminal region leads to an enhancement of targeted Ref cleavage at D-loops. An understanding of N-terminal domain function is critical to further development of the Ref system in biotechnology applications, and that is the primary focus of the current study.

DNA substrates
The M13mp18 circular ssDNA (7249 nt) was prepared as described previously (14,15). The M13mp18 circular ds-DNA was prepared as described previously (14)(15)(16). All DNA concentrations are given in micromolar nt (M nt). Oligonucleotides were purchased from Integrated DNA technologies. Sequences of oligonucleotides used in this study can be found in Supplementary Materials and Methods.

Cloning ref and ref variants
The WT ref gene from bacteriophage W39 was not available, as stocks of this phage no longer exist. The

Proteins
The E. coli RecA E38K protein was purified as described previously (Ronayne, 2014). The P1 WT Ref and P1 N76 proteins were purified as described previously (7). All Ref variants were purified in their native form using standard chromatography procedures, and quantified, as described in the Supplementary Materials and Methods. The mass of all variants was confirmed by mass spectrometry. All proteins were stringently tested for exonuclease and endonuclease contamination and all were free from detectable nuclease activity in the absence of RecA protein.

Fluorescence polarization DNA binding assay
Ref proteins at 0.5-10 000 nM were incubated with 2 nM AJM25 71mer ssDNA or AJM25 annealed with AJG52 71mer to make a linear 71 bp dsDNA substrate in 25 mM Tris-acetate (pH 8.5), 3 mM potassium glutamate, 15 mM magnesium acetate and 5% w/v glycerol at room temperature for 15 min. Fluorescence anisotropy (FA) was measured at 25 • C, using a Tecan Infinite M1000 instrument with 470-nm excitation and 535-nm emission wavelengths for at least three replicates. The average FA values were plotted with one standard deviation of the mean shown as error. Prism software was used to convert FA values to the percent of DNA bound and apparent dissociation constants were determined using one-site, specific binding with Hill coefficient.

Circular ssDNA nuclease assay
Ref proteins were tested for nuclease activity as previously described (7). See details in Supplementary Materials and Methods.

Nuclease site-specific targeting assay
Ref proteins were tested for targeted nuclease activity in assays as previously described (8). See details in Supplementary Materials and Methods.

Non-reducing sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE)
All Ref proteins were dialyzed into 20 mM sodium phosphate (pH 7.5), 200 mM sodium choloride and 10% glycerol to remove DTT present from purification. Each reaction contained 6.5 g Ref protein, 50 mM sodium phosphate (pH 7.5), 75 mM sodium chloride and in indicated reactions 10 mM dithio (DTT). Non-reducing cracking buffer (80 mM Tris-HCl pH 6.8, 2% SDS, 10% glycerol, 0.2% bromophenol blue) was added to each sample and then run on a 4-15% gradient SDS-PAGE. and W39 Ref enzymes are 95% identical in amino acid sequence. One difference at residue 11, ultimately helped define oligomeric properties. The two proteins are used interchangeably in this study, and the N-terminal deletion variants were constructed in W39 Ref.

A repeating pattern in the charge distribution in the Nterminal domain
The P7 Ref protein has 13 amino acid changes in comparison to P1 WT Ref as well as an additional 30 amino acids on the C-terminus. It was insoluble in several conditions tested (unpublished data). As we were unable to locate viable stocks of bacteriophage P7 (the P7 ref gene was reconstructed based on the published sequence), this protein was not pursued.
The W39 WT Ref has 10 amino acid changes in comparison to the P1 Ref (6 of these in the unstructured N-terminus), but no additional C-terminal amino acid residues ( Figure 1A). Stocks of W39 bacteriophage were also unavailable. Therefore, the reported sequence of  Figure 1A). Of the four amino acid residues in the C-terminal core domain that differ between the two proteins, the TMO hybrid has only two of the changes characteristic of the conversion of P1 Ref to W39 Ref, N107E and A115S ( Figure 1A). The TMO hybrid protein also lacks 47 amino acid residues from the N-terminus, due to a precise but apparently spontaneous protein degradation event that occurred during purification. Protease inhibitors were used in subsequent purification trials to prevent any further spontaneous degradation, and molecular weights of all purified proteins were verified by mass spectrometry.
Examining the N-terminus region more closely, we identified a potential signature charge pattern (consensus: ++ + +-+) repeated imperfectly three times as shown in Figure  1B and Supplementary Figure S1. This pattern guided the design of a series of specific truncation variants with one or more of these putative motifs removed ( Figure 1B). The Ref protein truncations purified included W39 N21, W39 N47, W39 N59, W39 N66 and W39 N74 (Figure 1C). The P1 N76 Ref that was previously purified (7) was included in many assays for purpose of comparison. We attempted, unsuccessfully, to purify a variant including only the N-terminal 76 amino acid residues of P1 Ref (Ref  C110). The highly charged nature of the N-terminal region renders the protein largely insoluble.

Removing charge motifs in the N-terminal domain decreases DNA-binding affinity
The P1 N76 Ref no longer exhibits DNA-binding activity (7), indicating the N-terminus is essential for DNA binding. We employed fluorescence polarization with labeled DNA to determine apparent dissociation constants (K d,app ), thereby measuring the DNA binding function of this domain. One key parameter needed to determine true K d values, the DNA binding site size of Ref, is currently unknown and may change with each truncation variant. Our data analysis was thus limited to reporting apparent (K d app ) values that provide a measure of relative DNA binding affinity.
When assayed with a labeled 71mer ssDNA (AJM25), the full-length P1 WT and W39 WT Ref proteins exhibited similar K d,app values (14 ± 1 nM and 28 ± 2 nM, respectively) (Figure 2A and C). It is possible that the small difference in DNA binding between the two proteins is due to the presence of a Glu instead of a Lys residue at position 13 in the W39 WT Ref, which alters the first of the putative charge distribution motifs (Figure 1 and Supplementary Figure S1). However, upon removing one putative charge motif, the W39 N21 and W39 N47 Ref proteins exhibited a 6-fold reduction in binding (119 ± 9 nM and 113 ± 7 nM, respectively) (Figure 2A and C). Removal of the second charge motif in the W39 N59 resulted in a 40-fold reduction in binding affinity. The W39 N66, which has a partial deletion of the third motif, exhibited further reduction in ssDNA binding, and DNA saturation was not attainable. Upon removal of the third charge motif in W39 N74 and P1 N76 Ref proteins, DNA binding was no longer detectable ( Figure 2C). Data for the P1 N76 Ref protein is consistent with Electrophoretic Mobility Shift Assay (EMSA) data from the earlier study (7). Our results suggest that each charge motif (or sequence features in or near these putative motifs) contributes to ssDNA binding. The results also suggest that the entire N-terminal 76 amino acid residue region contributes more or less additively to DNA binding.
The truncations were also assayed with a 71mer dsDNA substrate made by annealing the labeled 71mer oligonucleotide (AJM25) with the complementary 71mer oligonucleotide (AJG52). The P1 WT Ref and W39 WT Ref proteins exhibited very similar K d,app (14 ± 1 nM and 24 ± 1 nM respectively) ( Figure 2B and C). Binding affinity for ds-DNA was also reduced as the charge motifs were removed. However, W39 N21 Ref (56 ± 3 nM) and W39 N47 Ref (143 ± 12 nM) did not have similar K d,app values, as seen with the ssDNA. This suggests a contribution to DNA binding from the region that separates the first and second of the putative charge distribution motifs. As the second motif and part of the third motif were removed in the W39 N59 and W39 N66, dsDNA binding was further reduced. As with the ssDNA, W39 N74 and P1 N76 Ref had no observable dsDNA binding. Overall we provide evidence that the N-terminal region of Ref is necessary for DNA binding, and the putative charge motifs may contribute collectively (but probably not exclusively) to that function.    N47 and W39 Ref N74 were incubated in the presence or absence of the 150 nt targeting oligonucleotide and plus and minus the cross-linking reagent DSG (Supplementary Figure S5). Cross-linking was quite modest in all cases, and the inclusion of ssDNA in the reaction produced no effect. Because the Ref protein is replete with primary amine residues the DSG should trap any higher oligomeric states. However, it is possible that higher order oligomeric states are not cross-linked in the presence of DSG.

The N-terminus of Ref is necessary for ssDNA nuclease activity
P1 Ref is a RecA-dependent nuclease on ssDNA (7). In the presence of M13mp18 circular ssDNA, RecA protein, ATP and Mg 2+ there is evident degradation of the DNA within 20 min (Figure 4A), although the actual number of cleavage events per ssDNA is small (7). To test whether the truncations had any effect on nuclease activity, this assay was carried out using all truncation variants at the same concentration (24 nM Figure 4B). However, there was a clear decrease in nuclease activity as the truncations progressed to W39 N59, W39 N66 and W39 N74 Ref ( Figure 4B). The decrease activity gradient is consistent with the removal of the putative charge motifs. When concentrations of W39 N59, W39 N66, W39 N74 Ref were increased 10fold, levels of degradation similar to those produced by WT Ref were observed ( Figure 4B). This result suggests that the N-terminal domain does not alter the nuclease activity, but is necessary for Ref localization to the DNA and/or RecA nucleoprotein filament. The dramatic deficiency in the activity of W39 N66 and W39 N74 Ref coincides well with the deficiency in DNA binding affinity.

Partial N-terminal Ref truncation enhances targeted doublestrand cleavage
Using an assay established previously (8) we examined the efficiency of targeted dsDNA cleavage. Ronayne et al. (8) determined that the strand paired to the targeting oligonucleotide is cleaved first. This is followed by a mechanistically distinct cleavage of the displaced strand resulting in a linearization of the circular dsDNA (8). This work also showed that the targeting oligonucleotide was cut, but to a much lesser extent than the targeted circular dsDNA. The cleavage pattern and the degree to which the targeting oligonucleotide was cleaved was similar in the presence and absence of the circular dsDNA target (8). This suggests that the very limited cleavage of the oligonucleotide occurs independently of the targeting reaction (8).
In the dsDNA targeting reaction, supercoiled M13mp18 circular dsDNA is incubated with a complementary 150mer oligonucleotide and the RecA protein, which pairs the complementary oligonucleotide and duplex DNA ( Figure  5A) to form a D-loop. Within the RecA-created D-loop, Ref cleaves the strand paired to the oligonucleotide rapidly, followed by a much slower cleavage in the displaced strand to create a 7.25-kb linear dsDNA ( Figure 5A) (8). A representative agarose gel of the products formed in the presence of W39 WT Ref is displayed in Figure 5B. The supercoiled circular dsDNA (Supplementary Figure S6), nicked circular dsDNA ( Figure 5C) and linear dsDNA ( Figure 5D) were measured as a percentage of the total DNA in each lane and followed over time.
The P1 WT, W39 WT and W39 N59 proteins exhibited identical reaction kinetics with about 63% of supercoiled DNA being linearized ( Figure 5D). Strikingly and reproducibly, the W39 N21, W39 N47 and TMO hybrid Ref increase linear dsDNA product formation by 15-20% ( Figure 5D). At 30 min the W39 N21, W39 N47 and TMO hybrid Ref differed in kinetics compared to WT Refs by exhibiting a greater decrease in nicked DNA at 30 min ( Figure 5C). This decrease in nicked DNA corresponded to the second cleavage event, producing linear ds-  In contrast, the W39 N66 and W39 N74 exhibited decreased targeted assay efficiency. This corresponded with a deficiency in DNA binding ( Figure 2) and nuclease activity on circular ssDNA (Figure 4). The W39 N66 Ref had a 2-fold reduction in targeted nuclease activity, indicating that the removal of the 7 amino acids following the second charge motif is detrimental to the Ref activity. Upon removal of the remaining third putative charge motif, the W39 N74 Ref only generated 4% linear dsDNA product in 3 h. Both the W39 N66 and W39 N74 are capable of producing reduced but significant levels of nicked DNA, suggesting that the second cut needed to produce linear ds-DNA is especially compromised.

DISCUSSION
The present study provides an initial insight into the function of the intrinsically disordered N-terminal region of the  (20,21). Since the Ref protein does not display sequence-specific nuclease activity, the putative charge motifs we have identified may be involved in non-specific DNA interactions or are involved in making contacts within the RecA nucleoprotein filament groove or with DNA in the filament. We are uncertain if the charge motifs are necessary simply for direct electrostatic interactions with DNA/RecA or if, when bound to DNA and/or RecA filaments, they take on a structure necessary for Ref nuclease function.
Based on preliminary ultracentrifugation data on the P1 WT Ref (dimer-active) and P1 N76 (monomer-limited activity), we previously proposed that Ref functioned as a dimer. Although we now postulate that Ref is functionally monomeric, we cannot exclude the possibility that transient dimer formation occurs upon association with RecA or DNA. However, limited proteolysis in the absence or presence of DNA (data not shown) did not reveal a change in the protease sensitivity pattern. Cross-linking patterns with DSG are also not altered by the presence of DNA (Supplementary Figure S5). This suggests Ref does not change its oligomeric state in the presence of DNA.
Combining the present results with previous work (8), we present an updated model for the production of targeted DSBs by Ref ( Figure 6). The Ref protein cleaves the paired strand to create cut one and subsequently cleaves the displaced strand to produce a targeted DSB. As demonstrated previously (8), the first cleavage step does not require RecA-mediated ATP hydrolysis and can be promoted by the Ref active site mutant H153A. This suggests that the first cleavage of the paired DNA strand occurs within the RecA filament groove, and that residue H153 is not involved (8). Based on the present work, we suggest that the highly charged N-terminus of Ref acts as an anchor or tether, targeting the protein to the RecA protein groove. Access to the various DNA strands would then be modulated by the RecA protein structure within the D-loop. The slower, second cleavage event requires ATP hydrolysis and is not promoted by Ref H153A (8). This indicates that it may occur only during or after RecA filament disassembly (8). It also indicates that H153 is involved in this second cleavage event, suggesting that the active site is positioned somewhat differently than it is for the first cleavage event. Since this second DNA cleavage is as dependent on RecA as the first, we presume that Ref protein remains in contact with the RecA filament when carrying out this second cleavage. The Ref protein responsible could be the same one that promotes the first cleavage (after some type of rearrangement that may be reflected in the effects of the H153A mutation) or it could be a different Ref monomer that is also associated with the RecA filament. Once RecA filament disassembly occurs in proximity to the Ref protein, we hypothesize that the Ref monomer cuts the displaced strand ( Figure 6). In Figure 6, we show one of these possibilities, involving a rearrangement of a single Ref monomer to allow separate consecutive cleavage events, although the involvement of two different Ref monomers is equally likely.
The two-domain structure of Ref, with an unstructured N-terminal DNA binding domain and globular nuclease domain, is similar to the HNH and GIY-YIG homing endonucleases I-HmuI and I-BmoI, respectively (22,23). We envision a similar cleavage mechanism to the GIY-YIG homing endonuclease, I-BmoI (22). I-BmoI functions as a monomer to create DSBs through two sequential strand cleavage steps (22). The DNA-binding domain could act as a molecular tether attached through a flexible linker to the GIY-YIG domain, permitting rotation of the nuclease domain to sequentially nick the DNA substrate on each strand, if indeed one monomer carries out both cleavage events. There have been significant advances in the resources available for genome editing, but there is room for expansion of this toolbox. The CRISPR/Cas9 genome editing system has seen abundant application in human cell lines, mice, zebrafish embryos, Drosophila, yeast and other model organisms (24)(25)(26)(27)(28). The genomic sequence of interest is targeted with a chimeric guide RNA (gRNA) (29,30) that directs genomic DNA cleavage by the endonuclease, Cas9 (29,30). The gRNA complementary sequence in the genome needs to be preceded by a NGG sequence, termed a protospacer adjacent motif (30). This modestly restricts sequences that can be targeted for editing. In addition, significant off-target cleavage has been observed in human cells, limiting some therapeutic and research applications (31). At present, the Ref system is limited by the need to introduce three components for cleavage (RecA, Ref and a targeting oligonucleotide). However, the Ref system has no target DNA sequence constraints. Due to the length of the oligonucleotide employed, non-specific cleavage of offtarget sites can in principle be controlled.

SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.