Altering the sequence speci®city of HaeIII

Engineering the specificity of DNA-modifying enzymes has proven extremely challenging, as sequence recognition by these enzymes is poorly understood. Here we used directed evolution to generate a variant of HaeIII methyltransferase that efficiently methylates a novel target site. M.HaeIII methylates the internal cytosine of the canonical sequence GGCC, but there is promiscuous methylation of a variety of non-canonical sites, notably AGCC, at a reduced rate. Using in vitro compartmentalization (IVC), libraries of M.HaeIII genes were selected for the ability to efficiently methylate AGCC. A two-step mutagenesis strategy, involving initial randomization of DNA-contacting residues followed by randomization of the loop that lies behind these residues, yielded a mutant with a 670-fold improvement in catalytic efficiency (k(cat)/K(m)(DNA)) using AGCC and a preference for AGCC over GGCC. The mutant methylates three sites efficiently (AGCC, CGCC and GGCC). Indeed, it methylates CGCC slightly more efficiently than AGCC. However, the mutant discriminates against other non-canonical sites, including TGCC, as effectively as the wild-type enzyme. This study provides a rare example of a laboratory-evolved enzyme whose catalytic efficiency surpasses that of the wild-type enzyme with the principal substrate.


Introduction
Restriction±modi®cation (R±M) systems are widespread in bacteria, with over 240 DNA speci®cities known (Roberts et al., 2003).Each R±M system consists of a DNA methyltransferase which methylates a speci®c sequence, usually 4±8 bp long, and a restriction enzyme that digests unmethylated (foreign) DNA, providing the bacteria with a defence against phage infection.
The structure of the cytosine-C 5 methyltransferase M.HaeIII was solved in complex with DNA (Reinisch et al., 1995).The protein comprises two domains separated by a large cleft that holds the DNA.The target base is ¯ipped out of the helix into the catalytic site, as ®rst observed in the structure of the M.HhaI:DNA complex (Klimasauskas et al., 1994).One domain contains the catalytic site and the cofactor (Sadenosyl-L-methionine; AdoMet) binding site.The structure of this domain is conserved in all DNA methyltransferases and in the majority of AdoMet-dependent methyltransferases (Schluckebier et al., 1995;Cheng and Blumenthal, 1999;Martin and McMillan, 2002).The second domain is responsible for DNA recognition and varies widely in sequence, size and structure from one enzyme to another.
Despite the wealth of natural R±M systems, there is a continuing effort to engineer systems with novel speci®cities, as both methyltransferases and restriction endonucleases are extremely valuable tools in molecular biology.Alteration of the sequence speci®city of DNA methyltransferases by protein engineering has, however, proved very challenging and has been limited to the exchange of entire domains between methyltransferases, thus conferring the speci®city of the new TRD on the recipient enzyme (Gann et al., 1987;Klimasauskas et al., 1991;Gubler et al., 1992;Mi and Roberts, 1992;Walter et al., 1992;Trautner et al., 1996).An attempt to alter the speci®city of a bacteriophage methyltransferase by the fusion of two TRDs with different DNA speci®cities led to an enzyme with a degenerate `relaxed' speci®city that was not predicted from the speci®cities of the two parent TRDs (Lange et al., 1996).
An experiment to alter the speci®city for the methylated base from adenine to cytosine demonstrated the separate roles of the catalytic and target recognition domains (Roth and Jeltsch, 2001).The M.EcoRV variant, which had mutations in the conserved catalytic domain, displayed a 22-fold preference for cytosine as the target base, but only when located in a CT mismatch and with no increase in catalytic ef®ciency compared with methylation of the same substrate by the wild-type enzyme.In fact, the designed mutations abolish methylation of adenine without improving methylation of cytosine.
An attractive alternative to rational design is selection (Grif®ths and Taw®k, 2000;Brannigan and Wilkinson, 2002).An in vivo selection system, based on the principle of methylated DNA being rendered resistant to cleavage by a restriction endonuclease, has been used to clone methyltransferases from genomic DNA and to select active variants from libraries (Szomolanyi et al., 1980;Kiss et al., 2001;Vilkaitis et al., 2002).A library of mutants of M.SinI, target site GGWCC (W = A or T), was selected for the ability to methylate the degenerate site, GGNCC (N = A, C, G or T) (Kiss et al., 2001).The best variant isolated by this method showed a more relaxed speci®city at the central base, although the original target site was still methylated twice as ef®ciently as the new site and the methylation activity on both sites was 5to 10-fold lower than the activity of the wild-type enzyme on the canonical site.
Redesigning restriction endonucleases to create new sequence speci®cities has also proved dif®cult.Based on the crystal structure of EcoRV in complex with various DNA substrates, attempts have been made to lengthen the 6 bp recognition sequence to 8 bp (Schottler et al., 1998;Lanio et al., Altering the sequence speci®city of HaeIII methyltransferase by directed evolution using in vitro compartmentalization 2000).As with DNA methyltransferases, the most promising results have arisen by selection from libraries of variants.A semi-rational approach was used to select a mutant of EcoRV with 25-fold higher activity on sites ¯anked by AT rather than GC, although several other sites were cut only slightly less ef®ciently (Lanio et al., 1998).More recently, an experiment in directed evolution produced a mutant of BstYI (speci®city RGATCY, where R = A or G and Y = C or T) with a 12-fold preference for AGATCT over AGATCC or GGATCT and no detectable cleavage of GGATCC (Samuelson and Xu, 2002).Also, selection of random libraries of the bifunctional restriction endonuclease Eco57I, in which the restriction endonuclease activity had been inactivated by point mutation, for the ability to methylate and protect the sequence CTGGAG from cleavage by GsuI (the wild-type Eco57I target sequence is CTGAAG) led to the isolation of a mutant which could methylate both of the above sequences.Restoration of the cleavage activity yielded a mutant restriction endonuclease with the speci®city CTGRAG (Rimseliene et al., 2003).
Despite the dif®culties that have been encountered when trying to generate R±M systems of novel speci®city in the laboratory, the abundance of different speci®cities that occur in bacteria and archaea indicates that this process must have occurred many times in nature.Non-canonical activity against star sites (sites that differ by one base pair from the target sequence) or other DNA sites has been reported both for restriction endonucleases and methyltransferases.It has been proposed that many enzymes have promiscuous activity against molecules that are not their principal substrate and that new enzymes may evolve by improvements in the enzyme's ability to catalyse the conversion of one of these poor substrates (Jensen, 1976;O'Brien and Herschlag, 1999;Copley, 2003;James and Taw®k, 2003).The enhancement of an existing promiscuous activity has been proposed as a possible route for the evolution of R±M systems with novel speci®city (Beck et al., 2001;Cohen et al., 2002).This is also a strategy that has been successfully used for directed evolution of enzymes with altered substrate speci®city, for example the evolution of a b-glucuronidase with b-galactosidase activity (Matsumura and Ellington, 2001) and the alteration of the site speci®city of Cre recombinase (Buchholz and Stewart, 2001).
M.HaeIII methylates a range of other sites in addition to the target site GGCC, albeit with reduced ef®ciency, the most frequently methylated star site being AGCC (Cohen et al., 2002).In this study we used in vitro compartmentalization (IVC) to evolve a variant of M.HaeIII with novel sequence speci®city by enhancing this activity towards AGCC.
IVC is a completely in vitro system which allows activitybased selection of DNA methyltransferases (Taw®k and Grif®ths, 1998;Lee et al., 2002), in addition to other enzymes (Ghadessy et al., 2001;Grif®ths and Taw®k, 2003) and ligands (Doi and Yanagawa, 1999;Sepp et al., 2002).In nature, genotype±phenotype linkage results from compartmentalization of the genome by the cell membrane.In IVC, this linkage is achieved by compartmentalizing single genes, not in cells, but in the aqueous compartments of a water-in-oil emulsion (Figure 1).
Using IVC to select M.HaeIII libraries for the ability to methylate AGCC, we isolated a variant with a 670-fold improvement in catalytic ef®ciency (k cat /K m DNA ) on this site.

Oligonucleotides
See supplementary material (available at PEDS online).

Plasmids
Complementary oligonucleotides Mon1 and Mon2 were annealed as described (Cohen et al., 2002), phosphorylated by T4 polynucleotide kinase (New England Biolabs) and ligated into pIVEX2.2bNde(Roche), which had been digested with BamHI and treated with calf-intestinal phosphatase (Roche), to create the vector pIVEX.1s.pIVEX.3swas created in a similar manner by ligation of the annealed oligonucleotides 5SN1 and 5SN2 into pIVEX2.2bNdecut with BamHI and SacI.
The single mutant R225A was synthesized via the strategy of PCR and ligation described previously (Grif®ths and Taw®k, 2003).The template for the ®rst PCR was pIVEX.1s.MHaeIII.An in vitro transcription/translation reaction mixture containing a library of genes encoding mutant methyltransferases (MT), each with restriction± methylation (R±M) sites appended to the gene is dispersed to form a waterin-oil emulsion with typically one gene per aqueous compartment (1).The genes are transcribed and translated within their compartments (2).Proteins with methyltransferase activity methylate the R±M sites (3).Compartmentalization prevents the methylation of genes in other compartments.The emulsion is broken, all reactions are stopped and the aqueous compartments combined (4).Digestion with the cognate restriction enzyme results in the digestion of unmethylated genes (which do not encode active methyltransferases) (5) and the survival of methylated genes (which encode active methyltransferases) (6).The surviving genes can be ampli®ed using the polymerase chain reaction and compartmentalized for further rounds of selection ( 7 Oligonucleotides T1.1R and LMB2.1 were used to amplify the N-terminal fragment and T1.5 and PIV-B1bio for the C-terminal fragment.Digestion with BsmBI, ligation and capture via biotin were performed as described.The full-length gene was ampli®ed by PCR using primers LMB2.2 and pIV-B2bio, digested with NcoI and SacI, and ligated into pIVEX.3sto create pIVEX.3s.R225A.

Library synthesis
Libraries were synthesized using oligonucleotides to encode the diversi®ed residues, as described (Grif®ths and Taw®k, 2003).Library A was prepared by PCR ampli®cation from pIVEX.1s.MHaeIII.Primers LMB2.1 and TRD1Fo were used to amplify the N-terminal fragment and pIV-B1bio and TRD1Ba for the C-terminal fragment.Library B was prepared by PCR ampli®cation from pIVEX.3s.R225A using LMB2.1 and TRD1.3R for the N-terminal fragment and the pIV-B1bio and TRD1.3 for the C-terminal fragment.Each library was ampli®ed by PCR with primers pIV-B2bio and LMB2.2.

Library selection by IVC
Selection by IVC was performed essentially as described (Taw®k and Grif®ths, 1998) with the following modi®cations.The EcoPro T7 in vitro transcription/translation system (Novagen) was used and the oil mix was as follows: light mineral oil (Sigma) containing 4.5% (w/w) Span 80 (Fluka) and 0.5% w/w Triton X-100 (Fischer).DNA was added to the reaction mix to a ®nal concentration of 0.1 nM.Glycerol increases the activity of M.HaeIII against non-canonical sites (Cohen et al., 2002) and was added to the reaction at a ®nal concentration of 16% by volume.Emulsions were incubated at 25°C for 4 h.The restriction endonuclease NheI (New England Biolabs) was used to digest unmethylated genes.
After each round of selection the emulsion was broken, the genes captured on streptavidin-coated beads and treated with NheI.Undigested genes were ampli®ed by PCR using primers LMB2.11 and pIV-B11.To avoid PCR artefacts and to reappend the NheI restriction sites before the next round, the DNA was excised using NcoI and SacI, ligated into pIVEX.1s(library A) or pIVEX.3s(library B) and re-ampli®ed with primers LMB2.1 and pIV-B1bio which anneal in the vector sequence outside the annealing sites of the primers used in the previous PCR (Grif®ths and Taw®k, 2003).Selected libraries were cloned into pIVEX.1sor pIVEX.3sfor initial characterization by in vitro expression and sequencing.
In vitro translation and assay of selected genes Libraries, selected libraries or cloned genes were assayed for methylation activity by a digoxygenin±biotin ELISA-based method essentially as described (Taw®k and Grif®ths, 1998).PCR-ampli®ed DNA was added to EcoPro T7 transcription/ translation system to 2 nM and the reaction mixture was incubated at 25°C for 90 min.A 2 ml volume of in vitro transcription/translation reaction mixture was added to the methylation reaction containing 10 nM methylation substrate, 10 mM EDTA, 80 mM AdoMet, 50 nM Tris±HCl, pH 7.4, 50 mM NaCl, 1 mM dithiothreitol in a ®nal volume of 20 ml.Methylation reaction mixtures were incubated at 25°C and aliquots removed after 2 min to 16 h, for quenching as described (Taw®k and Grif®ths, 1998).Methylated DNA was bound to streptavidin-coated plates, digested by either HaeIII or NheI in NEB buffer 2 (50 mM NaCl, 10 mM Tris±HCl, pH 7.9, 10 mM MgCl 2 , 1 mM dithiothreitol) (New England Biolabs) and undigested DNA was detected as described.

Expression and puri®cation of methyltransferases
The genes for M.HaeIII and the mutant T29 were sub-cloned from pIVEX derivatives into pTrc99 (Amersham Biosciences) using NcoI and SacI restriction sites and transformed into Cultures were grown at 37°C to OD 0.6, induced by the addition of 1 mM isopropyl-(b)-D-thiogalactoside and transferred to 20°C for 4 h.Methyltransferases were puri®ed by ion exchange using S-Sepharose as described (Chen et al., 1991) and stored at ±80°C.The concentration of active sites was determined by an electrophoretic mobility shift assay (EMSA) (Vilkaitis et al., 2001) (see supplementary material).

Kinetic assays of methyltransferase activity
Puri®ed wild-type M.HaeIII and the T29 mutant were tested for the ability to protect NheIsub and HaeIIIsub from digestion by NheI and HaeIII, respectively, using a digoxygenin±biotin ELISA-based method as above (Taw®k and Grif®ths, 1998).Each 20 ml methylation reaction contained 0.5 nM active sites of enzyme and 10 nM substrate DNA.Methylation reaction mixtures were incubated at 25°C and aliquots removed and quenched after 2±30 min.
Five double-stranded 30 bp DNA substrates were used to determine the sequence speci®city of M.HaeIII and the T29 mutant.Each contains a single hemi-methylated HaeIII site (GGCC) or star site (AGCC, CGCC, TGCC or GGCT).Methylation reactions were performed in M.HaeIII reaction buffer supplemented with 5±6000 nM DNA, 10 mM EDTA and 1 mM S-adenosyl-L-[methyl-3 H]methionine (20 Ci/mmol) (Amersham).For each enzyme and each DNA substrate the DNA concentration range was within the range 0.2±10.0Km (see supplementary material).Reactions were pre-heated to 37°C and started by the addition of 1±5 nM active sites of puri®ed enzyme.At least four aliquots of 20 ml were taken within the linear range of the reaction (between 1 and 20 min) and quenched by the addition of 45 ml of 160 mM unlabelled Sadenosyl-L-methionine (AdoMet) in 300 mM sodium acetate, pH 5.0.Triplicate 20 ml samples were spotted on to Multiscreen DE plates (Millipore).The plates were washed three times with 200 ml of 0.2 M ammonium hydrogen carbonate and once with 100 ml of ethanol and allowed to air dry.The ®lters were punched out from the plates and counted in 1 ml of scintillant (National Diagnostics) in a Beckmann LS6000SC scintillation counter.The initial rate was calculated from measurements taken within the ®rst 5% of the reaction, to avoid product inhibition by S-adenosylhomocysteine.
Kinetic constants were determined by ®tting plots of the initial reaction velocity (n 0 ) versus DNA concentration [S] to the Michaelis±Menten model, ), using the Levenberg±Marquardt algorithm, as implemented in Kaleidagraph (Synergy Software, Reading, PA) (see supplementary material).[E] 0 is the initial concentration of enzyme active sites (determined by EMSA).k cat is the apparent turnover number and K m DNA is the apparent Michaelis± Menten constant of the enzyme for DNA, as measured at 1 mM AdoMet.This concentration is above the K m AdoMet of the wild-type enzyme and T29 mutant.Previously, M.EcoRI was tested with a similar variety of DNA substrates with little variation in the K m AdoMet (Reich et al., 1992), leading to the assumption that the K m AdoMet of M.HaeIII (and mutants) is similar for all the DNA substrates used.
K m AdoMet was measured by methyltransferase assays containing 1±5 nM enzyme, saturating concentrations of DNA (2 mM of the substrate containing a hemi-methylated GGCC site) and 50±3000 nM S-adenosyl-L-[methyl-3 H]methionine in M.HaeIII buffer.Reactions were carried out as above and plots of initial reaction velocity versus AdoMet concentration were ®tted to the Michaelis±Menten model, where [S] is the initial concentration of AdoMet and K m is the apparent Michaelis± Menten constant of the enzyme for AdoMet.

Selection strategy for methylation at star sites
The strategy for selecting active methyltransferases by IVC requires a restriction enzyme to digest the genes encoding inactive enzymes, leaving the methylated genes encoding active enzymes intact (Figure 1A).There are no known restriction enzymes with the sequence speci®city AGCC, but restriction sites may be protected from digestion if they overlap methylation sites.NheI is blocked by C 5 methylation of any of the cytosines in its recognition site (GCTAGC) (Roberts et al., 2003).By appending NheI sites that overlap with the M.HaeIII star site AGCC, genes encoding methyltransferases which methylate this star site can be selected (Figure 1B).Plasmids pIVEX.1sand pIVEX.3s,which contain one or three such NheI sites, respectively, were used for library construction and selection.This strategy can also potentially lead to selection for C 5 methylation of GGCT.

Selection of library A for methylation of AGCC
Crystallographic studies show that the TRD of M.HaeIII contains two target recognition loops (loops I and II) which contact the target DNA sequence.These two loops are thought to be responsible for almost all the base-speci®c interactions with the target site (Reinisch et al., 1995).Loop I residues interact with the ®rst three base pairs of the target DNA whereas loop II residues interact with the rest of the target site.
Library A was synthesized by introducing diversity at ®ve codons in loop I: Arg225, which is the only residue thought to make a contact to the ®rst base pair (to G1; there are apparently no amino acid interactions with the complementary base C1¢); Ser224, which is thought to interact with both Arg225 and C2¢; Ser217, Asn232 and Glu233, which differ between M.HaeIII and its closest homologue M.FnuDII, which also methylates GGCC (Figure 2).The template for this library was the M.HaeIII gene cloned in pIVEX.1swith an N-terminal FLAG tag (Chiang and Roeder, 1993).All library members therefore contained a single NheI site that could be protected from digestion by methylation of the central cytosine of AGCC (Figure 1B).
Library A was selected for methylation activity on these star sites by IVC as described (Taw®k and Grif®ths, 1998;Lee et al., 2002), using NheI restriction endonuclease to digest unmethylated DNA (Figure 1).The enrichment of genes encoding active methyltransferases was monitored by in vitro translation of the DNA ampli®ed after each round of selection and assaying the ability of the in vitro translated protein to protect the DNA fragment NheIsub from digestion by NheI.NheIsub is a 593 bp PCR product containing a single NheI site overlapped by AGCC sites.Unmethylated NheIsub was cut to completion but substrate DNA methylated at these star sites remains uncut.In a 16 h reaction, in vitro translated (unselected) library A protected 2% of the DNA from digestion.After the fourth round of IVC, 26% of the DNA was protected (Figure 3).
The selected genes were sub-cloned and individual clones screened by in vitro translation and assayed using NheIsub.Out of 20 clones with detectable activity, nine methylated NheIsub signi®cantly faster than wild-type M.HaeIII and were sequenced.These genes all contained the mutation R225A but no other mutations were common to all clones.Ser224 was not mutated in any of the sequenced genes.The single mutant R225A was constructed and assayed as above and displayed the same level of activity as the fastest selected mutant, a 20fold increase in the rate of methylation of NheIsub compared with the wild-type enzyme (Table I).

Selection of library B for methylation of AGCC
A second library, library B, was synthesized using the single mutant R225A in pIVEX.3s.All genes therefore contained three NheI sites, all of which must be methylated for any gene to be selected.Three residues were randomized, Asn260, Leu261 and Asn262, which are not thought to make direct DNA contacts but are adjacent to Arg225 in the crystal structure of wild-type M.HaeIII (Reinisch et al., 1995) (Figures 2 and 6).Residues 260±262 lie behind loop I and This library was put through three rounds of selection by IVC and the enrichment of the genes encoding active methyltransferases monitored as above.The length of methylation assays was reduced from 16 to 1 h, owing to the higher activity of the pool of genes.Some 4.7% of the DNA remained uncut after methylation by in vitro translated, unselected library B, suggesting the presence of a high frequency of active methyltransferase genes in this library even before selection.After one or two rounds of selection, 7.0 and 41% of the substrate DNA, respectively, remained uncut (Figure 3).A third round of selection, however, did not increase the methyltransferase activity of the library, probably because, after round 2, the library already predominantly comprised clones with suf®cient activity to methylate all three AGCC sites during the 4 h incubation.
Selected genes were sub-cloned after the second and third rounds of IVC.Individual genes were translated in vitro and assayed for the ability to protect NheIsub and HaeIIIsub from digestion by NheI or HaeIII, respectively (HaeIIIsub is a 350 bp PCR product containing a single HaeIII site).The number of variants showing detectable activity on NheIsub after the second round of selection was 10 out of 22 tested and after the third round 37 out of 56 tested.Thirteen of the 78 variants were more ef®cient at methylating AGCC (and hence protecting NheIsub) than methylating GGCC (protecting HaeIIIsub).These genes were sequenced and the methylation activities on NheIsub and HaeIIIsub are listed in Table I.Seven of these variants methylated NheIsub faster than the single mutant R225A.

Activity and sequence speci®city of the improved variants
The fastest mutant, T29, which has ®ve amino acid changes (D-6G, R225A, N260L, L261M, N262W), was found to methylate NheIsub over 600 times faster than M.HaeIII (Table I).This mutant protein and the wild-type M.HaeIII, each with an N-terminal FLAG tag, were puri®ed and tested for the ability to protect NheIsub (593 bp) and HaeIIIsub (350 bp) from digestion by NheI (detecting methylation of AGCC) and HaeIII (detecting methylation of GGCC), respectively (Figure 4).Using 10 nM substrate DNA, the initial rate of methylation of HaeIIIsub (GGCC) by the wild-type enzyme is 0.17 M/min/M enzyme and there was no detectable methylation of NheIsub (AGCC) after 30 min.In contrast, the T29 mutant methylates NheIsub (AGCC) very ef®ciently, with an a Amino acid numbering is the same as in the untagged protein, with the FLAG residues numbered ±7 to 0.  The sequence speci®city of the wild-type enzyme and the T29 mutant was investigated further using several different 30 bp substrates, each containing a single hemi-methylated target site for either wild-type M.HaeIII (GGCC), the selected speci®city (AGCC) or other star sites of M.HaeIII (CGCC, TGCC or GGCT).The star sites are asymmetric (in contrast to the palindromic wild-type site GGCC) and therefore contain a second potential target site for methylation on the complementary strand.For example, an AGCC substrate also has a potential GGCT substrate on the complementary strand.The use of hemi-methylated DNA ensures that methylation of just one target sequence can be assayed (Friedrich et al., 2000).The apparent K m AdoMet was 620 T 60 nM for M.HaeIII and 440 T 90 nM for T29.The apparent K m DNA and k cat of the wild-type and mutant T29 are listed in Table II (also see supplementary material).
Methylation of the star sites by wild-type M.HaeIII was 10to 27-fold less ef®cient (in terms of k cat /K m DNA ) than methylation of GGCC, with the preference for GGCC being mainly due to the high K m DNA for other substrates (Table II).Previously, we had observed that AGCC is methylated more frequently than any other non-canonical site, both in vitro and in vivo (Cohen et al., 2002), but this was not the case in this study.This difference may be due to a number of factors: the use of hemi-methylated substrates, the sequence context of each site or the use of short oligonucleotides in place of an 832 bp PCR fragment or a 6426 bp plasmid.The sequence context of R±M sites is known to in¯uence the ef®ciency of digestion by restriction endonucleases (Thomas and Davis, 1975).The length of the DNA substrate may also be signi®cant, as DNA methyltransferases are thought to scan DNA and methylate target sites processively, over distances of several hundred bases (Urig et al., 2002) and signi®cant differences in the activity of DNA methyltransferases on short and long substrates have been reported previously (Reich and Mashhoon, 1991;Cheng and Blumenthal, 1999).However, previous studies have shown that the kinetic parameters of bacterial DNA methyltransferases are almost identical when measured on otherwise identical unmethylated and hemimethylated substrates (Dryden, 1999;Lindstrom et al., 2000).
Wild-type M.HaeIII shows a 22-fold preference (in terms of k cat /K m DNA ) for the canonical site over AGCC (Table III, Figure 5).T29 shows a dramatic change in both catalytic ef®ciency and sequence speci®city.The k cat for methylation of AGCC is increased 4.6-fold compared with the wild-type and the K m DNA is 140-fold lower than that of M.HaeIII (Table II).Hence mutant T29 has a 670-fold improvement in catalytic ef®ciency (k cat /K m DNA ) on AGCC compared with wild-type M.HaeIII and shows a 3-fold preference for methylation of AGCC over GGCC (in terms of k cat /K m DNA ) (Table III, Figure 5).T29 also ef®ciently methylates CGCC, but the star sites TGCC and GGCT are methylated more slowly.
We cannot rule out the possibility that ¯anking sequence may affect the observed sequence speci®city; however, large increases in rates of AGCC methylation (relative to wild-type) were observed with the T29 mutant using both short oligonucleotide and long DNA substrates in which the DNA was in a completely different sequence context.

Discussion
Eighteen amino acids of M.HaeIII are thought to interact with the DNA target bases or phosphate backbone, based on the crystal structure (Reinisch et al., 1995).These interactions include multiple hydrogen bonds to the bases and a network of  water-mediated hydrogen bonds, similar to that seen in the high-resolution structure of the M.HhaI:DNA complex, is also likely to exist (Klimasauskas et al., 1994).Even with the bene®t of the crystal structure of the M.HaeIII:DNA complex, any attempt to alter the sequence speci®city by rational design would be very challenging as the crystal structure reveals only one of several states in the multistep pathway of DNA binding and recognition (Klimasauskas et al., 1998).Also, the lack of structural and sequence conservation between TRDs of DNA methyltransferases with different speci®cities and the lack of any known methyltransferase with the desired speci®city (AGCC) preclude the use of sequence homology to predict the effects of mutations in the TRD.
The normal target for methylation by M.HaeIII is the sequence GGCC, but methylation of other DNA substrates also occurs with lower ef®ciency.We set out to take advantage of this promiscuity to evolve an enzyme which ef®ciently methylates AGCC as we had previously observed that this site is methylated more frequently than any other noncanonical site, both in vitro and in vivo (Cohen et al., 2002).
To isolate ef®cient enzymes which methylate the new target AGCC we used a two-stage strategy of mutation and selection using IVC.The ®rst library (library A) contained diversity at residues in target recognition loop I.The best variants selected all contained the R225A mutation, which was then used as the starting point for a second library (library B) with mutations in the residues which lie behind loop I. Selection of this second library led to the isolation of a number of mutants with further improvements in activity.The fastest variant, T29, has a 670fold improvement in catalytic ef®ciency (k cat /K m DNA ) on the 30 bp substrate containing AGCC.This is due to a decrease in K m DNA of 140-fold and an increase in k cat of 4.6-fold (Table II).In nature and in the laboratory there is selection pressure for the evolution of enzymes with K m s that match the substrate concentration (Fersht, 1999;Grif®ths and Taw®k, 2000).Assuming a mean droplet diameter of 2.6 mm in the emulsion (Taw®k and Grif®ths, 1998), the concentration of a single gene inside a droplet is 0.18 nM.This low concentration of DNA may have driven the evolution of enzymes with low K m DNA values.
Many aspects of enzyme function have been modi®ed by directed evolution, but improvements in catalytic ef®ciency of the principal reaction and alterations to the catalytic mechanism and the substrate speci®city remain a challenge (Brannigan and Wilkinson, 2002).Directed evolution of a phosphotriesterase by IVC lead to a 63-fold increase in k cat for the principal reaction, but k cat /K m of this fast mutant was only 2-fold higher than that of the wild-type (Grif®ths and Taw®k, 2003).Although laboratory-evolved enzymes may show dramatic switches in substrate speci®city, the catalytic ef®ciency with the new substrate is usually orders of magnitude lower than that of the wild-type enzyme using the principal substrate; for   The ®gure was prepared with the program SETOR (Evans, 1993) using the published crystal structure (Reinisch et al., 1995).
Directed evolution of M.HaeIII speci®city by IVC example, aspartate aminotransferase was selected for the ability to use the non-cognate substrate valine (Yano et al., 1998).The best variant had an impressive 10 5 -fold increase in k cat /K m with valine, but the catalytic ef®ciency of 110 s ±1 M ±1 was 1000-fold lower than that of the wild-type with the principal substrate, aspartate (120 000 s ±1 M ±1 ).Our results provide a rare example of a laboratory-evolved enzyme where the catalytic ef®ciency with the new substrate surpasses that of the wild-type enzyme with its normal substrate.Interestingly, the mutant T29 was more ef®cient than wildtype M.HaeIII with all DNA substrates tested and used CGCC slightly more ef®ciently than AGCC (Table III, Figure 5).Although three substrates, AGCC, CGCC and GGCC, are methylated with high ef®ciency, T29 has not simply lost the ability to discriminate the ®rst base pair and shows greater selectivity against thymine in position 1 than does the wildtype enzyme.T29 also shows better discrimination than the wild-type against thymine in position 4. Overall, the sequence speci®city of T29 is VGCC (V = A, C or G).Wild-type M.HaeIII exhibits a 22-fold preference for GGCC over AGCC whereas T29 has a 3.3-fold preference for AGCC over GGCC, representing a 70-fold switch in speci®city.
The improvement in methylation of CGCC and GGCC is surprising, but not completely unexpected, as there was no selective advantage in using these substrates but no disadvantage either.`Negative selection' against unwanted activities is probably important for the development of a new substrate speci®city, as demonstrated in the directed evolution of Cre recombinase and tRNA synthetase variants with altered speci®city (Buchholz and Stewart, 2001;Santoro and Schultz, 2002;Santoro et al., 2002).IVC may also be used for simultaneous positive selection for methyltransferases with the desired sequence speci®city in parallel with negative selection against enzymes with an alternative or degenerate speci®city.After breaking the emulsion and capturing the DNA on beads, the DNA is digested ®rst with NheI, so that only those genes methylated at AGCC remain.Digestion with HaeIII then releases those genes that are unmethylated at GGCC from the beads.However, a single round of selection of library B in this way did not improve the speci®city of the pool of genes (data not shown).
The permutations of mutations that confer improvements in activity towards AGCC are shown in Table I.The consensus that emerged from the randomized loops over the course of selection provides clues to the importance of each residue in sequence recognition.Arg225 is replaced by alanine in every improved variant selected from library A and the R225A mutation alone confers a 20-fold increase in activity using AGCC (Table I).At least some of the additional mutations present in the T29 mutant must signi®cantly contribute to increased activity on AGCC as T29 is 34 times more active than R225A on AGCC (Table I).It seems likely that the N260L mutation is important as this mutation was found in 12 out of 13 mutants selected from library B. Likewise, the N262W mutation is probably signi®cant as it was found in 10 out of 13 mutants selected from library B. In contrast, L261 was mutated to a wide range of different sequences in the selected clones (Table I).Despite this, the L261M mutation found in T29 may still be important for optimizing activity as 10 other clones also contain the R225A, N260L and N262W mutations yet have lower activity than T29 on AGCC (Table I).None of the amino acid substitutions in the TRD of T29 would be likely to occur in a library generated by error-prone PCR, as each requires more than one base change per codon.
Nine sequences of GGCC-speci®c C 5 methyltransferases (including M.HaeIII) are known (Roberts et al., 2003).In all cases except for M.HaeIII, the residue corresponding to Asn262 is aromatic (usually tryptophan).It is interesting that Asn262 is replaced by tryptophan in the mutant T29.It is possible that this tryptophan makes a hydrophobic interaction with the side chain residue 264 (phenylalanine in all GGCC-speci®c C 5 methyltransferases, including M.HaeIII).The side chain of Arg225 is thought to form hydrogen bonds with the ®rst base, G1, and with the side chain of Asn260.In all nine enzymes Arg225 is conserved and the amino acid at position 260 is Asn or Asp.In T29 the substitution of Asn260 with Leu may compensate for the mutation R225A by removing a polar side chain from the core of the TRD.
The four amino acids mutated in the TRD of T29 are highlighted in Figure 6.The crystal structure of the M.HaeIII:DNA complex suggests that recognition of the ®rst base pair is achieved by hydrogen bonds between side chain of Arg 225 and the ®rst base, G1, and between Arg229 and a nonbridging phosphoryl oxygen in the backbone immediately 5¢ to G1.No hydrogen bonds to the complementary cytosine, C1¢, are predicted (Reinisch et al., 1995).In T29, the putative hydrogen bonding interaction between Arg225 and G1 certainly cannot be replaced by Ala225.Removal of this base-speci®c interaction might be expected to increase the K m DNA , decrease the catalytic ef®ciency and cause the complete loss of speci®city towards the ®rst base pair of the target site (leading to the speci®city NGCC), but these changes were not observed.This is a suprising result which illustrates the power of directed evolution by IVC to solve the problem of sequence recognition in unexpected ways, creating enzymes with sequences that would never have been predicted by rational design.
It is generally thought that direct contacts to the DNA bases are the primary determinant of sequence speci®city in DNA binding proteins (Garvie and Wolberger, 2001).However, the DNA sequence also affects the conformation of the phosphate backbone (el Hassan and Calladine, 1996).Interactions with the phosphate groups of the backbone (indirect readout) can play a role in sequence recognition, as observed in DNA recognition by the restriction enzyme EcoRV (Winkler et al., 1993), the trp repressor (Otwinowski et al., 1988) and the glucocorticoid receptor (Luisi et al., 1991).In the case of M.HaeIII, 11 amino acids are thought to interact with the DNA phosphate backbone and discrimination of the ®rst base pair by mutant T29 might be entirely due to backbone contacts.
Alternatively, the mutations in T29 may cause conformational changes in the TRD.It is also possible that some of the residues responsible for recognizing the ®rst base pair were not identi®ed in the crystal structure of the M.HaeIII:DNA complex, as this structure reveals only one of several states in the multistep pathway of DNA binding and recognition (Klimasauskas et al., 1998).
These results illustrate the power of directed evolution for protein engineering problems that are too complex to solve by rational design and demonstrate the usefulness of IVC for the rapid selection of enzymes with novel properties.IVC offers many advantages over in vivo screening or selection methods, including ¯exibility in the length of reactions, the contents of the compartments and the ability to work with substrates or enzymes that show in vivo toxicity.In this case the methylation reaction took place over 4 h in a solution of 16% glycerol.This increases the likelihood of non-canonical methylation, makes the system very sensitive and allows the selection of enzymes with very low activities, which is highly bene®cial when attempting to select a novel enzyme activity.The use of PCRgenerated libraries is also extremely advantageous, as many alternative substrates could be incorporated, for example mismatched or unnatural bases, to adapt this procedure for the selection of enzymes with alternative desirable activities.

Fig. 1 .
Fig.1.Selection of DNA methyltransferases by in vitro compartmentalization (IVC).(A) Schematic representation of the selection procedure.An in vitro transcription/translation reaction mixture containing a library of genes encoding mutant methyltransferases (MT), each with restriction± methylation (R±M) sites appended to the gene is dispersed to form a waterin-oil emulsion with typically one gene per aqueous compartment (1).The genes are transcribed and translated within their compartments (2).Proteins with methyltransferase activity methylate the R±M sites (3).Compartmentalization prevents the methylation of genes in other compartments.The emulsion is broken, all reactions are stopped and the aqueous compartments combined (4).Digestion with the cognate restriction enzyme results in the digestion of unmethylated genes (which do not encode active methyltransferases) (5) and the survival of methylated genes (which encode active methyltransferases) (6).The surviving genes can be ampli®ed using the polymerase chain reaction and compartmentalized for further rounds of selection (7).(B) The sequence of the (R±M) site.Library A has a single NheI site (encoded in pIVEX.1s)and library B three NheI sites (encoded in pIVEX.3s).The 6 bp recognition site of NheI is shaded grey.Digestion by NheI is blocked by C 5 methylation at any of the cytosines in bold type.The two AGCC sites, which overlap each end of the NheI site, are underlined.
Fig.1.Selection of DNA methyltransferases by in vitro compartmentalization (IVC).(A) Schematic representation of the selection procedure.An in vitro transcription/translation reaction mixture containing a library of genes encoding mutant methyltransferases (MT), each with restriction± methylation (R±M) sites appended to the gene is dispersed to form a waterin-oil emulsion with typically one gene per aqueous compartment (1).The genes are transcribed and translated within their compartments (2).Proteins with methyltransferase activity methylate the R±M sites (3).Compartmentalization prevents the methylation of genes in other compartments.The emulsion is broken, all reactions are stopped and the aqueous compartments combined (4).Digestion with the cognate restriction enzyme results in the digestion of unmethylated genes (which do not encode active methyltransferases) (5) and the survival of methylated genes (which encode active methyltransferases) (6).The surviving genes can be ampli®ed using the polymerase chain reaction and compartmentalized for further rounds of selection (7).(B) The sequence of the (R±M) site.Library A has a single NheI site (encoded in pIVEX.1s)and library B three NheI sites (encoded in pIVEX.3s).The 6 bp recognition site of NheI is shaded grey.Digestion by NheI is blocked by C 5 methylation at any of the cytosines in bold type.The two AGCC sites, which overlap each end of the NheI site, are underlined.

Fig. 2 .
Fig. 2. Library design.The regions of M.HaeIII that are mutated in library A and library B are shown.Amino acid numbering is the same as in the untagged protein.Amino acids in bold type were diversi®ed as follows.Library A: codon 217 is replaced with WSW, encoding CRST; codon 224 replaced with RVC, encoding ADGNST; codon 225 replaced with NNS, encoding all 20 amino acids; codon 232 replaced with RAW, encoding DENK; and codon 233 replaced with SAA, encoding E and N.In library B codons 260, 261 and 262 are each replaced by NNS, fully randomizing the amino acids at all three positions.N = A, C, G or T, R = A or G, S = C or G, V = A, C or G, W = A or T.
b PCR-ampli®ed genes were translated in vitro and tested for the ability to protect NheIsub and HaeIIIsub from digestion by NheI and HaeIII, respectively.NheIsub contains a single NheI site, overlapped by the site AGCC.HaeIIIsub contains a single GGCC site.Values indicate the rate of DNA protection in fmol/ min/ml IVT.c The R225A mutation was identi®ed from the selection of library A for AGCC methylation and then served as the starting point for library B from which all the subsequent clones were derived (see text).

Fig. 3 .
Fig. 3. AGCC methylation activity of the unselected and selected libraries.AGCC methylation was determined by measuring the rate of protection of the 593 bp DNA NheIsub against digestion by the restriction endonuclease NheI.

Fig. 4 .
Fig. 4. Rate of methylation of long DNA substrates.The percentage of 10 nM substrate DNA protected from digestion by either HaeIII restriction endonuclease (GGCC) or NheI restriction endonuclease (AGCC) is plotted against time.0.5 nM wild-type (wt) M.HaeIII or T29 mutant was used to methylate either the 350 bp substrate DNA HaeIIIsub (GGCC) or the 593 bp substrate DNA NheI sub (AGCC).

Fig. 6 .
Fig.6.Model of the interaction of M.HaeIII and the ®rst base of the DNA target GGCC.The residues mutated in T29 are shown with atoms coloured: grey, carbon; red, oxygen; blue, nitrogen.DNA strands are coloured purple (methylation target) and green (complementary strand).The ®gure was prepared with the program SETOR (Evans, 1993) using the published crystal structure(Reinisch et al., 1995).

Table I .
Sequence and methylation activity of selected clones

Table II .
Apparent k cat and K m DNA values for methylation of 30 bp, hemi-methylated DNA substrates by M.HaeIII and the T29 mutant

Table III .
Improvements in catalytic ef®ciency on 30 bp, hemi-methylated DNA substrates k cat /K m DNA values of wild-type M.HaeIII and T29, normalized with k cat /K m DNA M.HaeIII GGCC = 1. a