We have determined the structure of PvuII methyltransferase (M.PvuII) complexed with S-adenosyl-l-methionine (AdoMet) by multiwavelength anomalous diffraction, using a crystal of the selenomethioninesubstituted protein. M.PvuII catalyzes transfer of the methyl group from AdoMet to the exocyclic amino (N4) nitrogen of the central cytosine in its recognition sequence 5′-CAGCTG-3′. The protein is dominated by an open α/β-sheet structure with a prominent V-shaped cleft: AdoMet and catalytic amino acids are located at the bottom of this cleft. The size and the basic nature of the cleft are consistent with duplex DNA binding. The target (methylatable) cytosine, if flipped out of the double helical DNA as seen for DNA methyltransferases that generate 5-methylcytosine, would fit into the concave active site next to the AdoMet. This M.PvuII α/β-sheet structure is very similar to those of M.HhaI (a cytosine C5 methyltransferase) and M.TaqI (an adenine N6 methyltransferase), consistent with a model predicting that DNA methyltransferases share a common structural fold while having the major functional regions permuted into three distinct linear orders. The main feature of the common fold is a seven-stranded β-sheet (6↓ 7↑ 5↓ 4↓ 1↓ 2↓ 3↓) formed by five parallel β-strands and an antiparallel β-hairpin. The β-sheet is flanked by six parallel α-helices, three on each side. The AdoMet binding site is located at the C-terminal ends of strands β1 and β2 and the active site is at the C-terminal ends of strands β4 and β5 and the N-terminal end of strand β7. The AdoMet-protein interactions are almost identical among M.PvuII, M.HhaI and M.TaqI, as well as in an RNA methyltransferase and at least one small molecule methyltransferase. The structural similarity among the active sites of M.PvuII, M.TaqI and M.HhaI reveals that catalytic amino acids essential for cytosine N4 and adenine N6 methylation coincide spatially with those for cytosine C5 methylation, suggesting a mechanism for amino methylation.
DNA methyltransferases (Mtases) transfer a methyl group from S-adenosyl-L-methionine (AdoMet) to a given position of a particular DNA base within a specific DNA sequence. The resulting methylation can protect the DNA from a cognate restriction endonuclease or can have epigenetic effects on gene expression. The DNA Mtases belong to two families: one methylates C5, a ring carbon of cytosine, yielding 5-methylcytosine (5mC), while the second family methylates the exocyclic amino group (NH2) of cytosine or adenine yielding N4-methylcytosine (N4mC) or N6-methyladenine (N6mA) respectively. Two of the 5mC Mtases have been structurally characterized as covalent reaction intermediate complexes with their DNA substrates (1,2); one of these, M.HhaI has been characterized in complexes with structural analogs of DNA in three different methylation states, unmethylated, hemimethylated and fully methylated (3,4).
The primary sequences of the 5mC Mtases share a set of conserved motifs (I-X) in a constant linear order (5–9). The majority of these motifs are responsible for three basic functions of the 5mC Mtases: AdoMet binding, sequence-specific DNA binding and catalysis of methyl transfer. In contrast, the amino-Mtases (which generate N6mA or N4mC) belong to three groups characterized by distinct linear orders for the conserved motifs (10). The three groups are named α (including Mtases such as Dam), β (including Mtases such as M.PvuII) and γ (including Mtases such as M.TaqI). To date only one DNA amino-Mtase has been structurally characterized, the group γ N6mA Mtase M.TaqI (11).
While the M.TaqI structure has been determined only in the absence of DNA, it is sufficient to allow general structural comparison with the 5mC Mtases. Both M.HhaI and M.TaqI are bilobal structures: one lobe contains a catalytic domain with both the active site for methyl transfer and the AdoMet binding site and the other lobe contains a target (DNA) recognition domain (TRD). The catalytic domains of the two proteins exhibit very similar three-dimensional folding (12). This folding pattern is also present in M.HaeIII, another 5mC Mtase, in catechol O-Mtase, a single domain small molecule AdoMet-dependent Mtase, in VP39, an mRNA cap-specific RNA 2′-O-Mtase and in glycine N-Mtase (2,13–15). The folding similarity includes the positions of conserved amino acid side chains involved in either AdoMet binding or catalysis; only the binding of AdoMet reported for glycine N-Mtase differs from the consensus pattern (15). Guided by this common catalytic domain structure, sequence alignment of amino-Mtases suggests that for all amino-Mtases to fit the consensus M.HhaI/M.TaqI catalytic domain structure, despite having different motif orders, different sets of topological connections would be required for the three DNA amino-Mtase groups (10).
Determining the structure of PvuII methyltransferase (M.PvuII), a group β N4mC Mtase, would thus address two important questions about DNA Mtases. First, do the N4mC Mtases in fact match the consensus catalytic domain structure seen between M.TaqI and M.HhaI (12)? Second, are the major structural elements of amino-Mtases connected in three different orders, as suggested by their primary sequences (10)? M.TaqI itself did not provide a strong test for this model because the group γ and 5mC Mtases have essentially the same motif order; they differ only in the position of motif X (10). No Mtase from group α or β has been structurally characterized before this report.
M.PvuII, part of the restriction-modification system from the Gram-negative bacterium Proteus vulgaris (16), modifies the internal cytosine of the recognition sequence 5′-CAGCTG-3′ (17) to generate N4mC (18). PvuII endonuclease, which cleaves duplex DNA at the center of the same recognition sequence to generate blunt-ended products, was structurally characterized earlier (19,20). With this report, the PvuII restriction-modification system becomes the first system for which the structures of the cognate endonuclease and methyltransferase have both been determined.
Materials and Methods
Overexpression and crystallization
Overexpression and purification of and selenomethionine (SeMet) incorporation into M.PvuII have been described previously (21). To crystallize the M.PvuII-AdoMet binary complex, 0.2 mM AdoMet was added to the pre-purified protein (∼5 µM) and the mixture was further purified by cation exchange chromatography (21). M.PvuII and selenomethionyl M.PvuII, complexed with AdoMet, both crystallized in the monoclinic space group P21 with unit cell dimensions of a = 48.8 Å, b = 112.4 Å, c = 59.3 Å and β = 109.2° (21). There are two molecules per crystallographic asymmetric unit cell, termed molecules A and B. X-Ray diffraction data were collected using a MarResearch imaging plate detector on beamline X12-C at the National Synchrotron Light Source, Brookhaven National Laboratory, and processed using the HKL software package (22). Multiwavelength anomalous diffraction (MAD; 23) data to 3.3 Å resolution (Table 1) were collected on a single frozen SeMet crystal at three different wavelengths, corresponding to the inflection point λ1 (minimum Δf′) and the peak λ2 (maximum Δf″) of the Se-containing crystal absorption spectrum and a third wavelength (λ3) remote from the peak position (21). A higher resolution data set (up to 2.8 Å) used for final model refinement was collected from a native crystal (l4 = 1.072 Å, 180_ rotation, 1.5° increment, 90 s exposure).
SeMet MAD phasing
There are a total of 18 possible Se sites per asymmetric unit (nine per molecule). To locate the Se positions, we calculated the anomalous and isomorphous difference Patterson maps at the Harker section (v = 1/2) among data sets collected at wavelengths λ1, λ2 and λ3. A number of peaks were observed, which corresponded to possible Se sites and the cross vectors between them (21). Five Se sites were first manually determined from the Patterson maps. These five sites were used to calculate initial estimates of phases, to compute the difference and Bijvoet difference Fourier synthesis and to search for additional Se sites. Finally, a total of 12 Se sites were determined and confirmed by the two-fold non-crystallographic symmetry (NCS) operator, revealed by a self-rotation function (21).
These 12 Se positions were used for MAD phasing by treating the data from each wavelength as a multiple isomorphous replacement experiment with the inclusion of anomalous scattering (MIRAS): native with native anomalous scattering (λ3), derivative isomorphous (λ1) and derivative isomorphous with anomalous scattering (λ2; Table 2). The MAD-MIRAS phases were improved using 40% solvent content by four, four and eight cycles of solvent leveling (24) following each of three envelope determinations (25). The solvent-leveled map was used to refine the NCS operator and to construct the averaging mask. The phases were further improved using 16 rounds of Furey's averaging protocols (25). The electron density was averaged within the mask, the density for each molecule was replaced with the average and the ‘averaged’ density map was inverted to obtain new phases. The resulting phases were combined with the original solvent-leveled MAD-MIRAS phases. The process was cycled until convergence was obtained (16 cycles).
The starting Cα backbone for molecule A was traced using the skeleton in program O (26) with reference to three maps at 3.3 Å resolution: MAD-MIRAS, solvent flattened and density averaged. The Se positions also provide markers for selenomethionine in the polypeptide chain tracing. The atomic coordinates for molecule B were generated by the two-fold NCS operator. After the initial model building, the atomic model was subjected to refinement against 2.8 Å resolution data from a native crystal (Table 3). Initially, a strict NCS was invoked, assuming that two NCS-related molecules are strictly identical. The two models were refined by simulated annealing and least squares minimizations using the X-PLOR program suite (27). Seven rounds of refinement and model rebuilding brought the crystallographic R factor to 0.22. The model was further refined by a restraint NCS, with two NCS-related atoms restrained in their average positions. An additional five rounds of refinement, refitting and placing ordered water molecules brought the R factor to 0.19 and Rfree to 0.28 (Table 3).
M.PvuII is produced in two forms, resulting from translation initiators 13 codons apart (17). The shorter form of M.PvuII, starting from the internal translation initiator at Met14, was overexpressed in Escherichia coli and purified both in native and selenomethionine-substituted forms (21). The M.PvuII polypeptide chain is 323 amino acids long (numbered 14-336). Diffraction data (Table 1) were collected at three X-ray wavelengths from a crystal of the selenomethionyl M.PvuII-AdoMet complex, so that MAD could be used to extract the phases (23). Following the suggestion of Ramakrishnan (28,29), multiwavelength data were treated as if they were from a conventional MIRAS experiment. A total of 12 (out of 18 possible) Se sites per asymmetric unit were determined from Patterson maps and were used for MAD phasing with a figure of merit of 0.62 at 3.3 Å resolution. The MAD-MIRAS map, coupled with two-fold non-crystallographic symmetry averaging, was accurate enough to permit an initial interpretation (21) and the model was finally refined to 2.8 Å resolution with a crystallographic R factor of 0.19 and an Rfree value of 0.28.
Overview of the M.PvuII structure
The polypeptide chain folds into a structure with a V-shaped cleft, big enough to accommodate duplex DNA (Fig. 1). The V-shaped cleft is formed by three loops on one side and a three-helix bundle on the other side. The methyl donor AdoMet binds at the bottom of the cleft, which consists of a twisted 10 stranded β-sheet around which six α-helices are arranged on both sides.
Figure 2 shows the topology diagrams of M.PvuII, M.HhaI and M.TaqI. For clarity and convenience, we retain the nomenclature of Schluckebier et al. (12) for the secondary structure assignment and of Posfai et al. (5) for the conserved motifs. Loops or turns are designated by their flanking secondary structures; two of them are termed the glycine-containing G loop (loop 1-A) and the proline-containing P loop (loop 4-D) (10). The catalytic domains of the three structures are all of the α/β type with a central β-sheet sandwiched between two layers of α-helices: helices αC, αD and αE located on one side and helices αZ, αA and αB on the opposite side of the sheet (Fig. 2a). The β-sheets in the three structures all contain five central adjacent parallel β-strands with strand order 5, 4, 1, 2, 3 and one antiparallel hairpin (β6 and β7) next to strand β5. The order of parallel strands is reversed once between β4 and β1. The majority of the active amino acids from conserved motifs (circled in Fig. 2) are located at the carboxyl ends or in loop regions outside the carboxyl ends of these parallel β-strands. In all three structures the AdoMet binding site is located at the carboxyl ends of strands β1 and β2 and the amino end of helix αC; and the active site at the carboxyl ends of strands β4 and β5 and the amino end of the strand β7 (see below). The N- and C-termini of the folded polypeptide are within the AdoMet binding region in all three structures: located in the region between helix αZ and strand β1 (M.HhaI in Fig. 2b), prior to helix αZ (M.TaqI in Fig. 2c) and between helix αB and strand β3 (M.PvuII in Fig. 2d).
M.PvuII fits the consensus fold for AdoMet-dependent Mtases
The TRD, which is associated with sequence-specific DNA recognition, lies in the smaller domain of the three bilobal Mtases discussed above. In the current structure of M.PvuII, this domain comprises only one helix (αF) and its associated loops (Fig. 2d). It is interesting how the TRD is connected to the catalytic domain in the three Mtases. In 5mC Mtases such as M.HhaI, helix αZ (motif X, part of the catalytic domain) is folded from the C-terminus, following the TRD. Thus, there are two connections between the catalytic domain and TRD (Fig. 2b). In group γ N6mA Mtase M.TaqI, helix αZ originates from the N-terminus and the TRD is linked to the catalytic domain through β9 only (Fig. 2c). Thus, in both M.TaqI and M.HhaI the functional regions are in the order (amino→carboxyl) AdoMet binding region, active site region and TRD, the major difference between them being that helix αZ is moved from the N-terminus in M.TaqI to the C-terminus in M.HhaI.
As predicted (10), the most pronounced difference in topology between M.PvuII and both M.HhaI and M.TaqI is the connection between the AdoMet binding and active site regions: the two regions are connected via the putative TRD (helix αF) in the order (amino→carboxyl) active site region, TRD and AdoMet binding region (Fig. 2d). The active site and AdoMet binding regions of M.PvuII fit the consensus structure of M.HhaI/M.TaqI/M.HaeIII/ catechol O-Mtase, regardless of the motif order in the primary sequence. We call this common catalytic domain structure the AdoMet-dependent methylase fold (Fig. 2a). This fold has also been observed in the RNA Mtase VP39, though helix αE is replaced by a β-strand (14).
We had predicted the folding of group β amino-Mtases, including M.PvuII (Fig. 2e), based on structure-guided sequence analysis (10). Overall, the prediction is quite accurate, though there are some significant differences between the prediction and the current model. Unexpectedly, part of the AdoMet binding region (β3-αC or motif III) is located upstream of the active site region, near the N-terminus of the polypeptide. This arrangement preserves the crossover between strands β3 and β4, but splits the coding for the AdoMet binding region into two distant parts of the gene. The β3-αC secondary structure (motif III) was predicted to originate from the C-terminus as a contiguous part of the AdoMet binding region. This prediction, which would result in no crossover connection between strands β3 and β4, was made in part because of the very short distance between the N-terminus of another group β Mtase (M.BamHII) and its strand β4 (Fig. 3). However, this crossover has been observed in all currently available DNA Mtase structures (5mC Mtases M.HhaI and M.HaeIII, group β Mtase M.PvuII and group γ Mtase M.TaqI), as well as in catechol O-Mtase, glycine N-Mtase and the RNA Mtase VP39, and is predicted to occur in group α amino-Mtase structures, with the crossover connection in a separate domain comprising the TRD (10).
Such a crossover is necessary to generate a so-called topological switch point (30), at which the strand order is reversed and loops connected to the carboxyl ends of the two adjacent strands (β1 and β4 in Fig. 2a) go in opposite directions. The positions of concave active sites can be predicted from such switch points in different types of α/β twisted open sheet structures, including arabinose binding protein (31), carboxypeptidase (32) and tyrosyl-tRNA synthetase (33).
As noted above, there are two molecules per crystallographic asymmetric unit cell, termed molecules A and B. The current model of molecule A contains residues a16–a178, a217–a335 and one AdoMet, while molecule B contains residues b16-b56, b69-b178, b215-b335 and one AdoMet. The r.m.s. deviation between 269 common Cα atoms of the final refined two molecules is 0.6 Å. In both molecules ∼40 amino acids (Pro179-Gly216), located immediately after strand β7 and before strand β8, were not modeled in the current structure because of poor electron density. This poor density suggests that these amino acids are very flexible. Consistent with this flexibility, four out of five preferred trypsin cleavage sites are within this 40 amino acid region: the primary cleavages occur on the carboxyl sides of Arg183 and Lys186 and are followed by slower cleavages carboxyl of Lys198, Lys208 and Arg323 (34). In fact, SDS-PAGE analysis of dissolved crystals indicates that some M.PvuII crystals contained limited amounts of protein that had been cleaved in this region. It is noteworthy in this regard that some 5mC Mtases are naturally made as two separate polypeptides that associate in the cell to form active enzyme (35,36).
In molecule B part of the catalytic P loop (amino acids 57–68) was also not modeled due to poor electron density. However the corresponding P loop in molecule A was modeled, though Leu58-Asn66 (red in Fig. 1) possessed the highest crystallographic thermal factors in the current refined structure. This flexibility may be due to the absence of the DNA in the crystal and suggests a potential conformational change upon DNA binding. Similarly, the catalytic P loop in M.HhaI, which contains the key catalytic amino acids Pro80-Cys81, undergoes a massive conformational change upon binding DNA, moving ∼25 Å toward the corresponding DNA binding cleft of the protein (1).
The binding site for AdoMet is adjacent to the carboxyl ends of strands β1, β2, the amino end of helix αC and the loop prior to helix αZ, regions that contain conserved motifs I, II, III and X respectively (Fig. 4). The interactions between AdoMet and M.PvuII are almost identical to those between AdoMet and M.HhaI (1), M.TaqI (11), catechol O-Mtase (13) and VP39 (14). Amino acid side chains interacting with AdoMet are found in spatially equivalent positions, except that Phe273 of M.PvuII and Phe18 of M.HhaI are in the G loop, while the corresponding Phe146 of M.TaqI is in helix αD (Fig. 2).
In motif I of group β Mtases (Asp-X-Phe-X-Gly), the amino acids Asp and Gly are invariant. In M.PvuII these correspond to Asp271 and Gly275 (Fig. 3). The side chain carboxylate of Asp271 (β1) makes two hydrogen bonds to the main chain amide group of Phe273 (G loop) and the side chain hydroxyl of Thr279 (αA) and these bonds stereochemically constrain the β1-loop-αA structure. A negatively charged amino acid corresponding to Asp271 has been found in the same position of motif I in all DNA Mtases sequenced so far, including Asp16 of M.HhaI and Glu45 of M.TaqI (9,10). The main chain amide group of Gly275 (G loop) hydrogen bonds to the side chain carboxylate of Glu294 (β2), which is another conserved negatively charged amino acid (motif II) that interacts with the ribose hydroxyls of AdoMet. Comparable backbone-side chain interactions occur in M.HhaI (Gly20-Glu40) and M.TaqI (Ala49-Glu71).
In M.PvuII the AdoMet binding α/β cluster (αZ→β1→αA→ β2→αB) is further stabilized by the interactions of Arg288 (an invariant arginine among group β Mtases located prior to strand β2; Fig. 3) with the side chain of Thr263 (loop Z-1) and backbone carboxyls of both Thr263 (loop Z-1) and Glu286 (loop A-2). Only three structurally characterized AdoMet binding proteins interact with AdoMet in substantially different ways from the nucleic acid Mtases and catechol O-Mtase. One of these proteins is the E.coli MetJ repressor (37), for which AdoMet is a co-repressor and not a substrate; another is the reactivation domain of E.coli methionine synthase (38), which uses AdoMet in a flavodoxin-coupled reductive methylation of cobalamin. The third is glycine N-Mtase which does have a region structurally very similar to the consensus AdoMet-binding regions, though that is not where AdoMet was bound in the reported structure (15).
AdoMet binding and target base binding sites are structurally similar to one another
The M.PvuII protein has approximate two-fold pseudo symmetry around the center of the cleft, due in part to the structural similarity of the AdoMet binding site to the active site. These sites are each dominated by comparable α/β clusters, αZ→β1→αA→ β2→αB and αC→β4→αD→β5→αE; the former includes motifs I, II and X and forms the bulk of the AdoMet binding region and the latter includes motifs IV-VI and forms the bulk of the active site region. The two α/β clusters can be superimposed by rotating strands β1 and β2 onto strands β4 and β5 (Fig. 4b). This yields an r.m.s. deviation of 0.7 Å for the Cα atoms of these β-strands. Similar superimposability has also been observed for the α/β clusters of the 5mC Mtases M.HhaI and M.HaeIII and the N6mA Mtase M.TaqI (10). This observation has led to the suggestion that the original Mtases arose after gene duplication converted an AdoMet binding protein into a protein that bound two molecules of AdoMet (see also 39–42) and that the two halves then diverged (10). Regardless of the evolutionary model, the M.PvuII structure suggests that this internal structural repeat is a feature common to most AdoMet-dependent Mtases. Only the reactivation domain of E.coli methionine synthase does not fit this pattern (38).
Predicted DNA binding and base flipping
It is very likely that the V-shaped cleft of the protein is where DNA binds. In the absence of large scale protein conformational changes, the cleft is large enough to accommodate double-stranded DNA without steric hindrance (Fig. 1b). Positively charged groups, capable of interacting with the DNA phosphate backbone, are prominent on the surface of the cleft from the P loop (Arg60-Lys-Lys62), loop 5-E (Lys103 and Arg108) and loops 6-7 (Lys138, Lys148-Arg-Lys150, Arg152 and Lys154). We have docked a 13mer B-DNA duplex containing the PvuII recognition sequence, taken from the R.PvuII-DNA structure (19), against the basic face of the cleft (Fig. 1b). The fit of the DNA in the cleft is extremely convincing, with the protein occupying a distance of ∼37 Å along the axis of the double helix, which suggests that M.PvuII intimately contacts a 10 nt stretch including the 6 nt recognition sequence.
The M.HhaI-DNA structure provided the first example of base flipping (1). Several other types of enzymes are now also known or believed to use this approach (43,44), including the DNA repair enzymes T4 endonuclease V and human uracil-DNA glycosylase (45,46). The M.PvuII structure is consistent with a base flipping mechanism. Base flipping is a process by which an enzyme can rotate a DNA nucleotide out of the double helix, breaking only the base pairing hydrogen bonds and trapping it in a protein binding pocket. In our docking model the DNA is positioned such that the target cytosine is in the helix and the NH2 group to be methylated is far from the active CH3 group of AdoMet (Fig. 1b). Thus it is likely that M.PvuII (an amino-Mtase) flips the cytosine out of the DNA helix to access the target amino group (Fig. 1c), in a manner similar to that employed by 5mC Mtases, M.HhaI and M.HaeIII (1–4). The structure of M.TaqI and spectroscopic data for M.EcoRI suggest that these two amino-Mtases flip the target adenine out of DNA (47,48).
Although it is possible to predict where DNA binds, we cannot identify any known DNA binding motifs in the current structure that might be responsible for DNA sequence specificity. Furthermore, there is no obvious similarity between M.PvuII and the structures of R.PvuII or MyoD in complex with DNA (19,49); both are homodimeric proteins recognizing the same DNA sequence, CAGCTG. R.PvuII uses a β-ribbon motif to interact with nucleotides in the DNA major groove, while the myogenic transcription factor MyoD is a basic helix-loop-helix protein. The lack of obvious similarity may reflect the disparate roles of these three CAGCTG-recognizing proteins. DNA Mtases carry out base flipping (within specific nucleotide sequences) so they can access the atom to be methylated on the target nucleotide. Such a mechanism is not required for other sequence-specific proteins, such as transcription factors (for which specific binding is the main role) and restriction endonucleases (which only act on the readily accessible DNA phosphate backbone).
As mentioned before, only two 5mC Mtases, M.HhaI and M.HaeIII, have been structurally characterized in complex with their DNA substrates. The protein-base contacts in the recognized sequence are expected to differ between M.HhaI and M.HaeIII due to their different specificity and, indeed, the folding of the corresponding TRDs is different (2). However, both TRDs contain a shared feature: two recognition loops (1,2,44). In the M.PvuII structure, two loops (prior to and after helix αF) on the other side of the V-shaped cleft could easily fit into the concave face of the major or minor groove of B-form DNA. These two loops, which may correspond to the two 5mC recognition loops, are held in place through scaffolding made up of three helices, αF, αB and αB1. A similar pair of recognition loops has also been proposed for M.TaqI (47). The reason for such conservation may be that sequence recognition is a part of the base flipping mechanism and loops, instead of the more rigid structures of α-helix or β-strand, are used for discriminating DNA sequences flexibly and effectively.
Predicted catalytic mechanism for DNA amino methylation
What we call the catalytic P loop of the amino-Mtases was found in early sequence comparisons and called an ‘Asp-Pro-Pro-Tyr motif’ based on its sequence (50,51). A later comparison suggested it might correspond to Pro-Cys (motif IV) in 5mC Mtases, even though the reaction mechanisms of the two families of Mtases appear to be quite distinct (52). The structural comparison of M.HhaI and M.TaqI has confirmed that the Pro-Cys and Asn-Pro-Pro-Tyr motifs of these two enzymes are spatially equivalent (12) and thus, by analogy, are referred to as motif IV (10). Motif IV has the consensus sequence Ser-Pro-Pro-Tyr for N4mC Mtases, Asp-Pro-Pro-Tyr for groups α and β N6mA Mtases and Asn-Pro-Pro-Tyr for group γ N6mA Mtases (10,53,54). However, as we discuss below, Ser→Asp→Asn must not present an essential functional difference. We note that these consensus sequences are not absolute and there is still a problem in distinguishing N4mC from N6mA Mtases just on the basis of amino acid sequence (see Fig. 3).
The flipped cytosine, taken from the M.HhaI-DNA structure, can be docked surprisingly well into the M.PvuII active site, located at the bottom of the V-shaped cleft. By superimposition of the common α/β-sheet structures, the active site amino acids in M.HhaI from the catalytic P loop and strands β5 and β7 overlap the corresponding amino acids in M.PvuII: Gly78-Phe-Pro-Cys81 onto Ser53-Pro-Pro-Phe56 (P loop), Glu119 onto Asp96 (β5) and Arg165 onto Asn158 (β7) (Fig. 5a). In M.HhaI these amino acids interact with the target cytosine: Arg165 interacts with O2, Glu119 with N3 and N4, the main chain carbonyl of Phe79 with N4 and Cys81 covalently bonds to C6. Though M.PvuII also interacts with cytosine, we do not observe the identical amino acids in the same structural elements in M.PvuII. However, as noted above, different amino acids are spatially equivalent in the two enzymes.
One can easily model the interactions between the polar edge of the flipped cytosine and M.PvuII (shown in brown in Fig. 5a). The target atom, cytosine N4, would have two possible hydrogen bond partners: the hydroxyl group of Ser53 and the main chain carbonyl of Pro54 (the first two amino acids of the highly conserved motif IV). Also from this conserved motif, the phenyl ring of Phe56 could make van der Waals contacts with the cytosine ring; Phe56 occupies a position similar to Cys81 of M.HhaI. Asn158 (β7), which does not appear to be conserved among the N4mC Mtases, might hydrogen bond to cytosine O2.
Asp96 (Asn in most of the other N4mC Mtases) may hydrogen bond with and activate the Ser53 hydroxyl group (Asp96:Oδ2… Ser53:Oγ = 2.7 Å), thereby facilitating proton transfer from the cytosine amino group through the Ser and eventually to the Asp (Fig. 6a). If this occurs, the protonated Asp96 might then hydrogen bond to the N3 of the cytosine. Ser53 and Asp96 thus appear to belong to a charge relay system analogous to that seen in the serine proteinases (55).
Most importantly, the distance of the AdoMet methyl group to the cytosine N4 is ∼4 Å in our docking model, sufficiently close to permit methyl group transfer. For comparison, in the structures of M.HhaI-DNA complexes the substrate cytosine C5-AdoMet methyl distance is ∼2.9 Å (56); the product 5mC methyl-AdoHcy sulfur is also ∼2.9 Å (4). Thus, our model suggests that methylation of the exocyclic amino group results from a direct attack of the activated cytosine N4 on the AdoMet methyl group, in analogy with the previously proposed mechanism for DNA adenine methylation (12,57,58).
In the group β N4mC Mtases, Ser53 of M.PvuII is conserved except in M.BamHI, which has Asp at this position (Fig. 3); a conserved Asp is present in the same place in the group β N6mA Mtases, as well as in the group α N6mA Mtases (10). Modeling suggests that Asp in this position of the P loop could interact with cytosine N4 and N3 (M.BamHI) or adenine N6 and N1 (Fig. 6b). The Asp carboxyl group could hydrogen bond with cytosine N4 (NH2) or adenine N6 (NH2), thereby increasing the nucleophilicity of the nitrogen and serving as a trap for the amino-leaving proton, when the methyl group transfers to the nitrogen from AdoMet. In that case the protonated carboxyl group could hydrogen bond with cytosine N3 or adenine N1. If this is correct, the conserved Asp in M.BamHI and the N6mA Mtases may be functionally comparable to Asp96 in M.PvuII. Ser53 in M.PvuII may compensate for the fact that Asp96 is too far from the cytosine N4 for direct interaction (Figs 5a and 6a), but this does not explain why most N4mC Mtases do not simply have Asp in place of Ser, as is seen in M.BamHI.
When the structures of M.PvuII and M.TaqI, a group γ N6mA Mtase, are superimposed at their common α/β-sheet structures, Asn105 of Asn-Pro-Pro-Tyr (P loop) in M.TaqI is present in place of Ser53 of Ser-Pro-Pro-Phe in M.PvuII; and two hydrophobic amino acids, Phe196 (loops 6–7) and Val163 (β5), of M.TaqI replace the positions of two polar/charged groups (Asn158 and Asp96) of M.PvuII. These hydrophobic amino acids, particularly Phe196, are likely to make van der Waals contacts with the target nucleotide (12). The carboxamide of M.TaqI Asn105 could interact with both adenine N6 and N1 (Fig. 6c), similar to the role Asn229 of thymidylate synthase plays in hydrogen bonding to dUMP (see figure 3 of 59). However, Asn229 of thymidylate synthase plays a contributory but non-essential role in catalysis (60). In contrast to Asp or Ser, it is unlikely that Asn can accept a proton. Therefore, the only role obvious at the present time played by the Asn of the group γ N6mA Mtases is in positioning the substrate adenine, while the methylation would result from a direct attack of the AdoMet methyl group on the adenine N6 with a general base (possibly a highly ordered water molecule) assisting the proton transfer that occurs at N6.
A second AdoMet molecule
Some Mtases, including M.PvuII, appear to bind two molecules of AdoMet (34,61,62), one of which affects the selectivity of the protein towards substrate and non-specific DNA sequences (for M.EcoDam; 61,62). Extra electron density (from 2Fo — Fc, Fo — Fc and initial MAD-MIRAS maps) was found near the first AdoMet in molecule A. This may be a second AdoMet, as the density can be fitted well to an AdoMet adenosyl moiety with the methionine moiety extending into the solvent. This second AdoMet binding site is formed by the first AdoMet molecule at the bottom, Tyr299 (αB) and His246-Pro247 (loop F-Z) on one side and the main chains of Pro55 and Phe56 (P loop) on the other (Fig. 5b). The adenine sits above the ribose ring of the first AdoMet. Most interestingly, the second AdoMet ribose oxygens interact with the side chain of Glu37 of crystallographic symmetry-related molecule B. This interaction, analogous to the first AdoMet-Glu294 (β2, motif II), may stabilize the second AdoMet in molecule A, due to the different crystal packing environment.
However, despite the structural similarity of the AdoMet binding and active sites (Fig. 4b), this second AdoMet molecule does not occupy the active site (Fig. 5b). Instead, the second AdoMet occupies a space equivalent to the solvent channel in the M.HhaI-DNA structure, where a network of well-ordered water molecules, including that proposed as the general base for eliminating the C5 proton, mediates contacts between the target cytosine, AdoHcy and M.HhaI (see figure 3 of 4).
Evolutionary relationships among the DNA Mtases
As noted above, the structure of M.PvuII confirms two predicted features of DNA Mtase structure. First, all DNA Mtases structurally characterized to date have AdoMet adenine binding pockets that are superimposable onto their methylatable base binding pockets (10; see Fig. 4b). Second, all DNA Mtases structurally characterized to date share a common α/β architecture for their catalytic domains, making different topological connections to accommodate the permuted linear orders of functional regions in their genes (see Fig. 2).
These two features have implications for models of the evolutionary relationships among DNA Mtases. The internal symmetry provided by the two binding pockets, each formed by a comparable set of α helices and β strands, is suggestive of evolution by gene duplication (10). Subsequent gene fusion could have converted the resulting small molecule Mtase to a DNA Mtase by adding a TRD; some DNA Mtases are still produced in two separate pieces that associate to form active enzyme and one piece is essentially the TRD while the other is the catalytic domain (35,36).
The second feature, common structure despite permuted gene orders, raises a question. Do the four groups of DNA Mtases (α, β, γ and 5mC) represent divergence from a common ancestor or convergence from separate Mtase lineages? Matthews et al. (63) have proposed a set of six criteria for distinguishing divergence from convergence: the DNA sequences of the genes should be similar, the amino acid sequences of the proteins should be similar, the three-dimensional structures should be similar, the enzyme-substrate interactions should be similar, the catalytic mechanisms should be similar and «…those segments of the polypeptide chain that are critical for catalysis are in the same sequence in the respective proteins (i.e. insertions and deletions are allowed, but not transpositions)’. There is as yet no structure for a Mtase of the α group, but Mtases from the other three groups (where the information is known) satisfy all except the last criterion (Fig. 2): the DNA Mtase groups have the major functional regions in three permuted gene orders (10). We can only note that several proteins have been found to remain structurally and functionally intact following circular permutation of their genes (64–68) and that genetic mechanisms for gene permutation have been proposed (69). Whether convergence or divergence describes the relationship between the DNA Mtases, it is clear that the N4mC Mtases such as M.PvuII do not represent a separate subfamily of enzymes.
Note Added in Proof
Since acceptance of this paper, the structure of an AdoMetdependent protein methyltransferase has been published (72). The Salmonella typhimurium CheR protein matches the consensus Mtase structure very well, including the binding of AdoHcy in the expected AdoMet pocket.
We acknowledge Robert M.Sweet for help with X-ray data collection at the Brookhaven National Laboratory, in the Biology Department single-crystal diffraction facility, at beamline X12-C in the National Synchrotron Light Source. That facility is supported by the US Department of Energy, Office of Health and Environmental Research and by the National Science Foundation. We thank Thomas Malone for preparing Figures 2, 3 and 6, Rowena G.Matthews for critical discussions, Richard J.Roberts and David T.F.Dryden for comments on the manuscript and Kim Gernert for preparing the cover figure. This report is partially funded by a National Institutes of Health fellowship (GM17052 to M.O'G.), the National Science Foundation (MCB-9631137 to R.M.B.) and the National Institutes of Health (GM49245 and GM/OD52117 to X.C.).