Structures of Escherichia coli DNA mismatch repair enzyme MutS in complex with different mismatches: a common recognition mode for diverse substrates

We have reﬁned a series of isomorphous crystal structures of the Escherichia coli DNA mismatch repair enzyme MutS in complex with G:T, A:A, C:A and G:G mismatches and also with a single unpaired thymidine. In all these structures, the DNA is kinked by ~60 o upon protein binding. Two residues widely conserved in the MutS family are involved in mismatch recognition. The phenylalanine, Phe 36, is seen stacking on one of the mismatched bases. The same base is also seen forming a hydrogen bond to the glutamate Glu 38. This hydrogen bond involves the N7 if the base stacking on Phe 36 is a purine and the N3 if it is a pyrimidine (thymine). Thus, MutS uses a common binding mode to recognize a wide range of mismatches.


INTRODUCTION
Genomic integrity in organisms is maintained by a number of important DNA repair pathways. The DNA mismatch repair (MMR) pathway repairs mismatches and short insertion or deletion loops (IDLs). In addition, MMR also helps in preventing recombination between homologous but diverged DNA sequences (1±3). The fundamental mechanisms of MMR are similar in all organisms ranging from Escherichia coli to humans. In E.coli, MMR is initiated when the enzyme MutS recognizes and binds to mismatches or IDLs. This is followed by the uptake of ATP by MutS and the formation of a complex between MutS and the enzyme MutL. This complex initiates a number of events, leading to the recognition of the daughter strand, followed by its removal and resynthesis. In humans, the role of MutS is played by its homologs, the heterodimers MSH2/MSH6, which binds mismatches and IDLs, and MSH2/ MSH3, which binds longer loops (1,2). The role of MutL in humans is played by the heterodimer MLH1/PMS2. Mutations in the genes that encode MMR proteins lead to mutator phenotypes in bacteria and cause a predisposition to cancer, called hereditary non-polyposis colorectal carcinomas (HNPCC) in humans (4).
Two structures of MutS±DNA complexes have been reported already, the Thermus aquaticus enzyme in complex with a single unpaired thymidine (5), and the E.coli enzyme in complex with a G:T mismatch (6). The striking feature in both these structures is a sharp 60°kink in the DNA at the site of the mismatch±protein interaction. Mismatch recognition by MutS involves a phenylalanine, widely conserved in the MutS family (Phe 36 in E.coli, Phe 39 in Taq), which is seen stacking on one of the mismatched bases. The same base is also seen forming a hydrogen bond to a widely conserved glutamic acid (Glu 38 in E.coli, Glu 41 in Taq). In both these structures, the conserved phenylalanine stacks on the thymidine. To ®nd out how other mismatches are recognized, we have solved the structures of E.coli MutS in complex with A:A, G:G, C:A mismatches and an unpaired thymidine. Our results show that all these different lesions are recognized in a similar way, indicating a common binding mode for all mismatches.

Protein expression and puri®cation
DC800, an 800 residue C-terminal deletion construct of E.coli MutS (853 residues) in a pET3d vector (6), derived from the pMQ372 plasmid (7) was used to transform B834 (DE3) pLysS cells. A colony was picked, inoculated into 10 ml of minimal medium (8) and allowed to grow overnight at 30°C. This culture was diluted into 1 l minimal medium and grown at 30°C till it reached an OD (600 nm) of~0.7. The temperature was then lowered to 23°C and the culture induced with IPTG (®nal concentration 1 mM) for 4 h. The cells were harvested, suspended in 25 ml of lysis buffer [50 mM HEPES pH 7.5, 200 mM NaCl, 10 mM b-mercaptoethanol, 5 mM EDTA, 1 mM PMSF and two protease inhibitor tablets (Roche)] and lysed by sonication. Following centrifugation at 39 000 r.c.f. for 50 min at 4°C, the cleared lysate was subjected to streptomycin precipitation. The volume of the lysate was ®rst measured and streptomycin sulphate solution (25% w/v in water) added drop by drop while stirring on ice (9). The volume of streptomycin solution added was equal to 25% of the initial volume of the lysate. This was further stirred on ice for 15 min and centrifuged at 3000 r.c.f. for 25 min at 4°C. The supernatant from this step was subjected to ammonium sulphate precipitation by adding saturated ammonium sulphate solution, drop by drop, while strirring on ice continuously (9). The volume of the ammonium sulphate solution added was equal to 62% of the volume of the supernatant. This was stirred further on ice for 25 min and cleared by centrifugation (3000 r.c.f. for 25 min at 4°C). The pellet obtained was resuspended in GF1 buffer (25 mM HEPES pH 7.5, 150 mM NaCl, 10 mM b-mercaptoethanol, 5 mM EDTA and 0.1 mM PMSF) and applied on a Superdex 200 gel®ltration column (Pharmacia) pre-equilibrated with the same buffer. The peak corresponding to the dimer (160 kDa) was pooled and applied on a Mono-Q HR 10/10 ion-exchange column (Pharmacia) using buffers A (25 mM HEPES pH 7.5, 10 mM b-mercaptoethanol, 5 mM EDTA and 0.1 mM PMSF) and B (A with 1 M NaCl). The protein was eluted using a gradient running from 10 to 50% (buffer B) over 10 column volumes. The peak eluting between 20 and 42% of buffer B was pooled and its salt concentration was adjusted to~150 mM by adding buffer A. This was then applied on a HiTrap Heparin HP column (Pharmacia) (4 Q 5 ml), which used the same buffers A and B, as the Mono-Q column. The protein was eluted using a gradient of 16±100% (buffer B) over 9 column volumes with the protein coming off between 52 and 70%. The ®nal puri®cation step was a second gel®ltration using a Superdex 200 column pre-equilibrated with GF2 buffer (25 mM HEPES pH 7.5, 250 mM NaCl, 10 mM b-mercaptoethanol). The fractions corresponding to the peak were pooled and concentrated to~14 mg/ml using a Centriprep concentrator (Millipore). Aliquots were then¯ash frozen in dry ice-ethanol and stored at ±80°C.

DNA substrates
The two single strands of DNA puri®ed by the reverse-phase cartridge puri®cation method (Sigma-Genosys), were dissolved in 10 mM Tris±HCl pH 7.5, 1 mM MgCl 2 and annealed on a heat block. The purity of the ®nal double stranded product was checked using a 20% native polyacrylamide gel stained with ethidium bromide. The sequence of the top strand was 5¢-AGC TGC CAM GCA CCA GTG TCA GCG TCC TAT and that of the lower strand was 5¢-ATA GGA CGC TGA CAC

Crystallization
The MutS±DNA substrates were mixed in a ratio of 2.8 (MutS monomer) to 1 (double stranded DNA) and crystallized using the hanging drop technique. Microseeding was done to improve the crystal quality. The quality of the crystals improved further upon addition of 0.1 mM ADP to the protein±DNA mixture. All crystals grew in the same space group, from a well solution containing 11±14% PEG 6000, 350±750 mM NaCl, 10 mM MgCl 2 and 25 mM HEPES pH 7.5. Prior to data collection, cryobuffer (30% PEG 6000, 15% glycerol, 300 mM NaCl, 10 mM HEPES pH 7.5) was gradually added into the crystallization drop. The crystals were then removed, soaked into a drop of pure cryobuffer and frozen in liquid nitrogen.

Data collection, structure solution and re®nement
All data collection was done either at the ESRF in Grenoble, France or at the EMBL outstation at DESY, Hamburg, Germany, and the data processed using the HKL suite (9) ( Table 1). The structure of the MutS±G:T complex (6) was used as a model for structure solution. Unless otherwise speci®ed, all re®nement jobs were carried out using REFMAC5 (10) in the CCP4 suite (11). The waters, DNA and ligands (ADP-Mg) were ®rst removed and rigid body re®nement was carried out using the two protein monomers as rigid domains. This was followed by 20 cycles of rigid body re®nement using the individual domains of the protein as rigid bodies. After this, restrained re®nement was done and the ®rst electron density maps were generated. The DNA with the corresponding mismatch and the ADP-Mg were then built into the difference density using the program O (12). Torsion angle re®nement for the lower resolution structures (MutS±C:A, MutS±unpaired T) using CNS (13) and TLS re®nement using REFMAC5 (14) for all the structures were performed which led to improved R free values ( Table 2). The domains used in the rigid body re®nement were used as TLS groups during the TLS re®nement. Waters were built into the structures using ARP/wARP (15). All the structures were re®ned to good stereochemistry ( Table 2) with >99% of the residues in the allowed and additionally allowed and none in the disallowed regions of the Ramachandran plot. Stereochemical checks on all structures were carried out using WHATCHECK (16) and the DNAs were analyzed using the program 3DNA (17). All ®gures except 1C and 1D were generated using MolScript (18) and Raster3D (19).

Overall structures of the MutS±mismatch complexes
The overall structures of the MutS complexes are very similar to the published structure of the MutS±G:T complex (6) (Fig. 1A). Since the attempts to crystallize the protein in complex with a 16 bp oligo have been unsuccessful, the remaining bases could play a role in stabilizing the crystal packing or prevent alternative packing modes. Several contacts between the protein and DNA are seen, which are generally conserved in all the structures. The protein±DNA interface in the complexes is extensive, comprising of many hydrogen bonds, salt bridges and Van der Waals interactions (Fig. 1B, C and D). The surface area of the protein±DNA interface is~1850 A Ê 2 . The mismatch binding domain (residues 1±115) of monomer A accounts for more than half of this area (970 A Ê 2 ) with several residues from it forming both hydrophobic and hydrophilic contacts to the DNA (Fig. 1B, C and D). The other three domains in contact to the DNA have much smaller interfaces. They are the clamp domain (residues 450±512) of monomer B (525 A Ê 2 ), clamp domain of monomer A (285 A Ê 2 ) and the mismatch binding domain of monomer B (105 A Ê 2 ). The contacts from these domains are predominantly hydrophilic (Fig. 1C and D).

Mismatch binding by MutS
The most striking feature of the DNA in all the complexes is a sharp kink of~60°with the mismatched bases located at the vertex of the kink (Fig. 1B). This kinking causes a widening of the minor groove around the mismatch. The distance between the backbone phosphates (P±P distance) of the mismatched base pairs increases to 21±22 A Ê from an average of 11±12 A Ê for the rest of the base pairs in the minor groove. Mismatch binding by MutS involves the stacking of a phenylalanine residue, Phe 36 of one of the monomers, onto one of the mismatched bases. The same base is reoriented such that a particular nitrogen on it is brought into proximity to the glutamate, Glu 38. This enables the formation of a hydrogen bond between the carbonyl oxygen (OE2) of the glutamate and the nitrogen of the base. In the structures of the MutS±G:T and the MutS±unpaired T complexes, Phe 36 stacks over the thymidines with their N3s forming the hydrogen bonds to Glu 38 (Figs 2B and 3A). In the structures of the MutS±C:A and MutS±A:A complexes, Phe 36 stacks on the adenosines and their N7s form hydrogen bonds to Glu 38 ( Fig. 2D and H). The same is seen in the MutS±G:G complex (Fig. 2F) where the N7 of the guanosine is seen in this conformation. The purine bases on which Phe 36 stacks are in the syn orientation in contrast to the thymidines in the G:T mismatch and the unpaired thymidine complex which are in the anti orientation of the glycosyl bond.
In the structure of the MutS±unpaired T, more severe unstacking and disruptions in the base pairs adjacent to the unpaired thymidine, Thy 22, are seen ( Fig. 3A and B). Phe 36 stacks on the unpaired Thy 22, which is seen forming a G:T base pair with the Gua 9 with signi®cant rearrangements taking place in the Gua 9: Cyt 21 and Gua 10: Cyt 20 Watson± Crick base pairs. The contacts between the protein and the DNA are preserved (Fig. 1C and D) and are identical to those of the MutS±mismatch complexes. A small difference is the conformation of the loop between Ala 60 and Gly 63, which causes a change in the orientation of the side chain of Arg 58 (Fig. 1C and D). Since this loop is not visible in the other monomer due to lack of density, it seems to be highly mobile and takes on a different conformation in each of the structures (Fig. 2G).

DISCUSSION
Mismatch recognition by MutS and the effect of mismatches in DNA have been widely studied, biochemically and structurally. In our structures, MutS binds to DNA containing single G:T, A:A, C:A, G:G mismatches and an unpaired thymidine in a very similar way. Most protein±DNA interactions are conserved ( Fig. 1C and D) among the complexes. Although the packing of these molecules is similar, it is clear that rearrangements of loops and side chains would be possible. In fact there is substantial rearrangement of the DNA in the complex with the unpaired thymidine. However, the interface between the DNA and protein is remarkably similar in all the ®ve structures of the E.coli MutS, while the A similar kink in the DNA has also been seen in the structure of Taq MutS in complex with an unpaired thymidine. This seems to be an important requirement as a straight piece of DNA would lead to severe Van der Waals clashes with the mismatch binding domains (Fig. 1B). Phe 36, which is widely conserved in the MutS family, is seen stacking on to one of the mismatched bases. It has been shown that mutating this phenylalanine to an alanine eliminates both DNA binding and MMR by MutS (23).
Comparison of the mismatches bound to MutS to those in crystal structures of free oligos ( Fig. 2A and B The widening of the minor groove upon kinking of the DNA probably gives the protein enough room to reorient these bases to achieve this. In the MutS±G:T structure, the protein only has to shift the thymidine from its unbound position ( Fig. 2A and B) to expose the N3 to Glu 38. In the MutS±G:G structure, a similar rearrangement of the syn guanosine ( Fig. 2E and F) is enough to expose the N7 to Glu 38. In contrast, in the MutS± C:A structure, the adenosine is rotated around its C1¢±N9 bond, from its anti orientation in the unbound state (Fig. 2C) to the syn orientation (Fig. 2D). In the MutS±unpaired T structure ( Fig. 3A and B) and in the Taq MutS±unpaired T complex (5) (Fig. 3C and D) the N3s of the thymidine stacking on to Phe 36/Phe 39 form this hydrogen bond. These data suggest a scenario where the stacking of Phe 36 on any pyrimidine would lead to the N3 forming a hydrogen bond to Glu 38 while the stacking on a purine would involve its N7 forming the same hydrogen bond.
Glu 38 is a widely conserved residue in the MutS family of proteins. Besides forming the hydrogen bond to the base stacked upon by Phe 36, the role played by Glu 38 in mismatch recognition remains unclear. It has been shown that mutating Glu 38 to an alanine destroys MMR activity in MutS and increases the af®nity of the protein towards homoduplex DNA (24). The requirement of a hydrogen bonding donor/acceptor for this residue in the base stacking on Phe 36 has also been demonstrated (24). Removal of the N3 of the thymidine by replacing it with di¯uorotoluene, which lacks the N3, leads to an 8-fold decrease in mismatch binding af®nity by MutS (24). Replacement of the adenosine with 4-methylbenzimidazole, which lacks the N6, N1 and N3, also shows a similar effect. Although in the MutS±C:A and MutS±A:A structures, the N7s of the adenosines form hydrogen bonds with Glu 38, the N3s, N6s and N1 are involved in stabilizing the complex by forming base pairing hydrogen bonds (Fig. 2D and H). Thus, the disruption of any of these sites can affect the complex formation with MutS.
The purine N7±Glu 38 (OE2) hydrogen bond in the MutS± A:A, MutS±C:A and MutS±G:G structures is unexpected since neither of the atoms involved is protonated. Therefore, either  (28) have shown that mutating Glu 38 into a glutamine improves homoduplex DNA binding relative to mismatch binding, thereby eliminating MMR completely. This glutamine would be able to make the same hydrogen bond and so the role of this residue in mismatch recognition seems complex. It has been suggested that the acidity of the glutamate plays a role in kinking the DNA during mismatch recognition (28). More evidence on the protonation of Glu 38 or the tautomerization of the purines awaits biochemical testing.
The extensive contacts between the protein and DNA play an important role in the stabilization of the protein±DNA complexes as it has been seen that the mismatch binding and the clamp domains are disordered in the absence of the DNA (5). An interesting observation is the involvement in DNA binding, of many other residues widely conserved in the MutS family besides Phe 36 and Glu 38. In the mismatch binding domain, Arg 108 and Gln 95 side chains seem to be important as they are not only conserved in Taq MutS, where they form hydrogen bonds to the DNA (5), but also in eukaryotic MSH3 and MSH6. Since the eukaryotic MSH2±MSH6 complex is involved in the recognition of mismatches and short IDLs and MSH2±MSH3 recognizes longer IDLs (1,2) these residues could play a role in mismatch recognition. Conserved residues are also seen making contacts to the DNA in the clamp domain. Of these, Lys 496 and Arg 500 are conserved in Taq MutS where they are involved in hydrogen bond formation to the DNA. Lys 496 is also conserved in eukaryotic MSH2, MSH3 and MSH6 and so may play an important role in DNA binding, while Arg 500, conserved in MSH3 and MSH6, may play some role in mismatch recognition.
Although several crystallographic studies have shown that the presence of a mismatch in DNA does not change its structure dramatically (20±22,29), mismatches destabilize DNA. This can be seen by the reduction in melting temperatures of DNA upon incorporation of a mismatch (30,31). A mismatch binding enzyme like MutS could be making use of this local weakening to detect the presence of mismatches and   (24,32,33) and that it remains localized on the chromosomes in cells (34) suggests that it stays on DNA all the time, constantly scanning for mismatches.
A comparison of MutS±mismatch complexes with the MutS±unpaired T complex reveals only a few differences. The protein±DNA contacts are largely the same, with the exception of Arg 58 of the mismatch binding monomer adopting a different conformation in the MutS±unpaired T complex (Figs 1C, D and 2G). The main difference in DNA binding between the Taq MutS±unpaired T (Fig. 3C and D) and our E.coli MutS±unpaired T complex is the signi®cant rearrangement in the Watson±Crick base pairs adjacent to the unpaired T in the E.coli MutS complex ( Fig. 3A and B). In fact, the E.coli enzyme appears to recognize this as a G:T base pair with the unpaired T seemingly base paired to Gua 9. An interesting parallel in the difference between two similar enzymes binding the same substrate is seen in the structures of two methyltransferases in complex with DNA. While the structure of HaeIII methyltransferase-DNA complex (35) shows similar rearrangements of adjacent Watson±Crick base pairs upon recognition of the target cytosine, that of the HhaI methyltransferase±DNA complex (36) does not. Apparently such rearrangements are possible but not essential features of substrate binding by these enzymes. However, such rearrangements in DNA have so far only been observed in G:C base pairs (35,37). In both the HaeIII methyltransferase (35), and in our E.coli MutS±unpaired thymidine structures, the rearrangement involves G:C base pairs adjacent to the base being recognized (Fig. 3B). The HaeIII methyltransferase has only one G:C base pair being rearranged while our MutS structure has two (Fig. 3B). An explanation for the rearrangements occurring in G:C base pairs could be that there are more possibilities for creating new hydrogen bonds compared to A:T base pairs. So the energetically unfavourable rearrangement of the base pairs by the protein is at least partially compensated by the formation of new stabilizing hydrogen bonds. This can be seen in our MutS±unpaired T structure where the rearranged Gua 9:Cyt 21 and the Gua 10:Cyt 20 bp form an extensive network of hydrogen bonds, also involving the protein and the unpaired thymidine (Fig. 3B). Further, since mismatch binding by MutS is also known to be in¯uenced by sequence context (32,38) the involvement of neighbouring Watson±Crick base pairs is signi®cant. It suggests that the protein, in addition to kinking the DNA and rearranging the base pairs of the mismatch itself, may use the rearrangement of the adjacent base pairs in order to obtain the common binding mode.

CONCLUSION
We have shown that MutS binds to A:A, C:A and G:G mismatches by stacking Phe 36 over the purine and either keeping it or bringing it into the syn orientation to expose the N7 to Glu 38 for hydrogen bonding. In the G:T and unpaired T, MutS binds in such a way that the N3 of the thymidine forms this hydrogen bond. We have also shown that MutS rearranges the mismatched base pairs from their positions in unbound DNA to achieve this. This is indicative of a common mismatch binding mode for all mismatches.
In our structures, we see the protein interacting with 12 DNA base pairs other than the mismatch itself. This is suggestive of an ability of MutS to scan such regions of DNA, looking for mismatches. Thus, MutS uses the local weakening due to the mismatch to locate it and binds to it by rearranging the base pairs to the conformation de®ned by the common mismatch binding mode.