Excision of 5-hydroxymethyluracil and 5-carboxylcytosine by the thymine DNA glycosylase domain: its structural basis and implications for active DNA demethylation

The mammalian thymine DNA glycosylase (TDG) is implicated in active DNA demethylation via the base excision repair pathway. TDG excises the mismatched base from G:X mismatches, where X is uracil, thymine or 5-hydroxymethyluracil (5hmU). These are, respectively, the deamination products of cytosine, 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC). In addition, TDG excises the Tet protein products 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) but not 5hmC and 5mC, when paired with a guanine. Here we present a post-reactive complex structure of the human TDG domain with a 28-base pair DNA containing a G:5hmU mismatch. TDG flips the target nucleotide from the double-stranded DNA, cleaves the N-glycosidic bond and leaves the C1′ hydrolyzed abasic sugar in the flipped state. The cleaved 5hmU base remains in a binding pocket of the enzyme. TDG allows hydrogen-bonding interactions to both T/U-based (5hmU) and C-based (5caC) modifications, thus enabling its activity on a wider range of substrates. We further show that the TDG catalytic domain has higher activity for 5caC at a lower pH (5.5) as compared to the activities at higher pH (7.5 and 8.0) and that the structurally related Escherichia coli mismatch uracil glycosylase can excise 5caC as well. We discuss several possible mechanisms, including the amino-imino tautomerization of the substrate base that may explain how TDG discriminates against 5hmC and 5mC.

in space group P6 5 with cell dimensions of a=b=162 Å and c=56 Å. Crystals were cyroprotected by soaking in mother liquor supplemented with 20% ethylene glycol. A low-resolution data set at 4.0 Å resolution were collected at the beamline 22ID-D of the Advanced Photon Source at the Argonne National Laboratory. The P6 5 structure contains two TDG molecules (green and grey) and one DNA molecule (in stick model).
(b) Omit electron density map at 4.0 Å resolution, contoured at 3.5σ above the mean, is shown for omitting 5caC. The residues surrounding the 5caC are shown.
(c) The active site adopted from PDB 3UO7, a structure determined by the diffraction patterns exhibited strong anisotropy between 3 and 4 Å resolutions (17). We suggest that rotating the side chain χ2 torsion angle of Asn191 would allow the side chain amino group to form a hydrogen bond with the proton-deficit N3 atom of 5caC. (c) The hydroxyl group of 5hmU forms a hydrogen bond with the main chain amide nitrogen of Gly142 (in blue) and water-mediated contacts with Ala145 (in green and further away from the reader) and the 5' phosphate group of the abasic site.

Supplementary
(d) A 5caC base can be modeled into the same binding site with an optimum hydrogen bond between the main chain amide nitrogen atom (NH) and the proton-deficit ring N3 atom.
Rotating the side chain χ2 torsion angle of Asn191 would allow the side chain carbonyl oxygen atom to form a hydrogen bond with the N4 amino group (NH 2 ) of 5caC. (b) Superimposition of hTDG (colored) and hUNG (grey) indicates that the P-G-S-K loop of hTDG is located in the corresponding position of the thymine-interacting loop of hUNG (involving the main chain carbonyl atom of His212). His148 of hUNG corresponds to Leu143 in hTDG. (c) The structure of human UNG bound to DNA containing a thymine (PDB 2OXM). The thymine is rotated from the base stack by about 30°, which is only one-sixth of the 180° rotation required to fully flip uracil into the active site pocket. Interactions with the thymine base involve the side chain of His148 and the main chain carbonyl oxygen of His212 to the polar edge atoms, but not the methyl group of the thymine. (d) The mutants of S200A (pXC1120), K201A (pXC1102) or PGSK to AAAA (pXC1123) do not affect TDG activity under the conditions tested: double stranded 32 bp oligonucleotides bearing a single CpG dinucleotide were incubated with equal amount of the mutant proteins of TDG catalytic domain at 37 °C for 30 minutes. In hUNG, it is the main chain atoms of the corresponding loop that interact with a partially flipped thymine. It remains possible that the specific side chain interactions may come from different locations in hTDG. We followed the published purification procedure of ROS1 (47) with some modifications. Briefly, the full-length ROS1 (1393 residues) was expressed in BL21(DE3) dcm − Codon Plus cells (Stratagene) as a 6xHis fusion in a pET28a vector (Novagen). A 1 ml aliquot of the overnight culture was inoculated into 1 L of Luria-Bertani medium containing kanamycin (50 µg/ml) and chloramphenicol (25 µg/ml), and incubated at 37 °C, 250 rpm, until OD 600 of approximately 0.1. The temperature was then lowered to 23 °C, and incubation continued at 250 rpm for approximately 90 min before adding 5 mM betaine, 5 mM Na-glutamate and 500 mM NaCl. When OD 600 reached 0.7, protein expression was induced for 2 h with 1 mM isopropyl-1thio-β-d-galactopyranoside. The stored pellet was thawed and re-suspended in the buffer (20 mM Tris-HCl pH 8.0, 500 mM NaCl, 5 % glycerol and 1 mM dithiothreitol) with 5 mM imidazole. Cells were lysed by sonication (10 min: 1 s on and 3 s off), and the lysate was clarified by centrifugation. The fusion protein was isolated on a nickel-charged HisTrap HP affinity column after two washes (150 ml with 5 mM imidazole and 50 ml with 100 mM imidazole, respectively) and eluted by 15 ml of 300 mM imidazole. The protein was loaded onto tandem HiTrap Q column and HiTrap SP column in the same buffer with 500 mM NaCl and eluted from HiTrap-SP column in a 30-ml step elution in the presence of 800 mM NaCl. Protein concentration was estimated by coomassie staining using Bovine Serum Albumin (BSA) as standard.