The presence of the methylated nucleobase 5Me dC in CpG islands is a key factor that determines gene silencing. False methylation patterns are responsible for deteriorated cellular development and are a hallmark of many cancers. Today genes can be sequenced for the content of 5Me dC only with the help of the bisulfite reagent, which is based exclusively on chemical reactivity differences established by the additional methyl group. Despite intensive optimization of the bisulfite protocol, the method still has specificity problems. Most importantly ∼95% of the DNA analyte is degraded during the analysis procedure. We discovered that the reagent O-allylhydroxylamine is able to discriminate between dC and 5Me dC. The reagent, in contrast to bisulfite, does not exploit reactivity differences but gives directly different reaction products. The reagent forms a stable mutagenic adduct with dC, which can exist in two states (E versus Z). In case of dC the allylhydroxylamine adduct switches into the E-isomeric form, which generates dC to dT transition mutations that can easily be detected by established methods. Significantly, the 5Me dC-adduct adopts exclusively the Z-isomeric form, which causes the polymerase to stop. O-allylhydroxylamine does allow differentiation between dC and 5Me dC with high accuracy, leading towards a novel and mild chemistry for methylation analysis.
DNA methylation is an epigenetic mechanism for transcriptional regulation ( 1–2 ). The process controls cellular differentiation and is defective in many diseases including cancer ( 3–5 ). DNA methylation involves substitution of the H atom at C5 of the cytosine base (dC) by a methyl group to give 5-methylcytosine ( 5Me dC) ( 6 ). These methylated 5Me dC bases trigger processes that finally lead to gene silencing. Today, the detection of 5Me dC bases in a given DNA sequence is possible with restriction enzymes, which cleave selectively unmethylated DNA ( 7 ), or with antibodies, which recognize methylated DNA ( 8–10 ). The current gold standard, however, is bisulfite sequencing. This reagent selectively deaminates dC to dU but does not affect 5Me dC ( 11 , 12 ). The position of the newly formed dU base is then detected with standard sequencing methods after PCR amplification of the treated DNA material ( 13 ). Bisulfite sequencing is currently in widespread use ( 14–18 ). However, during bisulfite analysis ∼95% of the DNA material is degraded, which limits 5Me dC detection if only a small amount of sample is available ( 19–21 ). This drawback fuels current research to find alternative 5Me dC detection and sequencing methods ( 22 ). Research in this direction has furnished the result that Br + ions add selectively to 5Me dC to give cleavable DNA sites ( 23 ). OsO 4 and reagents derived thereof were found to cis- hydroxylate dC bases with some selectivity ( 24 , 25 ). Furthermore photochemical methods were reported ( 26 , 27 ) that allow some discrimination and finally even compounds were reported that are added during bisulfite sequencing to trap intermediates of the bisulfite reaction ( 28 ). All reported approaches target the reactivity differences between dC and 5Me dC imposed by the additional methyl group. However, as this difference is small, all new procedures are currently unable to distinguish between the dC and 5Me dC efficiently or their general approach interferes with current sequencing techniques.
Here we report a novel concept that does not rely on reactivity differences but in contrast allows detection of 5Me dC directly based on the reaction product. Incubation of dC and 5Me dC with hydroxylamine derivatives gives E - and Z -configured oxime-type adducts ( Z / E - 1 and Z / E - 2 , see Scheme 1 ) ( 29 , 30 ). The E - and Z -configured products are in equilibrium via the amino isomeric forms A- 1 and A- 2 . Each of these isomers ( Z , A and E ) features different H-bonding characteristics. Whereas the amino tautomers (A) base pair like dC, the E -isomers are expected to code like dT. The Z- isomers likely cause stalling of polymerases during replication ( Scheme 1 ) because they interfere with normal base pairing. NMR studies have shown that the pyrimidine oximes like 1 and 2 are present mainly in the imino tautomer and that the Z -isomers are the most stable forms ( 31 ). In the case of 5Me dC, we expected that the isomer E - 2 would be disfavored because of the steric strain imposed between the methyl group and the O -allyl chain. The 5Me dC-adduct 2 should therefore prefer Z-configuration ( Z - 2 ) or exist in the amino tautomeric form (A- 2 ). As such, we expected that this product would prefer base pairing with dG. In the case of the dC adduct 1 steric strain with the hydrogen atom at C5 is much lower and we hoped that this adduct would in contrast adopt at least to some extent the E –configuration, which would allow base pairing with dATP. The resulting dC to dT transition mutation we thought could be exploited as the desired signal that allows dC to be distinguished from 5Me dC. As our approach is based on mutagenicity, it is sequence independent and can also detect methylated cytosines that are not in a CpG context as found, for instance, in plant genomes. This is an advantage over restriction enzyme based methods that are limited by the specificity of the enzyme.
MATERIALS AND METHODS
Incubation of DNA strands
In a thermocycler (Eppendorf Mastercycler personal) with a heatable lid a single-stranded oligodeoxynucleotide ODN (10 µM) (Metabion, Martinsried, Germany) was incubated with NH 2 OAllyl (1 M, pH 5.2 with HNEt 2 , Fluka, Buchs, Switzerland) for 4 h at 60°C. For incubation kinetics 200 pmol of the DNA were directly injected into a HPLC without prior desalting (for detailed incubation kinetics please refer to the Supplementary Data ). If the sample was to be sequenced it was desalted using BioSpin 6 Tris columns (BioRad, Hercules, CA, USA) that were previously equilibrated three times with 500 µl ddH 2 O. The filtrate was concentrated in a SpeedVac (Christ RVC-2-33 IR), re-dissolved in NH 3 (28% in H 2 O, Fluka, Buchs, Switzerland) and heated to 60°C for 90 min in a thermocycler with a heatable lid. The solution was again desalted using BioSpin 6 Tris columns. The concentration of the solution was determined with a NanoDrop photometer (PeqLab ND-1000). The treatment with ammoniumhydroxide is optional but proved to be beneficial for the subsequent sequencing process because it converts the bis-hydroxylamineadduct ( Supplementary Data ) which is to a small extent formed into the desired compound 1 in base promoted elimination reaction.
Hybridization of DNA
An amount of 1.2 µM ODN, 1 µM primer and a suitable buffer (NEBuffer 2, NEB, Frankfurt am Main, Germany) were mixed and heated to 95°C for 4 min in a thermocycler with a heatable lid. The solution was allowed to cool at a rate of 1°C per minute and stored at 4°C in the dark.
An amount of 10 pmol dsDNA in 25 µl Annealing Buffer (Qiagen, Hilden, Germany) were sequenced in a PyroMark Q24 Pyrosequencer. All bases except the lesion generated by incubation were sequenced using standard conditions. Because the incorporation of nucleotides opposite the incubated dC or 5Me dC is slower than opposite canonical bases, addition of the nucleotide was repeated 10 times. The data was analyzed by the software provided by the manufacturer. The software sums up the peak heights and determines the ratio of dATPαS/dGTP incorporation. The last methylation site cannot be detected, because the last position in a strand generally yields no results in pyrosequencing. Please note that due to the special pyrosequencing procedure dATPαS instead of dATP was used.
Primer extension based sequencing
An amount of 10 pmol of the incubated sample were hybridized with a biotinylated primer. 0.1 U/µl Klenow(Exo-) polymerase, 500 µM dNTPs and 1 µM dsDNA in 20 µl 1× NEB Buffer 2 were incubated for 10–60 min at 60°C. To the solution were added 2 µl Streptavidin Sepharose beads (GE Healthcare, Uppsala, Sweden), 40 µl Binding Buffer (Qiagen) and 18 µl ddH 2 O. After agitation at 1400 rpm for 15 min the beads were captured with a Vacuum Prep Tool (Qiagen), washed with 70% EtOH, 0.1 M NaOH and Washing Buffer (Qiagen). The beads were dissolved in 25 µl Annealing Buffer (Qiagen) containing 10 pmol sequencing primer (Metabion). Pyrosequencing was performed on a PyroMark Q24 Pyrosequencer using standard conditions (Qiagen). The data was analyzed by the software provided by the manufacturer. The average dTTP incorporation of all experiments was calculated and the data plotted relative to this value. Negative deviations from average correspond to 5Me dC, positive values to dC. The data displayed are the average of three measurements.
Real-time primer extension based quantification
ODN1C and ODN1M were mixed at different ratios, incubated with NH 2 Oallyl and hybridized with ODN2 as described earlier. In a 96-well plate (NUNC, Roskilde, Denmark) were mixed per well: 10 pmol dsDNA (10 µl), 0.15 U ATP sulfurylase (0.5 µl), 1 U Klenow(exo–) polymerase (0.2 µl), 500 pmol APS (1 µl), 78.3 µl ENLITEN luciferase, luciferin reagent (Promega, Madison, WI, USA). In a 96-well reader (TECAN Genios Pro) at 30°C 10 µl dATPαS (20 µM) were injected at 200 µl/s. The plate was shaken for 5 s (amplitude 5 mm) and the luminescence measured every 250 ms for 400 cycles. The data was collected by the Microsoft Excel Makro provided by the manufacturer. The average of four blank measurements (enzymes without DNA) was subtracted from the average of four measurements of the sample. The data for dATPαS incorporation opposite undamaged dC was taken as baseline. In the first 15–20 s the curves are unstable due to mixing effects, so the data is only shown from second 20 onwards. The slope of the curves shows the speed of incorporation of dATPαS which corresponds to the fraction of dC in the sample.
The polymerase subunit of DNA polymerase I from Geobacillus stearothermophilus (DSM No. 22) was amplified via PCR and subcloned in pENTR-TEV-D-TOPO (Invitrogen, Karlsbad, USA) to introduce an N-terminal TEV-protease recognition site. The fragment was then transferred to the expression plasmid pDEST007 ( 32 ), yielding a cleavable N-terminal Strep-tag II. The protein was expressed in Escherichia coli BL21 at 37°C by induction at OD 600 = 1 with 0.2 mg/l anhydrotetracycline for 2 h. Cells were resuspended in 100 mM Tris–Cl pH 7.5, 150 mM NaCl, 10 mM ß-ME and Complete protease inhibitor (Roche), lysed by French-press and heated to 50°C for 10 min to denature E. coli proteins. 0.1% NP-40 and Tween-20 were added to the lysate prior to heat-treatment and maintained during Strep-tag purification. The cleared lysate was loaded on a Streptactin column (IBA) and eluted with desthiobiotin. Subsequently, the tag was cleaved by incubation on ice with 10 U/mg AcTEV protease (Invitrogen) over night. Further purification via Heparin affinity chromatography and crystallization was performed as described in ( 33 ).
For co-crystallization the template containing lesion 1 was annealed to the corresponding primer (for sequences see Supplementary Data ) in the protein storage buffer (10 mM Na–cacodylate pH 7, 50 mM NaCl, 0.5 mM EDTA 10 mM MgSO 4 ). Prior to crystallization protein and DNA were mixed in a 1 to 3 molar ratio. The final concentration of Pol I and dsDNA were 5 mg/ml and 0.5 mM, respectively. Crystals were grown by mixing an equal volume of protein-DNA complex with 47.5–51.0% (NH 4 ) 2 SO 4 , 3.0–3.5% MPD and 100 mM MES pH 5.8, using the hanging-drop vapor diffusion method. The crystallization plates were incubated at 18°C and crystals appeared after 1–2 days. Crystals were frozen in 24% sucrose, 55% (NH 4 ) 2 SO 4 , 3.0–3.5% MPD, 100 mM MES and stored in liquid nitrogen for data collection. Best crystals were obtained with 49.5% (NH 4 ) 2 SO 4 and 3% MPD.
Collection and processing of X-ray diffraction data, phase determination and structure refinement
Data were collected at the beamline PXIII [Swiss Light Source (SLS), Villigen, Switzerland] The data were processed with the programs XDS ( 34 ) and SCALA ( 35 , 36 ). Structure solution was carried out by molecular replacement with Phaser ( 37 ) using the coordinates of 1U45 ( 38 ). In order to reduce model bias, the temperature factors were reset to 20 for main chain and 40 for side chain and DNA atoms, respectively. Prior to model building in COOT ( 39 ) a simulated annealing omit map, removing the area around the lesion, was calculated with PHENIX ( 40 ). Restrained refinement was carried out on REFMAC5 ( 41 ). Data processing and refinement statistics are summarized in Supplementary Table S1 .
What are the coding properties of the DNA lesions 1 and 2?
In order to investigate whether the dC-adduct 1 and 5Me dC-adduct 2 have different coding properties we reacted the oligodeoxynucleotide ODN 1 ( Figure 1 A) containing either a dC or a 5Me dC (ODN 1C and ODN 1M ) with various hydroxylamine derivatives (10 µM DNA, 1 M NH 2 OR, pH = 5.2, 4 h, 60°C). The best results were finally obtained with O –allylhydroxylamine, which furnished lesions 1 and 2 cleanly and in high yields ( Supplementary Data ). We observed a small reactivity difference between dC and 5Me dC, with the latter reacting three to five times slower ( 42 ). Most importantly, no DNA degradation or reaction with other DNA bases was observed.
To investigate the coding properties of the reaction products 1 and 2 we performed primer extension reactions with the primer ODN 2 , Klenow(exo-) polymerase and ODN 1C or ODN 1M ( Figure 1 A1–3). For both oligonucleotides we observed full extension before and after treatment with O -allylhydroxylamine. This proves that the polymerase can indeed read through the hydroxylamine adduct 1 . When dGTP was removed from the primer extension assay ( Figure 1 A3), we observed that after incubation with O –allylhydroxylamine only unmethylated ODN 1C but not methylated ODN 1M was copied. This shows that adduct 1 is replicatively bypassed and that it is base-paired with a triphosphate other than dGTP. Further studies, depicted in Figure 1 B, showed that the polymerase is able to pair 1 with dATP. In order to investigate the relative incorporation efficiency of dATP and dGTP opposite 1 the primer extension studies were repeated with dGTP replaced by ddGTP ( Figure 1 A2). The data show that Klenow(exo-) pairs 1 with dGTP and dATP with similar efficiencies. Other polymerases tested including low fidelity polymerases such as Pol η and Pol κ showed the same behavior. In the case of ODN 1M we observed full extension only when all four triphosphates are present ( Figure 1 A1). No elongation was detected in the absence of dGTP, showing that adduct 2 either only allows incorporation of dGTP or blocks the polymerase. Further studies, shown in Figure 1 C, with ODN 3 in which we synthetically inserted the 5Me dC adduct 2 , confirmed that bypass of adduct 2 is indeed difficult and not possible with dATP.
In summary, the results show that oxime 1 instructs a polymerase to introduce dG and dA into the primer. Adduct 2 in contrast hinders replication and instructs the polymerase to incorporate a dG. Both adducts 1 and 2 consequently differ dramatically regarding their coding characteristics, and only the adduct derived from dC gives rise to dC to dT transition mutations, providing the requested readout for epigenetic sequencing.
What is the structural basis of the different coding properties?
In order to prove that the formation of a E - 1 :dA base pair is the reason for the mutational bypass we crystallized a duplex containing the base pair with the protein B.st Pol I ( 38 ). The full structure is depicted Figure 2 . It shows the E - 1 :dA base pair determined at a resolution of 2.9 Å. The O–allyl–hydroxylamine-dC adduct 1 is indeed present in the E –configuration and it base pairs with dA via two perfect Watson–Crick type H–bonds as expected.
Can the coding properties be exploited for a direct readout of the methylation status?
In order to study if the coding difference of the dC and 5Me dC oxime adducts 1 and 2 can be exploited for epigenetic sequencing, we analyzed the promoter sequence of the cyclin-dependent kinase inhibitor gene p15 , known to be aberrantly methylated in acute myeloid leukemia patients ( 43–45 ). For the study we used ODN 5 ( Figure 3 A) containing two either methylated or unmethylated CpG sites and one dC site that does not neighbor a dG. After incubation of ODN 5 with O –allylhydroxylamine followed by hybridization with a sequencing primer ODN 6 the three DNA constructs were directly subjected to pyrosequencing, which visualizes elongation of the primer strand (ODN 6 ) by a luminescence signal ( 46 ). The results for position 23 and 26 are depicted in Figure 3 B. As expected, both dC and 5Me dC in untreated oligonucleotides direct the addition of dGTP to the primer. The treated DNA strands, in contrast, showed significant incorporation of dATP opposite the dC. Opposite 5Me dC this misincorporation was in all cases not detected (<5% which is the error range of the instrument, Figure 3 B). Important is the finding that the O –allylhydroxylamine modified dC base 1 instructs incorporation of ∼50% dATPαS regardless of the position of the dC base, showing that no sequence effects occur.
Does the novel method allow sequencing of DNA fragments usually employed in pyrosequencing?
We extended the new epigenetic sequencing possibility to the analysis of a longer promoter sequence. For this experiment we choose an 84-bp fragment with three either methylated or unmethylated CpG sites (ODN 7 , CpG at positions 23, 34 and 57, for sequence see Supplementary Data ) of the p15 promoter. Such a promoter sequence is a typical target of bisulfite sequencing. For the experiment ( Figure 4 A) we first incubated the different 84–mer oligonucleotides with O –allylhydroxylamine and subsequently hybridized the treated strands with the biotinylated primer ODN 8 , (for sequence see Supplementary Data ). We next performed the primer extension reaction with the Klenow(exo-) polymerase and isolated the product DNA strand via streptavidin-coated sepharose beads. Following hybridization with the sequencing primer (ODN 9 , for sequence see Supplementary Data ) the obtained oligonucleotides were finally analyzed via pyrosequencing. The results are depicted in Figure 4 B.
The sequencing data clearly furnished a dT signal at all original dC positions in the promoter. The signal was in all cases at least 15–20% above background. Again, the same signal intensities are obtained at all dC positions showing again that sequence effects are negligible. The O –allylhydroxylamine based epigenetic sequencing method in the reported form is consequently only limited by the pyrosequencing method, which currently limits sequencing to <100- to 150-base long oligonucleotides. As the result provides the same final readout as bisulfite sequencing it is fully compatible with current epigenetic sequencing equipment.
Can the ratio of dC to 5Me dC be determined in a sample?
Generally, the extent of the methylation can be determined by comparing the relative heights of the dT and dC signals as done in bisulfite sequencing. Additionally, we aimed to develop an alternative approach that is based on a real time primer extension reaction. By coupling a polymerase to a luciferase, elongation of a DNA strand can be monitored by light. In the case of 5Me dC no light upon addition of dATPαS can be detected. If a mixture of dC and 5Me dC is present the amount of luminescence represents the fraction of dC in the sample. As shown in Figure 5 the extent of methylation of a specific CpG site can be readily determined after calibration.
Can the method distinguish between 5Me dC and 5HOMe dC?
Recently, 5-hydroxymethylcytosine ( 5HOMe dC) has been found as a second post-replicatively formed DNA base ( 47 , 48 ). Newly developed methods allow for precise quantification of 5HOMe dC levels in tissues, but do not yield sequence-specific information ( 49 ). It is highly desired to have a method that can distinguish 5HOMe dC and 5HOMe dC at a single base resolution. Unfortunately, as the detection principle presented in this work is based on sterics and the new base 5HOMe dC even imposes a larger steric strain it is not possible to distinguish 5Me dC and 5HOMe dC after incubation with O –allylhydroxylamine.
In conclusion, we show that the E / Z equilibrium of O –allylhydroxylamine dC/ 5Me dC adducts is influenced by the presence of a methyl group at C5. The lesion generated from dC after O –allylhydroxylamine treatment is bypassed by base pairing with dA and dG. In contrast, after incubation of 5Me dC only incorporation of dG can be observed. This allows for the first time detection of the methylation status of a cytosine not only based on reaction rates but based on different reaction products. We used this observation to establish the first bisulfite independent 5Me dC sequencing principle. The selectivity of the method is controlled by two chemical filters. First, dC reacts faster with NH 2 Oallyl and secondly, only the dC adduct 1 but not the 5Me dC adduct 2 can induce dC to dT transition mutations, which can be detected by standard sequencing methods. As this is the same readout as for the bisulfite method our system is fully compatible with current sequencing equipment. The next focus in our research will be to couple our novel detection principle to modern sequencing-by-synthesis methods that could allow our method to be routinely applied for the methylation analysis of small amounts of DNA. Alternatively one could imagine PCR based approaches that are employed in bisulfite sequencing or to exploit the mutagenic potential of hydroxylamines by subcloning in bacteria.
Supplementary Data are available at NAR Online. The crystal structure can be accessed under pdb code 2xo7.
SFB 646 and the Center for Integrated Protein Science, Munich (CIPSM); Verband der chemischen Industrie (VCI) for a Kekulé Fellowship. Funding for open access charge: Center for Integrated Protein Science Munich (CIPSM).
Conflict of interest statement . None declared.
The authors thank Claudia Szeibert and Andrea Kneuttinger for their help during the project, Sabine Schneider for her help solving the structure and the beamline staff at the SLS for technical support. M. Münzel likes to thank the Verband der chemischen Industrie (VCI) for a Kekulé Fellowship.