Intense interest in the biological roles of DNA methylation, particularly in eukaryotes, has produced at least eight different methods for identifying 5-methylcytosine and related modifications in DNA genomes. However, the utility of each method depends not only on its simplicity but on its specificity, resolution, sensitivity and potential artifacts. Since these parameters affect the interpretation of data, they should be considered in any application. Therefore, we have outlined the principles and applications of each method, quantitatively evaluated their specificity, resolution and sensitivity, identified potential artifacts and suggested solutions, and discussed a paradox in the distribution of m5C in mammalian genomes that illustrates how methodological limitations can affect interpretation of data. Hopefully, the information and analysis provided here will guide new investigators entering this exciting field.
At least seven different covalent base modifications have been identified at significant levels in the DNA genomes of prokaryotic and eukaryotic cells, bacteriophage and viruses, resulting in an intense effort to identify their biological significance (1–4). For example, 5-methylcytosine (m5C), the most abundant covalently modified base in the genomes of eukaryotic cells, plays a role in the regulation of gene transcription, X chromosome inactivation, genomic imprinting, cell differentiation and tumorigenesis. Unicellular eukaryotes also contain N6-methyladenine (m6A) and 5-hydroxymethyluracil (hm5U) (5,6). m6A and N7-methylguanine (m7G) have been reported in insects and humans (7), but their biological significance is not yet evident. In prokaryotic cells, the most prevalent covalent modifications are m6A and m5C, although N4-methylcytosine (m4C) also has been detected (8). All of these modifications are involved in restriction and modification of DNA, and m6A is involved specifically in the regulation of DNA replication and in DNA repair.
These and other modified bases can constitute a large part of the genome. For example, all of the cytosines in bacteriophage T4 DNA are either 5-hydroxymethylcytosine (hm5C) or glucosylated hm5C, and up to 50% of the cytosines in plant DNA are m5C (9,10). Modified DNA bases also can represent a minor fraction of the genome, but still exhibit strong biological effects. For example, only 3–10% (11) of cytosines in mammalian genomes are m5C, but they generally repress transcription when clustered at the 5′-ends of genes (3,12). In general, m5C is nearly universal among eukaryotes with genomes >108 bp (e.g. mammals and plants), but rare among eukaryotes with smaller genomes (e.g. yeast, flies and nematodes) (13–15).
The goal of this review is to provide a guide for identifying m5C and related covalent base modifications in DNA genomes. At least eight different methods, along with several variations, have been developed over the past three decades for characterizing covalently modified bases in DNA genomes. Each method has advantages and disadvantages, most of which become evident by examining their specificity, resolution, sensitivity and potential artifacts. Specificity is the ability to distinguish a particular covalent base modification either from another modification or from the unmodified base. Horizontal resolution is the ability to identify the position of a modified base along a DNA strand. Some methods, such as total base composition and nearest neighbor analysis, can determine the occurrence and abundance of virtually any modified base within an entire genome. Other methods, such as differential base modification by bisulfite, hydrazine or permanganate followed by DNA sequencing, can detect only m5C, but these methods can map m5C to a precise nucleotide position in virtually any stretch of DNA. Thus, the two methods that identify the widest range of specific base modifications also have the lowest horizontal resolution, while the three methods that have the highest horizontal resolution are specific only for m5C. This is not a problem when analyzing the genomes of animals or plants where the exclusive presence of m5C has been established by analyses of total base composition. However, for genomes with other modifications, these methods are of limited use. These methods also are too laborious for screening large amounts of DNA sequence. Both problems can be addressed to some extent using modification-sensitive restriction endonucleases (MSREs). MSREs provide a relatively simple method for mapping methylated cytosines and adenines to specific DNA sites, and this strategy has been adapted to large scale screening procedures that can map the genomic locations of modified bases in specific restriction endonuclease recognition sequences. However, horizontal resolution is reduced, because the number of sites that can be examined is limited by the number of sequences recognized by MSREs and by the availability of the appropriate enzyme. Another method, immunochemical analysis, may also help address these problems, because in principle, antibodies can be developed against most, if not all, covalently modified bases. However, horizontal resolution is reduced even further.
Three other important parameters in selecting a method are the minimum amount of DNA required to detect a particular base modification (sensitivity), the ability to identify a base modification when it is contained in only a fraction of the population (vertical resolution), and potential artifacts. For example, the hydrazine and permanganate methods compare the relative strengths of cytosine bands in a genomic sequence with those from an unmodified control sequence. While methods such as ligation-mediated PCR can improve their sensitivity (less DNA is required to see the genomic sequence), their vertical resolution (the ability to measure the relative amount of two signals) remains essentially the same, because the signal to noise ratio remains essentially the same. Some methods are subject to potential artifacts that affect interpretation of data. For example, failure to completely denature the DNA and improper design of PCR primers in the bisulfite method can lead to false identification of m5C, and resistance of DNA to cleavage by a MSRE can result from factors other than methylated cytosines and adenines.
In an effort to help the new investigator select the methods most appropriate for a particular application, we have listed them in the order they are frequently applied to a genome with unknown DNA modifications. The experimental protocols are found in the literature cited. Wherever possible, we have provided quantitative evaluations of the sensitivity and resolution of each method, based on our own experience and on published information. In addition, we have described the principle of each method, its potential artifacts, and its simplicity of application. Together, these parameters determine a method's utility and accuracy. Finally, we have attempted to illustrate problems that may be encountered when applying these methods by examining the distribution of m5C in mammalian genomes.
Identifying Covalently Modified Bases and Their Dinucleotide Composition
Identification of the covalently modified bases present in a particular genome and determining their abundance is prerequisite to mapping their locations in specific DNA sequences. The methods available for sequence analyses can identify only a limited number of modified bases, and with the exception of MSREs, are too labor intensive to screen more than a few kilobases of sequence. To obtain the total base composition of a genome, ∼10 µg of DNA has to be hydrolyzed to completion. Originally, this was achieved by chemical means, but this method produced a complicated array of products that made detection and quantification of specific adducts difficult. These problems can be avoided by enzymatic hydrolysis using calf spleen phosphodiesterase and micrococcal nuclease to produce 3′-phosphorylated mononucleotides, or pancreatic DNase I and snake venom phosphodiesterase to produce 5′-phosphorylated mononucleotides (16, 17). Following hydrolysis, 3′-and 5′-phosphates are removed with alkaline phosphatase (18), and the products are fractionated using standard chromatographic and electrophoretic techniques along with external standards of known modified bases (19–23). The occurrence of m5C and m6A has been quantified using high pressure liquid chromatography (HPLC) (24,25). This method has a sensitivity of ≤10 µg DNA (24) and a vertical resolution of 0.04% (24,25) to 0.005% (26). Mass spectrometry also has been applied to quantify the content of m5C in DNA (27), detecting as little as one m5C in 10 kb using 1–10 µg of DNA. However, total genome composition is devoid of sequence information.
Limited sequence information in the form of dinucleotide composition is easily obtained by nearest-neighbor analysis. About 5 µg of DNA is labeled with one of the four [α-32P]dNTP by nick translation at randomly generated single-stranded breaks or by DNA replication in a cell lysate (9,28). The DNA is digested completely to 3′-dNMPs using micrococcal endonuclease and calf spleen phosphodiesterase exonuclease in order to transfer 32P from the 5′-position of the labeled nucleotide to the 3′-position of its neighbor. The resulting [3′-32P]dNMPs are fractionated by chromatography or HPLC and quantified by comparison to internal standards (18). Vertical resolution is ∼0.01% (29).
Alternatively, genomic DNA can be labeled extensively at m5C-positions in vivo with [3H-CH3]methionine for detecting the nearest neighbors of m5C. Dinucleotides OH-NpN-OH are generated from the labeled DNA by limited digestion with DNase I and 5′-dephosphorylation with alkaline phosphatase. Dinucleotides containing 3H-labeled m5C are isolated by chromatography and paper electrophoresis, and the two isomers (OH-m5CpN-OH and OH-Np-m5C-OH) are distinguished by treatment with snake venom phosphodiesterase which generates OH-m5C-OH and pN-OH from the first isomer, and OH-N-OH and pm5C-OH from the second isomer. With this method, 2.5–5% m5C with a specific dinucleotide composition can be detected in 100 ng of DNA (30).
These two methods are specific for all known covalently modified bases, providing that a sample of the modified base is available as a standard for comparison. Although vertical resolution is excellent, horizontal resolution is limited to dinucleotide frequencies. The amount of DNA required for these methods limits their application to tissues or cell cultures that can provide ∼2 million cells. Since neither method localizes a modified base within a specific DNA sequence, neither method can distinguish genomic DNA from contamination by either RNA or foreign DNA from viruses, mycoplasms and other endoparasites. This potential artifact can lead to false identification of base modifications in small samples of genomic DNA.
Mapping Methylated Cytosines and Adenines at Specific DNA Sites
There are currently four methods available that can map specific covalent base modifications to specific DNA sites: modification-sensitive restriction endonucleases (MSRE) and differential base modification by either bisulfite (HSO3−), hydrazine (N2H4) or permanganate (MnO4−) followed by DNA sequence analysis. Each method has its own advantages and limitations such that a single method alone is often inadequate to address all problems. However, when used in conjunction with one another, these four methods can provide unambiguous identification of m5C, map its location with nucleotide resolution in any DNA sequence, and quantify the frequency it appears at a specific DNA site. Therefore, we have summarized their characteristics in Table 1, and outlined their protocols in Figures 1–3.
Modification-sensitive restriction endonucleases (MSREs)
MSREs provide the most convenient and rapid method for identifying modified bases at specific restriction endonuclease sites in virtually any genomic region. This method requires only a map of restriction sites in the region of interest, and can be used to survey large regions of DNA. While MRSEs provide a broader range of specificity than the differential base modification methods described below, their limitation to specific restriction sites reduces their horizontal resolution. Therefore, the MRSE method is frequently applied first in an effort to ascertain whether or not modified bases are likely to be found at biologically interesting sites such as transcriptionally active genes, replication origins and recombinational hot-spots.
More than 320 restriction endonucleases are sensitive to base modifications that lie within the enzyme's DNA recognition site (31). Most of these will not cut DNA if their cleavage site contains a methylated base, and in general, do not discriminate between m5C, m4C (32), hm5C (33) or glucosylated hm5C (33). However, there are some exceptions. MvaI, BstNI, RsaI, KpnI and BstYI (31,34) do not cut DNA if their recognition sequence contains m4C, but sites containing m5C remain sensitive (31). Similarly, CviSIII is inhibited by hm5C, but not by m5C. Alternatively, some enzymes cut their sites only when they contain m5C or hm5C. For example, PvuRts1I only cuts DNA containing hm5C (35). McrBC cuts DNA at multiple sites within RC(N40–80)RC only when the outer cytosines are m5C, m4C or hm5C (36). Thus, the usefulness of McrBC decreases as the genomic density of m5C increases. Other enzymes either require m6A in order to cut DNA, or are inhibited by this modification (31). It should be noted that only a few restriction endonucleases have been tested for their sensitivity to modified bases that lie outside their recognition sites, and some of these are inhibited (31).
Two methods are commonly used to determine the extent of DNA digestion. The most direct method is to fractionate the DNA digestion products by gel electrophoresis (preferably agarose, because DNA may migrate aberrantly in polyacrylamide; 37) and then identify specific genomic sites by Southern blotting-hybridization using standard protocols (38; Fig. 1). In the case of an endonuclease that cannot cut a methylated site, absence of the expected DNA cleavage product indicates methylation at one or both of the endonuclease cleavage sites that mark the ends of the DNA fragment. The fraction of resistant DNA equals the fraction of cells that contain this modified site. Vertical resolution depends on the hybridization background (usually 10–20% of signal). Thus, if ≥10% of the DNA population is not modified at two consecutive restriction endonuclease sites, the corresponding DNA fragment will be detected. If ≥10% of the DNA population is modified at one or both sites, then a larger DNA fragment will appear as a result of cleavage events outside the region of interest (Fig. 1). A single copy locus can be detected in ∼10 µg of mammalian genomic DNA (∼2 million cells) using 32P-labeled oligonucleotide probes (39).
The sensitivity of this assay can be increased ∼1000-fold and the vertical resolution ∼100-fold using the DNA polymerase chain reaction (PCR) to amplify specific DNA segments before fractionating them by gel electrophoresis so that the DNA products can be visualized by ethidium bromide staining (40), but this requires sequence information in order to design PCR primers that flank the restriction site in question and thereby amplify the region of interest (Fig. 1). The PCR product is compared to a DNA fragment produced by amplification of an uncut control DNA. However, quantification (i.e. vertical resolution) now depends on obtaining PCR conditions that produce a linear response between input DNA and amplified product. This problem could be solved by using competitive PCR (41) where the same PCR primers amplify simultanously both the genomic target and a homologous competitor. Alternatively, ligation-mediated PCR (LM-PCR; 42,43) could be used where the same PCR primers are employed to visualize simultaneously the relative extent of cleavage at both an MSRE site and a modification-insensitive restriction endonuclease site (44,45). This method requires only 0.6 ng of DNA (100 cells) for detection and 50 ng for a quantification (45).
When the DNA digestion products are fractionated by 2D gel electrophoresis, the MSRE method can be used to visualize simultaneously and quantitatively the methylation status of a large number of genomic loci (‘restriction landmark genomic scanning of methylation sites’; 46). In fact, an entire genome can be screened for sites where DNA methylation patterns change as cells undergo differentiation, carcinogenesis and genomic imprinting (47,48), and then these sites can be cloned. A similar end-labeling method has been developed specifically for rare DNA modifications (49).
Potential artifacts in the MSRE method
The principle artifact that can affect the MSRE method is inefficient DNA cleavage, resulting in the false conclusion that some or all of the restriction sites contain a covalently modified base. This problem generally results from impure DNA and can be corrected by its repurification. Careful extraction of genomic DNA in the presence of proteinase K and SDS followed by phenol extraction and extensive dialysis (no ethanol precipitation) usually result in complete DNA cleavage (39). If not, addition of 1 mM spermidine often alleviates the problem (39). Accessibility of DNA to cleavage can be checked using a modification-insensitive restriction endonuclease, ideally an isoschizomer of the modification-sensitive enzyme under identical reaction conditions. However, two enzymes can be affected differently by impurities in the DNA sample. For example, an unidentified, dialyzable inhibitor of the MSRE AluI can prevent this enzyme from completely cutting genomic DNA under conditions where the methylation-insensitive restriction enzyme control, MboII, can digest DNA to completion (39). Therefore, one should include in the same reaction mixture another DNA fragment, best radioactively labeled, with the same restriction site, but whose size or sequence allows it to be distinguished from the genomic target. In this way, the extent of cleavage by the diagnostic enzyme can be monitored in the same reaction simply by monitoring cleavage of the control fragment (39). Unfortunately, even this control is not foolproof. There have been reports of proteins that bind so tenaciously to DNA that they could not be removed during DNA purification, and the ubiquitous topoisomerases form covalent intermediates with DNA that can be trapped during DNA purification. Therefore, if the modified base is thought to be m5C, a prudent investigator will confirm its presence by one of the three methods described below.
Differential base modification by bisulfite
The bisulfite method has four advantages over the other two differential base modification methods (described below) for detection and mapping of m5C in any sequence of a complex genome (Table 1). First, the bisulfite method is the easiest to apply. Second, it is the most sensitive, requiring no more than 10 ng of genomic DNA (∼2000 cells). In one case, 10 pg of genomic DNA (∼2 cells) was sufficient (50), although the number of integrated target copies per cell in this transfection experiment was unknown. Third, both C and m5C residues appear during DNA sequencing (positive display of data). This makes it easy to quantify the fraction of m5C, which can be detected in as little as 5% of the total PCR product (51). In principle, vertical resolution is limited only by the number of cloned PCR products one is willing to sequence. Fourth, the methylation status of a specific genomic site can be determined both for the total cell population and for individual cells, depending on whether the total PCR DNA product is sequenced directly or individual cloned molecules from the PCR product are sequenced. This allows detection of subtle variations in cytosine methylation at specific genomic sites that may occur within a population of cells as they undergo differentiation and development (52), thus permitting detection of rare events, and analysis of partial methylation patterns. For example, if two cytosines at nearby positions are found to be methylated to an overall extent of 50% by sequencing the total PCR product, the following question arises: do both m5Cs sit on the same molecule (e.g. when the two m5Cs are only on one homologous chromosome of all cells, or on both homologous chromosomes in only 50% of the cells), or are they distributed on different molecules (e.g. when the two m5Cs are distributed on the two homologous chromosomes in all cells, or when one m5C is found on both homologous chromosomes in one half of the cells and on the other in the other half of the cells)? This question can be approached by the bisulfite technique, after cloning the PCR product. This information can be critical in determining whether or not m5C plays a significant role in regulating the activity of a specific promoter, replication origin, transposable element or imprinting element.
The bisulfite method is considered specific for identification of m5C. Only cytosines that are present in single-stranded DNA or in ‘distorted double-stranded DNA’ regions can be converted to uracil by bisulfite; properly paired cytosines are not affected (53). Bisulfite sulfonation at C-6 of a susceptible cytosine facilitates spontaneous hydrolysis of the amino group at C-4 to produce sulfonated uracil (53). Uracil is recovered by removing unreacted bisulfite and then desulfonating under alkaline conditions. m5C is not converted. The reactivity of glucosylated hm5C and m4C with bisulfite remains to be determined. Probably hm5C cannot be detected by the bisulfite method, because bisulfite converts hm5C into a stable product (cytosine 5-methylenesulfonate; 54) that, by analogy to the inhibitory effect of pyrimidine adducts with permanganate or osmiumtetroxide on DNA polymerase elongation (e.g. 55), should inhibit PCR amplification.
In the standard method (Fig. 2A), genomic DNA is denatured and treated with bisulfite (56). The region of interest is then amplified by PCR to produce specific double-stranded DNA fragments. Since the DNA strands are no longer complementary after bisulfite conversion, each strand must be analyzed separately using appropriate PCR primers. PCR amplification results in conversion of U (previously C) to T, and of m5C to C. The former PCR amplified product will thus contain T:A base pairs in place of C:G base pairs. The PCR product can be isolated by gel electrophoresis and the conversions detected by standard DNA sequencing protocols (51; Fig. 2). Unreacted C (presumed to be m5C) is seen as a positive band in the C lane, while reacted C appears as a band in the T lane. These data represent the average methylated state of particular cytosines within the DNA population (Fig. 2B). Alternatively, the PCR products can be cloned into a plasmid vector and individual clones sequenced (56).
Innovations and variations in the bisulfite method
A number of technical innovations can improve the speed and versatility of the bisulfite method. Biotinylated primers can be used for PCR amplification so that PCR products can be purified using streptavidin coated magnetic beads and DNA sequencing can be automated (57). This procedure has been combined with the specialized sequencing analysis called GENESCAN™ to quantify the extent of methylation at each cytosine position within a sequence (58). In addition, the methylation status can be rapidly screened at new restriction endonuclease sites that are created by conversion of cytosines to thymines (59).
Methylation-specific PCR can be used to determine rapidly the methylation status of CpG islands with a vertical resolution of 0.1% (60). The high density of CpG dinucleotides in these islands allows PCR primers to be designed that specifically amplify either m5CpG DNA or CpG DNA by making them complementary either to a sequence containing several C residues that are presumed to remain unconverted, or to a sequence in which several cytosines are presumed to have been converted to uracils (60). Primer specificity is verified by cutting the PCR amplification product with restriction endonucleases, because the DNA cleavage products will differ for DNA products amplified from methylated versus unmethylated DNA. However, since PCR products are not sequenced, analysis is not at the nucleotide level.
A similar strategy has been used to rapidly quantify methylation at specific sites in any DNA sample. ‘Combined bisulfite restriction analysis’ (COBRA; 61) uses restriction enzymes to cut the PCR product from bisulfite converted DNA. For example, bisulfite will convert the BstUI cleavage site (CGCG) in unmethylated DNA to TGTG, but not in methylated DNA. Therefore, the percentage of PCR amplified molecules that are cut by BstUI reflects the percentage of methylated molecules. COBRA is fast, sensitive and quantitative, but quantification relies on complete enzyme digestion and analysis is confined to restriction endonuclease cleavage sites in DNA.
An alternative approach is ‘methylation-sensitive single nucleotide primer extension’ (62). Genomic DNA is treated with bisulfite and then amplified by PCR. Methylation at a specific cytosine is analyzed by extension of an internal primer, located specifically 5-prime to that cytosine, with a single, radioactive nucleotide. The ratio of the extension product using dCTP (reflecting unconverted m5C) versus the extension product using dTTP (reflecting converted C) measures the ratio of methylation at a specific site. This method does not rely on restriction enzymes and, in principle, can be applied to any sequence. However, this method will be difficult to apply to regions rich in CpG dinucleotides, because the primers will invariably contain CpGs of unknown or mixed methylation status.
Potential artifacts in the bisulfite method
There are four potential artifacts that can affect the reliability of the bisulfite method. The following information will help to avoid them.
Conversion of C to U can be incomplete. Since conversion of C to U requires that the DNA substrate be single-stranded, incomplete denaturation of genomic DNA or its partial renaturation during bisulfite treatment can result in C residues that fail to react with bisulfite (51). Unfortunately, the high salt molarity in the reaction favors renaturation. In addition, bisulfite treatment must be exhaustive in order to ensure complete conversion of C to U in single-stranded DNA. Incomplete conversion would appear as a partially methylated site. In practice, complete bisulfite conversion is not feasible—even if the DNA is kept single-stranded—because the DNA substrate undergoes degradation with time (see below). Note that in studies of secondary structure of nucleic acids, usually only a fraction of the C residues in a single-stranded region are converted to U due to incomplete deamination by bisulfite (63). Incomplete conversion is not a problem when the total PCR product is sequenced, because the occasionally unconverted Cs will not significantly affect detection of the average extent of methylation at a specific site. It can become a major problem, however, if unconverted DNA is selected for amplification by a poor choice of PCR primers. Primers that anneal to sequences containing several C or m5C residues are selecting for regions of unconverted DNA. It can also be a problem when individual clones are sequenced, because an insignificant event may appear significant if it shows up in a relatively small number of molecules. Some plasmid vectors appear prone to this artifact (64). In fact, in three examples where all C residues were thought to be methylated on both strands, regardless of their dinucleotide composition (65,66), subsequent analyses by four independent methods demonstrated that all of the non-CpG methylation events in these regions represented cytosines that had failed to react with bisulfite (39,51).
Incomplete conversion can be dealt with in several ways (51). The extent of conversion of C to U can be checked by amplifying the target DNA with PCR primers designed to hybridize to a region devoid of C. These primers will not select for either converted or unconverted DNA. Alternatively, one can selectively amplify DNA that has reacted efficiently with bisulfite by designing PCR primers to anneal to sequences containing U in place of C, and by choosing sequences that originally contained several Cs. The presence of non-methylated cytosines interspersed with methylated cytosines indicates that this region of DNA was accessible to bisulfite during the reaction. The efficiency of the bisulfite reaction can be checked using DNA whose methylation status is known (e.g. plasmids or PCR generated fragments). Whenever possible, this control should be run in the same reaction tube, and the copy number of the control DNA should be comparable to that of test DNA sequences; otherwise its rate of renaturation could be artificially high. Finally, other methods that specifically rely upon partial modification of DNA, such as the hydrazine or permanganate methods discussed below can be independently applied to analyze the same region of DNA in order to confirm the presence of m5C.
To eliminate the problem of incomplete conversion a protocol has been developed that ensures ≥95% conversion of C to U with limited DNA degradation (51) by cleaving genomic DNA into ∼1 kb fragments in order to reduce the likelihood of renaturation, and treating the DNA in bisulfite with repeated cycles of heating to 95°C in a thermocycler for only 5 h. Several changes of the original protocol have been reported (56) to reduce the chances of renaturation. For example, ethanol precipitation has been omitted after initial denaturation with NaOH (64,67). The DNA sample also has been diluted (67), embedded in agarose (68), incubated at 0°C instead at elevated temperature (67) during the bisulfite conversion, and the bisulfite concentration has been increased (69).
From 2 to 3% of m5Cs can be converted to T (56,70). This results in an underestimate of the number of m5Cs when individual DNA clones are analyzed, but assuming that this loss occurs randomly, it should not be significant when the total PCR product is sequenced directly.
Loss of DNA due to fragmentation. Incubation of DNA at a slightly acidic pH can generate apurinic sites (71). Since the bisulfite treatment must be performed for a long time at pH 5, apurinic sites will be introduced into the DNA substrate. These apurinic sites will then be broken in a classical Maxam and Gilbert reaction during desulfonation of deaminated cytosines at the required basic pH (71). Protocols have been developed that minimize this problem (51,67,68,72).
Biased PCR amplification. Quantification of methylated alleles can be jeopardized by biased PCR amplification (73). Surprisingly, the same primer pair can preferentially amplify either the methylated or unmethylated sequence, even though the sequence to which the primers anneal and the lengths of the PCR products are identical. The bias is less serious with the Stoffel fragment polymerase than with Taq polymerase, but the most reliable solution is to establish standard curves for each primer pair by measuring the ratio of products from known mixtures of methylated and unmethylated DNA standards in control reactions.
Differential base modification by hydrazine
With genomes of low sequence complexity such as viruses, the hydrazine and permanganate methods are easier to apply than the bisulfite method, because they employ simple primer extension or end-labeling to sequence directly the differentially modified DNA. In contrast, the hydrazine and permanganate methods are more difficult to apply to genomes of high sequence complexity such as mammals, because they require sequencing the differentially modified DNA using LM-PCR (74,75). This generally leads to lower sensitivity and higher background with its attendant lower vertical resolution (Table 1). Moreover, it often proves difficult to identify primers that are effective in LM-PCR. However, once LM-PCR is established at a sequence of interest, the same primers can also be used in methods designed to explore protein/DNA interactions such as in vivo footprinting. In addition, hydrazine or permanganate provide independent confirmation of results in cases where artifacts are suspected using the MSRE or bisulfite methods described above.
Hydrazine reacts specifically with C and T in either double-stranded or single-stranded DNA. Hydrazine also reacts with m4C (34) but not with m5C (76,77). Presumably hydrazine also does not react with hm5C or glucosylated hm5C, but it is not clear that this has been examined. Hydrazine modified nucleotides are usually cleaved by piperidine (71), and the cleavage products are detected by a variety of direct genomic sequencing techniques (discussed below). An unmethylated sequence, obtained either by DNA cloning or by PCR amplification of the genomic region of interest, is analyzed in parallel in order to locate the positions of all cytosines. m5C is identified by the absence of a cytosine from the genomic DNA sequence (Fig. 3). Sequencing the complementary DNA strand insures that absence of a band results from m5C rather than from some unidentified artifact. Hydrazine reaction with T is normally suppressed by addition of 1.5 M NaCl to facilitate identification of C (76). However, in regions where all of the cytosines appear to be methylated, T reactivity at low salt concentrations provides a useful internal standard for monitoring the efficiency of the hydrazine reaction (39).
Differential base modification by permanganate
The same direct DNA sequencing methods used to display hydrazine/piperidine cleavage sites are alsoused to display permanganate/piperidine cleavage sites (Fig. 3A). However, while hydrazine produces a negative data display [bands disappear from the sequencing gel at the positions of m5C (Fig. 3A and B, lane 2)], permanganate produces a positive data display [bands appear in the sequencing gel at the positions of m5C (Fig. 3A and B, lane 4)]. Therefore, the permanganate method provides the perfect complement to the hydrazine method for detection of m5C. At pH 4.1, permanganate reacts strongly with m5C and T in single-stranded DNA (78), but not with C (79,39) (Fig. 3B, lane 4). Permanganate occasionally reacts with purines (G>>A) as a result of either direct oxidation or acidic depurination of DNA creating piperidine-labile sites (79). Permanganate, like bisulfite, reacts only with single-stranded DNA. However, this is not a problem, because oxidation of T by permanganate can serve as an internal control to insure that the sequences displayed were single-stranded.
LM-PCR analysis of genomic DNA cleaved by permanganate/piperidine resulted in nearly equivalent signals from m5C, T and G (Fig. 3B). This unexpected G reactivity relative to an end-labeled DNA fragment that was not subjected to LM-PCR (39) may have resulted from enhanced depurination caused by the initial heating at 95°C that was necessary in LM-PCR. Nevertheless, permanganate did not react with any C residues in this region, but did react with all T and m5C residues (Fig. 3B).
Direct genomic sequencing employs either linear or exponential amplification to visualize both hydrazine and permanganate reacted bases. In general, linear amplification requires large amounts of DNA (∼200 µg; 80–82). LM-PCR which employs exponential amplification (42,43) allows detection of m5C in 1–2 µg of genomic DNA (∼3 × 105 cells). When both C and m5C are present at the same site but on different DNA molecules, at least 25% of that site must be m5C to be detected by the hydrazine method. However, as a result of a positive data display, only 10% must be m5C for detection by the permanganate method (39). Both permanganate and hydrazine modifications of genomic DNA could be detected by genomic sequencing techniques that do not require piperidine cleavage of DNA (55,81–83), because they inhibit elongation of DNA polymerases.
No serious artifacts have been reported for the hydrazine and permanganate methods, although minor problems can lead to ambiguity. These include background cleavage events, closely spaced bands on the sequencing gel, and DNA concentration-dependent suppression of C-bands that have been reported for the hydrazine method (84). However, they can be eliminated by including sequencing controls to verify the positions of all cytosines.
Methods with Special Applications
Antibodies specific for methylated bases could be used as an independent method for confirming that resistance to cleavage by an MSRE is due to DNA methylation and not to artifactual resistance. For example, antibodies raised against m5C have been shown to identify that portion of the DNA that was resistant to HpaII, a MSRE that only cuts unmethylated CCGG sites (85). These antibodies can detect m5C with a vertical resolution of ≤0.008 mol% (85). They can react specifically with m5C in mammalian DNA bound to nitrocellulose paper (15,86), detecting ∼1 mol% m5C in the human genome (15), consistent with estimates from total genome and nearest neighbor analyses. Similar experiments could be carried out with antibodies against m6A and m7G that have been reported to react with DNA from human, Drosophila and mealybugs (7). Immunofluorescence also has been used to determine chromosomal regions with a high frequency of m5C (87). Therefore, it might be possible to use antibodies to map the locations of clusters of m5CpG (clusters) in cellular chromosomes relative to the locations of specific chromatin associated proteins, genes (88), replication origins, centromeres and telomeres using a second antibody or fluorescence in situ hybridization.
Differential base modification by UV radiation
UV radiation causes formation of pyrimidine (6-4) pyrimidone photo adducts between two adjacent pyrimidine bases (pyrimidine dimers) that are sensitive to cleavage with piperidine. In mammalian genomes, these cleavage sites have been mapped using LM-PCR to sequence the DNA (89). TpC and CpC dinucleotides produce a high frequency of pyrimidine dimers, but Tpm5C and Cpm5C dinucleotides do not (89). Therefore, wherever a m5CpG is preceded by a T or C, a pyrimidine dimer is not formed, leading to the absence of a band in the sequencing gel that indicates the presence of Tm5CG or Cm5CG, respectively. Thus, this method detects only a subset of the m5C sites in the genome. Sensitivity and vertical resolution have not been explored in detail, but since LM-PCR is used, they probably are the same as with the hydrazine method (Table 1). Although differential base modification by UV radiation has been used rarely so far and requires the more sophisticated LM-PCR technology, it should be quite useful in determining the effects of DNA methylation on UV induced DNA damage and repair.
How Methodological Limitations Can Affect Interpretation of Data
Since each method has its own limitations and potential artifacts, the results from different methods have not always led to the same conclusion. This problem is illustrated by analysis of the m5C distribution in mammalian genomes. The only known mammalian DNA methyltransferase is specific for maintaining m5CpG dinucleotides (90). Consistent with the properties of this enzyme, only two m5C residues have ever been reported in non-CpG dinucleotides in genomic DNA using either the hydrazine or permanganate methods (91). This amount is negligible compared to the thousands of nucleotides of genomic DNA on which the hydrazine method has been applied and so far has detected only m5CpG dinucleotides (84). However, some nearest neighbor analyses of genomic DNA suggest that up to 54.5% of all m5C are found at non-CpG sites (18,92,93). This conclusion is supported by some genomic sequence analyses where the bisulfite or MSRE method have detected m5C in both CpG and non-CpG dinucleotides (50,65,66,94,95). Moreover, mammalian cells have the ability to maintain m5CpNpGs sites integrated into the genome by transfection (50).
How might this paradox be resolved? First, non-CpG methylation activity has been detected in some mammalian extracts (30,92), but not in others (96–98), suggesting that the specificity of mammalian DNA methyltransferase may be altered by cofactors or limited proteolysis (99). In addition, some analyses may have overestimated the amounts of non-CpG methylation events. For example, reports of ‘densely methylated islands’ (DMIs) in which all cytosines within ∼100 to ∼500 bp regions were methylated, regardless of their dinucleotide composition (65,66), were later shown by stringent application of the bisulfite, MSRE, hydrazine and permanganate methods to be incorrect (39,51). Thus, other reports in which the bisulfite method produced similar results (100,101) should be considered with caution.
Another explanation is that the bulk of non-CpG methylation detected by total genome analysis is clustered at genomic sites that have not yet been examined by sequence analyses. Alternatively, cell populations might be heterogeneous in their non-CpG methylation. If methylation at a unique non-CpG dinucleotide did not occur in more than 25% of the cells, it would not be detected by the hydrazine method, but would be detected by the bisulfite method because of the difference in their vertical resolutions (Table 1). In contrast, nearest-neighbor analysis collects all non-CpG dinucleotides of the same type into a single pool. Thus, nearest-neighbor analysis could observe significant non-CpG methylation events under conditions where methods with low vertical resolution (e.g. hydrazine) could not.
In conclusion, current methodology can map the distribution of m5C in any genome at nucleotide resolution, although a prudent investigator will apply more than one method to the same problem. New methods for mapping other covalent base modifications in higher eukaryotes will be necessary only when such modifications are discovered in eukaryotes. In prokaryotes where base modifications other than m5C are important, the MSRE method can map most, if not all of them, because they all appear to be involved in restriction and modification.
T.R. was a postdoctoral fellow of the Deutsche Forschungsgemeinschaft. This work was supported by research grant Zo 59-2/2 of the Deutsche Forschungsgemeinschaft awarded to H.Z. We thank Prof. E.-L.Winnacker for his support, and D.Natale, B.Howard, N.Martin and the reviewers for their helpful advice and information.