The biological significance of 5-methylcytosine was in doubt for many years, but is no longer. Through targeted mutagenesis in mice it has been learnt that every protein shown by biochemical tests to be involved in the establishment, maintenance or interpretation of genomic methylation patterns is encoded by an essential gene. A human genetic disorder (ICF syndrome) has recently been shown to be caused by mutations in the DNA methyltransferase 3B (DNMT3B) gene. A second human disorder (Rett syndrome) has been found to result from mutations in the MECP2 gene, which encodes a protein that binds to methylated DNA. Global genome demethylation caused by targeted mutations in the DNA methyltransferase-1 (Dnmt1) gene has shown that cytosine methylation plays essential roles in X-inactivation, genomic imprinting and genome stabilization. The majority of genomic 5-methylcytosine is now known to enforce the transcriptional silence of the enormous burden of transposons and retroviruses that have accumulated in the mammalian genome. It has also become clear that programmed changes in methylation patterns are less important in the regulation of mammalian development than was previously believed. Although a number of outstanding questions have yet to be answered (one of these questions involves the nature of the cues that designate sites for methylation at particular stages of gametogenesis and early development), studies of DNA methyltransferases are likely to provide further insights into the biological functions of genomic methylation patterns.
Received 22 June 2000; Accepted 13 July 2000.
FORM AND FUNCTION OF GEMOMIC METHYLATION PATTERNS
The mammalian genome contains ∼3 × 107 residues of 5-methylcytosine (m5C), mostly within 5′-m5CG-3′ dinucleotides. Cytosine methylation raises the coding capacity of the genome and could, in principle, play any number of roles, although the real functions have been elusive and the subject of warm controversy over the past 25 years. It is still popularly believed that reversible methylation and demethylation regulate the normal development of mammals, although no single tissue-specific gene has been proven to be regulated in this way. The promoters of tissue-specific genes that are contained within CpG islands are largely unmethylated in both expressing and non-expressing tissues under normal conditions (except in the case of certain imprinted genes and genes on the inactive X chromosomes in females). The promoters of genes that lack CpG islands (and are therefore less sensitive to the inhibitory effects of methylation) are not heavily methylated in non-expressing tissues (1). The mere binding of certain transcription factors, or even the Escherichia coli lac repressor in cells transfected with the lac operator, can drive the loss of methylation from flanking CpG dinucleotides in dividing cells (2,3). The demethylation that is sometimes observed to accompany the onset of transcription of such genes is therefore more likely to represent a consequence, and not a cause, of gene activation (1,2). Several additional lines of evidence conflict with the methylation-development hypothesis; these have been discussed elsewhere (1). It has become increasingly difficult to maintain that reversible cytosine methylation has a major role in the regulation of mammalian development, which instead depends on well-conserved regulatory networks that operate even in organisms that do not modify their DNA.
Cytosine methylation is certainly involved in irreversible promoter silencing of many imprinted genes and genes subject to X-inactivation and of the promoters of transposons and endogenous retroviruses. The potential for expression of imprinted genes is irreversibly set in the germline of the preceding generation and that of X-linked genes in early post-implantation embryos. Irreversible promoter silencing appears to be restricted to organisms whose genomes contain modified bases, although a subset of imprinted genes have been reported to show methylation-independent imprinted expression (4,5). It may be useful to think of cytosine methylation as an additional regulatory signal that endows organisms that have modified bases with new abilities, rather than as an ancillary of conserved regulatory networks that operate in all metazoa, including those (such as Caenorhabditis elegans and Drosophila) which lack DNA modification.
Outcrossing sexual reproduction favors the evolution of aggressive and harmful transposons, which in turn selects for the evolution of host functions that repress those transposons (6). Imprinted genes and genes subject to X-inactivation (in females) account for <10% of the m5C in the genome; the large majority is actually in transposons (7), which are abundant [>106 elements, and >40% of the genome (8)] and relatively rich in CpG dinucleotides. Most cellular genes contain multiple transposons within introns, where their transcriptional activation would be expected to interfere with regulated expression of the host gene (7). Transposons also destabilize the genome by insertional mutagenesis and by favoring rearrangements via recombination between non-allelic repeats (7). A large and expanding body of evidence confirms that the role of the majority of the m5C in the mammalian genome is host defense against transposons. This provides short-term protection through the strong repressive effects of cytosine methylation, and permanent inactivation through the accumulation of C→T mutations that occur at high frequency at methylated sites (7). Transposons are heavily methylated in all cell types (8); this includes germ cells of both sexes (with the exception of the primordial germ cell, which is short-lived and in which alternative repressive mechanisms operate). A direct test of the host-defense hypothesis showed that demethylation in DNA methyltransferase-deficient mouse embryos caused fulminating transcription of a class of retroposon that is still capable of transposition in mice (9). Mechanisms that the cell may employ to identify and silence transposons have been described (10).
Disruption of global methylation patterns is lethal to mammals (11). Even focal demethylation or hypermethylation at imprinted loci can cause developmental abnormalities (12), and demethylation of classical satellite DNA in ICF (immunodeficiency, centromere instability and facial anomalies) syndrome can cause chromosome instability and fatal immunodeficiency (13). Ectopic de novo methylation of tumor suppressor genes may also contribute to oncogenesis (12). However, rather little is known of the cues that trigger de novo methylation or of the identity of the factors that respond to these cues. Much of what has been learned of the biological roles of genomic methylation patterns has come from studies of the DNA (cytosine-5)-methyltransferases themselves and the phenotypes that result from mutations in DNA methyltransferase genes.
THE UNUSUAL CATALYTIC MECHANISM OF DNA (CYTOSINE-5)-METHYLTRANSFERASES
The 5 position of cytosine is relatively unreactive, and its methylation in neutral aqueous solution has been called a ‘chemically improbable’ reaction (14). The catalytic mechanism of DNA (cytosine-5)-methyltransferases is correspondingly unusual (Fig. 1a). Santi et al. (15) proposed that the DNA cytosine-methyltransferases might use a mechanism similar to that of thymidylate synthetase, in which an enzyme cysteine thiolate adds covalently to the 6 position, thereby pushing electrons to the 5 position to make the carbanion, which could then attack the methyl group of N5,N10-methylenetetrahydrofolate. After methyl transfer, abstraction of a proton from the 5 position could allow reformation of the 5–6 double bond and release of enzyme by β-elimination (Fig. 1a). A similar mechanism could be used by DNA methyltransferases, except that the substrate is cytosine in DNA (rather than free dUMP) and the methyl donor is S-adenosyl-l-methionine (AdoMet). Erlanson et al. (16) pointed out that the approach trajectories to both the 5 and 6 positions of cytosine in DNA were occluded by neighboring nucleotides and suggested that the target base was extrahelical during the methyl transfer reaction; they also suggested that covalent addition of enzyme created a reactive 4–5 enamine rather than a 5-carbanion, which is too high in energy to exist under physiological conditions. The steric embarrassments were relieved by the remarkable DNA–DNA methyltransferase co-crystal structures of Klimasauskas et al. (17), who found that the target cytosine is everted from the DNA helix and inserted deep into the active site of the enzyme (Fig. 1b).
All enzymes that modify the 5 position of pyrimidines appear to use a variant of the reaction mechanism described above, and most [including all known DNA (cytosine-5)-methyltransferases] have a conserved prolylcysteinyl active site dipeptide that provides the cysteine thiolate (18). The DNA cytosine-methyltransferases bear ten characteristic sequence motifs (19,20), six of which are strongly conserved. Motifs I and X fold together to form most of the AdoMet binding site, motif IV contains the prolylcysteinyl dipeptide that provides the thiolate at the active site, motif VI contains the glutamyl residue that protonates the 3 position of the target cytosine, and motif IX has a role in maintaining the structure of the target recognition domain (usually located between motifs VIII and IX) that makes base-specific contacts in the major groove (18). All or most of these motifs are discernable in all DNA cytosine-methyltransferases of bacteria, fungi, plants and mammals, and a number of DNA methyltransferases have been identified from searches of anonymous expressed sequence tags (ESTs). With few exceptions, the set of motifs has proven to be a reliable diagnostic template in EST searches.
STRUCTURE AND FUNCTION OF Dnmt1
The wide conservation of DNA cytosine-methyltransferases was first revealed by the purification and cloning of the first eukaryotic DNA methyltransferase (21), which remains the sole mammalian DNA methyltransferase to have been identified by biochemical assay. This enzyme is now properly termed Dnmt1 (OMIM 126375), although many other names have been invented by various laboratories. This protein contains 1620 amino acids (an interesting form that lacks 118 N-terminal amino acids is found in oocytes and will be discussed later). Dnmt1 has a 5- to 30-fold preference for hemimethylated substrates (22), and as a result has been assigned a function in the maintenance of methylation patterns. However, this assignment was made largely to satisfy predictions made in the mid 1970s, and there is no direct evidence that Dnmt1 is not also involved in certain types of de novo methylation. Dnmt1 exerts the overwhelming majority of de novo methylation activity in embryo lysates and has little sequence specificity beyond the CpG dinucleotide (22). Homologs of Dnmt1 have been found in nearly all eukaryotes whose DNA bears m5C, but not in those that lack it. A report of a Dnmt1 homolog in Drosophila (23) may not have been completely accurate, as the genome sequence contains no evidence of such a homolog.
As shown in Figure 2a, Dnmt1 has an C-terminal domain that is related to bacterial restriction methyltransferases (21); the C-terminal domain is in fact more closely related to many of the bacterial enzymes than to mammalian DNA methyltransferases of the Dnmt2 and Dnmt3 families (Fig. 3). A large N-terminal domain has accreted multiple domains that provide functions specialized to eukaryotes; these functions include import into nuclei, the co-ordination of replication and methylation during S-phase, and the partial suppression of de novo methylation (18). The domain that targets Dnmt1 to replication foci (24) mediates a dramatic redistribution of Dnmt1 during S-phase: a uniform nucleoplasmic distribution in G1-phase is followed in S-phase by a coalescence into discrete foci that are organized around aggregations of γ satellite DNA in mouse fibroblast nuclei (Fig. 2b and c). During mid and late S-phase these large toroidal foci are the major sites of DNA replication; early in S phase there are many small replication foci throughout the nucleus (24).
Targeted mutations of the Dnmt1 gene (11,25) are recessive lethals that produce a number of unique phenotypes in mice. First, the Dnmt1 mutation produces a lethal differentiation phenotype in which homozygous mutant embryonic stem (ES) cells grow normally with severely demethylated genomes but undergo cell-autonomous apoptosis when induced to differentiate (11). Second, embryos homozygous for mutations at Dnmt1 show biallelic expression of several (but not all) imprinted genes (25,26). Third, homozygous Dnmt1-null embryos show transient ectopic expression of all copies of Xist and evidence of at least transient inactivation of all X chromosomes (27). Demethylation in ES cells has also been reported to cause an increased frequency of deletion and rearrangement mutations (28), probably through an increased rate of homologous recombination among demethylated and unmasked repeated sequences. Trace amounts of m5C persist in the genomes of Dnmt1-null ES cells and the capacity to methylate newly integrated retroviral DNA is partially retained, which requires the existence of one or more additional DNA methyltransferases (29).
SEX-SPECIFIC PROMOTERS AND EXONS AT THE Dnmt1 LOCUS
The Dnmt1 gene is unique in that expression is driven by sex-specific promoters and 5′ exons (Fig. 4). The 5′-most promoter introduces an oocyte-specific 5′ exon (exon 1o) which causes translation to initiate at an ATG codon in exon 4; the resulting protein is shorter than the somatic form by 118 N-terminal amino acids (30). This truncated oocyte-specific form of Dnmt1 (Dnmt1o) is enzymatically active and accumulates to very high levels in the oocyte; it is nuclear only at the earliest stages of oocyte growth, and just prior to ovulation comes to be localized in a cytoplasmic shell just within the oocyte cortex (31). Dnmt1o protein is cytoplasmic in pre-implantion embryos, but specifically enters and then exits nuclei at the 8-cell stage (30,31), and does not become fully nuclear until after implantation, where it is soon replaced by the full-length somatic form. Dnmt1 is localized largely or exclusively to nuclei of all somatic cells examined and is cytoplasmic only in the oocyte and pre-implantation embryo. The biological function of the elaborate and unprecedented nuclear–cytoplasmic trafficking of Dnmt1 during oogenesis and early development is currently unknown; the brief entry into and exit from nuclei at the 8-cell stage is especially intriguing. It should also be noted that somatic nuclei contain relatively large amounts of a form of Dnmt1 that is absent from the oocyte and pre-implantation embryo, and early development in the presence of this ectopic Dnmt1 may contribute to the poor success rates and developmental abnormalities commonly seen in offspring derived by transplantation of somatic nuclei into ooplasts.
A promoter and exon (exon 1s) that are active in all somatic cells is located ∼7 kb 3′ of exon 1o (Fig. 4). Promoter 1s functions as the housekeeping promoter (22), and exon 1s contains the ATG codon that initiates the full-length Dnmt1 in somatic cells (30). Promoter 1s is activated shortly after implantation and by post-coitum day 7 all detectable Dnmt1 protein is the full-length form (30). Promoter 1s is active in all cycling cells but is downregulated under conditions of growth arrest. There were early reports of massive overexpression of Dnmt1 in tumor cells (32), but later reports made it clear that expression in tumor cells is at most only slightly elevated over that of non-transformed cells (33–35). It has also been reported that the Dnmt1 gene contains >11 transcriptional start sites spread over many kilobases (36), and that some of these promoters respond to the products of oncogenes (37,38). However, the promoters that were reported to respond to c-Jun and RB1 are just 5′ of exon 4, and several kilobases 3′ of the initiation codon in exon 1s. Transcription initiation at the putative oncogene-sensitive sites could only yield the truncated protein that is found in oocytes. This form of Dnmt1 has not been observed in somatic cells. For this and other reasons, the identification of the many transcriptional start sites described by Bigey et al. may have been in error (36). Most data indicate that there is a single transcriptional start site in adult somatic cells and the mode of regulation of the somatic promoter of Dnmt1 is consistent with that of other genes whose products are associated with DNA replication (39). The more remarkable attributes ascribed to the somatic Dnmt1 promoter have been difficult to confirm. Repression of DNMT1 has been suggested as a new therapy for certain cancers, but DNMT1 is not significantly overexpressed in tumors, and the loss of DNMT1 function is lethal to normal cells (29). These findings greatly reduce the promise of DNMT1 as a target of repression in the clinical management of cancer.
Just 3′ of exon 1s lies promoter and exon 1p (Fig. 4), which is active only in the pachytene spermatocyte, where it gives rise to the major or sole transcript (30). Exon 1p contains multiple short open reading frames which would be expected to interfere with translation of the Dnmt1 open reading frame (which in this mRNA is the same as that of the oocyte-specific mRNA). In keeping with this expectation, the pachytene spermatocyte does not contain detectable Dnmt1 protein and the abundant mRNA that contains exon 1p is not associated with polyribosomes (30,40). It is not known why the spermatocyte should have evolved a combined transcriptional and post-transcriptional mechanism for the downregulation of Dnmt1 protein at the pachytene stage. It should be noted that Dnmt1 protein is absent from both oocytes and spermatocytes at the time of meiotic recombination (30). Dnmt1 is produced to high levels after this stage of oogenesis but does not reappear after recombination during spermatogenesis (30).
The relationship between Dnmt1 protein and mRNA levels in germ cells is unusual. Dnmt1 protein is present at very high levels in mature oocytes and pre-implantation embryos, but mRNA levels are low at these stages. Conversely, Dnmt1 mRNA levels in the pachytene spermatocyte are high, but protein levels are low (30). Analysis of mRNA levels therefore gives a large underestimate of protein levels in oocytes and early embryos and a large overestimate in pachytene spermatocytes.
THE ENIGMATIC Dnmt2 FAMILY
For a decade Dnmt1 was the only DNA methyltransferase to have been identified in a mammal. New candidate DNA methyltransferases identified by searches of EST databases were reported in 1998. The first of these encodes Dnmt2 (41), which is most similar to pmt1p (42) of Schizosaccharomyces pombe, an organism not known to methylate its DNA (Fig. 3b). Disruption of the pmt1+ gene in S.pombe gave no discernible phenotype, and transmethylation activity could not be detected when recombinant pmt1p was subjected to biochemical assays (42). Disruption of Dnmt2, the mouse homolog of pmt1+, had no obvious effect on genomic methylation patterns in embryonic stem cells, nor did it affect the ability of such cells to methylate newly integrated retroviral DNA (43). There are well-conserved Dnmt2 homologs in plants, vertebrates, D.melanogaster and S.pombe, but no related sequence is found in the genomes of Saccharomyces cerevisiae or C.elegans (none of the latter four species are known to methylate their DNA and none have other DNA methyltransferase homologs). The surprising phylogenetic distribution of Dnmt2 homologs might provide a hint as to biological role: centromere structure and function is conserved among the organisms that contain Dnmt2 homologs, but is quite different in the organisms that lack them. Saccharomyces cerevisiae has compact centromeres very different from those of other eukaryotes, and C.elegans has holocentric chromosomes without discrete centromeres. Although a biological role remains to be demonstrated for any member of the Dnmt2 family in any species, a role in some aspects of centromere function is a possibility.
THE Dnmt3 FAMILY
Additional DNA methyltransferases soon appeared in EST databases. These enzymes [Dnmt3A and Dnmt3B (44)] are distantly related to the Dnmt1 and Dnmt2 families (Fig. 3) and, in fact, are most closely related to the multispecific DNA methyltransferases encoded by bacteriophages that infect Bacillus species (Fig. 3). Both DNMT3A and DNMT3B had been mapped by the Unigene consortium via polymorphisms in 3′-untranslated region sequences. DNMT3B mapped to the region of chromosome 20q that contains the trait for ICF NSsyndrome (45). This syndrome presents with variable combined immunodeficiency, mild facial anomalies and extravagant cytogenetic abnormalities which largely affect the pericentric regions of chromosomes 1, 9 and 16. These pericentric regions contain a type of satellite DNA termed classical satellite, or satellites 2 and 3. It is normally heavily methylated, but is nearly completely unmethylated in DNA of ICF patients (46). It was soon found that ICF patients had mutations in the C-terminal DNA methyltransferase domain of DNMT3B (13). Although classical satellite sequences were completely demethylated, none of the patients were homozygous for null alleles of DNMT3B (13). This suggested that null alleles might be lethal for reasons other than their loss of DNA methyltransferase activity, which may explain the lethality of targeted null alleles of Dnmt3B in mice (47). In addition to classical satellite, demethylation of DNA in ICF patients is also seen at CpG islands on the inactive X chromosome in females and at two repeat families, one of which (D4Z4) has been tied to facioscapulohumeral muscular dystrophy (48). ICF patients have not been observed to suffer from this condition and the lack of methylation of CpG islands on the inactive X chromosome does not cause the symptoms of ICF syndrome to differ notably between male and female patients (49). Whereas inactivation of Dnmt1 causes global demethylation of the genome (11), DNMT3B appears to be specialized for the methylation of a particular compartment of the genome; loss of DNMT3B activity in ICF syndrome causes demethylation of only specific families of repeated sequences and CpG islands on the inactive X chromosome. Classical satellite DNA and CpG islands on the inactive X normally undergo de novomethylation soon after implantation (6), at which time DNMT3B may be especially active. DNMT3B also remains the only DNA methyltransferase shown to be mutated in a human disease. Disruption of Dnmt3A is also lethal to mice and the Dnmt3A/Dnmt3B double mutant has been reported to be unable to methylate newly integrated retroviral DNA, whereas each single mutant retains this ability (47). The locations of DNA methyltransferase genes, classical satellite tracts and genes currently known to influence methylation patterns are shown in Figure 5.
As mentioned previously, it is strongly held in some quarters that maintenance and de novo methylation must be performed by separate enzymes. In this model, sequence-specific de novo methyltransferases act at specific stages of gametogenesis and early development to establish methylation patterns, which would then be maintained during cell division by sequence-independent DNA methyltransferases that can methylate only hemimethylated substrates. In order to satisfy this expectation, Dnmt3A and Dnmt3B have been assigned the former role, and Dnmt1 the latter (47). However, the real situation is not nearly so simple. No mammalian DNA methyltransferase has been shown to be sequence specific. The preference of Dnmt1 for hemimethylated substrates is not large (22) and the specific activity of Dnmt1 on unmethylated DNA substrates is much greater than that of Dnmt3A or Dnmt3B. Dnmt1 is also present at much higher levels than either of the latter enzymes. Furthermore, Dnmt3A and Dnmt3B are present in somatic cells and (if they are dedicated de novo DNA methyltransferases responsible for the establishment of methylation patterns) would be expected to eliminate allele-specific methylation patterns at imprinted loci during development. It is not unlikely that methylation imprints are established by as-yet undiscovered DNA methyltransferases. Methylation patterns are established in males by de novo methylation in prospermatogonia at 14–20 days post-coitum and in females during growth of dictyate oocytes at >5 days post-partum (9). It is not clear that EST libraries enriched in the relevant cell types have been prepared, but it is not unlikely that when examined such libraries will be found to contain new and possibly sex-specific DNA methyltransferases. The features that designate a particular region for de novo methylation, and the factors that respond to these cues, have yet to be identified.
A few years ago the biological significance of cytosine methylation was widely doubted and among the small group of believers a role in tissue-specific gene expression was most often invoked. It is now clear that perturbations of genomic methylation patterns can have diverse and severe effects on phenotype. Genome stability, allele-specific expression of imprinted genes and those subject to X-inactivation, the transcriptional silencing and masking of transposons and the assembly of higher-order chromatin structures on classical satellite DNA are all clearly dependent of cytosine methylation. A direct role in developmental gene control has come to seem increasingly unlikely. The current rate of progress is quite rapid and we can expect more surprises (and perhaps more controversy) as experimental studies of cytosine methylation and other aspects of epigenetic phenomena in mammals continue to gain momentum.
I apologize to those authors whose work could not be cited due to length limitations. I thank M. Goll for comments on the manuscript and R. Chaillet, X. Cheng, J. Trasler and M. Yanagida for discussions. This work was supported by grants GM59377 and HD37687 from the NIH and by a grant from the Leukemia and Lymphoma Society.
Tel: +1 212 305 5331; Fax: +1 212 740 0992; Email: firstname.lastname@example.org
- transcription, genetic
- genetic disorder
- dna modification methylases
- genes, essential
- genomic imprinting
- mutagenesis, site-directed
- rett's disorder
- x inactivation
- dna transposons
- immunodeficiency syndrome, variable
- mecp2 gene
- binding (molecular function)
- biochemical test