DNA methylation is a type of epigenetic marking that strongly influences chromatin structure and gene expression in plants and mammals. Over the past decade, DNA methylation has been intensively investigated in order to elucidate its control mechanisms. These studies have shown that small RNAs are involved in the induction of DNA methylation, that there is a relationship between DNA methylation and histone methylation, and that the base excision repair pathway has an important role in DNA demethylation. Some aspects of DNA methylation have also been shown to be shared with mammals, suggesting that the regulatory pathways are, in part at least, evolutionarily conserved. Considerable progress has been made in elucidating the mechanisms that control DNA methylation; however, many aspects of the mechanisms that read the information encoded by DNA methylation and mediate this into downstream regulation remain uncertain, although some candidate proteins have been identified. DNA methylation has a vital role in the inactivation of transposons, suggesting that DNA methylation is a key factor in the evolution and adaptation of plants.
Gene expression in eukaryotes is highly influenced by chromatin structure. For example, gene expression is inactivated in compact chromatin which consists of highly packed nucleosomes (heterochromatin) but is active in less condensed chromatin (euchromatin). Although the mechanism(s) that defines chromatin status is still elusive, some of the chemical modifications of DNA and histone proteins, known as epigenetic marks, that may influence chromatin structure have been identified. Methylation of the fifth carbon of cytosine residues is one of the most extensively studied epigenetic modifications in both plants and mammals. While cytosine methylation is mostly limited to CG dinucleotide sequence contexts in mammalian genomes, except for embryonic stem cells in which 25% of cytosine methylations are found in a non-CG context (Lister et al. 2009), plants additionally harbor methylated cytosines in CHG and CHH sequence contexts (H = A, C or T) throughout their genomes (Cokus et al. 2008). DNA methylation is relatively stable and can be maintained during DNA replication. This stability is also important for plant defenses against transposons (mobile elements) that can induce mutations and chromosome instability. The proper regulation of DNA methylation is crucial for mammalian development, and abnormalities in the patterns of methylation are closely associated with human diseases (Bird 2002, Robertson 2005). DNA methylation also plays pivotal roles in the development and environmental responses of plants. Recent studies have suggested that dominance relationships in self-incompatibility, sex determination and shoot regeneration involve DNA methylation for their proper regulation, as well as the genomic imprinting described by Ikeda et al. in this issue (Shiba et al. 2006, Martin et al. 2009, Li et al. 2011). In addition, the dynamics of the DNA methylation pattern associated with developmental processes have been unveiled through studies of gametophyte development (see Dickinson and Gutierrez-Marcos in this issue). On the other hand, a decrease in the DNA methylation level influences the rates of infection by bacterial pathogens (Pavet et al. 2006). In this issue, Mittelesten-Scheid and Pecinka have reviewed epigenetic regulation including DNA methylation under stress adaptation. In the last decade, the mechanisms that regulate DNA methylation have been gradually elucidated in plants. Many of the advances in this field have derived from studies using Arabidopsis mutants, which have enabled identification of many factors involved in DNA methylation and have demonstrated that some mechanisms are conserved and shared with mammals. In this comprehensive review we briefly describe the achievements of these studies and outline the regulatory mechanisms of DNA methylation. In addition, we will discuss the current understanding of the roles of DNA methylation in transposon regulation.
DNA Methyltransferases in Plants
In Arabidopsis, two maintenance type and one de novo type of DNA methyltransferase have been identified. DNA METHYLTRANSFERASE 1 (MET1), a homolog of the mammalian Dnmt1 methyltransferase, catalyzes methylation at a CG dinucleotide site, whereas CHROMOMETHYLASE 3 (CMT3), a plant-specific DNA methyltransferase, catalyzes methylation at CHG sites. Both CG and CHG sites are symmetric sequences, therefore, the methylation patterns at these sites can be transmitted to and maintained in the sister strand during DNA replication by the activities of MET1 and CMT3, respectively. Asymmetric CHH methylation is maintained by de novo methylation that is catalyzed by CMT3 and DOMAINS REARRANGED METHYLTRANSFERASE 2 (DRM2), an ortholog of the mammalian Dnmt3a/b de novo methyltransferase (Cao et al. 2003, Chan et al. 2005). A recent study showed that DRM3, a catalytically mutated DNA methyltransferase paralog, also plays an important role in establishing de novo cytosine methylation in all sequence contexts in the process of RNA-directed DNA methylation (RdDM) by stimulating the activity of DRM2 (Henderson et al. 2010). In mammals, it has been shown that the interaction between Dnmt3a and the catalytically mutated DNA methyltransferase Dnmt3L is required for establishing de novo DNA methylation during germline development (Jia et al. 2007). Although phylogenetic analyses indicate that DRM2/DRM3 and Dnmt3a/Dnmt3L evolved independently in plants and animals, it seems likely that both organisms acquired a similar mechanism for de novo DNA methylation. It would be important to investigate the relationship between DRM2 and DRM3 biochemically for an in-depth understanding of the mechanism of de novo DNA methylation in plants.
RNA-Directed DNA Methylation
Atypical DNA-dependent RNA polymerases
RdDM was first described by Wassenegger et al. (1994) and refers to an RNA interference (RNAi)-based chromatin modification process in which 24 nucleotide (nt) small interfering RNAs (siRNAs) induce de novo cytosine methylation in all sequence contexts (CG, CHG and CHH) in DNA regions with sequence complementarity to the siRNAs (Law and Jacobsen 2010, Haag and Pikaard 2011, X. J. He et al. 2011, Kanno and Habu 2011). In mammals, 25–30 nt PIWI-interacting RNA (piRNA) is required for the control of transposable elements via DNA methylation during male gametogenesis (Thomson and Lin 2009). In Arabidopsis, RdDM is involved in several biological processes, such as biotic/abiotic stress responses and plant development, in addition to maintaining the inert state of transposons/repetitive sequences (Law and Jacobsen 2010, Haag and Pikaard 2011, X. J. He et al. 2011, Kanno and Habu 2011). The molecular mechanism for RdDM involves two steps: first, biogenesis of the 24 nt siRNAs; and, secondly, the conversion of siRNA signals into de novo DNA methylation at the target DNA regions. Each step requires a plant-specific DNA-dependent RNA polymerase, abbreviated as Pol IV and a Pol V, respectively.
Phylogenetic analyses of NRPD1 and NRPE1, the largest subunits of Pol IV and Pol V, respectively, showed that these atypical plant-specific polymerases evolved from Pol II for their specialized function in the RdDM pathway. In agreement with this conclusion, three mass spectrometric analyses have reported that both Pol IV and Pol V complexes consist of 12 subunits and that half of these subunits (the sixth, and 8th–12th subunits) are identical to those of Pol II (Huang et al. 2009, Ream et al. 2009, Law et al. 2011). The subunit compositions of each polymerase are summarized in Table 1. The second largest subunit, NRPD2/NRPE2, and the fourth subunit, NRPD4/NRPE4, are identical in Pol IV and Pol V but are different from their Pol II counterparts. There are two highly homologous proteins in the third subunit in Arabidopsis: Pol II harbors NRPB3/NRPD3A/NRPE3A, while Pol IV and Pol V can harbor either NRPB3/NRPD3A/NRPE3A or NRPD3B/NRPE3B. In the fifth subunit, Pol II contains NRPB5/NRPD5 and Pol V has NRPE5, while Pol IV has either NRPB5/NRPD5 or NRPD5B/NRPE5B, a homolog of NRPE5 (Huang et al. 2009, Law et al. 2011). With respect to the seventh subunit, the Arabidopsis genome contains NRPB7 of Pol II and three other genes, NRPD7A, NRPD7B/NRPE7B and NRPB7-like. To date, NRPD7A has only been identified in Pol IV, while NRPD7B/NRPE7B has been found in a subset of Pol IV and Pol V (Ream et al. 2009, Law et al. 2011). It remains unclear how the particular subunit compositions of Pol IV/Pol V affect the specific functions of each polymerase, and resolving this will be an important issue for future studies.
Summary of the mass spectrometric analysis data from Pikaard's (Ream et al. 2009), Baulcombe's (Huang et al. 2009) and Jacobsen's laboratories (Law et al. 2011) on the subunit compositions of Pol II, Pol IV and Pol V in Arabidopsis. The confirmed subunit of each polymerase complex is indicated by an orange box. The light green boxes indicate the subunit that could not be distinguished in related proteins due to the absence of unique peptides.
Small RNA biogenesis in RdDM
Although the RNA polymerase activity of Pol IV and its transcripts have not been confirmed either in vivo or in vitro, Pol IV is essential for production of siRNAs. Thus, Pol IV is considered to be an active polymerase for transcription of the target region of RdDM (Fig. 1A; Herr et al. 2005, Pontes et al. 2006, Zhang et al. 2007a). As a consequence of a physical interaction between Pol IV and RNA-DEPENDENT RNA POLYMERASE 2 (RDR2) (Law et al. 2011), the Pol IV transcript appears to convert immediately into double-stranded RNA (dsRNA). CLASSY 1 (CLSY1), the SWI2/SNF2-like chromatin remodeling factor, may have a function at some point in the early stages of siRNA biogenesis (Smith et al. 2007). This dsRNA is then chopped into 24 nt siRNAs by DICER-LIKE 3 (DCL3) (Xie et al. 2004); in turn, these siRNAs undergo a ‘maturation’ process mediated by HUA ENHANCER 1 (HEN1) in which a methyl group is added to their 3′ end (Yu et al. 2005). The mature siRNAs are loaded onto ARGONAUTE 4 (AGO4) to form a silencing effector complex. A recent study identified SAWADEE HOMEODOMAIN HOMOLOG 1 (SHH1), which physically interacts with NRPD1, as a novel component of siRNA biogenesis. Although the biochemical function of the protein in siRNA biogenesis is still unclear, the authors proposed that it might work as a transcription factor for recruiting Pol IV to the target DNA region. This proposal was based on the potential DNA binding ability of SHH1, which is characterized by a cryptic homeodomain and a SWADEE domain (Law et al. 2011).
Molecular mechanism of de novo DNA methylation in RdDM
Forward and reverse genetic approaches have identified a large number of proteins that are associated with the establishment of de novo methylation at sites homologous to the siRNAs. These proteins include the subunits of Pol V (Kanno et al. 2005, Onodera et al. 2005, Pontier et al. 2005, He et al. 2009a, Lahmy et al. 2009), SUPPRESSOR OF TY INSERTION 5-LIKE (SPT5L)/KOW DOMAIN-CONTAINING TRANSCRIPTION FACTOR 1 (KTF1) (Bies-Etheve et al. 2009, He et al. 2009a, Huang et al. 2009), DEFECTIVE IN RNA-DIRECTED DNA METHYLATION 1 (DRD1) (Kanno et al. 2004), DEFECTIVE IN MERISTEM SILENCING 3 (DMS3)/INVOLVED IN DE NOVO 1 (IDN1) (Kanno et al. 2008, Ausin et al. 2009), RNA-DIRECTED DNA METHYLATION 1 (RDM1) (Gao et al. 2010), DEFECTIVE IN MERISTEM SILENCING 4 (DMS4)/RNA-DIRECTED DNA METHYLATION 4 (RDM4) (He et al. 2009b, Kanno et al. 2010), INVOLVED IN DE NOVO 2 (IDN2)/RNA-DIRECTED DNA METHYLATION 12 (RDM12) (Ausin et al. 2009, Zheng et al. 2010), SUVH2 or SUVH9 (Johnson et al. 2008), DRM2 and DRM3 (Cao and Jacobsen 2002, Henderson et al. 2010, Naumann et al. 2011) (Fig. 1B).
Pol V-dependent non-coding RNAs that have 5′ modification such as triphosphate or a 7meG cap structure, but lack a poly(A) tail accumulate at several RdDM target loci; AGO4 can interact with the C-terminal domain of Pol V via the WG/GW motif, suggesting that the function of Pol V is to serve as a scaffold RNA and/or to recruit the siRNA–AGO4 complex to the methylation target region by siRNA–Pol V transcript sequence complementarity (El-Shami et al. 2007, Wierzbicki et al. 2008, Wierzbicki et al. 2009). A recent study showed that non-coding RNAs are transcribed by Pol II for a subset of RdDM target loci, and suggested that they are required to recruit Pol IV and/or Pol V to the target region (Zheng et al. 2009). The identities of the factors that provide locus specificity among the RdDM targets have yet to be determined. SPT5L/KTF1, a homolog of the yeast transcription elongation factor Spt5, binds to non-coding RNA and also AGO4 via the WG/GW motif in its C-terminus. SPT5L/KTF1 may contribute to the stabilization of the association between the siRNA–AGO4 complex and scaffold RNA (Bies-Etheve et al. 2009, He et al. 2009a).
IDN2/RDM12 shares some homology with SUPPRESSOR OF GENE SILENCING 3 (SGS3), which is involved in post-transcriptional gene silencing. IDN2/RDM12 possibly interacts with dsRNA via its XS domain; the levels of accumulation of siRNAs are largely unaffected in the idn2 mutant (Ausin et al. 2009), suggesting that IDN2/RDM12 has a downstream role in the siRNA biogenesis pathway. DMS3/IDN1, which has a hinge domain region similar to the structural maintenance of chromosome (SMC) protein, is expected to form a homodimer and may function as a clamp to stabilize homologous nucleic acids (Kanno et al. 2008, Ausin et al. 2009). IDN2/RDM12 and DMS3/IDN1 might contribute to the stabilization of base pairing between siRNA and scaffold RNA. RDM1 can interact with both AGO4 and DRM2, and can possibly bind to methylated single-stranded DNA through its unique fold that has a pocket-like structure (Gao et al. 2010). These characteristics of RDM1 suggest that it might work as a bridge between a silencing effector complex and a methylated/methylating target DNA region to recruit DRM2 together with DRM3.
Johnson et al. (2008) reported that SUVH2 or SUVH9 which have SET and SRA (SET and RING finger-associated) domains are required for DRM2-dependent DNA methylation in a locus-specific manner. Although some SET domain proteins mediate histone methylation (see the following section ‘Histone methylation and DNA methylation’), no change in the histone methylation pattern was observed in heterochrmatin in suvh2 and suvh9 mutants. On the other hand, it has been demonstrated that the SRA domain, which is considered as a methylcytosine-binding domain (see the section ‘The downstream mechanisms of DNA methylation’), of SUVH2 and SUVH9 can bind to methylated cytosines in the CG context and to methylated cytosines in the CHH context in vitro. A model where binding of SUVH2 or SUVH9 to methylated cytosine may be required to recruit or retain DRM2 at the RdDM target locus has been proposed (Johnson et al. 2008).
DRD1, a SWI2/SNF2-like chromatin remodeling factor, is required for the accumulation of non-coding RNAs transcribed by both Pol V and Pol II (Zheng et al. 2009), and was found to form a protein complex, DDR, with DMS3/IDN1 and RDM1 (Law et al. 2010). The proposed function of the DDR complex is to recruit Pol V to the templates and/or to activate Pol V for the transcription of non-coding RNAs. Notably, it has been shown that DMS3 is not essential for production of Pol II-dependent non-coding RNA but that DRD1 is indispensable (Zheng et al. 2009), indicating that the DDR complex is present only in some RdDM target loci. DMS4/RDM4 has homology to the yeast protein Interacts With RNA polymerase II 1 (IWR1), a dissociable factor of the core transcription machinery of Pol II (He et al. 2009b, Kanno et al. 2010). The Arabidopsis mutant dms4/rdm4 shows pleiotropic morphological defects in addition to reactivation of RdDM targets. Many thousands of Pol II transcripts show altered levels of accumulation in the dms4/rdm4 mutant, and there is evidence that DMS4/RDM4 can physically interact with Pol II, Pol IV and Pol V in vivo (He et al. 2009b, Kanno et al. 2010, Law et al. 2011). Recently, two different biological functions for yeast IWR1 were identified: first, IWR1 is required for importing the Pol II complex from the cytoplasm into the nucleus (Czeko et al. 2011); and, secondly, IWR1 is important for recruitment of the TATA-binding protein (TBP) to loci transcribed by Pol I, Pol II and Pol III (Esberg et al. 2011). By analogy with the yeast data, DMS4/RDM4 may bind to Pol II, Pol IV and Pol V to aid import of the polymerases into the nucleus or to facilitate polymerase activity at the RdDM target loci.
Histone Methylation and DNA Methylation
Histone methylation and DNA methylation in eukaryotic organisms other than plants
In addition to the RNAi-based mechanisms overviewed in the previous sections, covalent modifications of histone proteins play an important role in the establishment and maintenance of DNA methylation. In particular, methylation of histone H3 lysine 9 (H3K9) is essential for the formation of the heterochromatic state (Grewal and Jia 2007). Although some species, including the eukaryotic model organisms Drosophila melanogaster, Caenorhabditis elegans and Schizosaccharomyces pombe, have lost the capacity for cytosine methylation at some stage of evolution (Zemach et al. 2010), they have retained the H3K9 methylation and RNAi systems. In the filamentous fungus Neurospora crassa, DNA methylation is solely dependent on H3K9 methylation mediated by the histone methyltransferase Dim-5, and RNAi has little or no effective role in DNA methylation (Tamaru and Selker 2001, Freitag et al. 2004). In mammals, H3K9 methylation is mediated by SET domain proteins including G9a and SUV39H1 (Peters et al. 2001, Tachibana et al. 2002). H3K9 methylation is recognized by the chromodomain of Heterochromatin Protein 1 (HP1), which further recruits Dnmt1 (Fuks et al. 2003a), indicating a link between DNA methylation and H3K9 methylation. Indeed, loss of H3K9 methyltransferase function results in the reduction of DNA methylation in repetitive major satellite DNA, or at promoters of endogenous genes (Lehnertz et al. 2003, Tachibana et al. 2008). It is well established that the H3K9 methyltransferase Clr4 of the fission yeast S. pombe is recruited by an siRNA-containing complex called RITS (Grewal and Jia 2007), whereas the role of siRNAs in recruitment of the H3K9 methyltransferases to target loci in other organisms is still not well understood. Recent studies have demonstrated that chromatin-modifying enzymes, such as the polycomb group proteins (PcG) and G9a in mammals, can bind to long non-coding RNAs that recruit modifiers to target loci and induce transcriptional repression (Nagano et al. 2008, Pandey et al. 2008). It has been reported that a cis-acting non-coding RNA functions similarly in Arabidopsis to recruit PcG to target loci (Heo and Sung 2011), suggesting that it is possible that targeting mechanisms analogous to those of animal species are also present in plants.
Histone modifications and DNA methylation in plants
In plants, the histone modification status influences DNA methylation. As in other organisms, H3K9 methylation in plants is mediated by SET domain proteins, which are essential for transposon silencing and proper plant development (Jackson et al. 2002, Malagnac et al. 2002, Ding et al. 2007). In plant genomes, H3K9 methylation is enriched at repeat sequences and is associated with both DNA methylation and small RNAs, which are essential for the formation of constitutive heterochromatin (Bernatavichute et al. 2008, Lister et al. 2008, Zhou et al. 2010). In Arabidopsis, 15 protein homologs of the H3K9 methyltransferase SU(VAR)39 have been identified (Baumbusch et al. 2001). Among these, KRYPTONITE/SU(VAR) HOMOLOG 4 (KYP/SUVH4) primarily methylates H3K9 at repeat sequences in a redundant fashion with the homologs SUVH5 and SUVH6 (Jackson et al. 2002, Malagnac et al. 2002, Ebbs and Bender 2006, Bernatavichute et al. 2008). KYP/SUVH4 catalyzes mono- and dimethylation of K9 of H3 (Jackson et al. 2004). On the other hand, a recent study showed that the SET domain protein SUVR4 can convert monomethylated H3K9 to the trimethylated state especially in the presence of ubiquitin, and this activity affects transcriptional silencing of transposons (Veiseth et al. 2011).
Despite the conservation of SU(VAR)39-like histone methyltransferases and H3K9 methylation in plants, the readout mechanism for the H3K9 methylation signal on chromatin and its effect on DNA methylation seem to be different between animals and plants. For example, there is only one homolog of HP1 in Arabidopsis, named LHP1/TFL2, and it binds to methylated H3K27 rather than methylated H3K9 (Zhang et al. 2007b). In plants, H3K9 methylation is recognized by a plant-specific DNA methyltransferase named CHROMOMETHYLASE 3 (CMT3) (Lindroth et al. 2004). CMT3 mediates DNA methylation at non-CG sites, with a preference for CHG sites (Bartee et al. 2001, Lindroth et al. 2001). The SUVH family (SUVH1–SUVH10) in Arabidopsis has an SRA domain in the N-terminus, which preferentially binds to methylated cytosine by flipping out 5-methylcytosine from the DNA strand (Arita et al. 2008, Rajakumara et al. 2011). Thus, the histone H3K9 methyltransferases are recruited by DNA methylation via the SRA domain, and probably add H3K9 methylation marks to these loci. Sites of H3K9 methylation can also bind to the chromodomain encoded by CMT3, thus forming a self-reinforcing loop of repressive epigenetic marks at heterochromatic loci (Johnson et al. 2007) (Fig. 2).
As mentioned above, the mechanisms that establish H3K9 methylation/CHG methylation at target loci are still not well understood. Although the RNAi-based pathway might act upstream of H3K9 methylation as in S. pombe, several lines of evidence suggest that the two pathways act independently in plants. For example, inactivation of both the RdDM and the H3K9 methylation pathways in Arabidopsis results in pleiotropic developmental defects that are not observed in mutants of either pathway (Chan et al. 2006). The developmental defects are due to the ectopic expression of an F-box gene named SUPPRESSOR OF drm1 drm2 cmt3 (SDC) (Henderson and Jacobsen 2008). In wild-type somatic cells, SDC is transcriptionally silenced due to dense DNA methylation at seven tandem repeats in the promoter; this methylation is redundantly maintained by the RdDM and the H3K9 methylation pathways. Additional evidence for the independence of H3K9 methylation from RdDM comes from the fact that increases in H3K9 methylation can occur in the absence of RdDM factors. In Arabidopsis, H3K9 methylation is antagonized by a jumonji-C type histone demethylase INCREASED BONSAI METHYLATION 1 (IBM1) (Saze et al. 2008, Miura et al. 2009, Inagaki et al. 2010). In the ibm1 mutant, >4,000 actively transcribed genes become hypermethylated, particularly in CHG contexts, which concomitantly accumulate H3K9 methylation in their gene body (Miura et al. 2009, Inagaki et al. 2010). Hypermethylation is nearly completely suppressed by mutation of cmt3 or kyp (Saze et al. 2008, Inagaki et al. 2010), implying that H3K9 methylation/CHG methylation might occur more frequently in euchromatic regions than previously thought, although this methylation is eventually masked due to its active removal by IBM1 (Fig. 2). Importantly, ibm1-induced DNA hypermethylation does not depend on the RdDM pathway, since mutations of factors in the RdDM pathway, including the de novo methyltransferase DRM2, cannot suppress the effect of the mutation (Inagaki et al. 2010). The accumulation of CHG methylation peaks mainly in the middle of transcribed genes suggests a link between H3K9 methylation and transcription (Miura et al. 2009).
In addition to H3K9 methylation and histone methyltransferases, other histone modifications and chromatin modifiers can influence the maintenance of DNA methylation. In particular, pathways involved in the removal of active epigenetic marks from heterochromatin are important for maintenance of the repressive state. One example is monoubiquitination of histone H2B (H2Bub), which is associated with transcriptional activation, and is required for further deposition of active histone marks such as H3K4 and H3K36 methylation (Schmitz et al. 2009) (see Tamada et al. in this issue). In Arabidopsis, loss of ubiquitin protease UBP26 function, which can remove monoubiquitination at Lys143 of H2B, results in the reduction of DNA methylation and an accumulation of H3K4 methylation at silenced transgenes and transposons (Sridhar et al. 2007). In addition, direct removal of H3K4 methylation by distinct classes of histone demethylase is also required for the maintenance of DNA methylation. For example, defects in Arabidopsis H3K4 demethylase homologs of Lysine Specific Demethylase 1 (LSD1) cause the accumulation of H3K4 methylation and a loss of DNA methylation at the promoter of the FWA gene (Jiang et al. 2007). Similarly, jumonji-C domain-containing protein JMJ14 probably maintains non-CG methylation through removal of H3K4 trimethylation downstream of the RdDM pathway (Deleris et al. 2010, Searle et al. 2010). Hence, histone modifications are balanced and coordinated by positive and negative regulators, which can define the functional chromatin domains along the chromosomes (Roudier et al. 2011).
Genome-wide and locus-specific DNA demethylation are important processes during development and in environmental responses in mammals (Zhu 2009, Wu and Zhang 2010). In plants, allele-specific activation of imprinted genes is known to be regulated by DNA demethylation (see the review by Ikeda et al. in this issue). DNA demethylation can be achieved by both active and passive mechanisms. Active mechanisms remove methylation marks from DNA, whereas passive demethylation results from the inhibition of maintenance activity for DNA methylation during DNA replication. One such passive mechanism is the repression of MET1 expression during female gametogenesis that results in a decrease in the DNA methylation level in some imprinted genes (Jullien et al. 2008). Certain demethylation processes in plants and mammals occur rapidly and thus cannot be achieved by a passive mechanism that would need several rounds of DNA replications (Zhu 2009, Wu and Zhang 2010).
The identities of active mechanisms have been investigated and models for their behavior have been suggested for both plants and animals. The ROS1, DME, DML2 and DML3 proteins of Arabidopsis, which belong to a small family of DNA glycosylases involved in DNA base excision repair (BER), may be the main component of an active demethylation mechanism. The proteins are bifunctional enzymes and have an apyrimidinic lyase activity for cleaving phosphodiester bonds in the DNA backbone at abasic sites (Agius et al. 2006, Gehring et al. 2006, Morales-Ruiz et al. 2006, Penterman et al. 2007). The proteins can also specifically recognize 5-methylcytosine residues and, in vitro, attack the glycosidic bond to remove the methylcytosine base (Agius et al. 2006, Gehring et al. 2006, Morales-Ruiz et al. 2006, Penterman et al. 2007). The unmethylated cytosine may be inserted in a reaction mediated by an unknown DNA polymerase and DNA ligase in a manner similar to BER. Ectopic hypermethylation at specific loci was observed in mutants of these DNA glycosylase genes, which supports the hypothesis that they are active DNA demethylase enzymes (Choi et al. 2002, Gong et al. 2002, Penterman et al. 2007, Zhu et al. 2007).
Although no orthologs of DNA glycosylases of the ROS1/DME family have yet been identified in animals, nevertheless it is likely that BER is an important component of their mechanisms for DNA demethylation. Chick thymine DNA glycosylase and chick and human methyl-CpG-binding domain protein 4 (MBD4) have a DNA glycosylase activity for 5-methylcytosine, but it is much lower than the activity of thymine glycosylase for T/G mismatch repair (Jost et al. 1995, Zhu et al. 2000a, Zhu et al. 2000b). Therefore, deamination of 5-methylcytosine to thymine has been suggested to be an important step in the DNA demethylation process in animals (Kangaspeska et al. 2008, Métivier et al. 2008). In addition to deamination, oxidation of 5-methylcytosine may play a role in demethylation. TET1 oxidase converts 5-methylcytosine to 5-hydroxymethylcytosine, which can be a substrate for deamination enzymes, and the resultant hydroxymethyluracil may be replaced by methylcytosine through the threonine dehydrogenase and BER pathways (Kriaucionis and Heintz, 2009, Tahiliani et al. 2009, Cortellino et al. 2011, Guo et al. 2011). Recently it was reported that TET1 oxidase catalyzes iterative oxidation to yield 5-formylcytosine and, subsequently, 5-carboxylcytosine, which can be a substrate for thymine DNA glycosylase and/or other unknown decarboxylases (Y.-F. He et al. 2011, Ito et al. 2011). Hydroxymethylcytosine, however, cannot be recognized by maintenance mechanisms for DNA methylation, suggesting a role for passive DNA demethylation (Valinluck and Sowers 2007). In plants, TET1 and 5-hydroxycytosine have not been reported to date. Their absence may be associated with the evolution of DNA glycosylases that catalyze 5-methylcytosine directly.
Currently, little is known about the targeting mechanism that regulates DNA demethylation in plants and animals. However, in Arabidopsis, mutations in the DNA demethylation pathway have been shown to change DNA methylation levels in a small set of genes, indicating the existence of a targeting mechanism for the DNA demethylation machinery that leads to demethylation at specific loci (Choi et al. 2002, Gong et al. 2002, Penterman et al. 2007, Zhu et al. 2007). Arabidopsis ROS3 was isolated as one of the factors involved in the ROS1 demethylation pathway (Zheng et al. 2008). The ros3 mutant has a similar phenotype to that of ros1, and the ROS3 protein has been shown to co-localize with the ROS1 protein in nucleoplasmic foci and in the nucleolus. ROS3 has an RNA recognition motif which can bind small, single-stranded RNAs with specific sequences. It will be interesting to determine whether the small RNAs are associated with the DNA demethylation machinery and are involved in targeting demethylation to specific loci, just as RdDM does.
The Downstream Mechanisms of DNA Methylation
How are signals from DNA methylation mediated into downstream actions, such as changes in chromatin structure and regulation of gene expression? This is one of the most intriguing questions with respect to understanding the roles of DNA methylation. Although our understanding of the regulation of DNA methylation has increased (see the above sections), the downstream effects of DNA methylation remain less clear. Several models have been suggested for mechanisms that might read the information encoded by DNA methylation (Joulie et al. 2010). One possible mechanism involves the alteration of chromatin structure. DNA methylation modifies the flexibility of DNA, which may then influence nucleosome positioning (Segal and Widom 2009). Changes in nucleosome positioning can influence the accessibility of the DNA by the various factors involved in transcriptional regulation. A second possibility suggests a more direct effect in which transcription factors are prevented from binding to their target sequences. In mammals, inhibition of the binding of CTCF, a chromatin insulator, by DNA methylation is crucial for the regulation of imprinted genes (Bell and Felsenfield 2000, Hark et al. 2000). Additionally, the binding activities of various transcription factors, such as CREB or E2F, to the DNA sequences can be blocked by DNA methylation (Iguchi-Ariga and Schaffner 1989, Campanero et al. 2000). The plant homologs of these transcription factors are also prevented from binding to their targets by DNA methylation (Inamdar et al. 1991, Scebba et al. 2003), although the physiological roles of this inhibition by DNA methylation are unclear.
The third and most studied model proposes that methyl-CpG-binding proteins (MBPs) recognize methylated DNA and mediate downstream signals (Clouaire and Stancheva 2008, Sasai and Defosez 2009). MBPs are classified into three families by the differences in their motifs for recognizing DNA methylation. The first identified motif was the MBD (Lewis et al. 1992, Hendrich and Bird 1998). Mammalian MBD proteins are known to be involved in the protein complexes associated with histone deacetylase, methyltransferase and/or DNA methyltransferase activities that function for transcriptional repression (Nan et al. 1997, Nan et al. 1998, Jones et al. 1998, Ng et al. 1999, Zhang et al. 1999, Ng et al. 2000, Fuks et al. 2003b, Fujita et al. 2003, Kimura et al. 2003, Sarraf and Stancheva 2004, Kondo et al. 2005, Nan et al. 2007), which is consistent with the silencing activity of DNA methylation in the promoter region. MBD proteins have also been found in plants. The Arabidopsis genome has 13 genes encoding proteins with an MBD motif (Zemach and Grafi 2007). At least three of these MBD proteins, AtMBD5, AtMBD6 and AtMBD7, can bind to CG sequences in a methylation-dependent manner in vitro, while AtMBD5 also binds to methylated sites in a CHH context (Ito et al. 2003, Zemach and Grafi 2003, Schebba et al. 2003). AtMBD5, AtMBD6 and AtMBD7 are localized in a chromocenter in a highly methylated region (Scebba et al. 2003, Zemach et al. 2005). Two of the proteins, AtMBD5 and AtMBD6, are preferentially localized in a region adjacent to rDNA gene clusters, suggesting that they act on rDNA silencing (Zemach et al. 2005). In a similar manner to mammalian MBD proteins, AtMBD6 interacts with protein complexes with histone deacetylase activity (Zemach and Grafi 2003). However, down-regulation of AtMBD6 and AtMBD7 by T-DNA insertion does not produce an overt phenotype (Zemach and Grafi 2007). This is also the case in mammals; thus, for example, mice with a knockout of an MBD gene do not show a severe phenotype (Chen et al. 2001, Guy et al. 2001, Hendrich et al. 2001, Millar et al 2002, Zhao et al. 2003). As a result of the decrease in DNA methylation by DNA methyltransferases, plant met1 and mammal Dnmt1 mutants show severe developmental defects (Li et al. 1992, Mathieu et al. 2007). Studies of these mutants suggest that MBD proteins have functional redundancy or that other proteins have more pivotal roles for reading DNA methylation.
AtMBD9, one of the MBD proteins in Arabidopsis, has a unique feature in that it works as a transcriptional activator. Mutants of AtMBD9 showed an early flowering phenotype which is caused by transcriptional repression of FLC, a major flowering repressor (Peng et al. 2006). A decrease in histone acetylation and an increase in DNA methylation were observed in the FLC locus in the mutants, which is consistent with in vitro histone acetylation activity of AtMBD9 recombinant protein (Yaish et al. 2009). AtMBD9 is the largest member of the Arabidopsis MBD protein family possessing multiple chromatin-associated domains such as a PHD finger (Peng et al. 2006). Study of AtMBD9 indicates functional diversity of MBD proteins, although methyl DNA-binding activity has not been known yet. In mammals, MBD1 and MeCP2 are known to have specific functions in neurogenesis, indicating functional diversity in this protein family (Chen et al. 2001, Guy et al. 2001, Zhao et al. 2003).
The second class of MBPs includes proteins with an SRA domain, which is responsible for recognizing DNA methylation. As mentioned above, the plant histone methyltransferase family, SUVH, has an SRA domain, suggesting that proteins in this family translate DNA methylation into histone methylation (Johnson et al. 2007). Plant VIM/ORTH and mammal UHRF1 proteins have an SRA domain together with PHD and RING fingers. Mutations of VIM/ORTH and UHRF1 proteins produce similar phenotypes to those displayed by met1 and Dnmt1 mutants, respectively (Sharif et al. 2007, Woo et al. 2007, Woo et al. 2008). However, VIM/ORTH and UHRF1 proteins appear to work in maintaining DNA methylation levels rather than for reading and interpretation of signals, because the DNA methylation level is decreased in the mutants. Consistent with this, UHRF1 preferentially binds to hemimethylated DNA and associates with Dnmt1 during replication (Bostick et al. 2007). UHRF1 also associates with histone methyltransferase and deacetylase (Unoki et al. 2004, Karagianni et al. 2008), while the PHD finger can bind to H3K9 methylase (Sharif et al. 2007), suggesting a role in the maintenance of histone modifications.
A third class of MBPs includes proteins with zinc finger domains that bind to methylated DNA. Mammal Kaiso, ZBTB4 and ZBTB38 are known to belong to this protein group; however, no zinc finger proteins that bind to methylated DNA have yet been reported in plants (Prokhortchouk et al. 2001, Filion et al. 2006). Kaiso interacts with a protein complex that has histone deacetylase activity and works as a transcriptional repressor (Prokhortchouk et al. 2001). The mouse knockout of Kaiso does not show any developmental defects, and even the additional knockout of two MBD proteins, MeCP2 and MBD2, does not cause any greater effect, suggesting that Kaiso is dispensable in development (Prokhortchouk et al. 2006, Caballero et al. 2009).
MBPs are expected to be the main regulators for translating DNA methylation signals on the status of histone modifications into effects on gene expression. However, other as yet unknown downstream mechanisms for translating DNA methylation signals cannot be excluded. There may be other classes of MBPs and other downstream mediators of DNA methylation signaling in mammals and plants. Genetic studies using plant mutants have identified a candidate factor that might be involved in the downstream mechanisms for interpretation of DNA methylation signals. Mutations in Arabidopsis MOM1, a plant relative of mammalian CHD3 chromatin remodeling factors, cause the release of silencing of highly methylated repeats including pericentromeric and inactive 5S rDNA, transposons and transgenic loci (Amedeo et al. 2000, Steimer et al. 2000, Habu et al. 2006, Vaillant et al. 2006, Čaikovski et al. 2008, Numa et al. 2010, Yokthongwattana et al. 2010). However, the chromatin status, including DNA methylation, is not changed in mom1, while mutants of other silencing factors influence gene expression by changing the DNA methylation status (Amedeo et al. 2000, Probst et al. 2003, Habu et al. 2006, Vaillant et al. 2006). These fascinating features of mom1 imply that MOM1 acts on independent and/or downstream pathways of DNA methylation. Genetic analyses with RdDM factors showed that MOM1 acts in either a redundant or a cooperative manner on gene silencing and in a locus-dependent manner with the RdDM pathway (Numa et al. 2010, Yokthongwattana et al. 2010). In addition, MOM1 is suggested to interpret DNA methylation signals into H3K9 methylation at the SDC locus, which is a target of RdDM (Numa et al. 2010). The molecular action of MOM1 is still unknown, but elucidating how MOM1 works may be the key to solving how DNA methylation affects gene silencing.
Genome-wide analyses of DNA methylation have shown that in eukaryotes the coding regions of transcribed genes are highly methylated, so-called gene body methylation (Suzuki and Bird 2008). Their correlation with active gene expression suggests that they may be involved in mechanisms for reading DNA methylation different from those that act on gene repression in promoter regions. Common factors may interpret DNA methylation signals in both the promoter and gene body, but output from these factors could be different. The roles of gene body methylation are still elusive, but it tends to be observed in long and functionally important genes (Takuno and Gaut 2011), implying significance for gene body methylation. Although studies on the roles of gene body methylation have just started, it is likely that they will unveil how DNA methylation influences the status of both chromatin and gene expression.
DNA Methylation and Transposon Regulation
Transposons as major components of genomes
Transposons were discovered in maize by Barbara McClintock (McClintock 1965). Analysis of eukaryotic genome sequences revealed that a large fraction of the genome consists of transposons. In Arabidopsis, rice and maize, transposons occupy 12, 40 and 85% of the genomes, respectively (Arabidopsis Genome Initiative 2000, International Rice Genome Sequencing Project 2005, Schnable et al. 2009). Transposons are divided into two major classes based on their structure and transposition mechanism (Fig. 3). Retrotransposons (class I) involve a reverse transcription process with an RNA intermediate, and the transposition results in an increase in the copy number. DNA transposons (class II) translocate from the integrated site to a new site in the genome, and the copy number remains stable except when the element moves in the S or G2 phase of the cell cycle (Engels et al. 1990). While an autonomous element encodes the transposase protein required for transposition, a non-autonomous element does not encode a transposase. Thus, non-autonomous elements use the supplied transposase for transposition. Transposons can cause mutation by disruption of functional genes and can also induce chromosomal instability. Indeed, McClintock first identified transposable elements in a study of chromosome breaks at specific sites in maize, which she showed were caused by the transposition of a non-autonomous element, Dissociation (Ds), under the control of an autonomous element, Activator (Ac) (McClintock 1951). Since the original investigation, it has become clear that organisms have evolved epigenetic regulatory systems to inactivate transposition activity, because of its potential to damage the host genome (Slotkin and Martienssen 2007, Lisch 2009). In several plants, genome-wide analysis of DNA methylation showed that transposon sequences are heavily methylated (Liu et al. 2009, Zemach et al. 2010). Here, we describe the characteristic features of representative transposons and the regulation of transposition by DNA methylation.
Epigenetic regulation of DNA transposons
The correlation between cytosine DNA methylation and transposon activity was examined in the transposon superfamilies, CACTA, hAT and Mutator, using methylation-sensitive restriction enzymes (Chandler and Walbot 1986, Chomet et al. 1987, Banks et al. 1988). The Enhance/Suppressor-mutator (En/Spm) transposon belonging to the CACTA superfamily generates several mature transcripts (TnpA–TnpD) from a primary transcript by alternative splicing; two of these transcripts, TnpA and TnpD, are required for transposition. In addition to its role in transposition, the TnpA protein mediates demethylation of its promoter region (Fedoroff et al. 1995). It is suggested that TnpA induces DNA demethylation by binding to the newly replicated hemimethylated promoter regions (Cui and Fedoroff 2002). Ac/Ds is a member of the hAT superfamily and shows high transposition activity shortly after DNA replication in the unmethylated state (Ros and Kunze 2001). When Ac/Ds or its modified systems are used for transgenic gene tagging lines in various plants, inactivation of elements that were active in earlier generations has been reported (Izawa et al. 1997, Szeverenyi et al. 2006). Conversely, previously inactive Dart1 elements belonging to the hAT family in rice can translocate from integrated sites in transgenic Arabidopsis by demethylation associated with the cloning process (Shimatani et al. 2009). The transposition of non-autonomous nDart1-0 is induced by treatment with 5-azacytidine, a DNA methylation inhibitor, in a rice line carrying the pale yellow leaves mutation without the active autonomous element (Tsugane et al. 2006) (Fig. 4). Mutator (MuDR/Mu) elements exhibit a mutation frequency of 10−3–10−5 per locus per generation in maize and are characterized by high transposon activity (Walbot and Rudenko 2002). Alterations of methylation status in Mu elements are observed at various development stages in maize. Immature ears show an increased methylation of the Mu elements compared with young leaves, and, as a consequence of the higher level, germinal transposition is restricted and results in transmission of a stable genome to the next generation (Li et al. 2010). Mu killer (Muk), which is a natural derivative of an inverted duplication of autonomous MuDR, is a strong repressor of the Mu element. Muk produces hairpin transcripts that are processed to small RNAs by DCL3 processing. The small RNAs homologous to MuDR induce DNA methylation and post-transcriptional gene silencing of MuDR (Slotkin et al. 2005, Lisch 2009).
Epigenetic regulation of retrotransposons
Retrotransposons are subdivided into long terminal repeat (LTR) elements and non-LTR elements. LTR retrotransposons are the most abundant transposable elements in plant genomes, and the transposition mechanism exhibits significant similarity with the replication process of retroviruses. Tos17, a well-characterized Ty1/copia LTR retrotransposon in rice, is heavily methylated and immobilized under normal plant growth conditions (Cheng et al. 2006). However, Tos17 is activated in calli and gradually increases the copy number during prolonged callus culture (Hirochika et al. 1996). Several recent studies have shown that the transposition of Tos17 in calli or in a DNA glycosylase/lyase-modified line is associated with DNA hypomethylation (Cheng et al. 2006, Ding et al. 2007, La et al. 2011). An EVD element containing a copia structure was activated in Arabidopsis in a recombinant inbred line whose genome was maintained over several generations at a low methylation status by the met1 mutation. It was found that the transposon activity of the EVD element was suppressed not only transcriptionally but also post-transcriptionally in the wild type (Mirouze et al. 2009). The Arabidopsis Athila elements, that belong to the LTR family Ty3/gypsy, are only active in vegetative nurse cells, which do not transmit genetic information to the next generation. The 21 nt siRNA from the Athila elements accumulates in pollen. The findings suggested that siRNA suppresses the activation of Athila elements in the two sperm cells (Slotkin et al. 2009, Slotkin 2010). Short-interspersed elements (SINEs), tRNA-derived retrotransposons, were originally discovered and studied in mammalian systems (Shedlock and Okada 2000). In the human genome, the Alu sequences (a SINE retrotransposon) are involved in chromosome rearrangements that confer new functions (Mills et al. 2006). In the ddm1 mutant background, an epigenetically controlled SINE-related sequence was identified in the promoter region of the FWA gene (Kinoshita et al. 2004) that was originally identified as the gene responsible for a late-flowering phenotype (Koornneef et al. 1991). The FWA gene, an imprinted gene, is suppressed in the apical meristem and expressed in the endosperm (Kinoshita et al. 2004). The direct tandem repeat sequences of the SINE-related sequence in the promoter region of the FWA gene appear to be the critical cis-element for the transcription start site (Kinoshita et al. 2007). The methylation level of the transcription start site controls the expression of FWA in Arabidopsis species (Fujimoto et al. 2011). Transposition of long interspersed nuclear elements (LINEs) is also regulated by cytosine methylation. LINEs and SINEs are less abundant LTR retrotransposons in plant species, in sharp contrast to the situation in mammals. Transposition of Karma (a LINE element) in rice is affected by two factors. First, the induction of transcription from Karma requires hypomethylation in the callus. Secondly, for integration into the genome, a prolonged culture process is needed for the release of post-transcriptional regulation (Komatsu et al. 2003). Studies on active transposons show that they are regulated at each developmental stage by the host genome.
Transpositions induced by ddm1 mutation
The Arabidopsis genome has a low transposon content compared with other plant genomes. Although the reasons for the compact genome of Arabidopsis have not been resolved, Zhang and Wessler (2004) proposed that inhibition of an increase in transposon copy number rather than a large genomic deletion was responsible. In the global low DNA methylation state caused by the ddm1 mutation, transposition of various DNA transposons (Mutator superfamily, VANDAL21; and CACTA superfamily, CACTA1) and retrotransposons (gypsy family, ATGP3; and copia family, ATCOPIA13, ATCOPIA21 and ATCOPIA93) are observed (Miura et al. 2001, Tsukahara et al. 2009). Interestingly, following its activation by the ddm1 mutation, CACTA1 remained mobile in the absence of the mutation. Therefore, inactivation of the transposon was not the responsibility of DDM1 (Kato et al. 2004). While most transposons are located at the centromeric regions, new transposon insertions in the Arabidopsis ddm1 mutant were not restricted to the heterochromatin.
Transposons contribute to genome evolution and diversity. However, host genomes appear to have developed epigenetic mechanisms to suppress bursts of transposition activity, possibly to the disadvantage of the host organisms. Despite the existence of a variety of inhibitory mechanisms, active movement of the DNA transposons mPing and nDart1-0 results in an increased copy number under natural growth conditions (Naito et al. 2009, Hayashi-Tsugane et al. 2011). DNA methylation is a key event associated with transposon activity. Several studies on transposons have shown changes in the methylation status during plant development (Banks and Fedoroff 1989, Slotkin et al. 2009, Li et al. 2010). This phenomenon may also be an effective strategy by the host to suppress a burst of transposon activity. A fuller understanding of epigenetic regulation and the DNA methylation status of transposons could provide important clues to plant evolution and adaptation.
Recent advances in the methodologies for analysis of plant genomes have provided a better understanding of the mechanisms that regulate DNA methylation. More particularly, they have enabled the identification of new components of regulation such as the RdDM pathway; however, the molecular functions of the various factors and their interrelationships are still largely unknown. To remedy this lack of knowledge, it will be necessary to make much greater use of biochemical approaches in addition to genetic methods. Genome-wide analyses of mutants has provided insights into new aspects of the regulation of DNA methylation. The identification of locus-dependent methylation and demethylation using genome-wide analyses has stimulated a search for previously unknown mechanisms and cellular machineries that recruit regulatory proteins to their target loci. In addition, the methylation of transcribed genes, discovered by genome-wide analyses, implies a targeting mechanism and function that had not been previously anticipated. There are still many questions in this field, and most of these have not been addressed even by investigations in mammals. Studies using plants have some advantages with regard to genetic investigations and, in combination with the rapid development of genomic technologies, will undoubtedly help to elucidate the answers to many of these key questions. In some cases, the achievements of plant research will precede those from mammals.
This work was supported by the Japan Science and Technology Agency [PRESTO program (to H.S., T.K. and T.N.)]; the Ministry of Education, Culture, Sports, Science, and Technology of Japan (grant No. 22780007 to K.T.).
We are grateful to the editor and anonymous reviewers whose comments have greatly improved this paper.
DNA base excision repair
histone H3 lysine 36
histone H3 lysine 4
histone H3 lysine 9
long-interspersed nuclear element
long terminal repeat
plolycomb group protein
DNA-dependent RNA polymerase
RNA-directed DNA methylation
small interfering RNA
SET and RING finger-associated