On the accuracy of the epigenetic copy machine: comprehensive specificity analysis of the DNMT1 DNA methyltransferase

Abstract The specificity of DNMT1 for hemimethylated DNA is a central feature for the inheritance of DNA methylation. We investigated this property in competitive methylation kinetics using hemimethylated (HM), hemihydroxymethylated (OH) and unmethylated (UM) substrates with single CpG sites in a randomized sequence context. DNMT1 shows a strong flanking sequence dependent HM/UM specificity of 80-fold on average, which is slightly enhanced on long hemimethylated DNA substrates. To explain this strong effect of a single methyl group, we propose a novel model in which the presence of the 5mC methyl group changes the conformation of the DNMT1-DNA complex into an active conformation by steric repulsion. The HM/OH preference is flanking sequence dependent and on average only 13-fold, indicating that passive DNA demethylation by 5hmC generation is not efficient in many flanking contexts. The CXXC domain of DNMT1 has a moderate flanking sequence dependent contribution to HM/UM specificity during DNA association to DNMT1, but not if DNMT1 methylates long DNA molecules in processive methylation mode. Comparison of genomic methylation patterns from mouse ES cell lines with various deletions of DNMTs and TETs with our data revealed that the UM specificity profile is most related to cellular methylation patterns, indicating that de novo methylation activity of DNMT1 shapes the DNA methylome in these cells.


INTRODUCTION
In multicellular organisms, cell fate and cellular phenotypes are determined by epigenetic mechanisms that are heritable through cell divisions and function without changing the DN A sequence ( 1 ). DN A methylation is a key epigenetic process (2)(3)(4) which is conserved in most higher eukaryotes ( 5 ) and has essential roles in mammalian de v elopment and human disease ( 6 , 7 ). In mammals, DNA methylation mainly occurs at the C5 position of cytosine residues, primarily in CpG dinucleotides where most often both DNA strands are methylated in a symmetrical manner ( 2 , 8 ). Howe v er, only certain CpG sites are methylated, resulting in the generation of a tissue and cell type-specific methylation pattern which contains epigenetic information. Ther e ar e 56 million CpG sites in the diploid human genome about 60-80% of which ar e methylated, corr esponding to 4-6% of all cytosines. Methylation le v els and patterns vary with cell types and e v en between alleles, with the largest deviations seen in embryonic stem cells ( 6 , 9 ). 5-methylcytosine can be oxidized to 5-h ydroxymeth ylcytosine, 5-formylcytosine and 5-carboxycytosine by dioxygenases of the TET family ( 10 ). While 5-h ydroxymeth ylcytosine ma y also pla y signaling roles, 5-formylcytosine and 5-carboxycytosine are excised from DNA by Thymine-DNA glycosylase which initiates acti v e DNA demethylation ( 11 ).
Moreover, it is widely discussed that reduced activity of DNMT1 on substrates with hemih ydroxymeth ylated CpG sites (CG / 5hmCG, OH) could have an important role in passi v e DNA demethylation ( 10 , 21 , 22 ). Howe v er, published data of DNMT1 activity on OH substrates varied between < 5% ( 14 , 23 ), 5-10% ( 24 ) and 20-30% activity ( 18 , 25 ) when compared with HM substrates. Hence, the actual le v el of the HM / OH specificity, a critical parameter underlying this concept of passi v e DNA demethylation, is not settled.
This apparent lack of precise information about the relati v e acti vity of DNMT1 on UM and OH substrates results from different critical technical limitations which are related to four key aspects.
• First, the flanking sequence dependence of the specificity of DNMT1 has not been considered in most studies. The activity of DNMT1 has been shown to change strongly depending on the flanking sequence of the CpG site ( 26 ).
Howe v er, it has not been studied if and to which extent flanking sequences affect the HM / UM and HM / OH specificity of DNMT1, because all biochemical studies mentioned above were based on only few CpG sites. • Second, many of the studies listed abo ve pro vided only a very sparse resolution of the r eaction progr ess curve. This is important, because ratios of fast and slow methyla tion ra tes can only be determined accura tely, if the reaction progress is sufficiently sampled by data points. At early phases of the r eaction, r elati v e rates of disfavored substrates cannot be determined accurately, because of too little turnov er. Conv ersely, a t la te time points, favored substrates are almost completely turned over, again precluding the determination of accurate reaction rates. • Third, many studies were conducted using one substrate after the other. Howe v er, this is an unrealistic setting, as it does not allow the enzyme to choose between substrates, which may lead to an artificial boost of activity at unfavor ed substrates.  ( 27 ). This effect could be mediated by an auto-inhibitory loop located between the CXXC and the BAH1 domains that was shown to inhibit DNMT1 after DNA binding to the CXXC domain ( 27 ). In contrast, another study investigated DNA methylation by a DNMT1 mutant in which DNA binding of the CXXC domain was inactivated, but it did not detect changes of the HM / UM specificity ( 19 ).
Here, we investigated the specificity of DNMT1 using a recently de v eloped Deep Enzymolo gy a pproach ( 28 ). In these experiments, libraries of single CpG site substrates with different sequences and modification states are mixed, methylated by DNMT1, a hairpin linker is ligated and the DNA is bisulfite converted. This is followed by next generation sequencing to determine the sequence of individual product molecules together with their methylation state. Based on the sequence and methylation information of very many individual product molecules, the activity of DNMT1 on substrates with different flanking sequences and modifica tion sta tes can be determined. Experiments were conducted using unmethylated (UM), hemihydroxymethylated (OH) and hemimethylated (HM) substrates together in one r eaction mixtur e, thus allowing to determine methylation rates in competiti v e settings. This gi v es the enzyme the full choice of substrates and ensures that technical parameters like enzyme or AdoMet concentrations are identical for all substrates. In the next step of our work, we systematically studied the methylation of long DN A substrates w hich were either unmeth ylated, hemimeth ylated or contained a pattern of un-and hemimethylated CpG sites thus mimicking biological substrates in cells after DNA replication. Similar experiments were conducted with a DNMT1 mutant with inactivated CXXC domain ( 19 ) to uncover the potential influence of this domain in the specificity of DNMT1.
Our data demonstrate that DNMT1 has a strong but flanking sequence dependent HM / UM specificity of 80fold on average on single-site substrates which is slightly enhanced on long DNA substrates with methylation patterns. The HM / OH pr efer ence of DNMT1 is flanking sequence dependent as well but on average only 13-fold, indica ting tha t passi v e loss of DNA methylation by 5hmC generation and inhibition of DNMT1 is not an efficient process in many flanking contexts. Our data show that the DNMT1 CXXC domain has a moderate flanking sequence dependent contribution to the HM / UM specificity of DNMT1 during the binding process of DNA to DNMT1, but not if DNMT1 slides along the DNA molecules and methylates se v eral CpG sites in a processi v e methylation mode. Flanking sequence pr efer ences on the single-site substrates were similar for HM and OH substrates, while differences were observed for UM substrates. Comparison of genomic methyla tion pa tterns from mouse ES cell lines with various deletions of DNMTs and TETs with our data re v ealed that the UM specificity profile is most related to cellular methyla tion pa tterns, indica ting tha t de novo methyla tion activity of DNMT1 shapes the methylome in these cells.

Synthesis of the long DNA substrates and methylation reactions with them
The sequence of the 349 bp substrate with 44 CpG sites was taken from Adam et al. , 2020 ( 26 ). Three different substrates were used that contained internal 3-nucleotide sequence tags to distinguish them after mixing in the sequencing analysis (Supplemental Figure S1). The sequence differ ences wer e at least 4 base pair away from the nearest CpG site. In case of the unmethyla ted substra te, two additional CpG sites were generated that were not included in the analysis. Generation of the substrates and the methylation r eactions wer e conducted as described (Supplemental Figure S2) ( 26 ). In brief, for the generation of hemimethyla ted substra tes, the unmethyla ted DNA was methyla ted in vitro by M.SssI (purified as described ( 26 )) to introduce methylation at all CpG sites, or by M.HhaI (NEB) together with M.HpaII (NEB) to introduce methylation at GCGC and CCGG sites. For the synthesis of the two substra tes containing hemimethyla ted CpG sites, the upper strand of the methylated substrate was digested with lambda exonuclease, the single-stranded-DNA was purified and finally double-stranded hemimethylated DNA was generated by primer extension using Phusion ® HF DN A Pol ymerase (Thermo). Methylation r eactions wer e conducted using mixtures of UM, fully hemimethylated and patterned substr ate (total DNA concentr ation 200 ng in 20 l) in methyla tion buf fer (100 mM HEPES, 1 mM EDTA, 0.5 mM DTT, 0.1 mg ml −1 BSA, pH 7.2 adjusted with KOH) containing 1 mM AdoMet. DNMT1 concentrations and incubation times are indicated in the text. Methylation was followed by bisulfite conversion using the EZ DNA Methylation-Lightning ™ Kit (ZYMO RESEARCH) followed by library generation and Illumina paired-end sequencing (Novogene).

Flanking sequence pr efer ence analysis with r andomized single-site substrates
Methylation reactions of the randomized substrates with DNMT1 were performed similarly as described ( 26 , 29 ). Briefly, single-stranded oligonucleotides containing a meth ylated, h ydroxymeth ylated or unmeth ylated CpG site embedded in a 10 nucleotide random context on either side were obtained from IDT and used for generation of 67 bps long double-stranded DNA substrates by primer extension (Supplemental Figure S3). An additional barcode was included outside of the randomized part to distinguish the three substrates after mixing in the sequencing analysis (Supplemental Figure S4). Pools of these randomized substrates were then mixed in different combinations, and methylated by DNMT1 in methylation buffer (100 mM HEPES, 1 mM EDTA, 0.5 mM DTT, 0.1 mg ml −1 BSA, pH 7.2 adjusted with KOH) containing 1 mM AdoMet. DNMT1 concentrations and incubation times are indicated in the text. Methylation was followed by hairpin-ligation, then bisulfite conversion was performed using the EZ DNA Methylation-Lightning ™ Kit (ZYMO RESEARCH) followed by library generation and Illumina paired-end sequencing (Novogene).

Bioinformatics analysis
NGS data sets were bioinformaticall y anal yzed using a local instance of the Galaxy server ( 30 ) basically as described ( 26 , 31 , 32 ). In brief, for the long substrate, reads wer e trimmed, filter ed by quality, mapped against the reference sequence and demultiplexed using substrate type and e xperiment-specific barcodes. Afterwar ds, methylation information was assigned and retrie v ed by home-made scripts. For the r andomized substr ate, r eads wer e trimmed and filtered according to the expected DNA size. The original DNA sequence was then reconstituted based on the bisulfite converted upper and lower strands to investigate the average methylation state of both CpG sites and the NNCGNN flanks using home-made scripts. Methylation rates of 256 NNCGNN sequence contexts in the competiti v e methylation experiments with the mixed single-site substrates were determined by fitting to monoexponential reaction progress curves with variable time points with MatLab scripts as described ( 33 ). Pearson correlation factors were calculated with Excel using the correl function.

Radioactive DNA methylation kinetics
Experimental validation of the determined flanking sequence pr efer ences of DNMT1 was carried out using an avidin-biotin methylation plate assay as described ( 26 ) using oligonucleotides containing a single CpG site in UM, OH or HM state (Supplemental Table S1).

Analysis of genomic DNA methylation patterns
DNA methylation in wildtype murine ES cells, as well as multiple DNMT and TET KO cell lines was investigated using whole genome bisulfite data published by Li et al. ,2015 (GEO accession number GSE61457, data sets: GSM1505240-43) ( 34 ) which were processed as described ( 26 ). Data from Wang et al. , 2020 (GEO accession number GSE116482, data sets: GSM3239875, GSM3239876, GSM3239884, GSM4809269) ( 35 ) were filtered for coverage > 4 only using the upper DNA str and. Aver age methylation le v els in all NNCGNN flanks were determined with a home-written script. Correlations of genomic NNCGNN methyla tion pa tterns with DNMT1 pr efer ences wer e based on Pearson r -value determined with the Excel correl function. For statistical analysis of the significance of correla tions, genomic methyla tion profiles were randomized 20 times and the correlation analysis was repeated. In these randomizations, the measured genomic methylation le v els were randomly assigned to NNCGNN sequences. Based on the average r -value of the randomized data sets and its standar d de viation, P -values for the significance of the original corr elations wer e determined by Z -statistics.

Flanking sequence pr efer ences of DNMT1 for UM, OH and HM methylation
In previous studies, strong flanking sequence pr efer ences of DNMT1 on HM substrates were discovered ( 26 ), as well as flanking sequence pr efer ences of DNMT3A ( 28 , 29 ), DNMT3B ( 29 , 32 ) and TET enzymes ( 33 ). Howe v er, the effect of flanking sequences on the specificity of DNMT1 has not yet been in vestigated. Theref ore, we emplo yed oligon ucleotides containing one unmethylated (UM), hemihydroxymethylated (OH) or hemimethylated (HM) CpG site in a context of 10 randomized bases on either side as methylation substra tes. Methyla tion r eactions wer e conducted with HM / OH, HM / UM and OH / UM binary substrates mixtures as well as a HM / OH / UM ternary mixture. Using these 4 different settings, different DNMT1 concentrations and time points, in total 24 methylation data points comprising a pproximatel y 1766 million sequence r eads wer e generated (Supplemental Table S2, Supplemental Figures  S3 and S5). Control reactions without enzyme indicated v ery low le v els of false methylation detection. In addition, reactions with hemimethylated randomized substrates conducted before ( 26 ) were included in the analysis. To obtain a first ov ervie w of the trends in the da ta sets, all methyla ted and unmethylated reads of each substrate type (HM, OH, or UM) were pooled and the distribution of A, T, G and C at the -8 to + 8 flanking base pairs in methylated and unmethylated sequences determined and compared ( Figure  1 A, Supplemental Figure S6). As observed in our previous stud y on HM substra tes ( 26 ), the largest ef fects of the flanking sequences were detected between the -2 and + 2 sites. In general, similar observ ed / e xpected (o / e) patterns of bases were found with all three substrates (Table 1 ). On the 5 side of the CpG, T(-2) was pr eferr ed and C(-2) disfavored, C(-1) was pr eferr ed and G(-1) disfavor ed. On the 3 side, profiles differed in some details between HM / OH and UM. For OH and HM, T(+1) was pr eferr ed and A(+1) disfavored. In the case of UM, G(+1) was preferred and C(+1) disfavored. At the +2 site, HM / OH preferred A or T and disfavored G or C at this site, while UM showed a stronger pr efer ence for A(+2). The HM profiles are compatible with our previous findings and the similarity of all profiles indica tes tha t the influences of the flanking base sequence on base flipping and DNMT1 conformations are to a large extent independent of the readout of the mC / hmC / C in the complementary strand.

Analysis of the methylation rates of NNCGNN substrates
For a more detailed quantitati v e analysis and to determine real methyla tion ra tes, the average methyla tion le v els of NNCGNN sites were determined for all dif ferent substra te types in all e xperiments. Ne xt, for each mixture the different time points and enzyme concentrations were used for a fit of the methylation r eaction progr ess of each individual NNCGNN site to a characteristic first order rate constant (k NNCGNN ). Supplemental Figure S7A shows the pairwise correlation of the NNCGNN methylation profiles deri v ed in the individual experiments (each of them based on methylation kinetics over several time points). Comparison of HM profiles deri v ed from HM only, HM / UM, HM / OH, or HM / OH / UM re v eals for e xample correlation factors between 0.92 and 0.89, OH profiles from different experiments show correlations of 0.96 and 0.97, UM profiles 0.87 and 0.92. These high pairwise correlations of profiles obtained for the same substrate type (HM, OH or UM) indica te tha t the independent rate fittings were reproducible and the results ar e r eliable. Mor eover, individual HM and OH profiles were also highly correlated, while the correlation of HM and OH profiles to UM profiles was lower (Supplemental Figure S7A). Next, the different methylation kinetics were scaled using the average methyla tion ra tes of the HM, OH and UM substrates in the individual reactions (Supplemental Figure S7B) and using these scaling factors, the rate constants determined in the different experiments were averaged. After normalization of the pairwise data, the individual rates of all 256 NNCGNN sites determined in the different pairwise settings were also compared. As shown in Supplemental Figure S7C, the SEM values of these aver age r ates are small, around ±5% for the HM substrate and ±10% for OH and UM, indicating that they are welldefined. The slight increase in relati v e SEM is due to the fact tha t methyla tion ra tes dr op when going fr om HM to OH and UM. The combined methylation rates of all 256 NNCGNN sequences in HM, OH and UM context, as well as their corresponding SEM values are provided in Data Set 1. The methylation rates of the HM substrate varied over a range of about 10-fold, variances of the OH and UM methyla tion ra tes were larger about 25-fold (Figure 1 B). Exemplary methylation rates of favored and disfavored substra tes were valida ted by radioacti v e kinetics using oligonucleotide substrates with defined sequences (Supplemental Figure S8).
Comparison of the averaged HM, OH and UM methylation rates of corresponding NNCGNN substrates ( Figure  1 C) showed high correlation of HM and OH profiles, while correlation to UM was weaker (Figure 1 D) as expected from the primary data correlations (Supplemental Figure  S7A). We extracted from these quantitati v e data pr eferr ed and disfavored sequences in all sequence contexts ( Figure  2 A), re v ealing no big differences in the pr efer ences in the HM and OH conte xt, e xcept a slightly enhanced disfavor for A(+1) on the OH substrate. In contrast, the UM profile showed more pronounced differences from HM, viz. a disfavor for G(-2), pr efer ence for G(+1), disfavor for C(+1), and disfavor for C(+2) and G(+2). These new specificity profiles are in good agreement with the published observation that UM activity of DNMT1 is high on CCGG substrates ( 16 ). Next, w e w er e inter ested to compar e the pairwise pr eferences of DNMT1 for individual substrates and calculated the pairwise ratio of the 256 k NNCGNN for the HM, OH and UM substrates. As shown in Figure 2 B, on average HM substrates are methylated 87-times faster than UM substrates.
The HM / UM specificities range from > 300 fold for GGC-GAC, the substrate with the highest HM / UM ratio, to 29-f old f or ACCGGA, the substrate with the lowest HM / UM specificity. Howe v er, e v en at this target site DNMT1 showed a decent almost 30-fold pr efer ence for HM. This is also illustrated by ranges of k NNCGNN rates shown in Figure 1 B, where the methylation rates of the most disfavored HM substrates are well separated from the rates observed at the best UM substrates.
Comparison of the OH and HM methylation rates revealed smaller differences, because OH substrates, on average, are methylated only 14-fold weaker than HM (range 30-f old to 7-f old) (Figure 2 B). In this case it is noticeable that the methylation rates of most disfavored HM substrates are similar to the methylation rates of the most pr eferr ed OH substrates (Figure 1 B). Hence, the global conclusion that DNMT1 is inacti v e on OH substrates cannot be made. This is also illustrated by the fact tha t OH substra tes were

HM / UM specificity of DNMT1 in the methylation of long DNA molecules
Next, w e w er e inter ested to investigate the specificity of DNMT1 on a long DNA substrate with 44 CpG sites, which mimics natural DNA substrates appearing in cells after DNA replication (Supplemental Figure S2). The substrate was used in an unmethylated form, completely hemimethyla ted a t all CpG sites and with a pattern of 18 hemimethylated CpG sites (all sites in C C GG and G C GC context) and 26 unmethylated sites (all other sequence contexts). We then methylated a mixture of all three long DNA substrates with DNMT1 and followed the kinetics of methyla tion a t two enzyme concentrations over 4 time points by bisulfite conversion and NGS (Figure 3 A, Supplemental Table S3, Supple-mental Figure S9). The obtained data clearly showed a very ef ficient global methyla tion of the hemimethyla ted DNA, weak methylation of the unmethylated DNA and specific methylation of the patterned DNA at the hemimethylated CpG sites (Figure 3 B). We extracted the methylation levels of the HM, UM and patterned substrate, the latter separa ted for hemimethyla ted CpG sites and unmethyla ted CpG sites and fitted the individual data points to a reaction progress curve to determine the respecti v e av erage methyla tion ra tes. As shown in Figure 3 B , the methyla tion of hemimethylated sites on the HM and patterned long DNA substrate were almost identical. Similarly, the methylation ra tes of unmethyla ted CpG sites on the UM and pa tterned DNA were comparable. The overall ratio of the methylation of sites on the HM and UM substrates was about 180, and the HM / UM on the patterned substrate (145) was similar. Based on the flanking sequences of the methylation sites, a pr efer ence of 126 would have been expected for HM ver- sus UM and 102 for the methylation of the sites on the pa tterned substra te. This result shows that the specificity of DNMT1 is slightly enhanced on the longer DNA substrate with more target sites than on single-site substrates.
Next, we wanted to compare the relati v e specificities observed on the fully HM vs. fully UM substrates with the methylation of HM and UM sites on the patterned substra te. W hen considering the expected specificity ratio, the relati v e specificities observ ed in these two r eactions ar e identical (observed: HM / UM ratio = 180, patterned HM / UM ratio = 145, ratio 180 / 145 = 1.241, expected HM / UM specificities for both substrates 126 / 102 = 1.235). This finding indica tes tha t the rela ti v e HM / UM specificity of DNMT1 is influenced by neither the presence of additional CpG sites on the same DNA molecule nor their methylation state.

Role of the DNMT1 CXXC domain for HM / UM specificity
Next, we aimed to dissect the role of the CXXC domain in the HM / UM specificity of DNMT1. For this, we used a DNMT1 mutant containing 4 amino acid exchanges in the CXXC domain that completel y abro gate DN A binding of this domain ( 19 ). To test the effect of the CXXC domain in all possible flanking sequence contexts, unmethylated and hemimethyla ted substra tes containing a single CpG site in randomized sequence context wer e mix ed and methylated by the CXXC mutant (Supplemental Figure S10, Supplemental Table S4). The data were analyzed as described abov e regar ding the methylation le v els of all NNCGNN sites and compared with the HM and UM data of the WT DNMT1 experiments. The direct comparison of the mutant and WT methyla tion ra tes a t all 256 sites re v ealed a highly significant ( P -value 8.66 × 10 −67 , based on two-sided t-test of paired data) but mild about 2.5-fold reduced HM / UM specificity of the CXXC mutant (Figure 4 A). Howe v er, the effect was highly sequence dependent with most sites showing an about 2-fold effect, but at some sites no effect was observed while others showed up to 6-fold changes (Figure 4 B).
It is noticeable that the CpG substrate used in the paper which reported a strong effect of the CXXC domain in the HM / UM specificity of DNMT1 ( 27 ) contained a CpG site with flanks among the most strongly affected ones. In contrast, the flanks of the CpG site in the substrate used in the pa per w hich did not report strong effects of the CXXC domain ( 19 ) were among the moderately affected ones (Supplemental Figure S11). Hence, the dif ferent substra tes used in both studies can partially explain their diverging findings. Another difference between both studies was that a truncated DNMT1 was used in one of them and fulllength DNMT1 in the other, which may affect the domain movements.
Furthermore, we investigated the specificity of the CXXC mutant on the patterned long substrate. A methylation kinetic was determined (Supplemental Table S4, Supplemental Figure S12A) and the methyla tion ra tes of HM and UM sites were extracted and compared with the correspond-  Figure S9). The over all methylation r ates of HM and UM sites on the patterned substrate re v ealed a 132-fold ratio of HM / UM for the CXXC mutant (Figure 4 C), which is almost identical to the value observed with WT DNMT1. Hence, while the CXXC mutant DNMT1 showed a significant about 2.5-fold reduced HM / UM specificity on single-site substrates (Figure 4 A), no noticeable effect was detected on the long substrate with methylation pattern (ratio HM / UM 145 for WT and 132 for CXXC).

Comparison of DNMT1 specificity profiles with cellular DNA methylation levels
Finally, we aimed to investigate, if the newly determined DNMT1 methylation pr efer ences affect cellular DNA methyla tion pa tterns. For this, we re-analyzed published genome-wide DNA methyla tion da ta from various mouse ES cells lines with deletions of DNMTs and TETs ( Figure 5 A) ( 34 , 35 ). In our previous work, w e show ed that the NNCGNN methylation profiles of the WT and DKO cells are strongly correlated with DNMT1 HM flanking sequence pr efer ences ( 26 ). We now extended this analysis to the UM profile and included the following mouse ES Strikingly, in all data sets, the activity profiles of DNMT1 were strongly correlated with methylation patterns of cell lines containing DNMT1 (Figure 5 B and C). Statistical analysis based on 20 data sets with randomized genomic methylation profiles re v ealed P -values of the correlation of the genomic methylation profiles with the DNMT1 HM and UM profiles to occur by chance of < 5 × 10 −12 in each case. Interestingly, in all these cases the correlation was stronger with the UM profile (Figure 5 B). Statistical analysis using the same approach re v ealed P -values for the better correlation of cellular methylation patterns with the DNMT1 UM profile (when compared with HM) of 0.021, 4.4 × 10 −3 , 8.5 × 10 −3 and 7.7 × 10 −3 for WT, DKO, 5KO and 6KO cells, respecti v ely. The high significance of this effect is also

DISCUSSION
The specificity of DNMT1 for hemimethylated DNA is a central feature of the inheritance of DNA methylation and its function as a heritable epigenetic signal. We investigated the flanking sequence dependence of this specificity in a comprehensi v e manner using Deep Enzymology e xperiments with HM, OH and UM substrates investigating CpG sites in all possible NNCGNN flanks. Methylation kinetics of dif ferent substra tes were conducted in a competiti v e setting and they provided sufficient kinetic resolution. Experiments were conducted in independent repeats using dif ferent incuba tion times, enzyme concentra tions and also mixtures to extract a set of 768 rate constants describing the activity of DNMT1 on all 256 NNCGNN sites in an UM, OH and HM context. Flanking sequence pr efer ences of DNMT1 on HM and UM substrates are related, but distinct from pr efer ences of other DNMT and TET enzymes (Supplemental Figure S13 based on additional data taken from ( 29 , 31-33 )). Our data show that the HM / UM specificity of DNMT1 is around 100-fold, but it varies about 10-fold with the flanking sequence. Our data are in general agr eement with pr evious findings (12)(13)(14)(15)(16)(17)(18)(19)(20), but they provide a global picture including the differences in HM / UM specificity between different CpG sites.
This specificity is comparable on single CpG site substrates and long DNA molecules containing patterns of methylated and unmethylated CpG sites. The relati v e HM / UM specificity of DNMT1 was not influenced by the presence of additional CpG sites on the same DNA molecule or their methylation state. Howe v er, it was slightly enhanced on the longer DNA substrate with more target sites than on single-site substra tes. This observa tion can be explained, because in the methylation of single-site substrates product dissociation in multiple turnovers affects the overall rates, but this step does not contribute to the HM specificity. In contrast, on long multi-site substrates DNMT1 can slide along the DNA without dissociation after one methylation e v ent, hence the overall specificity can be higher.
A 100-fold pr efer ence for HM DNA corresponds to a lowering of the transition state energy of the methylation reaction ( G # ) by about 11 kJ / mol. This is a remar kab le property gi v en that the pr eferr ed target site is defined by the mere presence of a single methyl group. How can this strong discrimination be explained? The solvent accessible surface of a pyrimidine C5-methyl group in DNA is 30.3 Å 2 ( 36 ) and the energy associated with the burial of hydrophobic surface area is around 60.8 J / mol and Å 2 ( 37 ). Hence, the interaction of the C5-methyl group with the hydrophobic pocket in the acti v e sites of DNMT1 is expected to contribute about 1.9 kJ / mol to G # , much less to explain the 100-fold difference in reaction rates. The only mechanism strong enough to translate the presence of a single methyl group into rate enhancements of 100-fold is steric repulsion. Hence, we propose a model in which DNMT1 binds to UM substrates in an inacti v e conformation. The presence of the 5mC methyl group creates a steric overlap and pushes the conformation of the complex into an acti v e state from which base flipping and catalysis occurs. Future studies will show, if this hypothetical inacti v e binding mode of UM substrates can be structurally identified and characterized.
An HM / UM specificity of 100-fold means that on average e v ery hundredth unmethylated CpG site the enzyme encounters will aberrantly be methylated. We show that this pr efer ence is sufficient to cop y an existing methylation pattern on a long DNA molecule with fairly high accuracy in vitro . In vivo , DNMT1 may transfer only very few methyl groups into a typical unmethylated CGI, which afterwards can be easily removed by TET enzymes found at CGIs ( 22 ) to keep them unmethylated. Howe v er, the 100fold HM / UM pr efer ence is not sufficient to cop y the exact methylation state of all 56 million CpG sites in the diploid human genome. Hence, the maintenance of the DNA methyla tion pa ttern also relies on crosstalk with other chroma tin modifica tions as shown previously ( 2 , 38-40 ).
The role of the DNMT1 CXXC domain in the specificity of DNMT1 was unclear, due to conflicting experimental data ( 19 , 27 ). We show here that the CXXC domain has a moderate flanking sequence dependent contribution to HM / UM specificity on single CpG site substrates which is in the range of about 2.5-fold on average. Flanking sequence effects could partially resolv e di v erging litera ture da ta. Howe v er, the effect of the CXXC domain on the HM / UM specificity was only detectable with the single CpG substrates, but not on long DNA substrate with multi-ple CpG sites. This finding suggests a model in which DNA entering the DNA binding tunnel of DNMT1 encounters the CXXC domain, allowing CXXC to pre v ent the binding of DNA containing UM CpG sites. After the initial binding and the conformational changes towards a closed conformation ( 26 ), DNMT1 can move along the DNA and methylate se v eral sites in a processi v e manner ( 16 , 17 , 26 ). During this reaction, the DN A presumabl y stays in the central binding tunnel and it does not come in contact with the CXXC domain again, explaining the lack of influence of the CXXC domain on the specificity of DNMT1 on long m ulti-site DN A substrates.
Inhibition of DNMT1 on DNA substrates containing hemih ydroxymeth ylated CpG sites has been discussed as a potential mechanism for passi v e DNA demethylation. To uncover the fundamental parameters determining this process, we also determined the HM / OH pr efer ence of DNMT1. We observed that it is flanking sequence dependent and on average only 13-fold. In fact, the most pr eferr ed OH sites are methylated at similar rates as the least pr eferr ed HM sites. This suggests that the efficiency of passi v e DNA demethylation of hy droxymethylated DNA b y inhibition of DNMT1 depends on the sequence context, and in many sequence contexts it is expected not to be very efficient. This finding suggests that mCpG / hmCpG dyads should exist in genomic DNA. Regarding the pathways of DNA demethylation, based on our data, TET mediated oxidation of 5methylcytosine to higher oxidation forms followed by their removal catalyzed by TDG ( 11 ) is likely to play a more important role than passi v e loss of DNA methylation by inhibition of DNMT1 on hemih ydroxymeth yla ted CpG d yads in many sequence contexts.
Finally, we observed that flanking sequence pr efer ences of DNMT1 on unmethyla ted substra tes partially differ from pr efer ences on HM and OH substrates. Comparison of genomic methylation patterns from mouse ES cell lines with various deletions of DNMTs and TETs with our data re v ealed that the UM specificity profile of DNMT1 is most related to cellular methylation patterns of cells only containing DNMT1 as acti v e DNMT. We observed in three unrelated cell lines in which DNMT3A and DNMT3B are deleted but DNMT1 is still present that their residual DNA methylation is more correlated with the DNMT1 UM than HM profile, indica ting tha t de novo activity of DNMT1 is needed to compensate the loss of methylation caused by the KO of the DNMT3 enzymes. Similar effects, albeit to a weaker extent, were also observed in WT ES cells. These observa tions suggest tha t the de novo methyla tion activity of DNMT1 on unmethylated DNA is the limiting reaction for DNA methylation in DNMT1-only cells, but it also affects cellular methyla tion pa tterns in WT cells, in agreement with recent papers providing evidence for this type of activity in mouse ES cells ( 35 , 41 ). Based on our data, this effect r epr esents a fundamental principle affecting cellular methylation patterns, but the detailed consequences of this r equir e further studies.
Our findings illustrate how detailed flanking sequence pr efer ence analysis can re v eal footprints of DNMT acti vity pro viding no vel information about the function of DN-MTs in living cells. In general, the rate of DNA methylation loss at particular CpG sites in a defined genomic locus during cell division is expected to depend on the sequence pr efer ences of the DNMTs and TETs and their local activities at the genomic locus which are determined by their global e xpression le v els, regulation, and the local targeting efficiency of each enzyme. Further studies will be needed to determine the local activities of all DNMTs and TETs at gi v en genomic loci w hich finall y should allow to model DNA methylation dynamics with CpG site resolution.
Sequences extracted from the NGS kinetic raw data generated in this study are available at DaRUS under https://doi.org/10.18419/darus-3334 . Data Set 1 compiling the methylation rates of all 256 NNCGNN sequences in HM, OH and UM context, as well as their corresponding SEM values is available as an attachment to this paper and at DaRUS under https://doi.org/10.18419/darus-3334 . The biochemical data underlying this article are available in the article and in its online supplementary material. All other data are available from the corresponding author upon r easonable r equest.

SUPPLEMENT ARY DA T A
Supplementary Data are available at NAR Online.