Chemoenzymatic labeling of DNA methylation patterns for single-molecule epigenetic mapping

Abstract DNA methylation, specifically, methylation of cytosine (C) nucleotides at the 5-carbon position (5-mC), is the most studied and significant epigenetic modification. Here we developed a chemoenzymatic procedure to fluorescently label non-methylated cytosines in CpG context, allowing epigenetic profiling of single DNA molecules spanning hundreds of thousands of base pairs. We used a CpG methyltransferase with a synthetic S-adenosyl-l-methionine cofactor analog to transfer an azide to cytosines instead of the natural methyl group. A fluorophore was then clicked onto the DNA, reporting on the amount and position of non-methylated CpGs. We found that labeling efficiency was increased up to 2-fold by the addition of a nucleosidase, presumably by degrading the inactive by-product of the cofactor after labeling, preventing its inhibitory effect. We used the method to determine the decline in global DNA methylation in a chronic lymphocytic leukemia patient and then performed whole-genome methylation mapping of the model plant Arabidopsis thaliana. Our genome maps show high concordance with published bisulfite sequencing methylation maps. Although mapping resolution is limited by optical detection to 500–1000 bp, the labeled DNA molecules produced by this approach are hundreds of thousands of base pairs long, allowing access to long repetitive and structurally variable genomic regions.


INTRODUCTION
DNA methylation is an epigenetic mark that plays a major regulatory role in transcription, gene regulation, and disease. 5-methylcytosine (5-mC) is conserved among plants and mammals, and its precise genomic patterns are crucial for development. In mammals, 5-mC occurs most often in the context of CpG dinucleotides and, in the human genome, most CpGs are highly methylated (between 70 and 80%). However, in specific locations, mainly high-density CpG islands (CGI), they remain mostly nonmethylated. The majority of CGIs (∼70%) are located at gene promoters and the methylation status of these regions is known to regulate gene expression (1)(2)(3). In cancer, largescale changes in methylation levels are observed and can include both genome-wide hypomethylation as well as more localized hypermethylation, mainly of tumor suppressor genes that are often silenced in cancer (4)(5)(6)(7)(8).
Methylation is established by a diverse family of methyltransferases (MTases). These enzymes utilize S-adenosyl-Lmethionine (AdoMet) as the methyl donor, forming methylated DNA and S-adenosyl-L-homocysteine (AdoHcy) as a by-product (9,10). Various approaches have been developed to specifically profile cytosine DNA methylation, both on a global level and at base-pair resolution. The gold standard technique is bisulfite sequencing (called BSseq (11)) or methylC-seq (12) depending on the methods used), which relies on chemical conversion of unmodified cytosines to uracil, while leaving methylated cytosines unconverted (11,12). After PCR amplification and sequencing, the originally methylated cytosines are read as C while the unmodified cytosines are read as T. However, this method can suffer from several limitations, such as high DNA degradation and potential biases due to the amplification process. Short read sequencing also adds several drawbacks and mainly the limitation in characterizing large variable and repetitive regions, as well as population averaging of the data which masks cell-to-cell variation (13). Singlecell bisulfite sequencing may profile individual methylomes (14,15) but is still limited in characterizing large variable and repetitive regions.
Whole-genome epigenetic profiling by optical genome mapping (Bionano Genomics Inc.) is a new addition to the epigenetic mapping toolbox. This technology is based on stretching long genomic DNA fragments for optical imaging. The DNA is labeled with two colors, one is used for aligning the molecules to the reference genome, and the other reports on the epigenetic content of the molecules (22)(23)(24). This technique outputs extremely long, singlemolecule data (N50-200 kb) that allows profiling of large variable and repetitive regions (25).
Our group developed Reduced representation Optical Methylation mapping (ROM), as a method for the labeling and detection of non-methylated CpG sites using the bacterial MTase, M.TaqI, and a cofactor analogue (23). The non-methylation labeling scheme relies on the transfer of a fluorescent molecule from a synthetically modified form of the native methylation cofactor AdoMet. The enzyme is 'tricked' into transferring the extended chemical group with a fluorophore instead of the natural methyl group, resulting in fluorescence at non-methylated CpG sites. The long-read methylation data allowed us to assemble high coverage human methylomes and analyze the methylation status of promoters and their distal enhancers, simultaneously imaged on the same long DNA molecules (26). However, M.TaqI (recognition site: TCGA) only samples about 6% of human CpG sites and thus can fail to capture vital epigenetic information.
Here, we present a chemoenzymatic labeling approach for the detection of all non-methylated CpGs as a means for global methylation quantification and for genomic methylation mapping. This is achieved by utilizing the CpGspecific MTase M.SssI from the bacteria Spiroplasma sp. strain MQ-1 that naturally transfers a methyl group to the fifth position of cytosines in CpG dinucleotides (27,28). While M.SssI has the potential to label all non-methylated CpGs, it is not able to utilize modified AdoMet cofactors. However, an engineered variant of M.SssI (Q142A/N370A) is able to transfer an azide group to non-methylated CpG sites by processing an azide-modified cofactor AdoY-nAzide (Ado-6-azide). This engineered M.SssI (eM.SssI) was used by Kriukiene et al. for non-methylation sequencing (29). However, the affinity capture on streptavidinmodified beads showed a 20-30% capture efficiency, indicating poor labeling yield which may prove problematic for single-molecule optical mapping. To increase labeling efficiency, we employed the enzyme 5-methylthioadenosine/S-adenosylhomocysteine nucleosidase (MTAN, EC 3.2.2.9) that catalyzes the hydrolysis of the glycosidic bond in AdoHcy. Thus, MTAN degrades the inactive cofactor by-product, AdoHcy, formed from the natural cofactor AdoMet or synthetic cofactor analogues, like AdoYnAzide ( Figure 1A). In general, AdoHcy is a well-known product inhibitor for MTases (30). Lowering its concentration by the addition of MTAN effectively drives the reaction towards increased labeling efficiency ( Figure 1A). The azide-modified DNA can then be covalently labeled by a dibenzocyclooctyne-cy5 (DBCO-cy5) fluorophore in a strain-promoted azide-alkyne cycloaddition (SPAAC, copper-free click chemistry reaction), to specifically label non-methylated CpG sites. Thus, the increased efficiency of the eM.SssI/MTAN combination was used to generate optical maps that present the genome-wide epigenetic status of CpG sites with single-molecule sensitivity.
Here, we apply this labeling strategy to human peripheral blood mononuclear cells (PBMCs), emphasizing the ability to quantify the methylation status of CpG sites in healthy vs. cancer patients with high sensitivity. We further demonstrate whole-genome methylation mapping of the model plant A. thaliana. Our approach allows simultaneous genetic and epigenetic characterization of long stretches of DNA at the single-molecule level, and potentially permits studying variations between single cells (26).

Human subjects
The healthy donor sample used in this study was collected with informed consent for research use and approved by the Tel-Aviv University and Meir medical center ethical Review Boards, in accordance with the declaration of Helsinki. The PBMCs from a CLL donor were purchased from BioServe.

High-molecular-weight DNA extraction
Plants. High molecular weight (HMW) DNA was extracted for optical mapping from 1 g of 6-8 weeks old whole plant tissue, before flowering using the BioNano Genomics Fix'n'Chop protocol with some modifications. Briefly, no formaldehyde was used, chopping was done only with a lab blender and no razor blade was used, 7.5% of 2-mercaptoethanol was used. Following the final wash, the nuclei pellet was resuspended in cell suspension buffer (CHEF mammalian DNA extraction kit, Bio-Rad) and incubated at 43 • C for 10 min. 2% low melting agarose (Clean-Cut agarose, Bio-Rad) was melted at 70 • C followed by incubation at 43 • C for 10 min. Melted agarose was added to the resuspended cells at a final concentration of 0.75% and mixed gently. The mixture was immediately cast to a plug mold and plugs were incubated at 4 • C until solidified.
Human subjects. For human peripheral blood mononuclear cells (PBMCs), peripheral blood of a healthy donor was isolated by density gradient centrifugation using Ficoll Paque Plus (GE Healthcare) according to manufacturer's instructions. Plugs were prepared according to Plug Lysis protocol (BioNano Genomics). CLL sample was purchased as separated PBMCs from a CLL patient and was thawed at 37 • C water bath immediately prior to use. For both healthy and CLL samples, 1 × 10 6 PBMCs were washed twice with PBS and mixed with 2% low melting agarose as described above.
All plugs, from human or plant origin, were incubated twice (2 h incubation followed by an overnight incubation) at 50 • C with freshly prepared 167 l Proteinase K (Qiagen) in 2.5 ml lysis buffer (BioNano Genomics) with occasional shaking. Next, plugs were incubated with 50 l RNase (Qiagen) in 2.5 ml TE (pH 8) for 1 hour at 37 • C with occasional shaking. Plugs were washed three times by adding 10 ml wash buffer (10 mM Tris, pH 8, 50 mM EDTA), manually shaking for 10 seconds, and discarding the wash buffer before adding the next wash. Plugs were then washed four times by adding 10 ml wash buffer and shaking for 15 min on a horizontal platform mixer at 180 rpm at room temperature. Following washes, plugs were stored at 4 • C in wash buffer or used for labeling. In order to extract high molecular weight DNA, plugs washed three times in TE pH 8 with shaking on a horizontal platform as explained, were melted for 2 min at 70 • C, followed by 5 min incubation at 43 • C. Next, 0.4 units of Gelase (Thermo Fisher) were added and the mixture was incubated for 45 min. Viscous DNA was gently pipetted and incubated at room temperature overnight in order to achieve homogeneity. DNA concentration was determined using Qubit BR dsDNS assay.

Genetic barcoding and non-methylated CpG labeling
Genetic barcodes were produced by nick translation, a modified version of Irys prep NLRS protocol (Bionano Genomics). Briefly, 300 ng DNA were nicked with 10 units of Nt.BspQI (NEB) at specific sequence motifs (GCTCTTCN ∧ ) in the presence of 1 l 10× buffer 3.1 (NEB) for 2 h at 50 • C and ultra-pure water to a volume of 10 l. For DNA labeling, 15 units of Taq DNA polymerase were incubated with 200 nM of dATP dGTP dCTP (Sigma) and atto-532-dUTP (Jena Bioscience) in the presence of 1.5 l of 10× thermopol buffer and ultra-pure water to a volume of 15 l, the reaction was incubated at 72 • C for 1 h. Nicks were then ligated for 30 min at 37 • C using 4 units of Taq DNA ligase (NEB) in the presence of 0.5 l thermopol buffer, 20 M dNTPs (Sigma) 1 mM NAD+ (NEB) and water to a volume of 20 l. For nonmethylated CpG labeling (mTAG labeling) 300 ng of nicklabeled DNA were incubated overnight at 37 • C with 3.2 M of the CpG-specific DNA cytosine-C5 MTase M.SssI double mutant Q142A/N370A (eM.SssI) (29) in the presence of 80 M cofactor analogue AdoYnAzide (31) and 3 l 10X M.SssI reaction buffer (Thermo Fisher/10 mM Tris-HCl, 50 mM NaCl, 1 mM DTT, pH 7.9), in a total reaction volume of 30 l. The reaction was supplemented with 2 M 5 -methylthioadenosine/S-adenosylhomocysteine nucleosidase (MTAN) and after 2 h of incubation, a spikein of 0.8 M eM.SssI and 2 M MTAN (32, 33) was added to the reaction to further improve labeling efficiency. Next, 60 g of Proteinase K (PK) were added and incubated for 2 h at 45 • C. Following PK digestion, 700 M of dibenzocyclooctyne-cy5 (DBCO-cy5) were added and incubated overnight at 37 • C. The Dual-labeled DNA was then embedded in low melting agarose gel plugs and washed in 10 ml wash buffer (20 mM Tris-HCl, 50 mM EDTA, pH 8) for 15 min 5 times and twice with TE pH 8. Plugs were melted at 70 • C for 2 min and incubated at 43 • C for 5 min. 2 l of betaagarase (Thermo Fisher Scientific) were added for agarose digestion and the reaction was incubated at 43 • C for 45 min. Dual-labeled DNA was stained with DNA stain (BioNano Genomics) according to the Irys Prep NLRS protocol with the addition of 25 mM Tris pH 8 and 30 mM NaCl. DNA concentration was measured by Qubit HS dsDNA assay.
To test non-methylation labeling efficiency, DNA was labelled with eM.SssI as described above, with or without MTAN. All DNA samples were purified from excess fluorophores, then applied on a custom epoxy-covered multiwell slide and imaged using a commercial slide scanner, as described in the work of Margalit et al. (34). and incubation was continued at 37 • C for 1 h. The reaction mixtures were treated with proteinase K (1 l, 20 g/l), incubated at 53 • C for 30 min and 6× loading buffer (10 l, 0.25% bromphenol blue, 30% glycerol) was added. Samples (6 l) were analyzed by agarose gel (1%) electrophoresis in the presence of GelRed (0.1 l stock solution/ml gel). DNA bands were visualized with a UV transilluminator (312 nm) and documented with a CCD camera equipped with a filter (540 ± 50 nm). . The reaction mixtures were incubated at 37 • C for 1 h. Proteinase K (6 l, 20 g/l) was added to both reaction mixtures and incubation continued at 45 • C for 2 h, followed by heat inactivation at 80 • C for 3 h.
All reaction mixtures were purified using NAP-5 columns following the instructions of the manufacturer. Equilibration (10 ml) and elution (500 l) was done with nuclease P1 buffer (0.2 mM ZnSO 4 , 20 mM NaOAc, pH 5.3). Purified duplex ODN were supplemented with nuclease P1 (3 l, 10 U/l) and incubated at 37 • C for 2 h. CIAP (1.5 l, 0.6 U/l), CIAP buffer (60 l, 500 mM Tris-HCl, 10 mM MgCl 2 , pH 9.0) and deionized water were added to a total volume of 600 l and samples incubated at 37 • C overnight. Nucleosides were separated by reverse-phase HPLC (Prontosil C-18 AQ column, 250 × 4. Peaks of the nucleosides at 260 nm were integrated using Empower 2 software and areas were normalized by dividing them by the respective extinction coefficients at 260 nm (dC: 7009 mol l -1 cm -1 ; dG: 11715 mol l -1 cm -1 ; dT: 8902 mol l -1 cm -1 ; dA: 15663 mol l -1 cm -1 ; 5-azido-dC = 5mdC: 5435 mol l -1 cm -1 ). Normalized areas of dT were divided by the number of dT residues in the duplex ODN (20 residues) to obtain a normalized area for one nucleoside. Normalized areas for the other nucleosides were divided by this value to give experimental amounts for all nucleosides (see Supplementary Table S1). Yields for 5-azido-dC were calculated by dividing the amount of 5-azido-dC by the sum of dC and 5azido-dC multiplied by 2 (four dC/two target dC). Yields for 5-Cy5-dC were calculated from the decrease of 5-azido-dC.
Glass slide preparation and quality-control imaging 22 × 22 mm 2 glass cover-slips were cleaned for at least 7 hours to overnight by incubation in a freshly made 2:1 (v/v) mixture of 70% nitric acid and 37% hydrochloric acid. After extensive washing with ultrapure water (18 M ) and then with ethanol, coverslips were dried under a stream of nitrogen. Dry cover-slips were immersed in a premixed solution containing 750 l N-trimethoxysilylpropyl-N,N,N-trimethylammonium chloride and 200 l of vinyltrimethoxysilane in 300 ml ultrapure water and incubated overnight at 65 • C. After incubation, cover-slips were thoroughly washed with ultrapure water and ethanol and stored at 4 • C in ethanol. The silane solution was freshly made and thoroughly mixed before the cover-slips were introduced into the mixture. Stored coverslips were normally used within 2 weeks.
For quality control, samples were imaged to evaluate DNA length and degree of labeling. Samples were diluted 1:100-1:300 in TE with DTT buffer (10 mM Tris pH 8, 1mM EDTA, 200 mM DTT, Sigma) and were stained with 130 nM YOYO-1 (Invitrogen) DNA intercalating dye. DNA molecules were stretched by placing 8 l of the solution at the interface of an activated coverslip placed on a standard microscope slide. The extended DNA molecules were imaged with a fluorescence microscope (TILL Photonics GmbH) using an Olympus UPlanApo 100× 1.3 NA oil immersion objective. Each image was composed of three colors, the YOYO-1, atto-532 and the Cy5 fluorophores, and was therefore imaged with the appropriate filters (485/20, 537/26 and 650/13 bandpass excitation filters, 525/30, 578/16 and 684/24 bandpass emission filters, for YOYO-1 and Cy5, respectively). Images were acquired by a DU888 EMCCD (Andor technologies) with an EM gain setting of 300 and exposure times of 100 for YOYO-1 and 2000 ms for atto-532 and Cy5.

Global methylation quantification
Differences in global fluorescence intensity following labeling with and without MTAN, and between genomic samples were analyzed by an in house software (35,36) (https:// github.com/ebensteinLab/Tiff16 Analyzer). The code measures the total length of each molecule as well as the number and intensity of CpG labels detected along the DNA molecules. Quantification of non-methylated CpG was done by dividing the total intensity of the CpG label by the total length of the DNA (calculated as intensity of methylation signal/total DNA length in kbp). Similar analysis was performed on images of DNA stretched in nanochannels from CLL patient and healthy donor. In total we sampled 5.585 Gb from CLL patient and 330 Mb from the healthy donor and a total of 9840 field of view (FOV) from each donor. The two samples have a ∼17-fold difference in the amount of sampled DNA and therefore we have confirmed that this fact does not create a bias in the interpretation of the results. A running average was calculated for each sample by random selection of molecules from the dataset (Supplementary Figure S7). We confirm that the average methylation level remains stable even when sampling only 100 Mbp of genomic DNA, and therefore both samples were sufficiently oversampled to provide reliable global methylation levels.

Optical mapping and analysis
Loading of DNA in nanochannels and imaging were performed using an Irys instrument (BioNano Genomics). Detection of imaged molecules and fluorescent labels along each molecule was performed by AutoDetect (version 2.1.4, BioNano Genomics). Alignment to the reference genome was performed using IrysView software (version 2.3, BioNano Genomics). Arabidopsis accession col-0 was run on a single chip on the Irys platform (BioNano Genomics), for up to 120 cycles to collect up to 89 Gb of quality filtered data. Molecule files (.bnx) were loaded into IrysView, quality filtered (>100 kb length, >2.75 signal/noise ratio) and aligned to the reference TAIR10 (https://www.arabidopsis. org/download/index-auto.jsp?dir=%2Fdownload files% 2FGenes%2FTAIR10 genome release). Reference genome was converted into .cmap format and used to align the detected molecules using standardized parameters (-nosplit -maxEnd 50 -stdout -stderr -usecolor 1). Alignments were analyzed for genome coverage, and gaps using the .xmap output file.
Following alignment, data was uploaded to IrysExtract (37), (https://www.nanobiophotonix.sites.tau.ac.il) to generate an average intensity profile of fluorescence intensity across the genome. We used all molecules aligned with a confidence equal or higher than 8 (P ≤ 10 -8 ) and that had at least 70% of each molecule aligned to the reference. All regions in the genome with less than 15× coverage (3% of the genome), were excluded from the data. The analysis pipeline results in a genome-wide intensity bedgraph file at 500 bp resolution. The file allows further analysis and visualization in a genome browser as seen in Figures 4 and 5.

Comparison of optical methylation mapping to bisulfite sequencing
Genome-wide bisulfite sequencing data (GSE43857) (38) was processed to generate a bedgraph track of nonmethylated CpGs represented as a "methylation score." To calculate the methylation score, the non-methylated CpGs were padded ±500 bp and summed over 1 kb bins to generate the bedgraph file. The score in each bin was then divided by the maximum score, setting the data between 0-1. The genome was then divided to 10 kb windows and the mean score from each method (methylation score for the bisulfite data and relative intensity for the optical mapping data) was calculated for each window, K-S test was performed to test normality and since the data do not normally distribute, Spearman correlation analysis was performed using SPSS (IBM Corp. Released 2016. IBM SPSS Statistics for Windows, Version 24.0), to determine the linear dependence between bisulfite sequencing and methylation mapping.

Analysis of CpG methylation distribution across gene body and siRNA
pol-iv-dependent 24 nt short interfering RNAs (siRNA) was retrieved from the previously published data (39). miRNA primary transcript loci were extracted from miR-Base ath.gff3 file (http://www.mirbase.org/ftp.shtml). Arabidopsis TAIR10 TAS loci were obtained from the ta-siRNA database (40). The bed files of the 24nt siRNA loci, miRNA precursor loci, and TAS loci were used to measure the non-methylated CpGs from optical mapping and genome-wide bisulfite sequencing as detailed below (Supplementary File 5).
First, a file containing all CpGs was created from TAIR10 FASTA by knickers (version 1.5.5 Bionano Genomics). Then, a non-methylated CpG track from genomewide bisulfite sequencing data of A. thaliana was created using BEDTools (version 2.25.0) subtract (41 in the non-methylated CpG track was expanded by 500 bp on both sides to adjust to the optical mapping resolution. Genome coverage of non-methylated CpG was generated by BEDTools genomecov to receive a non-methylated CpG bedgraph. Finally, this file was uploaded to deepTools (42) in addition to the average label intensity from optical mapping and the clusters bed file. Using computeMatrix in reference point mode, we computed the mean value for each bin from all regions 10 kb upstream and downstream from the clusters. All clusters were set to position 0, and median coverage for the neighboring regions in 1 kb bins was plotted.
For gene body analysis, we compared the average label intensity of non-methylated CpGs from optical mapping and genome-wide bisulfite sequencing data with a bed file containing all A. thaliana genes (TAIR10) (supplementary file 6). Using computeMatrix in scale-region mode, all genes were scaled to the same length in 100 bp bins. We added 1500 bp upstream and downstream to the TSS and TES (respectively) and computed the mean value for each bin.

Efficient labeling of non-methylated CpGs
CpG methylation is a binary state and on the singlemolecule level a site can be either methylated or non methylated. Neverthless, methylation level is reported on a scale of 0-1 to account for the average methylation of a CpG locus in a population of cells. In order to accurately report on such average population levels, the methylation status of each CpG must be sampled over multiple DNA molecules containing this locus. Single-molecule epigenomic mapping requires highly efficient labeling in order to extract reliable information from the studied sample. We hypothesized that during DNA alkylation by eM.SssI, the cofactor byproduct AdoHcy could reduce the overall DNA alkylation efficiency by occupying the cofactor binding pocket and preventing additional alkylation cycles ( Figure 1A). Thus, we sought to enzymatically degrade this by-product by the addition of recombinant MTAN (43). To test the effect of MTAN on the overall labeling efficiency, we labeled non-methylated DNA using eM.SssI, AdoYnAzide, and DBCO-cy5, with or without MTAN ( Figure 1B and Supplementary Figure S1). Figure 1C shows the relative fluorescence intensity of DNA labeled with or without MTAN. Average labeling intensity increased two-fold with the addition of MTAN to the reaction ( Figure 1C, Supplementary Figure S2). The relative labeling efficiency was further tested and quantified with and without MTAN by a multi-well slide fluorescence assay (Supplementary Figure  S3) as well as a modification-restriction assay (Supplementary Figure S4) which showed a similar improvement in labeling in the presence of MTAN. A quantitative HPLC efficiency assay could only be performed on a short duplex DNA. In our case, a 24 bp ODN containing one CpG site on each strand (Supplementary Figure S5, Supplementary  Table S1). The alkylation efficiency of the short duplex was measured to be 67% and increased to 80% with addition of MTAN. The high initial alkylation efficiency could result from the higher modification yields with short duplex DNA compared to native genomic DNA leading to a smaller increase with MTAN. We note that the short ODN contains much higher density of labeling sites (1 to 24 bp in the ODN vs. 1 to ∼150 bp for genomic DNA).
Since the optical resolution (∼500-1000 bp) is lower than the density of CpG sites in CpG-rich regions in the genome, we verified that the fluorescence intensity increases linearily with the increase in labeled CpG sites ( Supplementary Figure S6). Thus, although a single fluorescent spot may contain several CpG sites that cannot be resolved, the intensity of the spot reports on the density of labeled CpG sites.

Global methylation quantification
One of the hallmarks of cancer is a global reduction in genomic methylation levels (44). As a first step in evaluating our optical mapping technique for epigenetic profiling, we quantified the genome-wide methylation level of PBMCs from a single patient diagnosed with chronic lymphocytic leukemia (CLL) and a healthy donor. Both healthy and patient DNA was stretched in nanochannel array chips, imaged, and the fluorescence from the non-methylated CpG labels was detected (Figure 2A). Fluorescence intensity along the labeled DNA molecules was automatically measured to determine the relative difference in labeling (36). The relative global methylation level was measured by summing the overall red intensity along the DNA molecules (i.e. non-methylated cytosines in CpG context), normalized to the total DNA length measured for each sample ( Figure  2B). When comparing genomic DNA from a patient and a healthy donor, the cancer cells show a global reduction in methylation levels indicated by a higher fluorescence signal along the DNA ( Figure 2B). Specifically, the signal intensity from non-methylated CpGs increased by 1.7-fold in the CLL sample compared to the healthy control, indicating global hypomethylation in the CLL sample as previously reported (Supplementary Figure S7, Supplementary Files 2 and 3) (45). Incorporation of our novel chemoenzymatic labeling method into single-molecule optical mapping allows us to distinguish global DNA methylation differences between samples, opening up the potential application of this technique for DNA methylation research and medical diagnostics.

Genomic mapping of DNA methylation
We next attempted to map the CpG methylation profile in the well-characterized genome of the model plant, Arabidopsis thaliana. This genome is composed of five chromosomes with a total size of ∼135 Mb. The relatively small genome allowed us to test mapping feasibility while avoiding the cost and computational complexity of analyzing the human genome. In A. thaliana DNA methylation commonly occurs in all sequence contexts (CG, CHG and CHH, where H = A, T or C) (10,12) and the presence of methylation at transposons and repeats, as well as some protein-coding genes, is associated with gene silencing and genome stability (12,38,46). A. thaliana DNA was extracted, non-methylated CpGs were labeled using our modified M.SssI system, and genomic mapping was performed by using nickase labeling to create a barcode of specific sequence motifs. The dual-labeled DNA molecules were then electro-kinetically forced into nanochannels and imaged on the Bionano Genomics (Inc) Irys system, allowing the detection of fluorescent labeling patterns ( Figure 3A and B). We detected 34 866 DNA molecules longer than 100 kb and 28,917 were successfully aligned to the TAIR10 A. thaliana reference genome (Supplementary Figure S8). The analysis resulted in an average mapping depth of 47×, covering ∼97% of the genome with a minimum coverage of 15 molecules. The intensity profile of the epigenetic labels for each molecule was created by Irys Extract (37), followed by calculation of normalized average intensity across all molecules aligned to the same region ( Figure 3C and Supplementary Figure S9).

Optical methylation mapping is well correlated with bisulfite sequencing
To further evaluate methylation mapping, a genome-wide correlation of publicly available bisulfite sequencing data (38) with epigenetic optical mapping was calculated. Since our method captures signal from non-methylated CpGs, we present the genome-wide bisulfite sequencing data as the non-methylated fraction of CpGs. To overcome scaling and resolution differences between the two methods, each non-methylated CpG from the bisulfite sequencing data was padded ±500 bp and summed across the genome in 10 kb bins to generate a 'methylation score' that corresponds to the amount of non-methylated CpG sites in each bin (i.e. a higher score equates to more non-methylated CpG sites,). This binned data was then compared to the average intensity profile of optical epigenome mapping. First, optical data were normalized for coverage variations by dividing the summed local intensity by the local coverage to obtain a fluorescence intensity score in 1 kb bins across the genome. The normalized intensity score in each bin was then divided by the global maximum intensity (e.g. the 1 kb bin with the highest fluorescence intensity score), setting the relative intensity data between 0-1. As presented in Figure 4A, the optical mapping data nicely correlate with the genome-wide bisulfite sequencing data with a Spearman correlation coefficient of 0.745 (Supplementary File 4). In addition, a global view of the A. thaliana genome showing the levels of non-methylated CpG sites illustrates the correlation between the optical mapping and genome-wide bisulfite sequencing data ( Figure 4B). The centromeres of the A. thaliana genome are enriched in methylated CpGs and are thus seen as dips in the optical graphs at the centromeric positions (low signal denotes high methylation, Figure 4B). The lower panel of Figure 4B shows a zoomedin region illustrating the positive correlation between the two methods. The finer details in the blue track highlight the higher resolution of the genome-wide bisulfite sequencing data. Nevertheless, the long molecules observed by optical mapping (>100 kb) allowed us to assemble a reliable and highly contiguous epigenetic profile of the A. thaliana genome.

Optical patterns correlate with genome-wide bisulfite sequencing at siRNA clusters and gene bodies
In A. thaliana, DNA methylation at transposons and repeats is guided by short interfering RNAs (siRNAs) that recruit the DNA cytosine MTase, Domains Rrearranged Methyltransferase 2 (DRM2), via protein/RNA complexes (47,48). Since the 24nt siRNAs guide where DNA methylation occurs in the plant genome, including methylation in the CpG context, we checked if the CpG methylation patterns from bisulfite sequencing data correlate as expected with the optical data at sites that produce 24nt siRNAs (i.e. sites producing 24nt siRNAs should have  lower levels of non-methylated CpG sites compared to surrounding regions). Methylation scores from genome-wide bisulfite sequencing data (Supplementary File 5) and relative intensity values from optical mapping data were calculated 10 kbp upstream and downstream relative to the 24nt siRNA clusters (position 0) (39). These data were combined into a single plot representing the methylation profile around these regions ( Figure 5A). The methylation score from bisulfite sequencing data (blue) shows a sharp dip around position 0, indicating highly methylated DNA around 24nt siRNA clusters. The relative intensity values from the optical mapping data (red) correlate well with this trend, although at a lower resolution, as expected from the optical diffraction limit. As negative controls, we examined optical mapping data over primary miRNA loci and several trans-acting siRNA loci (TAS loci) which are not related to DNA methylation (Supplementary Figure S10, Supplementary File 5). As expected, the intensity profile across the miRNA precursor and TAS loci did not have a specific pattern around the cluster center.
In addition to DNA methylation in all sequence contexts at transposons and repeats, some gene bodies are methylated specifically in CpG context in the A. thaliana genome and this methylation is distributed away from the 5 region of genes (11). Thus, we also compared the methylation profiles across gene bodies from genomewide bisulfite sequencing and optical mapping (Supplementary File 6). Figure 5B illustrates the profiles across the gene bodies and 1.5 kb upstream of the transcription start site (TSS) and downstream from the transcription end site (TES). Again, the two datasets show the expected modulation in methylation along the gene, with sequencing data displaying higher resolution (as seen downstream of the TES). While lacking in resolution, optical mapping has the advantage of visualizing intact genes as well as extremely long DNA molecules that are able to capture both large-scale methylation patterns such as promoters and their distant enhancers (26), and methylation across highly variable regions that are difficult to map by short-read sequencing such as macrostelite arrays (23).

CONCLUSIONS
In this study, we use the DNA methyltransferase eM.SssI and a synthetic cofactor together with the nucleosidase MTAN to fluorescently label non-methylated CpG dinucleotides. The addition of MTAN to the reaction significantly increased the labeling efficiency of eM.SssI, making the method compatible with single-molecule analysis. Traditional bisulfite sequencing experiments require sampling a CpG site at least 30 times to accurately report on its methylation status. In order to reproduce such methylation calls by single molecule mapping, each CpG must be assessed on at leased 30 individual molecules. With a given experimental coverage of about 50× with optical mapping, we require a minimum labeling efficiency of ∼60% to accurately call methylation status. Thus, the labeling efficiency without MTAN was measured to be too low for an accurate methylation call in genomic DNA in a single experiment, while the increased efficiency with the addition of MTAN is sufficient. Application of this labeling approach to optical genome mapping enables genome-wide methylome profiling, offering long-range information that can span long variable regions and repeats (22,23). We note that this con-e92 Nucleic Acids Research, 2022, Vol. 50, No. 16 PAGE 10 OF 11 cept may be similarly utilized to map other genomic features such as other epigenetic modifications and DNA damage lesions (22,23,(49)(50)(51)(52).
Global methylation changes can be quantified by integrating the optical methylation signal for a given amount of DNA (Figure 2). Assessment of whole genome methylation levels may serve as a diagnostic tool that can distinguish between patients with healthy or malignant cells. It is long known that patients with B-cell chronic lymphocytic leukemia (CLL) experience hypermethylation in specific genes (53) while their global methylation levels decrease (45). This hypomethylation can be detected by the presented optical labeling scheme with very high sensitivity, emphasized by the distinct differences in labeling between the healthy and CLL patient ( Figure 2B).
The generation of whole-genome methylation maps by means of optical mapping was demonstrated here on the model plant A. thaliana and found to correlate well with genome-wide bisulfite sequencing data. As many plant genomes are highly repetitive, optical mapping is becoming a routine method for genomic characterization in plants (54)(55)(56). By utilizing this new approach, it is now possible to access methylation information in repetitive or otherwise challenging regions. While optical mapping resolution is limited by the optical diffraction limit to around 1 kb, we increased the effective resolving power of the technique by using fluorescence intensity in addition to genomic location. Thus, CpG-rich regions, where more than one CpG site resides within the optical spot, can now be characterized by signal intensity that reports on labeling density, i.e., the actual number of labeled CpG sites at a given fluorescent spot. Future extension of the method to the human genome may facilitate research on long range epigenetic regulation and for early diagnosis of malignant transformations related to DNA methylation.