Highly efficient single-stranded DNA ligation technique improves low-input whole-genome bisulfite sequencing by post-bisulfite adaptor tagging

Abstract Whole-genome bisulfite sequencing (WGBS) is the current gold standard of methylome analysis. Post-bisulfite adaptor tagging (PBAT) is an increasingly popular WGBS protocol because of high sensitivity and low bias. PBAT originally relied on two rounds of random priming for adaptor-tagging of single-stranded DNA (ssDNA) to attain high efficiency but at a cost of library insert length. To overcome this limitation, we developed terminal deoxyribonucleotidyl transferase (TdT)-assisted adenylate connector-mediated ssDNA (TACS) ligation as an alternative to random priming. In this method, TdT attaches adenylates to the 3′-end of input ssDNA, which are then utilized by RNA ligase as an efficient connector to the ssDNA adaptor. A protocol that uses TACS ligation instead of the second random priming step substantially increased the lengths of PBAT library fragments. Moreover, we devised a dual-library strategy that splits the input DNA to prepare two libraries with reciprocal adaptor polarity, combining them prior to sequencing. This strategy ensured an ideal base–color balance to eliminate the need for DNA spike-in for color compensation, further improving the throughput and quality of WGBS. Adopting the above strategies to the HiSeq X Ten and NovaSeq 6000 platforms, we established a cost-effective, high-quality WGBS, which should accelerate various methylome analyses.


SUPPLEMENTARY FIGURES
Supplementary Figure S1. PBAT enables highly efficient library preparation for whole-genome bisulfite sequencing. (Left) Conventional methods before development of PBAT first attach adaptors to both ends of DNA fragments, and then bisulfite treatment is performed. In this scheme, however, because bisulfite treatment frequently cuts DNA at random, the library molecule structure (i.e., DNA attached with adaptor sequences at both ends) will be lost. (Right) In contrast, because the adaptor tagging is performed after bisulfite treatment in the PBAT scheme, the structure of library molecules is not lost after adaptor tagging. Because bisulfite-treated DNA is single stranded, an efficient method for adaptor tagging to ssDNA is required for implementation of the PBAT scheme.  Supplementary Table S1), and 200 U of TdT (Takara Bio Inc.), by incubating at 37 °C for 2 h and then at 70 °C for 10 min. The acceptor ODN without ribotailing was prepared in the same manner, except that the reaction mixture did not contain TdT. The adenylated and ribotailed ODNs were used without further purification. The ligation reaction was performed in a 50 µL mixture that contained 1× TACS basal buffer, 100 µM ATP, 10% PEG400, 400 nM pre-adenylated donor ODN, and 400 nM acceptor ODN with or without ribotailing. For reactions with T4 RNA ligase and T4 RNA ligase 2, 40 U and 10 U of enzyme, respectively, was added to the reaction, the reaction mixtures were incubated at 25 °C for 1 h, and the enzymes were heat-inactivated at 70 °C for 10 min. For reactions with Mth RNA ligase (Mth-W) and 5′AppDNA/RNA ligase (Mth-M, New England Biolabs), 50 pmol and 20 pmol of enzyme, respectively, was added to the reaction. Next, the reaction mixtures were incubated at 65 °C for 1 h and the enzymes were heat-inactivated at 95 °C for 10 min. After the reactions, samples were analyzed using denaturing polyacrylamide gel electrophoresis with 10% Novex TBE-Urea Gel (Invitrogen). After the electrophoresis, the gel was stained with SYBR Gold Gel stain (Invitrogen) and image was obtained using a ChemiDoc system (Bio Rad Laboratories, Hercules, CA).
Conversely, because the thermostable 5′AppDNA/RNA ligase lacks the adenylation activity, only pre-adenylated donor ODN was used. Adenylation and ribotailing of donor and acceptor ODNs were achieved as described in Supplementary Figure S2, and the modified ODNs were used without further purification. The reaction was performed in 20 µL of a solution containing 1× TACS basal buffer, 250 µM ATP, 500 nM donor, 500 nM acceptor, and 40 U of T4 RNA ligase or 20 pmol 5′AppDNA/RNA ligase. The reaction mixture was incubated at 37°C for 2 h for T4 RNA ligase and 65°C for 2 h for 5′AppDNA/RNA ligase. After terminating the reaction with incubation at 95°C for 5 min, samples were analyzed using denaturing polyacrylamide gel electrophoresis as described in Supplementary Figure S2.
Supplementary Figure S4. Relationship between the efficiency of TACS ligation and molecular weight of PEG. The ability to enhance TACS ligation was compared among six PEG compounds of different molecular weights. When present at 10%, no PEG compound resulted in enhancement of TACS ligation efficiency; at 20%, the improvement was more pronounced when PEG with molecular weight higher than 1450 was used. Ribotailing of acceptor ODNs was achieved as described in Supplementary Figure S1, and the modified ODNs were used without any purification.
The reaction was performed by sequentially incubating at 65°C for 2 h and 95°C for 5 min. After the reactions, samples were analyzed using denaturing polyacrylamide gel electrophoresis as Supplementary Figure S14. Comparison of data produced by HiSeq X Ten and NovaSeq 6000.
Comparisons of methylation level (left panels) and read depth (right panels) at single nucleotide resolution (top panels) and 1,000-bp bin (bottom panels) are shown. Cytosines mapped with minimum 10 reads are used for the calculations of methylation levels.

SUPPLEMENTARY TABLES
Supplementary Table S1. Oligonucleotides used in the current study *1 N denotes an equimolar mixture of A, C, G, and T. "Number" refers to the nucleotide length. In the current study, stretches of 40, 60, 80, 100, 120,

Name
Nucleotide sequence and chemical modifications Median read depth All bases 6 10 12 18 All C 6 11 13 21 All C of CpG contexts 5 11 14 23 All C of CHG contexts 6 12 14 21 All C of CHH contexts 5 11 13 21 Supplementary Table S7. Library yields and mapping rate of reads were compared between tPBAT and rPBAT. The data presented in Figure 3 is shown with mapping rate. For yield of library, three independent preparations were summarized (mean and standard deviation are shown).
For some representative conditions, sequencing was performed with Illumina MiSeq sequencer. The rates of uniquely mapped reads are shown.  Supplementary Supplementary Table S6 HiSeq X Ten IMR90 SRP158894 GSE119068 Figure 5 Supplementary

SUPPLEMENTARY INFORMATION
Supplementary Information S1. Nucleotide sequence of a gene encoding codon-optimized TS2126 RNA ligase.
Recognition sequences for BamHI and EcoRI are underlined.  This is because the bisulfite-treated DNA is now double-stranded and excessive primers in the solution serve as a carrier DNA to prevent the adsorption of template DNA to tube wall.