High-fidelity target sequencing of individual molecules identified using barcode sequences: de novo detection and absolute quantitation of mutations in plasma cell-free DNA from cancer patients

Circulating tumour DNA (ctDNA) is an emerging field of cancer research. However, current ctDNA analysis is usually restricted to one or a few mutation sites due to technical limitations. In the case of massively parallel DNA sequencers, the number of false positives caused by a high read error rate is a major problem. In addition, the final sequence reads do not represent the original DNA population due to the global amplification step during the template preparation. We established a high-fidelity target sequencing system of individual molecules identified in plasma cell-free DNA using barcode sequences; this system consists of the following two steps. (i) A novel target sequencing method that adds barcode sequences by adaptor ligation. This method uses linear amplification to eliminate the errors introduced during the early cycles of polymerase chain reaction. (ii) The monitoring and removal of erroneous barcode tags. This process involves the identification of individual molecules that have been sequenced and for which the number of mutations have been absolute quantitated. Using plasma cell-free DNA from patients with gastric or lung cancer, we demonstrated that the system achieved near complete elimination of false positives and enabled de novo detection and absolute quantitation of mutations in plasma cell-free DNA.

The linear amplification products were purified once with a 1.2x volume of AMPureXP, and the purified beads were dissolved in 20 µL of PCR amplification solution: 1x High Fidelity PCR Buffer (Life Technologies), 0.2 mM dNTPs, 2 mM MgSO4, 0.5 µM each of the PGM/Proton primers (Supplementary Table S2), and 0.4 units of Platinum Taq High Fidelity (Life Technologies).
Thermal cycling after the removal of the AMPureXP beads was performed as follows: 2 min at 95 °C for denaturation and 30 cycles of 15 sec at 95 °C and 1 min at 60 °C (for TP53) or 63 °C (for KRAS/CTNNB1). To compare error rates between the DNA polymerases that were used for the final amplification step, purified linear amplification products were also amplified in 20 µL of solution: 1x Q5 Reaction Buffer, 0.2 mM dNTPs, 0.5 µM PGM/Proton primers, and 0.4 units of Q5 Hot Start High-Fidelity DNA Polymerase. Thermal cycling was performed for 30 sec at 98 °C for denaturation and 30 cycles of 10 sec at 98 °C, 10 sec at 65 °C, and 15 sec at 72 °C. The final amplification products that were obtained using Platinum Taq High Fidelity exhibited clearer bands after agarose gel electrophoresis than those obtained using Q5 DNA polymerase; thus, we primarily used Platinum Taq High Fidelity for the final amplification. For the preparation of libraries for analysis by the Illumina system, the final PCR amplification step was performed 1 using indexed oligonucleotides for the discrimination of individual samples (Supplementary   Table S3) and Platinum Taq High Fidelity. The amplification products were purified twice with a 1.0x volume of AMPureXP and eluted in 20 µL of nuclease-free water (Ambion, TX, USA). When the final amplification products were prepared using Q5 DNA polymerase, the suggested method for the Illumina system, or the KRAS/CTNNB1 assay, the products were purified using agarose gel electrophoresis with a MinElute Gel Extraction Kit (Qiagen).

Library construction for experiments with double strand labeling
The digestion of genomic DNA by restriction enzymes and adaptor-ligation were performed as described above. The purified ligation products were mixed in 20 µL of Platinum Taq High Fidelity PCR solution containing 0.5 µM T_PCR_A and a 0.5 µM region-specific primer mixture (Supplementary Table S2). The PCR mixture was incubated for 20 or 30 min at 72 °C for replacement synthesis by Pyrococcus GB-D polymerase using the Platinum Taq DNA Polymerase High Fidelity kit and amplified as follows: 30 cycles of 15 sec at 95 °C and 1 min at 60 °C. The validity of the double strand labeling was confirmed using a model experiment with a mixture of heteroduplex DNA fragments. PCR fragments from normal individuals (Megapool) and from a cell line with a mutation in TP53 (MIA PaCa-2) were mixed at a ratio of 100 to 1, denatured, and renatured. Library construction was performed as described above, with or without replacement synthesis. The use of replacement synthesis resulted in an approximately 10-fold reduction in the rate of mutation detection (Supplementary Figure S1 and Table S4).

2
Supplementary Figure S1 Supplementary Figure S1. Construction of heteroduplex fragments with base substitutions.
As models for DNA lesions, we prepared heteroduplex DNA fragments by mixing R280W mutant (from MIAPaCa-2) and wild type (from Megapool) PCR fragments at a ratio of 100 to 1. These artificial heteroduplex fragments were digested by restriction enzymes and attached to adaptors.
We then generated fully double-stranded fragments using the strand displacement capability of Pyrococcus GB-D polymerase using a Platinum Taq DNA Polymerase High Fidelity kit (Life STechnologies) and amplified the sequences by PCR without linear amplification ( Figure 1A). R248W homologous mutation and from Megapool (wild type) were mixed at ratios, 1:0, 3:1, 1:1, 1:3, and they were used for preparation of sequencing libraries as described in method section.
"double strands" were labeled with the same barcode by replacement synthesis of the complementary strand, and sequenced ( Figure 1A). For "one strand" sequencing, replacement synthesis of the complementary strand was not done.

15
Supplementary Note: We selected FastDigest series of Thermo Scientific because they work in one universal buffer, that enables any combination of restriction enzymes in one reaction tube. All listed enzymes produce three-, four-or five-base protruding ends, and inactivate at more than 65 °C. Supplementary