Single molecule quantitation and sequencing of rare translocations using microfluidic nested digital PCR

Cancers are heterogeneous and genetically unstable. New methods are needed that provide the sensitivity and specificity to query single cells at the genetic loci that drive cancer progression, thereby enabling researchers to study the progression of individual tumors. Here, we report the development and application of a bead-based hemi-nested microfluidic droplet digital PCR (dPCR) technology to achieve ‘quantitative’ measurement and single-molecule sequencing of somatically acquired carcinogenic translocations at extremely low levels (<10−6) in healthy subjects. We use this technique in our healthy study population to determine the overall concentration of the t(14;18) translocation, which is strongly associated with follicular lymphoma. The nested dPCR approach improves the detection limit to 1 × 10−7 or lower while maintaining the analysis efficiency and specificity. Further, the bead-based dPCR enabled us to isolate and quantify the relative amounts of the various clonal forms of t(14;18) translocation in these subjects, and the single-molecule sensitivity and resolution of dPCR led to the discovery of new clonal forms of t(14;18) that were otherwise masked by the conventional quantitative PCR measurements. In this manner, we created a quantitative map for this carcinogenic mutation in this healthy population and identified the positions on chromosomes 14 and 18 where the vast majority of these t(14;18) events occur.


INTRODUCTION
Tumor-specific somatic mutations can provide highly useful molecular biomarkers and therapeutic targets for cancer diagnosis, prognosis and treatment. Central to the use of these genetic biomarkers in clinical oncology is sensitive and quantitative measurement of rare mutations in a vast excess of wild-type alleles. For instance, discovering driver mutations that lead to carcinogenesis in a rare subset of cells is one key approach to the risk assessment, early detection and treatment of cancer (1,2). Investigation of genetic variants in rare circulating tumor cells in metastatic cancer patients would help understand the biology of metastasis and development of drug resistance in chemotherapy (3). Moreover, quantification of low-level mutated sequences in cancer patients during and after treatments can provide informative data for evaluating therapy efficacy, monitoring minimal residual diseases and detecting disease relapse (4).
In recent years, technical advances have enormously improved the capacity to analyze genetic variants, yielding novel methods for the detection of rare mutations (5). For instance, quantitative PCR (qPCR), a widely used approach in genetic analysis, measures the analog fluorescence signal of targets and thus is limited in the detection sensitivity and/or quantification accuracy owing to instrumental and experimental variation. An attractive alternative to this analog technique is digital PCR (dPCR), which provides a superior sensitivity to conventional qPCR by allowing absolute quantification of target molecules (6)(7)(8)(9). Here, we report the development and application of a bead-based hemi-nested microfluidic digital droplet PCR (simplified as nested dPCR hereafter) approach to achieve 'quantitative' measurement of somatically acquired carcinogenic translocations at extremely low levels (<10 À6 ) in healthy subjects. This sensitive nested dPCR approach has an overall clinical sensitivity that is mainly limited by the amount of DNA that is available for screening (10). In contrast to other dPCR methods using emulsion droplets (8,9), our bead-based dPCR approach provides not only superior quantification performance at extremely low levels but also the capacity to sequence and quantify each mutated clone in a subject after millions of discrete single molecule reactions are conducted in parallel. Therefore, this novel dPCR method can be used to measure the amounts of various clones within a subject or population over time and thus monitor for clonal expansion before clinical disease progression.
The model translocation that we chose for technology validation, the BCL-2/immunoglobulin heavy chain (IgH) translocation t (14;18), is highly prevalent in many blood cancers, including $80% of follicular lymphoma (FL) cases and $25% of large-cell B-cell lymphoma cases (11,12). The translocation brings the B-cell lymphoma-2 (BCL2) gene from 18q21 under the control of the strong enhancers of the IgH locus, ultimately disrupting BCL2's normal pattern of expression in B cells (13,14). BCL2 is an anti-apoptotic protein, and its overexpression can be intimately involved in the pathogenesis of B-cell neoplasms (15). t (14;18) is found in a relatively small fraction of the peripheral blood mononuclear cells (PBMCs) of healthy individuals and may be a biomarker of early lymphoma (16)(17)(18). The mutation concentration in healthy individuals is $1000-fold lower than for individuals with stage III/IV FL(10), and it is believed that clonal expansion of atypical B cells is required for lymphoma progression (16,(18)(19)(20). t (14;18) prevalence at any level in healthy populations has been reported in the range of 8-88%, which reflects the differences both in the populations studied and in the techniques used to assay t(14;18) (17,21,22). Thus 'highly sensitive and quantitative detection' of t(14;18) is essential for fully investigating the clinical value of t(14;18) for risk assessment and early diagnosis of lymphoma. Furthermore, clinical studies have observed clonal evolution of t(14;18) associated with disease progression in individual patients (23). A high-throughput technique that can sequence and quantify multiple t(14;18) + clones could provide insight into the molecular pathology and clinical importance of t(14;18) (24,25).
Using the nested microfluidic dPCR method, we were able to quantitatively detect and sequence a single t(14;18) copy in 9 mg ($3 Â 10 6 copies) of clot genomic DNA (gDNA) from individuals in a healthy study population. We also applied nested dPCR to develop a quantitative genomic map of t(14;18) by sequencing and quantifying the unique t(14;18) clones found in individual subjects within this study population. The genomic map that we produced represents a baseline for this healthy population, and further sampling of this population can be used to monitor for expansion of particular clonal forms as part of disease progression.

Study subjects
The formaldehyde-exposed worker population, exposure assessment and biological sampling from the study subjects were described in detail in the Supplementary Methods.

Cell and gDNA purification
We used Clotspin Õ baskets and the Gentra Õ Puregene Õ blood kit (Qiagen, CA) to purify clot gDNA. Buffy coat was prepared by spinning whole blood at 200g for 10 min and then by removing the concentrated leukocyte band. We then isolated DNA using the FlexiGene DNA kit (Qiagen). PBMCs were purified from whole blood using density gradient centrifugation through Ficoll-Paque TM PLUS following the manufacturer's recommendations (GE Healthcare, NJ). To purify gDNA from cell lines and PBMCs, we used a standard cell lysis with RNA and protein digests followed by a phenol-chloroform DNA extraction. The quality of gDNA was assessed, and copy number was normalized using qPCR for b-actin. For more detail regarding gDNA extraction and quality assessment, please see the Supplementary Methods.

Microfluidic droplet-based dPCR
The four-channel Microfabricated emulsion generator array (MEGA) devices were constructed and operated as detailed previously (26). For droplet generation, freshly prepared carrier oil was injected into the oil channels by a syringe pump, and droplet dPCR mix containing t(14;18) amplicon and primer-conjugated beads was driven by the on-chip diaphragm pump, which was pneumatically actuated by a solenoid valve controller system built in house. The pumping was conducted in a four-step fashion under the control of a LABVIEW program to produce uniform $2.5 nl of PCR droplets, which were collected in 0.5 ml of PCR tubes filled with microfine emulsion. Thermal cycling was carried out in a PTC200 thermocycler (MJ Research) and involved a 10 min hot start at 95 C, and 33 cycles of 95 C for 30 s, 60 C for 60 s, 72 C for 90 s and a final 72 C extension for 5 min. The beads were then recovered by using a 15 mm of mesh filter, rinsed with isopropanol, ethanol and 1Â Dulbecco's PBS (DPBS, GIBCO) and analyzed by a multicolor flow cytometer (FC-500, Beckman-Coulter). More details regarding chip fabrication, droplet generation, PCR condition and bead handling and flow cytometry were provided in the Supplementary Methods. Table S3) used in this study were obtained from IDT (Coralville, IA), and, unless otherwise noted, PCR reagents were from Life Technologies (Carlsbad, CA). The pre-amplification (preamp) reaction mix contained 1Â Ampli Taq Gold Õ buffer with 5 mM MgCl 2 , 0.2 mM deoxyribonucleotide triphosphates (dNTPs) (deoxyuridine triphosphate (dUTP) was used at 0.4 mM instead of deoxythymidine triphosphate (dTTP)), 0.01 U ml À1 of uracil-DNA glycosylase (UDG) (Roche), 2.5% dimethyl sulfoxide (DMSO), 0.3 mM each of the oligonucleotides (JH Exo and RT0001), 0.035 U ml À1 of Ampli Taq Gold Õ Polymerase and 3 mg of gDNA per 50 ml reaction. Thermal cycling in an ABI GeneAmp Õ 9700 cycler consisted of a UDG reaction (50 C for 2 min), followed by a 10 min hot start at 95 C and 20 cycles of 95 C for 15 s, 60 C for 30 s, and 72 C for 30 s. The hemi-nested reaction mix contained the same components as the preamp mix except that 1 mM ROX reference dye, 0.3 mM each of the primers (JH Exo, Nv3, and BCL2MBRTM2) and 1 ml of the preamp reaction product were used instead. Thermal cycling in an ABI 7300 cycler consisted of a 10 min hot start at 95 C, and 33 cycles of 95 C for 15 s, 60 C for 30 s and 72 C for 30 s. Standards made by diluting RL gDNA in TK6 gDNA were used to establish a calibration curve for t(14;18) quantification ( Figure 1 and Supplementary Figure S2). Each assay plate included three standards (10 2 , 10 1 and 3 copies in 3 mg) and a negative control (3 mg of TK6 DNA) to ensure robust and specific detection at the level of 1 copy/mg. Positive reactions were run on a 1.5% agarose gel to separate amplicons that were then excised, purified using a QIAquick Gel Extraction Kit (QIAGEN) and sequenced at the UC Berkeley Core Sequencing Facility.

Single-molecule sequencing
Amplicon-bound beads from dPCR were counted using a hemocytometer, plated in 96-well PCR plates at $1 bead per well, and re-amplified using the hemi-nested PCR conditions described in the last section. DNA amplicons yielded from single beads were sequenced following the method described earlier in the text. Sequencing reads were aligned to the reference assembly of the human genome using NCBI's nucleotide basic local alignment search tool (BLASTn). The 'N sequence' insert was identified as the de novo sequence found between the two breakpoints in a particular translocation clone. To map the locations of the V(D)J recombination signal sequences (RSSs), we mapped all allowable RSS nonamers and heptamers to the chromosome 14 contig (NT_026437.11) and then found nonamer-heptamer pairs that were separated by appropriately sized spacer sequences ( Figure 5 and Supplementary Figure S6). More details about single-molecule sequencing and sequence analysis were discussed in the Supplementary Methods.

Digital quantitation and single-molecule sequencing of t(14;18): method design and performance
We found that our standard qPCR method was not sensitive enough to quantify and sequence t(14;18) from the clot gDNA of healthy subjects (Supplementary Figure S1); therefore, we developed a nested PCR approach (Figure 1a and c) for digital analysis of t (14;18). This approach starts with a preamp reaction (Figure 1b), and the resultant target copies are then quantified and sequenced using both a conventional nested qPCR method and the microfluidic nested dPCR for direct comparison of their performance. The nested qPCR detection was conducted in 50 ml of reaction volumes with a BCL2-specific cleavable probe sequence (Figure 1c and d) to determine the threshold cycle (C t ) values ( Figure 1c and Supplementary Figure S2). The dPCR methodology uses our custom-built MEGA devices and a bead-based emulsion PCR assay (26) to achieve high-throughput digital quantitation and singlemolecule sequencing of t(14;18) ( Figure 1d). In this methodology 2.5 nl of droplets serve as digital reaction volumes and droplets containing both single copies of t(14;18) and an IgH primer-functionalized bead yield clonal DNA beads labeled by fluorescein amidite (FAM)-labeled BCL2 primer after thermal cycling. A portion of the post-PCR beads are then analyzed by flow cytometry to quantify the target copies. Remaining beads are used as templates for further PCR amplification for single-molecule counting and sequencing of the genetic variants of the mutation ( Figure  1d). Figure 2 directly compares the detection performance of both nested assays using the standards of t(14;18) + gDNA spiked into wild-type human gDNA. The qPCR methodology had an efficiency of 93.8% and a linear dynamic range that spanned five log 10 t(14;18) concentrations ( Figure 2a). Each 50 ml of reaction had a quantitative limit of $3.3 Â 10 À6 copies of t(14;18) per genome (1 copy/mg gDNA) with an ultimate limit of detection of $10 À6 copies of t(14;18) per genome (1 copy in 3 mg of gDNA) (Figure 2a). The microfluidic nested dPCR technique offered quantitative detection down to a concentration of $10 À6 copies t(14;18) per genome (1 copy in 3 mg of gDNA), and the theoretical detection limit was determined to be 10 À7 copies per genome as the experimental signal is well above the background noise ( Figure 2b). Further dilutions of the lowest concentration preamp standard were used to demonstrate quantitative detection down to equivalent concentrations near $2 Â 10 À8 (Figure 2b, inset), indicating the detection and quantitation limits of the microfluidic nested dPCR detection is constrained by the amount of DNA input (maximum of 3 mg) that can be used in the 50 ml of preamp reactions.
Digital detection and quantification of t(14;18) in occupationally exposed subjects To validate the microfluidic dPCR method for digital analysis of rare t(14;18) mutations, we examined the clot   Figure S2); and (D) microfluidic emulsion single-molecule PCR using JH Exo-functionalized beads and 5 0 -FAM-labeled Nv3 for t(14;18) detection. A microfluidic emulsion generator array was used for high throughput (>10 6 /h) production of monodisperse reaction volumes. Following emulsion generation, the nanoliter-scale reaction droplets were cycled to achieve single copy genetic analysis of the template. If a copy of t(14;18) and a bead were both present in a reaction droplet, then the bead was labeled with fluorescent amplicon during PCR, otherwise beads remained unlabeled. Beads were then recovered and analyzed by flow cytometry to determine t(14;18) concentration within a sample. In addition, some of the beads were distributed at $1 bead/well in 96-well plates to confirm that FAM+ concentration among beads corresponds with t(14;18)+ concentration as determined using a BCL2-specific probe. This tertiary round of amplification also produced sufficient template to conduct standard sequencing reactions that were derived from single molecule reactions. gDNA samples of 93 healthy Chinese subjects, 42 of whom had been occupationally exposed to formaldehyde. All the samples were first characterized using the bulk nested qPCR method (Figure 1b and c). Three aliquots of 3 mg of gDNA from each subject were pre-amplified and assayed (Figure 3a, circles under each subject indicate number of assays). Two samples from Subjects Q and X are included for internal quality control, giving six reactions for each of these subjects. We found that there were detectable levels of t(14;18) in 41 of 93 ($44%) study subjects, as summarized in Figure 3a. Thirty-two of these t(14;18) + subjects show one or more negative reactions (indicated by the open circles in Figure 3a), presumably owing to the stochastic distribution of rare targets in the 3 mg of gDNA aliquots. The overall concentration in these t(14;18) + subjects ranged Filled circles indicate that a preamp reaction was positive by standard qPCR, whereas open circles indicate that the preamp reaction was found negative by standard qPCR. If an assay circle is marked with the letter 'e', then that preamp reaction was also tested using microfluidic PCR, and in these cases, the quantitative result from dPCR is also given. A box grouping preamp reactions indicates that these reactions were pooled before dPCR analysis. The data from these parallel measurements (when positive) contributed data to part c of this figure. For all positive subjects, at least one clonal form of t(14;18) was confirmed by sequence analysis; in some subjects, multiple clonal forms of t(14;18) were defined. If multiple clonal forms were found in a subject, the total number of defined clones is indicated above the error bars. An asterisk indicates that one clonal form for that subject went undetected in qPCR and was then discovered and/or defined in dPCR. (B) The dot scattered plot of the nested dPCR measurements of the preamp reactions showing the distinct populations for the t(14;18) positive and negative subjects. The dashed line indicates the detection limit of 10 À7 determined in Figure 2. (C) The two methods represented in Figure 1c and d correlate well for quantitating t(14;18) both in cell line-derived DNA standards and in clot DNA samples from chemically exposed workers. The horizontal error bars are standard error and the vertical error bars are standard deviation (n = 2). from $7.7 Â 10 À5 ($23 copies/mg) in Subject A down to $4 Â 10 À7 (a single copy in a total of 9 mg of gDNA assayed) in Subject OO. The median level of t (14;18) in the 41 positive study subjects was 2.23 copies/mg, and the mean level of t(14;18) among the positive study subjects was 3.83 copies/mg (genomic concentrations of 7.4 Â 10 À6 and 1.3 Â 10 À5 , respectively). Furthermore, we used sequencing and sequence analysis to definitively confirm and fully define at least one clonal form in all 41 t(14;18) + study subjects and to confirm that the forms of t (14;18) found in these subjects were unique and different from that found in the positive control CRL 2261 cell line. In several subjects, more than one unique clonal form was identified, and for these subjects, the total number of t(14;18) clones identified is displayed above the bars for each subjects (Figure 3a).
To assess the microfluidic-nested dPCR technology for rare translocation detection and molecular profiling, we focused our studies on the subjects with extremely low concentration of t(14;18) and/or multiple clonal forms. We analyzed a total of 69 preamp reactions, which consisted of 50 positive reactions and 10 negative reactions from 28 t(14;18) + subjects (circles marked with 'e' in Figure 3a) and 9 reactions randomly chosen from t(14;18) À subjects. The t(14;18) concentration results obtained by the nested microfluidic dPCR analysis agreed with those obtained by the standard qPCR. This parallel comparison also demonstrated the robust nature of both ultrasensitive assay variants. There was not a single instance of disagreement between the two methods for determination of positive/negative preamp reactions. We assayed 10 negative preamp reactions from seven positive subjects at the low end of t(14;18) concentration as determined by qPCR. All the nested dPCR assays yielded consistently negative results, a finding that strongly suggests true negatives, given the sensitivity of the methods used. Only one of the three preamp reactions from many of the subjects contained t(14;18), and the single positive qPCR trial for some subjects (e.g. subject OO with a C t value of 28.8), corresponded to $1 copy in 3 mg of gDNA, which suggests digital t(14;18) detection in the single positive assay reaction for these subjects. These observations demonstrate the ability of these nested methodologies to detect a single copy of t(14;18) in 9 mg of gDNA (relative genomic concentration of $4 Â 10 À7 ). Thus, the overall clinical sensitivity of this method, like other ultrasensitive techniques that achieve mutation detection at the single copy level, is limited mainly by the amount of gDNA available for screening (10,27).
The scatter plot of the nested dPCR measurements of the preamp reactions (Figure 3b) shows that the positive population is distinctly separated from the negatives, which represent an extremely low background (below genomic concentration of 10 À7 ). Such detection performance and background level are consistent with those obtained using the gDNA standards. The measured concentration of some positive reactions was lower than the lowest concentration possible in our assays using 3 mg of gDNA (1 Â 10 À6 ), which we found is largely attributed to the degradation of DNA caused by the freeze-thaw of the preamp samples. Regression of the parallel measurements from the t(14;18) + preamp reactions showed good correlation between the microfluidic nested dPCR and the standard nested qPCR (R 2 $0.75, Figure 3c). Overall, these observations confirm that our nested dPCR methodology is ultrasensitive and capable of quantifying rare mutations at concentrations of 10 À6 and lower.
In addition, the same clonal form(s) of t(14;18) were identified using both nested techniques for all positive preamp reactions, except that an additional form of t(14;18) was discovered by microfluidic nested dPCR for two subjects (J and Q, note asterisks in Figure 3a). This discovery demonstrates a key advantage of dPCR: singlemolecule analysis allows each clonal form to be amplified, detected and quantified without competition from other clonal forms, whereas a high concentration clonal form can mask the presence of a less concentrated clone in bulk analysis. Beyond the discovery of novel low-frequency clones, the dPCR technique can also be used to discretely amplify similarly sized clones that are contained within a single sample, and thus allow these distinct clonal forms to be resolved (see discussion of subjects H and HH in the next 'Results' section). The results from single-molecule quantification and sequencing are detailed later in the text.

Single-molecule sequencing to define and quantify t(14;18) clones
To fully define the various t(14;18) clones and to definitively confirm positive assay reactions, we purified various clonal forms of t(14;18) by size using agarose gel electrophoresis and then extracted the amplicons for sequencing reactions. When multiple clones are present in the same preamp reaction, the standard 'bulk' nested qPCR technique yields multiple bands in the same lane, and it is impossible to estimate the relative ratio of clonal forms. However, when the same preamp reaction is analyzed using the microfluidic nested dPCR technique, each positive reaction droplet amplifies a single molecule of t(14;18) amplicon ( Figure 4 and Supplementary Figure S3). The gel and sequencing data from single molecule-derived sequencing reactions can then be used to estimate the relative concentration of each clonal form in a preamp reaction. Following the nested dPCR from one particular preamp reaction conducted on subject D, flow cytometry revealed that the resulting primer beads were 27.8% positive for t(14;18) (Supplementary Figure S3a). A representative section (19/96 wells) from a plate of single-molecule sequencing reactions ($1 bead/ reaction on average) is displayed in Figure 4 Figure S3b). The overall C t from qPCR for this preamp reaction was 22.15, corresponding to $18 copies per mg or a genomic concentration of $6 Â 10 À5 copies per genome. When the data from both methods are considered, we can estimate that clone 1 is present at a concentration of $2 Â 10 À5 copies per genome, and that clone 2 is present at a concentration of $4 Â 10 À5 copies per genome. This analysis demonstrates the key strength of single molecule analysis via dPCR: each clonal form can be uniquely quantified and tracked. We used this approach to develop a quantitative map of the t(14;18) landscape in our healthy Chinese study population (see next 'Results' section).
Another advantage of dPCR is that the technique can be used to resolve similarly sized clones carried by a subject. Subjects H and HH carried similarly sized clones, and the conventional bulk qPCR approach was unable to resolve the clonal forms present in preamp reactions. In these cases, the various clones of t(14;18) that were concurrent in preamp reactions were similar in size and were not adequately separated on gel before purification for sequencing reactions. Subsequent sequencing reactions yielded reads with consensus near the Nv3 sequencing primer but with mixed traces as the clonal forms diverged (Supplementary Figures S4a and S5a for Subjects H and HH, respectively). Therefore, we used the microfluidic nested dPCR method to discretely package the various clonal forms of t(14;18) before nested amplification, and we then used a tertiary round of PCR to produce sufficient amplicon for sequencing. In both cases, dPCR and single-molecule-derived sequencing allowed us to purify and define the similarly sized clones of t(14;18) that were present in a particular preamp reaction (Supplementary Figures S4b, c and S5b and c).
To assess the effect of sequencing errors, we used a tracking sequence on the BCL2 side of the translocation, which is less variable than the IgHJ side of the translocation. The tracking sequence (AGAGCCCTCCTGCCCT) extends from position 3498 to position 3513 on NM_000633.2, and it was chosen because it appears in the vast majority of sequencing reads owing to both its . Definition and quantitation of clonal forms within subject samples. Representative results for the gel and sequence analysis of a single subject ('D') who was found to be positive for two unique clonal forms of t(14;18). The gel bands were excised for sequencing reactions, and sequence analysis defined the clonal forms of t(14;18) present in the sample. In bulk assays, both clonal forms amplify together, and it is not possible to estimate their relative concentration. However, in digital microfluidic dPCR, the different clonal forms are discretely encapsulated in nanoliterscale reaction droplets along with primer-functionalized beads. The resulting digital reactions load individual beads with amplicon that represents only a single form of the translocation. By simply counting the number of 'large' and 'small' bands or sequence reads, the relative ratio of clonal forms within a sample can be estimated. In this gel section, the ratio of clone 1:clone 2 is 1:5, but gel analysis of more reactions from this subject estimates the ratio as $18:37 (see Supplementary Figure S3b). proximity to the Nv3 sequencing primer and its distance from most of the BCL2 breakpoints. The single exception is that of clone 1 from Subject S; this clonal form [3507-N-87331509 (BCL2-N-IgHJ)] had a BCL2 breakpoint at 3507 and therefore did not contain the complete tracking sequence. Overall, we observed that 67% of all sequencing reads from bead-based amplifications contained no errors in this tracking sequence, whereas 33% of reads contained at least one error. The most common type of sequencing error were insertions (50% of all observed errors), whereas substitutions accounted for 30% of all errors, and multiple errors accounted for the remaining 20% of misread sequences. However, these types of sequencing errors did not impact the results presented here, as each clonal form of t(14;18) was sequenced multiple times, allowing a definitive consensus sequence to be assembled, despite the presence of sequencing errors.

Quantitative genomic mapping of t(14;18) breakpoints
We defined all of the clonal forms present in positive assay reactions and found that most positive subjects carried a single unique t(14;18) clone, although nine subjects carried two unique clonal forms and five subjects carried three unique clonal forms. We used the dPCR method to resolve each t(14;18) clone in subjects carrying multiple clonal forms, and thus we were able to quantify the relative amount of each clonal form. These relative ratios were then scaled by the results from conventional qPCR, which can only measure the total amounts of t(14;18) within a subject, to quantify all 60 clonal forms that were defined in this study.  Table S1). This clustering on chromosome 14 is consistent with the theory that errors in V(D)J recombination are responsible for t (14;18) formation (23,28). We found that t(14;18) breakpoints in the BCL-2 major breakpoint region (MBR) on chromosome 18 cluster around positions 3520, 3571 and 3629 on NM_000633.2, consistent with prior reports on FL cases and healthy individuals (17,29). This quantitative genomic map of t(14;18) breakpoints represents a baseline mutational landscape in these study subjects, and a time-course of such measurements could reveal clonal expansion on the path to lymphoma. For complete details for all t(14;18) clones, including the de novo 'N sequence' inserts, please see Supplementary  Table S1.

DISCUSSION
Droplet-based dPCR provides a powerful tool for sensitive genetic analysis and has been reported for quantitative detection of point mutations with detection limits of 1 mutant allele in 10 4 -10 5 wild-type copies (8,9,30). Here, we demonstrated a nested microfluidic dPCR that enables highly sensitive and quantitative detection of rare somatic translocation targets in a vast wildtype DNA background. The hemi-nested primer design reduces non-specific PCR amplification; however, we found that significant improvement in sensitivity is mainly conferred by using preamp reactions to increase the amount of target sequence relative to the overall concentration of gDNA used in the dPCR assay and thus decrease the interference from excessive background. We tested gDNA NTCs [preamp of 3 mg of purely t(14,18) À gDNA] in dPCR and found that the percentage of false FAM + beads was strongly dependent on the average concentration of gDNA in droplets. Excessive background (>0.1% false FAM + beads) was observed when averaged concentrations were >0.1 genomic copies per droplets ($100 pg/ml), owing to non-specific amplification of concentrated background gDNA in 2.5 nl of droplets. Therefore, it is necessary to operate at 0.01-0.1 copies per droplet to achieve highly sensitive detection. This operational limit results in a large dead volume during droplet generation and lowers the effective throughput of the dPCR assay when used to assay target mutations directly. Preamp of the low concentration standards allows us to use diluted gDNA concentration in dPCR, which enable quantitative detection down to concentrations of 1 Â 10 À7 or even lower (Figure 2b and inset) while maintaining the analysis efficiency.
The microfluidic approach described here has digital detection capability for highly quantitative measurement of low-level t(14;18) mutations with single-molecule sensitivity and resolution. Conventional qPCR assays developed here and by other researchers also conferred single-molecule sensitivity to detect t(14;18) at 10 À5 to 10 À7 levels, depending on the amount of gDNA screened (10,16,19,20,(31)(32)(33)(34)(35). However, these analog measurements remained semi-quantitative, especially at the low concentration range (e.g. <10 À5 in Figure 2a). Another distinct advantage that the dPCR method offers over analog qPCR assays is that it enables high-throughput targeted single-molecule sequencing and allows various t(14;18) clones to be resolved and quantified individually (Figures 4 and 5). We demonstrated that the microfluidic nested dPCR method provides single-molecule resolution and enables identification, quantitation and genomic mapping of unique t(14;18) clonal forms that are unresolvable using the conventional nested qPCR approach (see data for subjects G, Q, H and HH in Figure 3a, Supplementary Table S1 and Supplementary Figures S4  and S5). This technology thus could provide a powerful tool for investigating the clonal evolution of cancer at the single copy level. Clinical studies have identified clonal evolution of t (14;18) in individual patients in response to disease progression (23). A high-throughput technique with the capability to sequence and quantify multiple t(14;18) + clones could provide insights into the molecular pathology and clinical implication of t(14;18) (24,25). It could also be used to detect rare cancer stem cells in a large background of normal tissue. We may have identified t(14;18) + lymphoma stem cells in this study population, but only continued monitoring in a large prospective study can reveal whether particular t(14;18) + clones expand and give rise to lymphoma. In addition, in a previous report, we have demonstrated that this bead-based dPCR technology can be adapted for multiplexed detection of multiple mutations in single cells, which can provide even deeper insight into disease progression (36).
Furthermore, we demonstrated that our nested PCR assays for t (14;18) can be used directly on gDNA from whole blood or clot, a biological specimen that is easily collected in population-based studies. With few exceptions, researchers assay for t (14;18) in hematopoietic cell subpopulations that are enriched for B cells. We found significant differences in the t(14;18) qPCR signal provided by donor-matched PBMCs, buffy coat and clot, with clot gDNA containing the lowest t(14;18) concentrations (Supplementary Figure S1). However, collection strategies in field-based studies of healthy populations often cannot accommodate immediate blood fractionation, and a high sensitivity and throughput t(14;18) assay using clot gDNA of healthy individuals, such as the method described here, would be useful in large prospective cohort studies.
To our knowledge, this is the first time that a highly sensitive t(14;18) assay has been applied to a healthy Chinese population; though our group previously applied a less sensitive assay to PBMC DNA in a smaller Chinese population (37). This is also the first study to apply microfluidic dPCR to monitor somatic cancer mutations in occupationally exposed human subjects. Through the use of the new microfluidic nested dPCR technology, we detected, sequenced and quantified 60 t(14;18) clones found in 41 of 93 ($44%) healthy Chinese subjects. Our quantitative genomic mapping of these t(14;18) clones revealed clustering within the MBR region of BCL2 and within the IgHJ locus of chromosome 14, and the clustering we observed is consistent with previous reports. Specifically, we observed clustering within the MBR centered around positions 3520, 3571 and 3629 and mainly involving the J4, J5 and J6 RSSs on the IgHJ locus-positions that are essentially identical to those identified in prior reports in Western and North American populations (17,29,38).
The t(14;18) translocation is thought to be an initiating event in FL, and additional mutations after t(14;18) can be associated with various outcome measures (17,24). A study that used competitive genomic hybridization in biopsies from t(14;18) + FL cases showed that gain of chromosome X in males and gains involving chromosomes 2, 3q and 5 were among copy number alterations associated with poor outcome (39). Gene disruptions that are frequently associated with adverse outcome in FL cases include TNFRSF14 on 1p36 along with FAS and TP53 on 10q and 17p, respectively (40). A cytogenetic study of t(14;18) + FL biopsies found that del(6q), +5, +19 and +20 were associated with poorer overall survival, and that del(17p) was associated with poorer event-free survival (25). Although cytogenetics and FISH offer information about mutation concurrence in single cells, these methods are laborious, low-throughput and do not provide sequence information. Most other modern methods use homogenized samples, and there is no opportunity to achieve large-scale studies of mutation concurrence and synergy during disease progression at the single cell level-where carcinogenesis ultimately occurs.
The dPCR and single-molecule sequencing technology established here provides a promising platform for developing new approaches for high-throughput single-cells analysis of the concurrence of multiple mutations. Based on this digital microfluidic platform, we recently developed a methodology for high-throughput purification of single-cell genomes and multiple-allele sequencing of single cells (36). Currently, the throughput of our single-molecule/cell sequencing procedure is limited by the use of second-round PCR to amplify the bead-bound DNA for the standard Sanger sequencing. However, it is feasible to adapt our bead-based dPCR method to next-generation sequencing technologies for direct massively parallel sequencing off the post-PCR beads, thus providing unprecedented throughput in single-cell genetic analysis. We hope to soon extend this microfluidic single cell analysis technology to an expanded set of genetic markers of lymphoma and conduct multiple allele sequencing of these regions at the single-cell level in lymphoma biopsies. With these further developments, we will add additional dimensions to the mutational landscape developed here so that we can begin to study FL progression at the level of individual cancer stem cells.