An Automated Correction Algorithm (ALPACA) for ddPCR Data Using Adaptive Limit of Blank and Correction of False Positive Events Improves Specificity of Mutation Detection

BACKGROUND: Bio-Rad droplet-digital PCR is a highly sensitive method that can be used to detect tumor mutations in circulating cell-free DNA (cfDNA) of patients with cancer. Correct interpretation of ddPCR results is important for optimal sensitivity and speciﬁcity. Despite its widespread use, no standardized method to interpret ddPCR data is available, nor have technical artifacts affecting ddPCR results been widely studied. METHODS: False positive rates were determined for 6 ddPCR assays at variable amounts of input DNA, re-vealing polymerase induced false positive events (PIFs) and other false positives. An in silico correction algorithm, known as the adaptive LoB and PIFs: an automated correction algorithm (ALPACA), was developed to remove PIFs and apply an adaptive limit of blank (LoB) to individual samples. Performance of ALPACA was compared to a standard strategy (no PIF correction and static LoB ¼ 3) using data from commercial reference DNA, healthy volunteer cfDNA, and cfDNA from a real-life cohort of 209 patients with stage IV nonsmall cell lung cancer (NSCLC) whose tumor and cfDNA had been molecularly proﬁled. RESULTS: Applying ALPACA reduced false positive results in healthy cfDNA compared to the standard strategy (speciﬁcity 98 vs 88%, P ¼ 10 (cid:2) 5 ) and stage IV NSCLC patient cfDNA (99 vs 93%, P ¼ 10 (cid:2) 11 ), while not affecting sensitivity in commercial reference DNA (70 vs 68% P ¼ 0.77) or patient cfDNA (82 vs 88%, P ¼ 0.13). Overall accuracy in patient samples was improved (98 vs 92%, P ¼ 10 (cid:2) 7 ). CONCLUSIONS: Correction of PIFs and application of an adaptive LoB increases speciﬁcity without a loss of sensitivity in ddPCR, leading to a higher accuracy in a real-life cohort of patients with stage IV NSCLC.

METHODS: False positive rates were determined for 6 ddPCR assays at variable amounts of input DNA, revealing polymerase induced false positive events (PIFs) and other false positives. An in silico correction algorithm, known as the adaptive LoB and PIFs: an automated correction algorithm (ALPACA), was developed to remove PIFs and apply an adaptive limit of blank (LoB) to individual samples. Performance of ALPACA was compared to a standard strategy (no PIF correction and static LoB ¼ 3) using data from commercial reference DNA, healthy volunteer cfDNA, and cfDNA from a real-life cohort of 209 patients with stage IV nonsmall cell lung cancer (NSCLC) whose tumor and cfDNA had been molecularly profiled.

Introduction
Bio-Rad droplet-digital PCR (ddPCR) is a sensitive and quantitative method for the detection of variants with low variant allele frequencies (VAF) (1,2). For this reason, it is often used for the detection of mutations in circulating cell-free DNA (cfDNA).
The concept of ddPCR is that DNA molecules are randomly distributed over 10 000-20 000 droplets (2,3). A PCR reaction driven by Taq polymerase amplifies the target molecules, and absence or presence of the target mutation is signaled by wildtype and mutation specific taqman hydrolysis probes labeled with hexachloro-fluorescein (HEX) and fluorescein amidites (FAM) fluorescence, respectively. Subsequently, each droplet's fluorescence is individually measured, allowing for the detection of minute quantities of target molecules. In this way, ddPCR can detect rare (low VAF) mutations, making it especially useful for the detection of low abundant tumor-derived cfDNA in blood samples (1,(4)(5)(6).
One of the challenges of this approach is to distinguish a true positive from a false positive signal. It has been noted that mutation positive droplets (events) can occur in wildtype-only experiments (2,3,7), underlining the need for an optimal cutoff that prevents false positive results yet preserves a high analytical sensitivity. Different strategies have been described in literature and recommendations are given by the manufacturer (2,(8)(9)(10)(11)(12)(13), but none of these strategies takes into account technical artifacts and concentration dependent effects.
In this study, we report the occurrence of polymerase induced false positive events (PIFs) and an input dependent increase in PIFs in ddPCR experiments. Based on this observation, we developed a novel ddPCR data interpretation algorithm (adaptive LoB and PIFs: an automated correction algorithm, ALPACA) that combines corrections for assay specific error rates and technical artifacts. Its performance was compared to the standard ddPCR data analysis strategy suggested by the manufacturer.

HEALTHY DONORS AND PATIENT ENROLLMENT
Plasma and serum from healthy donors were obtained under informed consent, in accordance with the declaration of Helsinki and as approved by the medical ethics committee (METC) of the Netherlands Cancer Institute (NKI; Amsterdam, the Netherlands). All patients were enrolled with written informed consent as part of the Lung Cancer Early Molecular Assessment Trial (LEMA; ClinicalTrials.gov NCT02894853), which was reviewed and approved by METC of the NKI. A subgroup of the LEMA cohort, consisting of the first consecutive cohort of 209 patients with confirmed stage IV nonsmall cell lung cancer (NSCLC) for whom pretreatment plasma was available, was used in this study.
Blood collection, cfDNA isolation, and ddPCR procedure are described in the online Supplemental Methods, and dMIQE2020 adherence in Table 4 in the online Data Supplement (14). Briefly, cfDNA was isolated using the QIAsymphony (Qiagen) and ddPCR was performed using the QX100 instruments (Bio-Rad Laboratories Inc.).

PHASE 1: DESIGNING THE ALPACA ALGORITHM
Experiments. The ALPACA algorithm was designed using EP17 experiments (15), as follows: commercial wildtype DNA (Horizon Discovery Ltd) or mutant DNA (gBlock Gene Fragments, Integrated DNA Technologies Inc.) was measured in 60-fold in 2-4 concentrations per assay, ranging from 5 to 472 copies/mL (0.4-24.5 ng/well). Droplet counts from 60 wells were combined and analyzed as a single result.
PIF correction. The EP17 results were corrected for PIFs as follows: assuming that mutant molecules are randomly distributed over the droplets, the ratio of FAM-only vs FAM&HEX positive droplets was expected to be equal to the ratio of empty vs HEX-only positive droplets (2,3). The cumulative binomial probability (P) to observe each ratio of FAM vs FAM&HEX given the ratio of empty vs HEX droplets was calculated. For any experiment where P was smaller than 0.1, the maximum number of FAM&HEX droplets was calculated for which we would have P ! 0.1. All excess FAM&HEX droplets (the difference between the observed number and the calculated maximum expected number) were defined as PIFs and were treated as HEXonly droplets in all further analyses. The limit of P ¼ 0.1 was determined based on performance on the EP17 training data.
Adaptive LoB. Any FAM-positive droplets remaining after PIF correction were considered false positive droplets in the EP17 experiments. The false positive rate (FPR) was calculated as the number of mutant molecules per wildtype molecule for each assay at 2-4 concentrations per assay.
For all subsequent experiments, the droplet counts from each replicate were added up and the PIFs were removed as described. The adaptive LoB was calculated as 99.9% binomial upper confidence limit of the expected number of false positive droplets, given the FPR of the EP17 experiment with the concentration closest to the concentration of the sample. This limit was determined based on performance on the EP17 training data.
A mutation was called positive if the number of remaining FAM-positive droplets after PIF removal was equal to or greater than the adaptive LoB. PIF removal and the adaptive LoB were applied sequentially in an R algorithm called ALPACA, made available in GitHub: https://github.com/DCLVessies/ALPACA.git (Fig. 1).

PHASE 2: VALIDATION OF ALPACA
The analytical performance of ALPACA was compared to the manufacturer's standard strategy (2), which used no PIF correction and called a mutation when 3 or more FAM-positive droplets were present in a duplicate experiment and the corresponding blank experiment had no positive droplets. Three data sets were used for the evaluation: (a) commercial reference DNA to assess sensitivity of the 2 strategies; (b) cfDNA from healthy donors to assess specificity; and (c) stage IV NSCLC patient cfDNA to assess diagnostic accuracy in a real-world setting.
Commercial reference DNA. Wildtype (catalog no. HD124, HD249) and mutant DNA for EGFR p. E746-A750del (HD251), EGFR p. T790M (HD258), EGFR p. L858R (HD254), and BRAF p. V600E (HD238) were acquired commercially (Horizon Discovery Ltd). Wildtype and mutant DNA were mixed to obtain different levels of mutant DNA: 4.0, 0.3, 0.1, and 0.02 mutant copies/mL in a final concentration of about 11 wildtype copies/mL (0.7 ng/well), and 0.3, 0.03, and 0 mutant copies/mL in a final concentration of 300 wildtype copies/mL (20 ng/well). Five duplicates (10 wells) of each sample were measured with the relevant ddPCR assay (Bio-Rad, Supplemental Methods). Each duplicate was considered a single result, for a total of n ¼ 30 results per assay. Sensitivity of ALPACA and the standard strategy was defined as the fraction of results in which the mutation was detected.
Healthy cfDNA. Healthy donor plasma and serum cfDNA was acquired as described in the Supplemental Methods. The cfDNA yield was not determined, and 9 mL of sample was used as input for ddPCR, leading to a median plasma/serum equivalent input of 0.5 mL (IQR 0.5-0.5 mL). Specificity was defined as the fraction of results in which no mutation was detected.
Decentralized tissue molecular profiling (TMP) was performed according to the standard of care in the hospital of enrolment. According to national and international guidelines TMP should include known NSCLC oncodriver genes, including among others KRAS and EGFR (16)(17)(18). The methods of TMP were dependent on the quantity and quality of available tumor tissue and local laboratory preferences. TMP results were obtained from the clinical pathology reports.
Mutations detected with either PMP or TMP were considered true mutations. Mutations not detected with either method were considered true negatives. An independent sample of up to 4 mL of plasma (from the same blood draw as PMP) was used for ddPCR as described in the Supplemental Methods. This cfDNA was measured with 4 ddPCR assays [EGFR p. L858R, EGFR p. T790M, KRAS p. G12/G13 screening (Bio-Rad), and EGFR exon19 del drop-off (IDT) (19)], such that the equivalent of median 1 mL of plasma was used per assay (IQR 0.88-1 mL). The median concentration in ddPCR was 71 copies/mL (IQR 49-146 copies/mL), corresponding to median 4.8 ng/well (IQR 3.3-9.7 ng/ well). 49 patients (24%) had cfDNA concentration greater than 10 ng/well. Accuracy of ALPACA and the standard strategy were calculated as agreement with the molecular profiling (PMPþTMP) results.

STATISTICAL METHODS
ddPCR results were analyzed in QuantaSoft (Bio-Rad) as described in the Supplemental Methods. Mutation calls were interpreted by the ALPACA algorithm or the standard strategy where 3 FAM-positive droplets constitute a positive mutation call.
Statistical significance levels of the differences in sensitivity, specificity, and accuracy were assessed using the McNemar test with continuity correction. Statistical significance of differences in negative and positive predictive values were assessed using a weighted generalized score using compbdt (20). ALPACA and all statistical tests were performed in R v.3.6.0. (21).

POLYMERASE INDUCED FAM&HEX POSITIVE DROPLETS (PIFS)
The FPR of EGFR p. T790M, EGFR p. L858R, EGFR p. E746_A750del, BRAF p. V600E, KRAS G12/G13 screening, and EGFR exon 19 deletion drop-off assays were determined according to the CLSI EP17 protocol using different amounts of wildtype reference DNA (range 0.4-24.5 ng/well). The number of mutant positive droplets differed per assay and increased with the amount of wildtype reference DNA present in the PCR reaction.
Mutant positive droplets showed a skew toward FAM&HEX vs FAM-only droplets (examples in Fig. 2, A, B). This skew was not due to stochastic distribution of mutant molecules over the droplets. For example (Fig. 2, A), the probability to observe 188 or more FAM&HEX droplets out of 193 total mutant droplets, given 67% HEX-negative droplets, is P ¼ 10 À82 . P values for all experiments ranged from P ¼ 1 to P ¼ 10 À82 (Table 1).
We hypothesized that the skew could be caused by Taq polymerase PCR errors introduced in originally wildtype DNA present in a droplet (Fig. 2, C). This hypothesis was based on the observations that first the FAM&HEX droplets appear in a "fan-like pattern" (Fig. 2 A, C, Supplemental Figs. 1-3) reminiscent of discrete processes such as consecutive PCR cycles. Second, the skew was absent in the EGFR p. E746_A750del (c.2235_2249del15) assay (Fig. 2, B, Table 1), a mutation unlikely to be caused by PCR error. Third, the skew correlated with the DNA concentration in the analysis (Table 1). Fourth, the skew and fan pattern were more prominent in the EGFR p. T790M (c.2369C>T) assay than in the EGFR p. L858R (c.2573T>G) and BRAF p. V600E (c.1799T>A) assays (Table 1), which is in line with the base specific error rates of Taq polymerase (22,23). The fan pattern was subsequently confirmed in KRAS p. G12D (c.35G>A) and KRAS p. G13D (c.38G>A) assays, providing a similar pattern to EGFR p. T790M (Supplemental Figs. 2 and 3). Fifth, FAM&HEX positive droplets were never observed in No Template Control wells, indicating they require wildtype target molecules to arise. Last, the skew could not be caused by nonspecific probe-binding of the mutant probe to   (14)] rather than maximum FAM fluorescence in a minority of wildtype droplets.
To verify this hypothesis 100% EGFR p. T790M mutant DNA (gBlock Gene Fragment, IDT) was used to measure the skew for the T > C transition, for which Taq polymerase has a higher error rate than for the original C > T transition (22,23). This resulted in a 30 times higher number of FAM&HEX droplets compared to the EGFR p. T790M wildtype experiments (Supplemental Fig. 1), supporting the hypothesis that the excess FAM&HEX droplets do not represent original mutant DNA fragments and should be considered polymerase induced false positive events (PIFs) instead.
To address the occurrence of PIFs an in silico correction algorithm was developed. Briefly, this algorithm removes excess PIFs based on the likelihood of observing that distribution of FAM&HEX vs FAM-only droplets. The effect of removing PIFs is shown in Table 1.

ADAPTIVE LIMIT OF BLANK
PIFs are a source of false positive droplets that will result in FAM&HEX droplets. In the EP17 experiments we also observed FAM-only false positive droplets, resulting in an FPR after PIF correction (Table 1). Since the FPR increased with the amount of input DNA, we developed an adaptive LoB that scales with the concentration and the number of wells analyzed. Briefly, the adaptive LoB is calculated for each experiment (all replicates of a single sample) individually. It uses the assay and concentration dependent FPR to determine the maximum expected number of FAM-positive droplets in a wildtype-only experiment after PIF removal. Only experiments that have a number of FAM-only þ FAM&HEX positive droplets equal to or greater than the LoB are considered mutation positive. PIF correction and the adaptive LoB were applied sequentially in the ALPACA algorithm (Fig. 1).

VALIDATION OF ALPACA
To evaluate the performance of ALPACA we compared the results obtained by ALPACA to Bio-Rad's standard strategy (no PIF correction and static LoB ¼ 3) (2). The validation was performed using 3 data sets: commercial reference DNA to evaluate sensitivity, healthy donor cfDNA to evaluate specificity, and NSCLC patient cfDNA as evaluation in a real-life cohort.
Commercial reference DNA. Commercial reference DNA harboring the mutation was spiked into wildtype reference DNA at various VAFs and measured in 5 duplicates. The sensitivity of ALPACA vs the standard strategy was 70 vs 68%, n ¼ 120 (P ¼ 0.77, Supplemental Table 1). Subgroup analysis of low-input samples (0.7 ng/well) containing 4.0, 0.3, 0.1, and 0.02 variant copies/mL likewise showed no difference in sensitivity: 75 vs 70%, n ¼ 80, P ¼ 0. 22. Similarly, at high input and low VAF (0.3 and 0.03 variant copies/mL in 20 ng/well) no differences were found: 60 vs 65%, n ¼ 40, P ¼ 0.68. Meanwhile, the standard strategy had 2 false positive results in the high-input wildtype-only samples versus 0 for ALPACA.
In addition to analyzing the results as duplicates, ALPACA allows for merging a variable number of replicates and analyzing them as a single result, using the same rules for setting the adaptive LoB as it does for single wells, duplicates, or more replicates. In contrast to strategies with a static LoB (such as the standard strategy), it therefore requires no separate validation for each different number of replicates analyzed. This feature can be used to increase the sensitivity of the approach. When data from 5 duplicates (10 wells) were merged into 1 result ALPACA's sensitivity improved from 70% (n ¼ 120) to 92% (n ¼ 24), without calling a variant in the wildtype-only samples (Supplemental Table 1).
Healthy donor cfDNA. Healthy donor cfDNA was isolated from plasma and serum, and measured in duplicates using the EGFR T790M (n ¼ 79), EGFR L858R (n ¼ 28), KRAS G12/G13 screening (n ¼ 40), and EGFR exon19 deletion (n ¼ 40) assays. Combined specificity (n ¼ 187) was significantly greater for ALPACA vs the standard strategy (98 vs 88%, P ¼ 10 À4 , Supplemental Table 2). Most differences were seen in serum samples with concentration greater than 10 ng/ well (94 vs 74%, n ¼ 72, P ¼ 10 À4 ) confirming our observation that samples with high concentration have more false positive droplets. No false positive results were seen for either strategy when using EGFR L858R or EGFR exon19 deletion drop-off, while EGFR T790M (95 vs 85%, P ¼ 0.01) and KRAS G12/G13 screening (100 vs 75%, P ¼ 0.004) showed large differences in specificity, again fitting with earlier observations. NSCLC patient cfDNA. Last, both strategies were evaluated using a real-life cohort of stage IV NSCLC patient samples (n ¼ 209). For all patients, molecular profiling results for plasma and tissue were available, which served as the gold standard. Comparing the ddPCR results from 4 assays (EGFR exon 19 deletion drop-off assay, EGFR L858R, EGFR T790M, and KRAS G12/G13 screening) to the gold standard, we observed overall accuracy of ALPACA vs standard strategy of 98 vs 92% (P ¼ 10 À8 , Supplemental Table 3). The accuracy for each individual assay was improved when using ALPACA (Fig. 3, A). Combining results of the 4 assays, ALPACA and the standard strategy correctly identified 61 and 65 out of 74 positive mutations (sensitivity 82 vs 88%, P ¼ 0.13), and 730 and 683 out of 735 negative results (specificity 99 vs 93%, P ¼ 10 À11 ), respectively. The negative predictive value (NPV) was 98 vs 99% (P ¼ 0.10), and positive predictive value (PPV) was 92 vs 56% (P < 0.001).

Discussion and Conclusion
Bio-Rad ddPCR is a highly sensitive method that is used in clinical and research settings to detect mutations in cfDNA (1,2). Highly sensitive methods can be hindered by the occurrence of false positive results. Yet, the manufacturer's recommendations for data processing are rather straightforward. As a consequence, it copes poorly with high-concentration samples and technical artifacts, which ultimately may cause false positive mutation calls. This can be especially relevant when therapeutic choices are based on these results and can impact patient treatment and outcome.
We observed that FAM&HEX false positive droplets (PIFs) were introduced during ddPCR experiments (Fig. 2, Table 1). Our results show that the Taq polymerase, which is prone to base substitution errors (22,23), is the most likely source of PIFs. Since Taq polymerase is an irreplaceable component of Bio-Rad ddPCR Supermix, PIFs are likely to occur more generally when using this platform and the effects will be most significant for assays designed to detect C > T/G > A or T> C/A > G base substitutions. This may explain why others have found reduced specificity for the EGFR T790M (C > T) ddPCR assay compared to other assays (24).
PIFs can potentially lead to false positive ddPCR results and affect the quantification of ctDNA. We designed a statistical model to estimate the number of PIFs and exclude them from downstream analyses. In addition to PIFs, all ddPCR assays studied showed FPR increasing with DNA concentration ( Table 1). The adaptive LoB sets a threshold for individual ddPCR results, considering assay characteristics, the concentration dependent FPR, and the number of wells analyzed. PIF correction and adaptive LoB were combined into the ALPACA algorithm.
Together, these results show that ddPCR produces false positive results, particularly at high DNA input concentration. These are prevented by ALPACA, leading to improved confidence in positive results. This can be especially relevant in settings seeking low VAF variants, such as detection of minimal residual disease or samples with high wildtype DNA contamination such as serum cfDNA.
Confirming our findings in cfDNA from 209 patients with stage IV NSCLC, ALPACA improved diagnostic accuracy (98 vs 92%, P ¼ 10 À8 ), specificity (99 vs 93%, P ¼ 10 À11 ), and PPV (92 vs 56%, P < 0.001), while not affecting sensitivity (82 vs 88%, P ¼ 0.10) or NPV (98 vs 99%, P ¼ 0.10) (Fig. 3, B). Importantly, the For applications where optimal sensitivity is required, it is desirable to analyze more DNA by merging data from multiple replicates. In contrast to approaches with a static LoB, ALPACA's adaptive LoB scales with the number of replicates analyzed. In this manuscript, by merging data from 10 replicates of the commercial reference samples, sensitivity for ALPACA was improved from 70 to 92%. This shows that merging replicates can be a powerful tool to improve sensitivity for ddPCR, provided the specificity is maintained by application of an adaptive LoB.
Apart from the effect on sensitivity and specificity, ALPACA will also affect the quantification of results. Differences in quantification are especially relevant when mutations are monitored longitudinally, and additional research will be required to determine the effects of ALPACA on the precision of quantification.
Another approach for preventing false positive results can be to standardize the total DNA input per reaction. Decreased specificity due to high input is circumvented that way, and no adaptive LoB is required. However, samples that do not meet the input requirement cannot be analyzed and high-concentration samples need to be diluted, decreasing the theoretical sensitivity of the method (25).
In conclusion, ddPCR causes assay specific and input dependent false positive events (coined PIFs in this study). Even after PIF correction samples with higher DNA concentrations still can have more false positive events, requiring an adaptive LoB to distinguish positive from negative results. ALPACA is a novel algorithm that applies PIF correction and an adaptive LoB to individual and merged ddPCR results. The algorithm prevents false positive results especially in samples with a high concentration. Application of ALPACA to a real-life cohort of stage IV NSCLC plasma samples significantly improved the overall accuracy, specificity, and PPV while not significantly affecting the sensitivity and NPV.

Supplemental Material
Supplemental material is available at Clinical Chemistry online.
Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 4 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; (c) final approval of the published article; and (d) agreement to be accountable for all aspects of the article thus ensuring that questions related to the accuracy or integrity of any part of the article are appropriately investigated and resolved. D.C.L. Vessies, statistical analysis; T.C. Linders, administrative support, statistical analysis; M. Lanfermeijer, administrative support, statistical analysis; K.L. Ramkisoensing, administrative support, statistical analysis; V. van der Noort, statistical analysis; R.D. Schouten, provision of study material or patients; M.M. van den Heuvel, provision of study material or patients; K. Monkhorst, provision of study material or patients; D. van den Broek, financial support, provision of study material or patients.