Abstract

Digital polymerase chain reaction (dPCR) is a state-of-the-art targeted quantification method of nucleic acids. The technology is based on massive partitioning of a reaction mixture into individual PCR reactions. The resulting partition-level end-point fluorescence intensities are used to classify partitions as positive or negative, i.e. containing or not containing the target nucleic acid(s). Many automatic dPCR partition classification methods have been proposed, but they are limited to the analysis of single- or dual-color dPCR data. While general-purpose or flow cytometry clustering methods can be directly applied to multicolor dPCR data, these methods do not exploit the approximate prior knowledge on cluster center locations available in dPCR data. We present Polytect, a method that relies on crude cluster results from flowPeaks, previously shown to offer good partition classification performance, and subsequently refines flowPeaks’ results by automatic cluster merging and cluster labeling, exploiting the prior knowledge on cluster center locations. Comparative analyses with established methods such as flowPeaks, dpcp, and ddPCRclust reveal that Polytect often surpasses established methods, both on empirical and simulated data. Polytect manages to merge excess clusters, while also successfully identifying empty clusters when fewer than the maximally observable number of clusters are present. On par with recent developments in instruments, Polytect extends beyond two-color data. The method is available as an R package and R Shiny app (https://digpcr.shinyapps.io/Polytect/).

Introduction

Digital polymerase chain reaction (dPCR) is an increasingly popular molecular method that allows highly accurate, calibration-free quantification of nucleic acids [1]. Its advantages render it widely used across life science domains [2–5]. Unlike quantitative PCR, which requires real-time monitoring for quantification and a standard curve, dPCR typically relies on end-point fluorescence detection, i.e. fluorescence intensity at the end of the PCR amplification process, simplifying the reaction readout. Partition classification (thresholding; clustering) is based on these end-point fluorescence intensities. After thresholding, the binary or multinomial outcomes of the partitions allow for the quantification of the targeted nucleic acid(s) [5]. A dPCR glossary is available as Supplementary Table (Supplementary Table S1).

Recent advances in dPCR instrumentation give the user access to up to seven colors, enabling the simultaneous quantification of multiple target nucleic acids. However, multicolor dPCR data clustering poses challenges. Manual clustering is often performed, particularly for small series of single or duplex experiments. This may, however, introduce bias and low precision [6], and become increasingly difficult as the number of colors increases. We previously benchmarked automatic partition classification methods, including general-purpose, dPCR, and flow cytometry methods [7], and concluded that all methods face limitations, especially for the identification of small clusters or clusters with poor separation.

Some dPCR analysis methods make use of the expected cluster center locations [89], whereas flow cytometry methods do not, as cluster positions are unpredictable in flow cytometry. Current dPCR partition classification methods primarily operate on two-color data, with some 1-color methods having the potential to be extended to the multicolor setting [10]. This contrasts with flow cytometry methods that could—in principle—be directly applied to multicolor, (higher order) multiplexed dPCR experiments.

To address these limitations, we developed ‘‘Polytect’’. The software is based on hierarchical mixture modeling and makes use of the expected dPCR cluster center locations. Unlike current dPCR methods, Polytect is applicable to multicolor, (higher order) multiplexed experiments. Polytect builds upon ‘‘flowPeaks’’ [11], recognized as the top-performing clustering algorithm in our previous benchmarking study [7]. Notably, flowPeaks demonstrated robust performance, even with very low target concentrations. However, a drawback of flowPeaks is that it occasionally yields more clusters than expected, posing challenges for automatic labeling and nucleic acid concentration estimation. In contrast to flow cytometry, the maximal number of clusters in a dPCR experiment is known, and their position is estimable. Polytect addresses this limitation by allowing users to specify the expected (maximum) number of clusters and by enabling automatic cluster labeling and nucleic acid concentration estimation. Polytect remains accurate even when some or all targets are absent.

We evaluated Polytect’s performance using empirical and simulated data. The testing scenarios range from (higher order) two- to six-color data, detecting two to six targets in a reaction. Polytect’s performance was compared with manual expert clustering and automatic methods flowPeaks, ‘‘dpcp’’, and ‘‘ddPCRclust’’.

Materials and methods

Methodology overview

The different stages of Polytect involve (i) preliminary flowPeaks clustering, (ii) merging of flowPeaks clusters via hierarchical modeling, (iii) integration of cluster position information through penalty terms, and (iv) automatic labeling and target concentration estimation. Fig. 1 and Algorithm 1 give an overview of the Polytect algorithm, and details are provided in the following sections.

graphic

Schema of Polytect tool: (i) perform flowPeaks; (ii) initialize the cluster centers; and (iii) merge the cluster centers.
Figure 1.

Schema of Polytect tool: (i) perform flowPeaks; (ii) initialize the cluster centers; and (iii) merge the cluster centers.

Initialization

In a preliminary step, flowPeaks is performed. The number of estimated clusters by flowPeaks can be, and often is, higher than expected. In the subsequent hierarchical mixture modeling step [12], excess clusters are merged. However, when fewer than the expected number of clusters are identified by flowPeaks, Polytect will fail to find the correct clusters as Polytect relies on cluster merging and is unable to split clusters. To address this issue, flowPeaks’ parameters are tuned to obtain an excess number of clusters. This is achieved using Bayesian optimization with the ‘‘mlrMBO’’ R package (version 1.1.5.1, [13]). We specify the loss function as the discrepancy between the expected and observed number of clusters resulting from adjusting the input parameters for flowPeaks. As our method performs merging, we impose adjustments only when the actual number of clusters falls below the expected value. Of note, this adjustment is only performed during initialization but does not affect the subsequent cluster merging.

When there is only one cluster, only flowPeaks is performed: the subsequent steps are not executed, as Polytect performs merging only, and a single cluster cannot be merged further.

Hierarchical Gaussian mixture model

Suppose that the levels represent the steps required for clustering. Level l + 1 precedes level l. At level l + 1, more clusters are identified than expected, while at level l, these clusters have been further merged.

Assume there are k true clusters. At level l + 1 where only flowPeaks is performed, there are k1 clusters. At the level l where the merging is performed, there remain k clusters. k1 is often larger than k.

The likelihood of the observed data belonging to k1 clusters at level l + 1 can be written as [12]

(1)

where xi represents the intensities of the ith partition (of a total of n partitions), |$\pi ^{l+1}_g$| is the mixture weight (the fraction of the total partitions that is estimated to belong to the gth cluster) of component (cluster) g at level l + 1, and f(xil + 1) is the probability density function of the ith data point xi given the parameter set θl + 1.

Data points assigned to a given cluster at level l + 1 are assumed to be within the same cluster at level l as the method performs merging. The likelihood of the observed data at level l can be expressed as

(2)

where Xg represents all the data points in the gth cluster.

Let Z denote the membership of clusters at level l + 1 to clusters at level l. Z maps clusters across levels and is used in an Expectation–Maximization (EM) algorithm to estimate parameters. Specifically, zgh is 1 or 0, indicating whether the gth component at level l + 1 belongs to hth component at level l. The likelihood of the complete data at level l (given Z) can be formulated as

(3)
(4)

and the log-likelihood becomes

(5)

where Xg is the data belonging to component g at level l + 1, |$\pi ^{l}_h$| is the mixture weight of component h at level l, and f(Xg|zgh, θl) is the probability density function of Xg given that it belongs to h at level l, parameterized by θl.

This log -likelihood function is used in an EM algorithm for estimating the parameters θl of the hierarchical mixture model at level l.

Parameter estimation via an EM algorithm

Here we assume a Gaussian mixture model at both level l and l + 1. An EM algorithm is used to estimate the parameter set θl.

In the E-step, qgh, the expected value of zgh|Xg, θl can be derived as

(6)
(7)

where |$G(x,\boldsymbol{\mu },\boldsymbol{\Sigma })$| is the Gaussian density function with mean |$\boldsymbol{\mu }$| and covariance |$\boldsymbol{\Sigma }$|⁠. |$\boldsymbol{\mu ^{l+1}_{g}}$| is the mean of the gth component at level l + 1, |$\boldsymbol{\mu ^{l}_{h}}$| is the mean of the hth component at level l, |$\boldsymbol{\Sigma ^{l+1}_{g}}$| is the covariance of the gth component at level l + 1, and |$\boldsymbol{\Sigma ^{l}_{h}}$| is the covariance of the hth component at level l.

The M-step consists of maximizing the complete-data likelihood with regard to θl, resulting in

(8)

More details can be found in the ‘‘EM algorithm’’ section of the Supplementary material.

Penalization

To enforce constraints on cluster centers, penalty terms are introduced that penalize deviations from the expected cluster positions.

In the case of a common, noncompeting two-target, two-color assay design, we expect to observe (i) a negative cluster with low fluorescence intensities in both colors, (ii) a single positive cluster representing partitions positive for target 1 only, aligning horizontally with the negative cluster, (iii) a single positive cluster representing partitions positive for target 2, aligning vertically with the negative cluster, and (iv) a double positive cluster representing partitions positive for both targets, with its center approximately the sum of the centers of the two single positives that is positive for a single target (Fig. 2). The double positive cluster has the same endpoint fluorescence as the two single positive clusters. In the vector space, it is the sum of coordinates of the centers of the single positive clusters. However, deviations from these expected positions may occur, such as the double positive cluster (e.g. top right cluster, Fig. 2A) deviating from its expected intensities for color 1 (bottom right cluster, Fig. 2A). To accommodate such deviations, penalty terms are incorporated into Equation 8, resulting in

(9)

where |$\boldsymbol{\mu _{1}, \mu _{2}, \mu _{3}, \mu _{4}}$| are the centers of the negative population, single positive (target 1), single positive (target 2), and double positive population, respectively. |$\boldsymbol{\hat{\mu }_{1}}$| is the initial center estimate of the negative population. The user-defined parameter r1 controls the deviation between the estimated initial cluster center of the negative population and the actual cluster center, and r2 controls the deviation between the actual cluster center of the double positive population and the expected center. That is, |$\boldsymbol{\mu _{4}-\mu _{1}}$| should align as closely as possible with |$\boldsymbol{\mu _{2}-\mu _{1}+\mu _{3}-\mu _{1}}$|⁠. Details of the parameter estimation can be found in the ‘‘EM algorithm’’ section of the Supplementary material.

(A) HR (high-resolution) dataset. The solid arrow represents the actual cluster center, while the dashed arrow represents the expected cluster center. (B) MM (multi-mode) dataset. (C) LR (low-resolution) dataset.
Figure 2.

(A) HR (high-resolution) dataset. The solid arrow represents the actual cluster center, while the dashed arrow represents the expected cluster center. (B) MM (multi-mode) dataset. (C) LR (low-resolution) dataset.

With these constraints, the cluster labels (i.e. the identification of what are the single and double positive clusters) are automatically determined. This method can be easily extended to (higher order) multiplexing settings [14] by imposing additional constraints. We provide mathematical formula derivations for higher order two- and four-color data in ‘‘EM algorithm’’ section of the Supplementary material.

Polytect can be extended to analyze other types of assays, such as competitive assays, by adding a constraint coefficient to the cluster centers. Equation 9 becomes

(10)

where α1 and α2 are the coefficients used to construct a linear combination of the vectors formed by the cluster centers. They are chosen so that |$\boldsymbol{\mu _{4}-\mu _{1}}$| aligns as closely as possible with |$\alpha _{1}(\boldsymbol{\mu _{2}-\mu _{1}})+\alpha _{2}(\boldsymbol{\mu _{3}-\mu _{1}})$|⁠. α1 and α2 should be known and specified beforehand. In a noncompetitive assay, α1 = α2 = 1. For competitive assays, the fluorescence intensities of the double positives do not align with those of the single positives (see Supplementary Fig. S24 for an example). The typical fluorescence intensity of double positives in color 1 is ∼0.5 times the fluorescence intensity of single positive 1 in color 1. For color 2, it is ∼0.8 times the fluorescence intensity of single positive 2. In this case, we have α1 = 0.5, α2 = 0.8. For more details, please refer to ‘‘EM algorithm’’ section of the Supplementary material.

Performance evaluation

For the evaluation of the clustering methods, we selected four empirical two-color datasets (Fig. 2), three higher order two-color datasets, one three-color dataset, one four-color dataset, one five-color dataset, and two six-color datasets (Supplementary Tables S3S8). These 12 datasets were obtained across 3 dPCR instruments.

For the two-color datasets, including the competitive assay and higher-order multiplexing datasets, we benchmarked Polytect against flowPeaks, dpcp, and ddPCRclust. For more than two-color datasets, the comparison was limited to flowPeaks, as dpcp and ddPCRclust are tailored to the two-color setting.

To ensure fair comparisons and mitigate biases stemming from differences in intensity scales across colors [710], we conducted color-level min–max rescaling [15]. This procedure scales the data to a range [0, 1], preventing flowPeaks and dpcp from favoring colors with larger scales. Because ddPCRclust fails to cluster effectively when applied to such rescaled data, consistently yielding two clusters, we resorted to applying the method to the original, nonrescaled data.

When Polytect made significant errors on empirical data, such as failing to identify a cluster, we visually inspected the single-channel plots and increased the resolution, in order to identify the cause of the failure (Supplementary Tables S10 and S11).

Additionally, we tested the proposed method on simulated two-color data [7]. The simulation, based on empirical datasets, encompasses various scenarios on different factors (150 factor combinations in total), including low to high target concentrations, low to high percentages of rain, good to poor resolution, unimodal to bimodal distributions, orthogonal to non-orthogonal relationships, and equal to unequal target concentrations [7].

We used the adjusted rand index (ARI) [16], the relative bias of the estimated average number of target DNA molecules per partition (denoted as λ), and the quantities of interest (QOIs) to evaluate the clustering performance on both 12 empirical datasets and simulated data. The ARI quantifies the similarity between two cluster results of the same dataset. In particular, we compared the results of the proposed method with the expected labels of a reference clustering: true labels for simulated data and expert-determined clusters for empirical data. ARI scores range from −1 to 1, with 1 indicating perfect agreement, 0 indicating agreement not better than that obtained by random assignment of partitions to clusters, and −1 indicating complete disagreement. The relative bias of a λ estimate measures the deviation between the estimates and reference, relative to that reference, expressed as |$(\hat{\lambda }-\lambda _\text{ref})/\lambda _\text{ref}$|⁠. Concerning the quantities of interest, most datasets were used for absolute quantification, except for the MM, CNV 5-plex, and CNV 6-plex datasets. In the case of absolute quantification, the impact on the quantity of interest is directly related to the relative bias of λs. The MM dataset was analyzed to measure DNA integrity, following the method described in [17]. The CNV 5-plex and CNV 6-plex datasets were analyzed to measure copy number variation. We reported the median of relative bias of λs and QOIs of the clustering methods across 100 simulations.

To match the clusters identified by flowPeaks to the reference populations, we employ the Hungarian assignment algorithm [18], which efficiently solves the linear assignment problem by determining a one-to-one mapping that minimizes the total distance between the cluster centers provided by flowPeaks and those of the reference populations (the known true cluster centers). Polytect, dpcp, and ddPCRclust include automatic labeling.

We assessed the performance of the methods using optimal tuning parameter values for the empirical datasets [7]. Additionally, we recorded the run times for the empirical data analyses. To assess the stability of clustering results, we implemented the methods on 100 bootstrap samples drawn from the original datasets, each sample containing fluorescence intensities of 10 000 partitions. For the simulated data, we utilized the default parameter values, as conducting a thorough manual search for tuning parameters across all simulation scenarios was considered impractical. In the simulation setting, an automatic search may also be problematic [7]. However, we further examined scenarios where a method performed poorly, characterized by an ARI <0.8 or a relative bias >20% on both empirical and simulated data. For these cases, we optimized the methods’ parameters (details are provided in ‘‘Parameter optimization’’ section of the Supplementary material, see Supplementary Figs S1S5 and Supplementary Table S9).

Implementation, data, and code availability

We conducted all analyses using R (version 4.2.2) [19]. For the R package version, please refer to Supplementary Table S2 in ‘‘Package version’’ section of the Supplementary Data. In addition, we have developed an R package and a Shiny app that enable users to interactively explore different parameters and apply Polytect to a dataset of choice (see https://digpcr.shinyapps.io/Polytect/). R code and data are available from https://zenodo.org/records/14592424.

Results

Polytect achieved high ARI and low relative bias on empirical data

Across the 12 empirical datasets, Polytect consistently demonstrated strong performance, compared with other methods, emerging as the top performer in terms of ARI and relative bias (Table 1 and Supplementary Figs S11S22). Polytect (median absolute relative biases: [0 to 6.70], sd: [0 to 2.28]) outperformed flowPeaks (median absolute relative biases: [0 to 46.44], sd: [0.052 to 19.31]), dpcp (median absolute relative biases: [0 to 13.63], sd: [0.064 to 11.86]) and ddPCRclust (median absolute relative bias: [0 to 8406.43], sd: [0.045 to 184]). This high relative bias of flowPeaks likely stems from cluster mislabeling, which is supported by the disconnection between the observed ARI (high) and the relative bias (high). Visual review indeed suggests appropriate clustering by flowPeaks (Supplementary Figs S3 and S12). However, the high dimensional setting renders automatic labeling difficult. For the five-color and six-color data, Polytect (median absolute relative biases: [0 to 0.46], sd: [0 to 1.18]) also did better than flowPeaks (median absolute relative biases: [0.09 to 1.08], sd: [0.35 to 2.76]).

Table 1.

Performance metrics from the resampling study of the empirical data

DatasetMethod|$\frac{\hat{\lambda }_{1} - \lambda _{1}}{\lambda _{1}}$| (%)|$\frac{\hat{\lambda }_{2} - \lambda _{2}}{\lambda _{2}}$| (%)|$\frac{\hat{\lambda }_{3} - \lambda _{3}}{\lambda _{3}}$| (%)|$\frac{\hat{\lambda }_{4} - \lambda _{4}}{\lambda _{4}}$| (%)|$\frac{\hat{\lambda }_{5} - \lambda _{5}}{\lambda _{5}}$| (%)|$\frac{\hat{\lambda }_{6} - \lambda _{6}}{\lambda _{6}}$| (%)QOI (%)ARI
HRPolytect−0.89−0.29/////0.999
 flowPeaks−1.87−0.60/////1
 dpcp−1.87−0.62/////0.999
 ddPCRclust0.33−0.30/////0.999
 Biorad quantaSoft−0.240.42/////0.999
MMPolytect0.19−0.39////0.150.997
 flowPeaks2.07−0.23////2.410.996
 dpcp−0.01−0.30////1.180.960
 ddPCRclust17.040.24////14.640.973
 Stilla crystal Miner2.60.43////−2.50.988
LRPolytect−1.04−0.13/////0.996
 flowPeaks−1.18−0.13/////0.996
 dpcp−3.350/////0.985
 ddPCRclust−0.060/////0.998
 Biorad quantasoft00/////1
CAPolytect−3.77−0.26/////1
 flowPeaks−20.33−1.08/////0.999
 dpcp−10.55−0.32/////0.996
 ddPCRclust8406.43−2.47/////0.865
 Roche digital00/////1
 LightCycler development        
HO-HIGHPolytect0−0.32−0.23////0.999
 flowPeaks0−0.39−0.23////0.998
 dpcp−1.76−0.7213.63////0.994
 ddPCRclust0.290.230.46////0.999
 Biorad quantasoft///////0.999
HO-MEDPolytect000////1
flowPeaks0−0.320////1
dpcp1.370.331.84////0.999
ddPCRclust0.332.123.09////0.996
Biorad quantasoft////////
HO-LOWPolytect−2.163.053.69////1
flowPeaks−2.170.910.28////0.997
dpcp0.294.644.64////1
ddPCRclust4.073.7515.24////0.983
Biorad quantasoft////////
BPVPolytect000////1
flowPeaks000////1
Stilla crystal1.940.971.3////1
 Miner        
HIV 4-plexPolytect−6.70−0.40−3.40−0.81///0.985
flowPeaks−4.948.17−0.38−46.44///0.985
Biorad quantasoft////////
CNV 5-plexPolytect0−0.060.11−0.460/00.998
flowPeaks3.90−0.09−0.22−0.471.06/−2.960.998
Roche digital0−0.10−0.10.02/−0.131
 LightCycler development        
HIV 6-plexPolytect−0.110.290.010.06−0.260.14/0.991
flowPeaks−0.78−0.49−0.68−0.73−1.08−0.57/0.993
Biorad quantasoft000000/1
CNV 6-plexPolytect−0.09−0.26−0.09−0.09−0.29−0.250.240.992
flowPeaks−0.09−0.26−0.09−0.09−0.29−0.250.240.991
Roche digitalFAIL−0.02−0.010.07−0.14−0.14FAILFAIL
 LightCycler development        
DatasetMethod|$\frac{\hat{\lambda }_{1} - \lambda _{1}}{\lambda _{1}}$| (%)|$\frac{\hat{\lambda }_{2} - \lambda _{2}}{\lambda _{2}}$| (%)|$\frac{\hat{\lambda }_{3} - \lambda _{3}}{\lambda _{3}}$| (%)|$\frac{\hat{\lambda }_{4} - \lambda _{4}}{\lambda _{4}}$| (%)|$\frac{\hat{\lambda }_{5} - \lambda _{5}}{\lambda _{5}}$| (%)|$\frac{\hat{\lambda }_{6} - \lambda _{6}}{\lambda _{6}}$| (%)QOI (%)ARI
HRPolytect−0.89−0.29/////0.999
 flowPeaks−1.87−0.60/////1
 dpcp−1.87−0.62/////0.999
 ddPCRclust0.33−0.30/////0.999
 Biorad quantaSoft−0.240.42/////0.999
MMPolytect0.19−0.39////0.150.997
 flowPeaks2.07−0.23////2.410.996
 dpcp−0.01−0.30////1.180.960
 ddPCRclust17.040.24////14.640.973
 Stilla crystal Miner2.60.43////−2.50.988
LRPolytect−1.04−0.13/////0.996
 flowPeaks−1.18−0.13/////0.996
 dpcp−3.350/////0.985
 ddPCRclust−0.060/////0.998
 Biorad quantasoft00/////1
CAPolytect−3.77−0.26/////1
 flowPeaks−20.33−1.08/////0.999
 dpcp−10.55−0.32/////0.996
 ddPCRclust8406.43−2.47/////0.865
 Roche digital00/////1
 LightCycler development        
HO-HIGHPolytect0−0.32−0.23////0.999
 flowPeaks0−0.39−0.23////0.998
 dpcp−1.76−0.7213.63////0.994
 ddPCRclust0.290.230.46////0.999
 Biorad quantasoft///////0.999
HO-MEDPolytect000////1
flowPeaks0−0.320////1
dpcp1.370.331.84////0.999
ddPCRclust0.332.123.09////0.996
Biorad quantasoft////////
HO-LOWPolytect−2.163.053.69////1
flowPeaks−2.170.910.28////0.997
dpcp0.294.644.64////1
ddPCRclust4.073.7515.24////0.983
Biorad quantasoft////////
BPVPolytect000////1
flowPeaks000////1
Stilla crystal1.940.971.3////1
 Miner        
HIV 4-plexPolytect−6.70−0.40−3.40−0.81///0.985
flowPeaks−4.948.17−0.38−46.44///0.985
Biorad quantasoft////////
CNV 5-plexPolytect0−0.060.11−0.460/00.998
flowPeaks3.90−0.09−0.22−0.471.06/−2.960.998
Roche digital0−0.10−0.10.02/−0.131
 LightCycler development        
HIV 6-plexPolytect−0.110.290.010.06−0.260.14/0.991
flowPeaks−0.78−0.49−0.68−0.73−1.08−0.57/0.993
Biorad quantasoft000000/1
CNV 6-plexPolytect−0.09−0.26−0.09−0.09−0.29−0.250.240.992
flowPeaks−0.09−0.26−0.09−0.09−0.29−0.250.240.991
Roche digitalFAIL−0.02−0.010.07−0.14−0.14FAILFAIL
 LightCycler development        

Median relative bias of λ1, λ2, λ3, λ4, λ5, and λ6, QOI, and the ARI calculated for all resampled 10 000 data points. The result with the optimal tuning parameter value is shown. HR, MM, and LR are noncompeting two-color assay data, CA is competing mutation data, HO-HIGH, HO-MED, and HO-LOW are higher order two-color three-target data, BPV is three-color data, HIV 4-plex and 6-plex are four-color and six-color data, respectively, CNV 5-plex and CNV 6-plex are five-color and six-color data, respectively. ‘‘/’’ means ‘‘not applicable’’. ‘‘FAIL’’ means the software was not able to distinguish between positive and negative partitions. All partitions were marked as positive with this automatic thresholding method.

Table 1.

Performance metrics from the resampling study of the empirical data

DatasetMethod|$\frac{\hat{\lambda }_{1} - \lambda _{1}}{\lambda _{1}}$| (%)|$\frac{\hat{\lambda }_{2} - \lambda _{2}}{\lambda _{2}}$| (%)|$\frac{\hat{\lambda }_{3} - \lambda _{3}}{\lambda _{3}}$| (%)|$\frac{\hat{\lambda }_{4} - \lambda _{4}}{\lambda _{4}}$| (%)|$\frac{\hat{\lambda }_{5} - \lambda _{5}}{\lambda _{5}}$| (%)|$\frac{\hat{\lambda }_{6} - \lambda _{6}}{\lambda _{6}}$| (%)QOI (%)ARI
HRPolytect−0.89−0.29/////0.999
 flowPeaks−1.87−0.60/////1
 dpcp−1.87−0.62/////0.999
 ddPCRclust0.33−0.30/////0.999
 Biorad quantaSoft−0.240.42/////0.999
MMPolytect0.19−0.39////0.150.997
 flowPeaks2.07−0.23////2.410.996
 dpcp−0.01−0.30////1.180.960
 ddPCRclust17.040.24////14.640.973
 Stilla crystal Miner2.60.43////−2.50.988
LRPolytect−1.04−0.13/////0.996
 flowPeaks−1.18−0.13/////0.996
 dpcp−3.350/////0.985
 ddPCRclust−0.060/////0.998
 Biorad quantasoft00/////1
CAPolytect−3.77−0.26/////1
 flowPeaks−20.33−1.08/////0.999
 dpcp−10.55−0.32/////0.996
 ddPCRclust8406.43−2.47/////0.865
 Roche digital00/////1
 LightCycler development        
HO-HIGHPolytect0−0.32−0.23////0.999
 flowPeaks0−0.39−0.23////0.998
 dpcp−1.76−0.7213.63////0.994
 ddPCRclust0.290.230.46////0.999
 Biorad quantasoft///////0.999
HO-MEDPolytect000////1
flowPeaks0−0.320////1
dpcp1.370.331.84////0.999
ddPCRclust0.332.123.09////0.996
Biorad quantasoft////////
HO-LOWPolytect−2.163.053.69////1
flowPeaks−2.170.910.28////0.997
dpcp0.294.644.64////1
ddPCRclust4.073.7515.24////0.983
Biorad quantasoft////////
BPVPolytect000////1
flowPeaks000////1
Stilla crystal1.940.971.3////1
 Miner        
HIV 4-plexPolytect−6.70−0.40−3.40−0.81///0.985
flowPeaks−4.948.17−0.38−46.44///0.985
Biorad quantasoft////////
CNV 5-plexPolytect0−0.060.11−0.460/00.998
flowPeaks3.90−0.09−0.22−0.471.06/−2.960.998
Roche digital0−0.10−0.10.02/−0.131
 LightCycler development        
HIV 6-plexPolytect−0.110.290.010.06−0.260.14/0.991
flowPeaks−0.78−0.49−0.68−0.73−1.08−0.57/0.993
Biorad quantasoft000000/1
CNV 6-plexPolytect−0.09−0.26−0.09−0.09−0.29−0.250.240.992
flowPeaks−0.09−0.26−0.09−0.09−0.29−0.250.240.991
Roche digitalFAIL−0.02−0.010.07−0.14−0.14FAILFAIL
 LightCycler development        
DatasetMethod|$\frac{\hat{\lambda }_{1} - \lambda _{1}}{\lambda _{1}}$| (%)|$\frac{\hat{\lambda }_{2} - \lambda _{2}}{\lambda _{2}}$| (%)|$\frac{\hat{\lambda }_{3} - \lambda _{3}}{\lambda _{3}}$| (%)|$\frac{\hat{\lambda }_{4} - \lambda _{4}}{\lambda _{4}}$| (%)|$\frac{\hat{\lambda }_{5} - \lambda _{5}}{\lambda _{5}}$| (%)|$\frac{\hat{\lambda }_{6} - \lambda _{6}}{\lambda _{6}}$| (%)QOI (%)ARI
HRPolytect−0.89−0.29/////0.999
 flowPeaks−1.87−0.60/////1
 dpcp−1.87−0.62/////0.999
 ddPCRclust0.33−0.30/////0.999
 Biorad quantaSoft−0.240.42/////0.999
MMPolytect0.19−0.39////0.150.997
 flowPeaks2.07−0.23////2.410.996
 dpcp−0.01−0.30////1.180.960
 ddPCRclust17.040.24////14.640.973
 Stilla crystal Miner2.60.43////−2.50.988
LRPolytect−1.04−0.13/////0.996
 flowPeaks−1.18−0.13/////0.996
 dpcp−3.350/////0.985
 ddPCRclust−0.060/////0.998
 Biorad quantasoft00/////1
CAPolytect−3.77−0.26/////1
 flowPeaks−20.33−1.08/////0.999
 dpcp−10.55−0.32/////0.996
 ddPCRclust8406.43−2.47/////0.865
 Roche digital00/////1
 LightCycler development        
HO-HIGHPolytect0−0.32−0.23////0.999
 flowPeaks0−0.39−0.23////0.998
 dpcp−1.76−0.7213.63////0.994
 ddPCRclust0.290.230.46////0.999
 Biorad quantasoft///////0.999
HO-MEDPolytect000////1
flowPeaks0−0.320////1
dpcp1.370.331.84////0.999
ddPCRclust0.332.123.09////0.996
Biorad quantasoft////////
HO-LOWPolytect−2.163.053.69////1
flowPeaks−2.170.910.28////0.997
dpcp0.294.644.64////1
ddPCRclust4.073.7515.24////0.983
Biorad quantasoft////////
BPVPolytect000////1
flowPeaks000////1
Stilla crystal1.940.971.3////1
 Miner        
HIV 4-plexPolytect−6.70−0.40−3.40−0.81///0.985
flowPeaks−4.948.17−0.38−46.44///0.985
Biorad quantasoft////////
CNV 5-plexPolytect0−0.060.11−0.460/00.998
flowPeaks3.90−0.09−0.22−0.471.06/−2.960.998
Roche digital0−0.10−0.10.02/−0.131
 LightCycler development        
HIV 6-plexPolytect−0.110.290.010.06−0.260.14/0.991
flowPeaks−0.78−0.49−0.68−0.73−1.08−0.57/0.993
Biorad quantasoft000000/1
CNV 6-plexPolytect−0.09−0.26−0.09−0.09−0.29−0.250.240.992
flowPeaks−0.09−0.26−0.09−0.09−0.29−0.250.240.991
Roche digitalFAIL−0.02−0.010.07−0.14−0.14FAILFAIL
 LightCycler development        

Median relative bias of λ1, λ2, λ3, λ4, λ5, and λ6, QOI, and the ARI calculated for all resampled 10 000 data points. The result with the optimal tuning parameter value is shown. HR, MM, and LR are noncompeting two-color assay data, CA is competing mutation data, HO-HIGH, HO-MED, and HO-LOW are higher order two-color three-target data, BPV is three-color data, HIV 4-plex and 6-plex are four-color and six-color data, respectively, CNV 5-plex and CNV 6-plex are five-color and six-color data, respectively. ‘‘/’’ means ‘‘not applicable’’. ‘‘FAIL’’ means the software was not able to distinguish between positive and negative partitions. All partitions were marked as positive with this automatic thresholding method.

Polytect often outperformed other methods in terms of accurately quantifying the quantity of interest (Table 1). In terms of relative bias for DNA integrity, Polytect achieved a significantly lower bias (0.15) compared with flowPeaks (2.41), dpcp (1.18), and ddPCRclust (14.64). Similarly, for the CNV 5-plex and CNV 6-plex datasets, Polytect demonstrated excellent performance with low relative biases for CNV (0 and 0.24, respectively) compared with flowPeaks (−2.96 and 0.24, respectively).

Performance of Polytect when compared with the automatic thresholding provided by dPCR instrument-specific proprietary software was mixed (Table 1 and Supplementary Figs S6S10). For the HR, LR, CA, BPV, and CNV 5-plex datasets, the proprietary software produced results similar to those obtained by manual thresholding, with ARI values of 1 and relative biases close to 0. For these datasets, Polytect performed nearly as good (median ARI: [0.996 to 1]; median absolute relative biases: [0 to 1.04]). For the MM dataset, the proprietary software misclassified many data points at the edge of clusters, and rain (median ARI 0.988, Supplementary Fig. S7) while Polytect performed better, achieving a higher median ARI (0.997). For the higher order multiplexing datasets (HO-HIGH, HO-MED, and HO-LOW), the proprietary software could not be used for quantification because the number of targets exceeded the number of channels and the software only provided one threshold for each channel, making some clusters indistinguishable. Polytect provided excellent clustering performance (median ARI [0.999 to 1]; median absolute relative biases [0 to 3.69]). For the CNV 6-plex dataset, the proprietary software failed to distinguish between positive and negative partitions in channel 1, mislabeling all partitions as positive. Contrarily, Polytect demonstrated robustness (median ARI 0.992; median absolute relative biases [0 to 0.29]).

Per sample run times were low overall ([0.61 to 16] s, see Supplementary Table S12), with Polytect requiring 1.14 s per sample, flowPeaks being fastest (0.61 s) and ddPCRclust the slowest (16.41 s). dPCP and ddPCRclust offer options to reduce runtime. After optimization, ddPCRclust was the fastest (0.21 s), while dPCP also showed improvement (from 3.84 to 3.06 s). For the details, please refer to section‘‘Runtime’’ of the Supplementary Data.

Polytect outperformed other methods on simulated data

Across the simulation scenarios, all methods exhibited strong performance, with average median ARI values ranging from 0.954 to 0.996 and relative bias of λ1 (absolute values) ranging from 0.13% to 10.6% (Fig. 3). Notably, Polytect demonstrated superior performance compared with both flowPeaks and dpcp in terms of ARI and relative bias. Moreover, Polytect exhibited smaller variation compared with flowPeaks and dpcp.

(A) Median ARI across the 150 factor combinations. (B) Median relative bias of λ1 across the 150 factor combinations. (C) Median relative bias of λ2 across the 150 factor combinations. The methods are ranked from best left panel to worst right panel. All methods perform well with median ARI close to 1 and bias of λs close to 0. Polytect outperforms flowPeaks and dpcp.
Figure 3.

(A) Median ARI across the 150 factor combinations. (B) Median relative bias of λ1 across the 150 factor combinations. (C) Median relative bias of λ2 across the 150 factor combinations. The methods are ranked from best left panel to worst right panel. All methods perform well with median ARI close to 1 and bias of λs close to 0. Polytect outperforms flowPeaks and dpcp.

Polytect performed well in various scenarios

The effectiveness of Polytect was evaluated across various scenarios: (i) When more clusters than expected were identified by flowPeaks (Fig. 4A). (ii) When the expected number of clusters were identified by flowPeaks (Fig. 4B). (iii) When the actual number of clusters was 3, but the expected number was specified as 4 (Fig. 4C).

Clustering results obtained from flowPeaks and Polytect. The first row represents the clusters identified by flowPeaks, while the second row depicts those by Polytect. Each column corresponds to a different dataset: (A) and (D) HR dataset; (B) and (E) MM dataset; and (C) and (F) simulated data at very low concentration. The cluster centers are highlighted with a dot and a number. In first case, flowPeaks produced more clusters than expected (panel A), but Polytect successfully merged the surplus clusters (panel D). In second case, flowPeaks generated the expected number of clusters (panel B), and Polytect only relabeled the clusters in (panel E). In third case, flowPeaks produced fewer clusters than expected (panel C), and Polytect did not assign a partition to the double-positive cluster in (panel F).
Figure 4.

Clustering results obtained from flowPeaks and Polytect. The first row represents the clusters identified by flowPeaks, while the second row depicts those by Polytect. Each column corresponds to a different dataset: (A) and (D) HR dataset; (B) and (E) MM dataset; and (C) and (F) simulated data at very low concentration. The cluster centers are highlighted with a dot and a number. In first case, flowPeaks produced more clusters than expected (panel A), but Polytect successfully merged the surplus clusters (panel D). In second case, flowPeaks generated the expected number of clusters (panel B), and Polytect only relabeled the clusters in (panel E). In third case, flowPeaks produced fewer clusters than expected (panel C), and Polytect did not assign a partition to the double-positive cluster in (panel F).

The results demonstrate that additional clusters were successfully merged by Polytect (Fig 4D). When the number of clusters identified by flowPeaks aligned with the expected number, Polytect refrained from merging, thereby retaining the cluster centers (Fig. 4E). In such cases, only automatic labeling was performed. However, when the expected number of clusters exceeded the actual count (e.g. four expected clusters for third actual ones), a cluster center was assigned without any data points belonging to this additional cluster.

In the case of the three-color BPV data, Polytect demonstrated effective performance, producing estimated cluster centers that closely matched those obtained by manual thresholding. Despite an expected presence of eight clusters for three targets, the triple positives are absent due to low target concentrations. Consequently, while Polytect provided a cluster center for the hypothetical triple positives, no partitions were assigned to this cluster (Table 2). When analyzing the four-color HIV data, Polytect effectively aligned cluster centers and sizes with the reference; however, it failed to identify a single positive cluster (+ − − −, see Supplementary Table S10 and Supplementary Figs S19 and S20. Supplementary Fig. S20 provides better visualization with clusters that are positive in other colors removed). This discrepancy may stem from a low resolution in color 1 (Supplementary Fig. S23). After we artificially increased the fluorescence intensity of the partitions above the threshold in color 1 by 0.2, the method successfully identified the missing cluster of 17 data points, aligning with the manual thresholding, and supporting that a low resolution caused Polytect to fail (Supplementary Table S11). In the case of the five-color and the six-color data, Polytect matches well with the manual thresholding results (Supplementary Fig. S25).

Table 2.

Cluster centers and sizes given by manual thresholding and Polytect on three-color BPV data

Cluster labelMethodsCluster centerCluster size
− − −manual(0.057, 0.086, 0.105)24 401
 Polytect(0.057, 0.086, 0.105)24 401
+ − −manual(0.876, 0.074, 0.094)481
 Polytect(0.875, 0.074, 0.094)482
− + −manual(0.056, 0.814, 0.087)245
 Polytect(0.056, 0.814, 0.087)245
− − +manual(0.056, 0.079, 0.87)323
 Polytect(0.056, 0.079, 0.87)323
+ + −manual(0.746, 0.558, 0.141)4
 Polytect(0.744, 0.618, 0.161)3
+ − +manual(0.748, 0.084, 0.787)6
 Polytect(0.748, 0.084, 0.787)6
− + +manual(0.057, 0.530, 0.733)1
 Polytect(0.057, 0.530, 0.733)1
+ + +manual/0
 Polytect(0.874, 0.795, 0.845)0
Cluster labelMethodsCluster centerCluster size
− − −manual(0.057, 0.086, 0.105)24 401
 Polytect(0.057, 0.086, 0.105)24 401
+ − −manual(0.876, 0.074, 0.094)481
 Polytect(0.875, 0.074, 0.094)482
− + −manual(0.056, 0.814, 0.087)245
 Polytect(0.056, 0.814, 0.087)245
− − +manual(0.056, 0.079, 0.87)323
 Polytect(0.056, 0.079, 0.87)323
+ + −manual(0.746, 0.558, 0.141)4
 Polytect(0.744, 0.618, 0.161)3
+ − +manual(0.748, 0.084, 0.787)6
 Polytect(0.748, 0.084, 0.787)6
− + +manual(0.057, 0.530, 0.733)1
 Polytect(0.057, 0.530, 0.733)1
+ + +manual/0
 Polytect(0.874, 0.795, 0.845)0

‘‘−’’ indicates the absence of the targets and ‘‘+’’ indicates the presence.

Table 2.

Cluster centers and sizes given by manual thresholding and Polytect on three-color BPV data

Cluster labelMethodsCluster centerCluster size
− − −manual(0.057, 0.086, 0.105)24 401
 Polytect(0.057, 0.086, 0.105)24 401
+ − −manual(0.876, 0.074, 0.094)481
 Polytect(0.875, 0.074, 0.094)482
− + −manual(0.056, 0.814, 0.087)245
 Polytect(0.056, 0.814, 0.087)245
− − +manual(0.056, 0.079, 0.87)323
 Polytect(0.056, 0.079, 0.87)323
+ + −manual(0.746, 0.558, 0.141)4
 Polytect(0.744, 0.618, 0.161)3
+ − +manual(0.748, 0.084, 0.787)6
 Polytect(0.748, 0.084, 0.787)6
− + +manual(0.057, 0.530, 0.733)1
 Polytect(0.057, 0.530, 0.733)1
+ + +manual/0
 Polytect(0.874, 0.795, 0.845)0
Cluster labelMethodsCluster centerCluster size
− − −manual(0.057, 0.086, 0.105)24 401
 Polytect(0.057, 0.086, 0.105)24 401
+ − −manual(0.876, 0.074, 0.094)481
 Polytect(0.875, 0.074, 0.094)482
− + −manual(0.056, 0.814, 0.087)245
 Polytect(0.056, 0.814, 0.087)245
− − +manual(0.056, 0.079, 0.87)323
 Polytect(0.056, 0.079, 0.87)323
+ + −manual(0.746, 0.558, 0.141)4
 Polytect(0.744, 0.618, 0.161)3
+ − +manual(0.748, 0.084, 0.787)6
 Polytect(0.748, 0.084, 0.787)6
− + +manual(0.057, 0.530, 0.733)1
 Polytect(0.057, 0.530, 0.733)1
+ + +manual/0
 Polytect(0.874, 0.795, 0.845)0

‘‘−’’ indicates the absence of the targets and ‘‘+’’ indicates the presence.

Methods failed on empirical and simulated data

When methods performed poorly, further manual parameter optimization often improved the results, but sometimes failed. For the CA dataset, both flowPeaks and ddPCRclust failed to provide accurate estimates of λ1. While flowPeaks achieved high ARI by correctly classifying most partitions (Fig. 5A), it generated more clusters than expected (six rather than the expected four clusters), resulting in incorrect partition labeling. After parameter optimization, flowPeaks produced four clusters, reducing the absolute relative bias of λ1 from 20.33% to 4.17% (Fig. 5D). ddPCRclust initially produced only two clusters and showed no improvement after parameter adjustments (Fig. 5E). For the HIV 4-plex dataset, flowPeaks struggled to identify many positive partitions in channel 4 (Fig. 5C). Initially, it generated only nine clusters (16 are expected). After optimization to match the expected number of clusters, flowPeaks produced 17 clusters; however, these were small, often comprising just one or two partitions. Due to limitations in the labeling method, the results did not improve significantly even after parameter optimization (Fig. 5F). In the simulated dataset, Polytect generally achieved high ARI and low bias. flowPeaks failed in five cases (3.3%), while dpcp failed in 25 cases (16.7%). To match the number of flowPeaks failure cases, for dpcp, we visually inspected the five worst cases (with the lowest ARI). Parameter tuning improved the results for both methods in these failure scenarios (Supplementary Figs S3 and S5).

Scenarios where flowPeaks and ddPCRclust fail. The first row illustrates clusters identified before parameter optimization, while the second row shows clusters after optimization. Panels (A)–(D) correspond to the CA dataset, and panels (C) and (F) represent channel 4 of the HIV integrity-4 dataset. (A, D): Results from flowPeaks on the CA dataset. In panel (A), flowPeaks produced more clusters than expected. After parameter optimization, the surplus clusters were merged, as shown in panel (D). (B, E): Results from ddPCRclust on the CA dataset. In panel (B), ddPCRclust identified only two clusters, and this result did not improve even after optimization in panel (E). (C, F): Results from flowPeaks on channel 4 of the HIV integrity-4 dataset. In panel (C), flowPeaks failed to identify many positive partitions, misclassifying them as negative partitions. This issue persisted even after parameter tuning panel (F).
Figure 5.

Scenarios where flowPeaks and ddPCRclust fail. The first row illustrates clusters identified before parameter optimization, while the second row shows clusters after optimization. Panels (A)–(D) correspond to the CA dataset, and panels (C) and (F) represent channel 4 of the HIV integrity-4 dataset. (AD): Results from flowPeaks on the CA dataset. In panel (A), flowPeaks produced more clusters than expected. After parameter optimization, the surplus clusters were merged, as shown in panel (D). (BE): Results from ddPCRclust on the CA dataset. In panel (B), ddPCRclust identified only two clusters, and this result did not improve even after optimization in panel (E). (CF): Results from flowPeaks on channel 4 of the HIV integrity-4 dataset. In panel (C), flowPeaks failed to identify many positive partitions, misclassifying them as negative partitions. This issue persisted even after parameter tuning panel (F).

Discussion

We have developed and validated Polytect, an automatic multicolor dPCR data classification and labeling method. Unlike previous partition classification methods that are limited to one- or two-color data, Polytect can be applied to experiments using any number of colors. Further strengths of Polytect are its automatic cluster labeling component, a step necessary for subsequent target nucleic acid concentration estimation, through exploitation of expected cluster positions, and exploitation of prior knowledge on the (maximum) number of distinct clusters. The latter is used for cluster merging of preliminary flowPeaks clusters. Importantly, Polytect accommodates scenarios where certain clusters may be absent due to low concentrations or absence of target(s). When there are only negative samples and no target is present, the method still functions effectively. In such a scenario, only flowPeaks is performed. One advantage of flowPeaks is that it does not require prespecification of the number of clusters.

An increasing need for methods like Polytect stems from the latest developments in dPCR instrument hardware. Instruments now allow analysis of up to seven colors, corresponding to 27 (128) observable clusters for simple one-color, one-target seven-plex assays. Data complexity increases further when using, e.g. multiple probes for (a subset of) targets (‘‘higher order multiplexing’’). Indeed, for such assays, the number of clusters is typically (much) higher. Unfortunately, current state-of-the-art dPCR clustering methods are restricted to at most two colors. While some of these methods could be extended to accommodate more colors, a curse of dimensionality issues arises [10]. Therefore, the development of improved methods capable of accurately clustering partitions and accommodating more than two colors is imperative. Polytect, being inherently applicable to any number of colors, addresses this gap, and increases partition classification robustness through incorporation of prior knowledge on cluster location and count.

One example of Polytect’s strengths is in rare mutation detection. In such experiments, one target (wild type) is often abundant while the other is rare or absent. Due to this low abundance, some single, double (triple, etc.) positive partition clusters will contain few or no partitions. Polytect exhibits high sensitivity in detecting such small clusters. Importantly, even in the absence of expected clusters, Polytect, will not split the largest partition cluster, unlike some generic clustering methods such as kmeans. Users benefit from the convenience of specifying only the expected (maximum) number of clusters, alleviating concerns regarding cluster absence. Additionally, Polytect’s flexibility allows analysis of different applications, such as noncompetitive and competitive assays, the latter through fine-tuning the constraint coefficients of the cluster centers (Methods section).

We have conducted a comparative analysis of Polytect against dpcp, ddPCRclust, flowPeaks, and software provided by dPCR instruments. Across both empirical and simulated datasets, Polytect consistently outperformed or showed comparable performance to these methods. While flowPeaks, the base clustering algorithm for Polytect, demonstrates competence in handling multicolor data and detecting small clusters, its tendency to generate more clusters than expected poses challenges for automatic labeling. Because cluster locations in flow cytometry experiments are hard to predict, the cluster labeling issue arising in dPCR experiments is not addressed by flowPeaks. Polytect leverages the strengths of flowPeaks while addressing its limitation by merging excess clusters and automatically labeling clusters based on the prior knowledge of cluster center locations. Polytect’s robustness on this front is supported by the excellent concordance of estimated cluster center locations with expertly assigned ones.

dpcp and ddPCRclust are constrained to (higher-order multiplexing) two-color datasets. Performance assessment shows that dpcp may misclassify (Fig. 3, outliers), potentially stemming from misidentification of negative or primary clusters. ddPCRclust yields inaccurate results when fluorescence intensity scales vary substantially across colors (Supplementary Fig. S14).

Polytect faces a few limitations. First, as it merges excess clusters using hierarchical Gaussian mixture modeling and relies on an EM algorithm for parameter estimation, the initialization of cluster centers is important. When initial cluster centers are too close together or too extreme, the algorithm may not converge. Second, the method labels automatically by imposing constraints on cluster centers, implying a requirement for position rules between the cluster centers of single positive partition clusters, double positive partition clusters, and so forth. That is, we assume no interference between assays so that they are positioned in a near orthogonal way. However, if this assumption is violated, then an adaptation of the design is needed. As an example, we introduced a specific design for competing assays. For other assays, such as drop-off assays, further design changes would be needed. Moreover, since the method merges clusters rather than splitting them, it cannot generate more clusters than those produced by flowPeaks. In scenarios where data resolution is low (Supplementary Fig. S23), flowPeaks may fail to identify all clusters that actually exist, leading Polytect to overlook these clusters as well (Supplementary Table S10). While we try to alleviate this drawback by implementing the parameter optimization procedure outlined in [7], leading flowPeaks to identify more clusters, still fewer than the actual number is retrieved (9 versus 13 versus 14, before optimization, after optimization, and actual number, respectively). Utilizing different base clustering methods proficient in detecting small clusters is then a viable alternative. Indeed, Polytect is compatible with any initial clustering result.

In conclusion, our proposed automatic multicolor dPCR data clustering method, Polytect, can be applied beyond two-color data across different dPCR instruments and for various assay design types. It has demonstrated strong performance on both empirical and simulated data, achieving high ARIs and low relative biases. The method performs automatic labeling, making it convenient for analyzing data with three or more dimensions.

Acknowledgements

Authors contributions: Conceptualization: Y.C., M.V., and O.T.; Funding acquisition: O.T. and W.D.S.; Formal analysis: Y.C.; Supervision: M.V. and O.T.; Writing - original draft: Y.C.; Writing - review and editing: Y.C., W.D.S., W.T., J.V., D.G., A.L., O.T. and M.V.; Data curation: G.W. and W.T; Visualisation: Y.C., and Software: Y.C.

Supplementary data

Supplementary data is available at NAR Genomics & Bioinformatics online.

Conflict of interest

J.V. is co-founder of pxlence bv, providing universal Rainbow probes for digital PCR.

Funding

This work is funded in part by Bijzonder Onderzoeksfonds UGent (BOF, grant 01IO0420), and Agentschap voor Innoveren en Onderneme (VLAIO, grant HBC_2022.0673).

Data availability

R code and data are available from https://doi.org/10.5281/zenodo.14592424.

References

1.

Huggett
 
JF
,
Foy
 
CA
,
Benes
 
V
 et al. .  
The digital MIQE guidelines: minimum information for publication of quantitative digital PCR experiments
.
Clin Chem
.
2013
;
59
:
892
902
.https://doi.org/10.1373/clinchem.2013.206375.

2.

Querci
 
M
,
Van
 
den Bulcke M
,
Žel
 
J
 et al. .  
New approaches in GMO detection
.
Anal Bioanal Chem
.
2010
;
396
:
1991
2002
..

3.

Coccaro
 
N
,
Tota
 
G
,
Anelli
 
L
 et al. .  
Digital PCR: a reliable tool for analyzing and monitoring hematologic malignancies
.
Int J Mol Sci
.
2020
;
21
:
3141
.

4.

Tiwari
 
A
,
Ahmed
 
W
,
Oikarinen
 
S
 et al. .  
Application of digital PCR for public health-related water quality monitoring
.
Sci Total Environ
.
2022
;
837
:
155663
https://doi.org/10.1016/j.scitotenv.2022.155663.

5.

Hindson
 
BJ
,
Ness
 
KD
,
Masquelier
 
DA
 et al. .  
High-throughput droplet digital PCR system for absolute quantitation of DNA copy number
.
Anal Chem
.
2011
;
83
:
8604
10
..

6.

Trypsteen
 
W
,
Vynck
 
M
,
De
 
Neve J
 et al. .  
ddpcRquant: threshold determination for single channel droplet digital PCR experiments
.
Anal Bioanal Chem
.
2015
;
407
:
5827
34
..

7.

Chen
 
Y
,
De
 
Spiegelaere W
,
Trypsteen
 
W
 et al. .  
Benchmarking digital PCR partition classification methods with empirical and simulated duplex data
.
Brief Bioinform
.
2024
;
25
:
bbae120
.

8.

De
 
Falco A
,
Olinger
 
CM
,
Klink
 
B
 et al. .  
Digital PCR cluster predictor: a universal R-package and shiny app for the automated analysis of multiplex digital PCR data
.
Bioinformatics
.
2023
;
39
:
btad282
https://doi.org/10.1093/bioinformatics/btad282.

9.

Brink
 
BG
,
Meskas
 
J
,
Brinkman
 
RR
 
ddPCRclust: an R package and Shiny app for automated analysis of multiplexed ddPCR data
.
Bioinformatics
.
2018
;
34
:
2687
9
..

10.

Vynck
 
M
,
Chen
 
Y
,
Gleerup
 
D
 et al. .  
Digital PCR partition classification
.
Clin Chem
.
2023
;
69
:
hvad063
https://doi.org/10.1093/clinchem/hvad063.

11.

Ge
 
Y
,
Sealfon
 
SC
 
flowPeaks: a fast unsupervised clustering for flow cytometry data via K-means and density peak finding
.
Bioinformatics
.
2012
;
28
:
2052
8
..

12.

Vasconcelos
 
N
,
Lippman
 
A
 
Learning mixture hierarchies
.
Adv Neur Inf Proc Syst
.
1998
;
11
:
606
12
.

13.

Bischl
 
B
,
Richter
 
J
,
Bossek
 
J
 et al. .  
mlrMBO: a modular framework for model-based optimization of expensive black-box functions
.
arXiv3 December 2018, preprint: not peer reviewed
https://doi.org/10.48550/arXiv.1703.03373.

14.

Whale
 
AS
,
Huggett
 
JF
,
Tzonev
 
S
 
Fundamentals of multiplexing with digital PCR
.
Biomol Detect Quantif
.
2016
;
10
:
15
23
..

15.

Patro
 
S
,
Sahu
 
KK
 
Normalization: a preprocessing stage
.
arXiv19 March 2015, preprint: not peer reviewed
https://arxiv.org/abs/1503.06462.

16.

Hubert
 
L
,
Arabie
 
P
 
Comparing partitions
.
J Classif
.
1985
;
2
:
193
218
..

17.

Gleerup
 
D
,
Chen
 
Y
,
Van Snippenberg
 
W
 et al. .  
Measuring DNA quality by digital PCR using probability calculations
.
Anal Chim Acta
.
2023
;
1279
:
341822
https://doi.org/10.1016/j.aca.2023.341822.

18.

Papadimitriou
 
CH
,
Steiglitz
 
K
 
Combinatorial Optimization: Algorithms and Complexity
.
1988
;
North Chelmsford, MA
Courier Corporation
.

19.

R Core Team
 
R: a language and environment for statistical computing
.
2022
;
Vienna, Austria
R Foundation for Statistical Computing
.

Author notes

The last two authors should be regarded as Joint Last Authors.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact [email protected].

Supplementary data

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.