-
PDF
- Split View
-
Views
-
Cite
Cite
Francesco Denti, Michele Guindani, Fabrizio Leisen, Antonio Lijoi, William Duncan Wadsworth, Marina Vannucci, Two-Group Poisson-Dirichlet Mixtures for Multiple Testing, Biometrics, Volume 77, Issue 2, June 2021, Pages 622–633, https://doi.org/10.1111/biom.13314
- Share Icon Share
Abstract
The simultaneous testing of multiple hypotheses is common to the analysis of high-dimensional data sets. The two-group model, first proposed by Efron, identifies significant comparisons by allocating observations to a mixture of an empirical null and an alternative distribution. In the Bayesian nonparametrics literature, many approaches have suggested using mixtures of Dirichlet Processes in the two-group model framework. Here, we investigate employing mixtures of two-parameter Poisson-Dirichlet Processes instead, and show how they provide a more flexible and effective tool for large-scale hypothesis testing. Our model further employs nonlocal prior densities to allow separation between the two mixture components. We obtain a closed-form expression for the exchangeable partition probability function of the two-group model, which leads to a straightforward Markov Chain Monte Carlo implementation. We compare the performance of our method for large-scale inference in a simulation study and illustrate its use on both a prostate cancer data set and a case-control microbiome study of the gastrointestinal tracts in children from underdeveloped countries who have been recently diagnosed with moderate-to-severe diarrhea.
1 Introduction
The availability of high-dimensional data in domains as diverse as genomics, imaging, and astronomy, has brought the necessity to screen a large number of hypotheses simultaneously. Here, we focus on the two-group modeling framework (Efron, 2004). To illustrate, we assume that the observations are suitably defined difference scores over a large number of distinct hypotheses. The two-group model assumes that the
's are drawn either from a null (f0) or a nonnull (f1) distribution, ie, each score is drawn from a mixture,

for some weight , and some probability (density) functions f0 and f1. The null component is typically assumed standard normal; however, the true null distribution may differ from the theoretical null, eg, due to limited sample size or unaccounted correlation. Thus, Efron proposes the estimation of an “empirical null” distribution to adequately capture the range of parameter values coherent with the null hypothesis and accordingly evaluate each testing decision.
In Bayesian nonparametrics, the Dirichlet process (DP) has been extensively used to provide flexible estimates of f0, or f1, or both, as well as for clustering the 's into common “expression” levels (Do et al., 2005; Dahl and Newton, 2007; Kim et al., 2009; Kottas and Fellingham, 2012). Martin and Tokdar (2012) develop a flexible hierarchical nonparametric approach where f0 is assigned a Normal distribution with unknown mean and variance, whereas f1 is a location mixture of normals. One appealing feature of the two-group model is that the resulting inference is immediately amenable to interpretation in a decision-theoretic framework. For example, Efron (2004) describes a local version of the false discovery rate (local fdr), which represents the posterior probability that a difference score
is generated according to the null hypothesis,
. The selection of interesting scores is conducted by flagging all
's such that
,
, allowing control of the Benjamini-Hochberg FDR (Benjamini and Hochberg, 1995) at level α. More generally, the decision problem could minimize loss functions that compound expected false-positive and false-negative decisions. The optimal decision would then lead to thresholding the posterior probability of the alternative (eg, see Muller et al., 2006).
In this paper, we investigate the use of a mixture prior of two-parameter Poisson-Dirichlet (2PPD) processes in lieu of the commonly used DPs. The 2PPD process, also known as the Pitman-Yor process, is a generalization of the DP and is characterized by two parameters: a “concentration” parameter θ (analogous to the single parameter of the DP), and a “discount” parameter σ. The additional parameter allows for more flexible clustering behavior than the DP and can be used to tune the reinforcement mechanism of large clusters (Lijoi et al., 2007). We show how the proper choice of σ can be used to model the empirical null distribution f0 and the uncertainty related to the nonnull distribution in the two-group model, leading to improved testing procedures. Our modeling framework further employs nonlocal prior (NLP) densities for the base measure of the random probability measures under the alternative hypothesis to allow better separation between the two mixture components. We derive the expression of the exchangeable partition probability function (EPPF), induced by the proposed two-group 2PPD mixture process and observe that, conditional on the assignment of the observations to the null or the alternative hypothesis, the respective random partitions are independent. This property conveniently facilitates posterior inference obtained via Markov Chain Monte Carlo (MCMC) algorithms, which take into account the conditional independence of the partitions. By means of a simulation study, we discuss the performance of our method with respect to the commonly used mixture of DPs and existing state-of-the-art approaches for large-scale multiple comparison problems. We also illustrate the use of the proposed 2PPD processes mixture model on two publicly available data sets: a well-known Prostate cancer data set (Singh et al., 2002) and one collected from a recent microbiome study (Pop et al., 2014). In the latter case, the aim was to characterize the microbial composition of the gastrointestinal tracts of children from underdeveloped countries who have been diagnosed with moderate-to-severe diarrhea (MSD). Our study suggests that mixture of DPs should be used with some caution in large-scale multiple testing, and that the use of 2PPD processes could lead to improved operating characteristics.
2 A Review of the 2PPD Process
In this section, we provide an overview of the 2PPD process with particular regard to its use for density estimation and its clustering properties. Let be a sample of n data measurements (eg, raw observations or summary statistics), drawn from a sequence of exchangeable random elements
, taking value in a complete and separable metric space
endowed with its Borel σ-algebra
. By virtue of the de Finetti representation theorem,

for any , and for
, a random probability measure, with distribution Q defined on the space
of probability measures on
. In a Bayesian framework, Q represents the prior distribution and the model is said to be parametric whenever Q degenerates on a finite-dimensional subspace of
; otherwise, the model is denoted as nonparametric.
Here, we consider the 2PPD process for the random probability measure , which can be represented almost surely as an infinite mixture, ie,
, where
denotes the point mass at c, the
's are random weights obtained as
and
,
with
,
(stick-breaking construction; Pitman, 1995), for some
and
. The
's are random locations in
, independent of the weights
's, and assumed as random draws from a nonatomic base measure
, ie,
,
, which represents the prior expected value of the random distribution
, ie,
for any
. We should note that the 2PPD process is also well defined for
and
, with r being an integer; however, in such case the process reduces to the parametric Fisher model (Ghosal and van der Vaart, 2017). Hereafter, we will use
, with
,
to indicate a sample from a 2PPD with parameters σ and θ, and base measure
. If
is a realization from an exchangeable sequence driven by a 2PPD process, there is a positive probability of ties, ie,
for any
. This clustering property often motivates the use of the 2PPD process in statistical applications, eg, to model data from heterogeneous populations.
The clustering behavior of the 2PPD process can also be investigated by considering the EPPF, which characterizes the probability that are partitioned into K distinct clusters with respective sizes
. For the 2PPD process, such probability is
for any choice of positive integers
such that
, with
and
, for any nonnegative integer q. The expression highlights how the values of the parameters σ and θ affect the clustering structure induced by the 2PPD process. It is well-known that if
denotes the number of distinct values recorded in a sample
of an exchangeable sequence drawn according to a 2PPD
process, then
as
(almost surely) for some positive random variable
when
(see theorem 3.8 in Pitman, 2002). When
, we recover the clustering behavior of the DP, obtaining
as
(almost surely). Hence, the larger σ is, the larger the number of clusters. Moreover, σ controls the reinforcement of the partition, ie, the ability of big clusters to attract even more observations, as highlighted by the predictive distribution of the 2PPD process,

where the probability that a new observation is assigned to an existing cluster, and assumes value ,
, is proportional to
. Therefore, values of σ close to 1 favor the formation of a large number of clusters, most of which are singletons (Lijoi et al., 2007).
Finally, we consider the variability of realizations from a 2PPD process around the base measure . The variance of the process is
, for any
and
. Large values of σ correspond to random probability measures which are more concentrated around the base measure
. Therefore, one should expect that the empirical distribution function of any sample
drawn from a 2PPD process with high values of σ,
, would be characterized by a large number of weights
of similar size. In the next sections, we will exploit these properties to guide the use of the 2PPD process in the two-group model for multiple testing.
3 Methods
3.1 A Two-Group 2PPD Model
The different clustering behavior that the 2PPD process exhibits as a function of σ can be exploited for distinguishing between the null and alternative distributions in the two-group model. More precisely, we first rewrite model (2) as the two-component mixture,

where represents the unknown distribution under the null and the alternative hypotheses, for
and
, respectively. Similarly as in (1), the mixture weight ρ is a random variable independent of the
's and takes values in [0, 1]. We further introduce an auxiliary binary random variable
,
, such that
if
and
if
. Thus, conditionally on the
, we can rewrite (2)-(3) as

with and
independent, and assuming a
distribution on ρ. The hyperparameters a and b influence the proportion of discoveries and can be tuned according to the problem at hand. In genomic studies, one may want to enforce sparsity of discoveries, with prior expected proportions
between 1% and 10% of the total number of hypotheses. A lower value of
typically results in lower posterior probabilities of the alternative, although the relative ranking of the posterior probabilities is overall preserved.
We exploit the properties of the 2PPD process discussed in Section 2 and propose to specify the hyperparameters of the null and nonnull random probability measures in (4) as follows. In accordance with Efron's idea that the empirical null distribution should capture only small departures from the theoretical null, we let concentrate around the theoretical null. Furthermore, we assume that there is no good model a priori for the nonnull distribution. Therefore,
is allowed to vary more freely on the space of the alternative distributions. Under the null distribution, the process should encourage the creation of a large number of clusters each composed by few observations, so that the empirical distribution well approximates the theoretical null. For the nonnull distribution, we should expect a more uneven distribution of the realizations. Based on those considerations, we propose to set
. We will discuss how such a choice might help discriminating between the null and the alternative distribution in the multicomparison problem.
We conclude this section by considering the joint partition structure induced by model (4) for a sample . Let
denote the EPPF of process
,
, that is the probability that n observations are assigned to K different clusters of sizes
. For notational simplicity, we assume that
, for any
and
such that
. Then the following result provides the EPPF of the mixture of 2PPD processes as below:
Proposition 1. The EPPF associated to the mixture of 2PPD processes in (4) is given by

where ,
,
, and
. If
or
, we assume
.
See Web Appendix B for a proof. Direct use of (5) is far from trivial. Nonetheless, the expression lends itself to an interesting interpretation: conditional on the assignment of the clusters to either or
, the respective random partitions are still independent. This remark is useful for devising a suitable computational algorithm for posterior inference.
3.2 Bayesian Hierarchical Two-Group Mixture Model
In many applications, the discreteness of the realizations of the 2PPD process may be considered inadequate. Thus, in lieu of (4), it is often common to assume for a sample a hierarchical mixture model with continuous components, ie,

that is the two-group model is characterized by a null and nonnull distributions which are each defined as a 2PPD process mixture. Here, is the random density induced by the random probability measure
, while
,
are general kernels such that for
and some σ-finite measure λ on
one has
,
. For our purposes, it is convenient to set
and let λ coincide with the Lebesgue measure on R so that the previous model defines a prior on the space of density functions on R. By conditioning on the auxiliary group indicator variables
,
, we can rewrite model (6) as a hierarchical Bayes two-group 2PPD process mixture,

where may indicate either a scalar or a vector parameter. In general,
and
could be different. Here, we assume
to be a Normal kernel and set
. For notational simplicity, in (7) we have omitted additional hyperparameters which may feature in the kernel function
but are not relevant for the decision problem and thus are assigned separate priors.
We conclude the specification of the two-group model (7) by discussing the choice of the base measures and
. On the one hand, we achieve flexible estimation of the so-called “empirical null” distribution by setting
, where the parameters of the
on τ2 are chosen so to allow relatively small deviations from the theoretical null distribution. For example, by assuming
,
, the induced marginal distribution on
has only slightly fatter tails than the standard normal.
Moreover, and
should not have significantly overlapping supports, ie, they should assign high probability to regions of the parameter space that are consistent with the null and the alternative hypotheses, respectively. In the Bayesian multiple hypotheses testing framework, this requirement has sometimes been advocated to ensure enough separation between the null and the alternative models. Thus, we first model
as a symmetric bimodal mixture of Normal-Inverse Gamma (NIG) distributions, as
, with
, and
. Marginally,

We further achieve separation in the multiple hypotheses testing problem by modeling the location parameter m1 with an NLP, ie, a prior that assigns vanishing density to small neighborhoods of the null hypothesis (Johnson and Rossell, 2010). Several types of NLP have been proposed in the literature. See, for instance, Rossell and Telesca (2017). Here, we adopt an rth moment (MOM) prior for m1, with

where ξ is the normalizing constant, and we write . Specific hyperparameter specifications will be detailed in Section 4. Here, we only note that the NLP specification in
should provide enough separation from the origin to ensure good estimation of the posterior probability of the alternative. Finally, the other parameters of the 2PPD processes are set such that
and
. In general, θ0 and θ1 are chosen relatively small, in order to enforce coarser clustering structures, especially under the alternative hypothesis. Typically, in DP two-group models,
(see, eg, Do et al., 2005). From the discussion at the end of Section 2, it follows that realizations of the 2PPD null process are expected to be more concentrated around the base measure. In the next sections, we will investigate the effect of different choices for the parameter values of the 2PPD processes for the multiple comparison problem.
3.3 Posterior Inference
Posterior inference for model (4) or (7) relies on MCMC techniques since the posterior distributions are not available in closed form. Our primary interest is in the group indicators 's, which uniquely identify the random probability measure from which the data
's were generated, and, correspondingly, the probability of group membership, ρ. For the sampling of the
's, we exploit the independence of the random partitions implied by the EPPF (5) of the proposed mixture of 2PPD processes. More specifically, if
are a random sample from (4) and
,
are nonatomic base measures with common support, then
for
. Thus, all the
's in a cluster are generated by the same 2PPD process. The details of the MCMC algorithms are provided in the Web Appendix A. In particular, we employ a split-merge move to speed up computations for large sample sizes (Dahl, 2005). The computational burden of the MCMC algorithm increases for higher values of either θ0, θ1, σ0, or σ1 due to the increased number of latent clusters generated by the 2PPD process. A discussion of the computational efficiency of a plain Pólya-Urn sampler versus the split-merge implementation is also provided in the Web Appendix D.
Posterior inference on the weight ρ in (4) is conducted by means of post-MCMC analysis, by approximating the posterior expected value using auxiliary indicators, say
, which denote if cluster
at iteration
is a realization from
or
. More precisely, if we denote by
the burn-in period of the chain, we can compute the following Monte Carlo approximation of the posterior expected value
.
Similarly, the posterior probability that an observation belongs to the nonnull group can be obtained from the MCMC output as , where the
's indicate the MCMC draws of the component indicators
's. Then, a score
is considered significant if the corresponding
is larger than a threshold, say κ, chosen to control the Bayesian FDR at a preassigned
level,
(Newton et al., 2004; Muller et al., 2006).
4 Applications
4.1 Simulation Study
We investigate the performance of the Bayesian hierarchical 2PPD mixture modeling framework described in (6)-(7) for large-scale multiple hypothesis testing by means of a simulation study under scenarios. More specifically, we simulate z-scores from mixture (1), where
. We set
for
. For the fifth scenario, we set
to model the effect of hidden correlation among observations and of the association with unobserved covariates, that may lead to departures from standard Gaussianity. For f1, we choose:
- •
Scenario 1:
,
- •
Scenario 2:
with
,
- •
Scenario 3:
with
,
- •
Scenario 4:
with
,
, and
,
- •
Scenario 5:
,
ie, f1 is assumed asymmetric unimodal (Scenario 1), symmetric bimodal (Scenarios 2), asymmetric bimodal (Scenario 3), and symmetric bimodal with fat tails (Scenarios 4 and 5), thus mimicking typical high-dimensional testing situations. An illustrative plot of data generated under the five scenarios is provided in the Web Appendix C. In all scenarios, we set , since typically only a small proportion of the comparisons is expected to be significant in large-scale inference hypothesis testing. Each simulation includes
simulated scores and is replicated 30 times to allow quantification of posterior uncertainty and of the frequentist operating characteristics of the testing procedures.
For model fitting, we employ the mixture model (6)-(7), where we assume , with
. The base measure of the 2PPD process
is chosen as described in Section 3.2, with
,
. For
, we set
,
, and
. An
prior is assumed for m1, with
and
. For the parameters characterizing the clustering behavior of the 2PPD process priors, we investigate the effect of different choices of
on the inference, with
. More specifically, here we report the inference for the following values for the pair
: (0.75, 0), which corresponds to assuming a DP on the nonnull component; in addition to (0.75, 0.1), (0.75, 0.25), (0.9, 0.25) to investigate the effect of decreased prior uncertainty,
, on the components of the two-group 2PPD mixture. We further set the concentration parameters
(Do et al., 2005). For the Beta prior on ρ, we set
and
. For each data set, the MCMC algorithm was run for 2500 iterations after a 2500 iterations burn-in period. The evaluation of posterior convergence was conducted using standard Bayesian convergence diagnostics on the chains of the traceable parameters, m1 and ρ, by monitoring the number of group components and by inspecting the estimated densities of the null and nonnull processes.
We compare the performance of our modeling approach with five alternative methods for large-scale hypothesis testing: (a) a two-group DP mixture model, which can be seen as a special case of the modeling framework proposed here, obtained by setting , with an NLP on the base measure for the alternative distribution; (b) the local false discovery rate of Efron (2004); (c) the Benjamini and Hochberg procedure (BH, Benjamini and Hochberg, 1995); (d) the empirical Bayes mixture model of Muralidharan (2012), which allows simultaneous estimation of the effect size and of the local false discovery rate, and (e) the empirical Bayes semiparametric approach of Martin and Tokdar (2012).
For each simulation replicate, results were compared using several performance measures: the Matthews Correlation Coefficient (MCC), which can be computed from a confusion matrix as , where
,
,
, and
are the number of true positive, true negative, false positive, and false negative results, respectively; the F1 score,
; as well as precision, specificity, accuracy, and the area under the curve (AUC) of the corresponding receiver operating characteristic curve. For each simulation, we identify significant scores by controlling the Bayesian false discovery rate (Newton et al., 2004), the local false discovery rate (Efron, 2004), and the frequentist false discovery rate (Benjamini and Hochberg, 1995) at the 10% level.
In Table 1, we report the performance metrics achieved in the different simulation scenarios as a function of the combinations of hyperparameters of the 2PPD process. Overall, the performance of the proposed 2PPD process is similar, as long as . Higher values of σ0 lead to draw samples from f0 which are closer to the theoretical null, but the implied tighter control of the variance of the null process may lead to a slightly decreased performance in some scenarios. If
, the performance can deteriorate considerably.
Simulation study: sensitivity results across different settings for σ0 and σ1 for the five simulation scenarios considered in Section 4.1 ()
. | σ0 = 0.75 . | σ0 = 0.9 . | ||
---|---|---|---|---|
. | σ1 = 0 . | σ1 = 0.1 . | σ1 = 0.25 . | σ1 = 0.25 . |
Scenario 1 | ||||
MCC | 0.5777 (0.0903) | 0.5833 (0.0940) | 0.6020 (0.0835) | 0.5893 (0.0876) |
F1 | 0.5197 (0.1125) | 0.5269 (0.1169) | 0.5520 (0.1051) | 0.5342 (0.1095) |
AUC | 0.9095 (0.0245) | 0.9143 (0.0224) | 0.9201 (0.0253) | 0.9200 (0.0207) |
PRE | 0.9775 (0.0357) | 0.9773 (0.0333) | 0.9713 (0.0346) | 0.9776 (0.0328) |
SPEC | 0.9995 (0.0007) | 0.9995 (0.0007) | 0.9994 (0.0008) | 0.9995 (0.0007) |
ACC | 0.9676 (0.0049) | 0.9680 (0.0052) | 0.9690 (0.0048) | 0.9683 (0.0049) |
Scenario 2 | ||||
MCC | 0.6249 (0.0673) | 0.6242 (0.0695) | 0.6212 (0.0681) | 0.6135 (0.0644) |
F1 | 0.5809 (0.0831) | 0.5796 (0.0855) | 0.5771 (0.0835) | 0.5665 (0.0808) |
AUC | 0.9526 (0.0218) | 0.9563 (0.0185) | 0.9581 (0.0170) | 0.9523 (0.0183) |
PRE | 0.9710 (0.0338) | 0.9725 (0.0373) | 0.9680 (0.0396) | 0.9712 (0.0384) |
SPEC | 0.9993 (0.0008) | 0.9994 (0.0009) | 0.9993 (0.0009) | 0.9993 (0.0009) |
ACC | 0.9703 (0.0043) | 0.9703 (0.0044) | 0.9701 (0.0043) | 0.9696 (0.0040) |
Scenario 3 | ||||
MCC | 0.5081 (0.0842) | 0.5080 (0.0847) | 0.5340 (0.0797) | 0.5224 (0.0808) |
F1 | 0.4320 (0.1053) | 0.4320 (0.1056) | 0.4659 (0.1001) | 0.4489 (0.1018) |
AUC | 0.9335 (0.0235) | 0.9401 (0.0238) | 0.9477 (0.0180) | 0.9452 (0.0209) |
PRE | 0.9721 (0.0438) | 0.9714 (0.0425) | 0.9682 (0.0402) | 0.9772 (0.0360) |
SPEC | 0.9995 (0.0007) | 0.9995 (0.0007) | 0.9994 (0.0007) | 0.9996 (0.0006) |
ACC | 0.9624 (0.0044) | 0.9624 (0.0045) | 0.9638 (0.0045) | 0.9631 (0.0043) |
Scenario 4 | ||||
MCC | 0.7513 (0.0462) | 0.7554 (0.0462) | 0.7625 (0.0461) | 0.7572 (0.0478) |
F1 | 0.7354 (0.0538) | 0.7413 (0.0528) | 0.7535 (0.0518) | 0.7449 (0.0542) |
AUC | 0.9552 (0.0162) | 0.9627 (0.0119) | 0.9685 (0.0107) | 0.9661 (0.0087) |
PRE | 0.9787 (0.0264) | 0.9736 (0.0284) | 0.9532 (0.0312) | 0.9657 (0.0289) |
SPEC | 0.9993 (0.0009) | 0.9991 (0.0010) | 0.9984 (0.0013) | 0.9988 (0.0010) |
ACC | 0.9789 (0.0034) | 0.9792 (0.0035) | 0.9797 (0.0035) | 0.9794 (0.0036) |
Scenario 5 | ||||
MCC | 0.8920 (0.0229) | 0.8832 (0.0249) | 0.8529 (0.0232) | 0.8694 (0.0241) |
F1 | 0.8951 (0.0219) | 0.8860 (0.0241) | 0.8534 (0.0235) | 0.8710 (0.0238) |
AUC | 0.9985 (0.0010) | 0.9985 (0.0010) | 0.9985 (0.0011) | 0.9985 (0.0011) |
PRE | 0.8346 (0.0300) | 0.8170 (0.0334) | 0.7560 (0.0330) | 0.7856 (0.0326) |
SPEC | 0.9898 (0.0021) | 0.9885 (0.0025) | 0.9832 (0.0029) | 0.9859 (0.0026) |
ACC | 0.9886 (0.0020) | 0.9875 (0.0028) | 0.9831 (0.0030) | 0.9855 (0.0029) |
. | σ0 = 0.75 . | σ0 = 0.9 . | ||
---|---|---|---|---|
. | σ1 = 0 . | σ1 = 0.1 . | σ1 = 0.25 . | σ1 = 0.25 . |
Scenario 1 | ||||
MCC | 0.5777 (0.0903) | 0.5833 (0.0940) | 0.6020 (0.0835) | 0.5893 (0.0876) |
F1 | 0.5197 (0.1125) | 0.5269 (0.1169) | 0.5520 (0.1051) | 0.5342 (0.1095) |
AUC | 0.9095 (0.0245) | 0.9143 (0.0224) | 0.9201 (0.0253) | 0.9200 (0.0207) |
PRE | 0.9775 (0.0357) | 0.9773 (0.0333) | 0.9713 (0.0346) | 0.9776 (0.0328) |
SPEC | 0.9995 (0.0007) | 0.9995 (0.0007) | 0.9994 (0.0008) | 0.9995 (0.0007) |
ACC | 0.9676 (0.0049) | 0.9680 (0.0052) | 0.9690 (0.0048) | 0.9683 (0.0049) |
Scenario 2 | ||||
MCC | 0.6249 (0.0673) | 0.6242 (0.0695) | 0.6212 (0.0681) | 0.6135 (0.0644) |
F1 | 0.5809 (0.0831) | 0.5796 (0.0855) | 0.5771 (0.0835) | 0.5665 (0.0808) |
AUC | 0.9526 (0.0218) | 0.9563 (0.0185) | 0.9581 (0.0170) | 0.9523 (0.0183) |
PRE | 0.9710 (0.0338) | 0.9725 (0.0373) | 0.9680 (0.0396) | 0.9712 (0.0384) |
SPEC | 0.9993 (0.0008) | 0.9994 (0.0009) | 0.9993 (0.0009) | 0.9993 (0.0009) |
ACC | 0.9703 (0.0043) | 0.9703 (0.0044) | 0.9701 (0.0043) | 0.9696 (0.0040) |
Scenario 3 | ||||
MCC | 0.5081 (0.0842) | 0.5080 (0.0847) | 0.5340 (0.0797) | 0.5224 (0.0808) |
F1 | 0.4320 (0.1053) | 0.4320 (0.1056) | 0.4659 (0.1001) | 0.4489 (0.1018) |
AUC | 0.9335 (0.0235) | 0.9401 (0.0238) | 0.9477 (0.0180) | 0.9452 (0.0209) |
PRE | 0.9721 (0.0438) | 0.9714 (0.0425) | 0.9682 (0.0402) | 0.9772 (0.0360) |
SPEC | 0.9995 (0.0007) | 0.9995 (0.0007) | 0.9994 (0.0007) | 0.9996 (0.0006) |
ACC | 0.9624 (0.0044) | 0.9624 (0.0045) | 0.9638 (0.0045) | 0.9631 (0.0043) |
Scenario 4 | ||||
MCC | 0.7513 (0.0462) | 0.7554 (0.0462) | 0.7625 (0.0461) | 0.7572 (0.0478) |
F1 | 0.7354 (0.0538) | 0.7413 (0.0528) | 0.7535 (0.0518) | 0.7449 (0.0542) |
AUC | 0.9552 (0.0162) | 0.9627 (0.0119) | 0.9685 (0.0107) | 0.9661 (0.0087) |
PRE | 0.9787 (0.0264) | 0.9736 (0.0284) | 0.9532 (0.0312) | 0.9657 (0.0289) |
SPEC | 0.9993 (0.0009) | 0.9991 (0.0010) | 0.9984 (0.0013) | 0.9988 (0.0010) |
ACC | 0.9789 (0.0034) | 0.9792 (0.0035) | 0.9797 (0.0035) | 0.9794 (0.0036) |
Scenario 5 | ||||
MCC | 0.8920 (0.0229) | 0.8832 (0.0249) | 0.8529 (0.0232) | 0.8694 (0.0241) |
F1 | 0.8951 (0.0219) | 0.8860 (0.0241) | 0.8534 (0.0235) | 0.8710 (0.0238) |
AUC | 0.9985 (0.0010) | 0.9985 (0.0010) | 0.9985 (0.0011) | 0.9985 (0.0011) |
PRE | 0.8346 (0.0300) | 0.8170 (0.0334) | 0.7560 (0.0330) | 0.7856 (0.0326) |
SPEC | 0.9898 (0.0021) | 0.9885 (0.0025) | 0.9832 (0.0029) | 0.9859 (0.0026) |
ACC | 0.9886 (0.0020) | 0.9875 (0.0028) | 0.9831 (0.0030) | 0.9855 (0.0029) |
aNote. The values in the table represent the average and F1 scores, the average precision (PRE), specificity (SPEC), accuracy (ACC), and the area under the curve (AUC) of the corresponding receiver operating characteristic curve, over 30 replicates with corresponding standard deviations between brackets.
Simulation study: sensitivity results across different settings for σ0 and σ1 for the five simulation scenarios considered in Section 4.1 ()
. | σ0 = 0.75 . | σ0 = 0.9 . | ||
---|---|---|---|---|
. | σ1 = 0 . | σ1 = 0.1 . | σ1 = 0.25 . | σ1 = 0.25 . |
Scenario 1 | ||||
MCC | 0.5777 (0.0903) | 0.5833 (0.0940) | 0.6020 (0.0835) | 0.5893 (0.0876) |
F1 | 0.5197 (0.1125) | 0.5269 (0.1169) | 0.5520 (0.1051) | 0.5342 (0.1095) |
AUC | 0.9095 (0.0245) | 0.9143 (0.0224) | 0.9201 (0.0253) | 0.9200 (0.0207) |
PRE | 0.9775 (0.0357) | 0.9773 (0.0333) | 0.9713 (0.0346) | 0.9776 (0.0328) |
SPEC | 0.9995 (0.0007) | 0.9995 (0.0007) | 0.9994 (0.0008) | 0.9995 (0.0007) |
ACC | 0.9676 (0.0049) | 0.9680 (0.0052) | 0.9690 (0.0048) | 0.9683 (0.0049) |
Scenario 2 | ||||
MCC | 0.6249 (0.0673) | 0.6242 (0.0695) | 0.6212 (0.0681) | 0.6135 (0.0644) |
F1 | 0.5809 (0.0831) | 0.5796 (0.0855) | 0.5771 (0.0835) | 0.5665 (0.0808) |
AUC | 0.9526 (0.0218) | 0.9563 (0.0185) | 0.9581 (0.0170) | 0.9523 (0.0183) |
PRE | 0.9710 (0.0338) | 0.9725 (0.0373) | 0.9680 (0.0396) | 0.9712 (0.0384) |
SPEC | 0.9993 (0.0008) | 0.9994 (0.0009) | 0.9993 (0.0009) | 0.9993 (0.0009) |
ACC | 0.9703 (0.0043) | 0.9703 (0.0044) | 0.9701 (0.0043) | 0.9696 (0.0040) |
Scenario 3 | ||||
MCC | 0.5081 (0.0842) | 0.5080 (0.0847) | 0.5340 (0.0797) | 0.5224 (0.0808) |
F1 | 0.4320 (0.1053) | 0.4320 (0.1056) | 0.4659 (0.1001) | 0.4489 (0.1018) |
AUC | 0.9335 (0.0235) | 0.9401 (0.0238) | 0.9477 (0.0180) | 0.9452 (0.0209) |
PRE | 0.9721 (0.0438) | 0.9714 (0.0425) | 0.9682 (0.0402) | 0.9772 (0.0360) |
SPEC | 0.9995 (0.0007) | 0.9995 (0.0007) | 0.9994 (0.0007) | 0.9996 (0.0006) |
ACC | 0.9624 (0.0044) | 0.9624 (0.0045) | 0.9638 (0.0045) | 0.9631 (0.0043) |
Scenario 4 | ||||
MCC | 0.7513 (0.0462) | 0.7554 (0.0462) | 0.7625 (0.0461) | 0.7572 (0.0478) |
F1 | 0.7354 (0.0538) | 0.7413 (0.0528) | 0.7535 (0.0518) | 0.7449 (0.0542) |
AUC | 0.9552 (0.0162) | 0.9627 (0.0119) | 0.9685 (0.0107) | 0.9661 (0.0087) |
PRE | 0.9787 (0.0264) | 0.9736 (0.0284) | 0.9532 (0.0312) | 0.9657 (0.0289) |
SPEC | 0.9993 (0.0009) | 0.9991 (0.0010) | 0.9984 (0.0013) | 0.9988 (0.0010) |
ACC | 0.9789 (0.0034) | 0.9792 (0.0035) | 0.9797 (0.0035) | 0.9794 (0.0036) |
Scenario 5 | ||||
MCC | 0.8920 (0.0229) | 0.8832 (0.0249) | 0.8529 (0.0232) | 0.8694 (0.0241) |
F1 | 0.8951 (0.0219) | 0.8860 (0.0241) | 0.8534 (0.0235) | 0.8710 (0.0238) |
AUC | 0.9985 (0.0010) | 0.9985 (0.0010) | 0.9985 (0.0011) | 0.9985 (0.0011) |
PRE | 0.8346 (0.0300) | 0.8170 (0.0334) | 0.7560 (0.0330) | 0.7856 (0.0326) |
SPEC | 0.9898 (0.0021) | 0.9885 (0.0025) | 0.9832 (0.0029) | 0.9859 (0.0026) |
ACC | 0.9886 (0.0020) | 0.9875 (0.0028) | 0.9831 (0.0030) | 0.9855 (0.0029) |
. | σ0 = 0.75 . | σ0 = 0.9 . | ||
---|---|---|---|---|
. | σ1 = 0 . | σ1 = 0.1 . | σ1 = 0.25 . | σ1 = 0.25 . |
Scenario 1 | ||||
MCC | 0.5777 (0.0903) | 0.5833 (0.0940) | 0.6020 (0.0835) | 0.5893 (0.0876) |
F1 | 0.5197 (0.1125) | 0.5269 (0.1169) | 0.5520 (0.1051) | 0.5342 (0.1095) |
AUC | 0.9095 (0.0245) | 0.9143 (0.0224) | 0.9201 (0.0253) | 0.9200 (0.0207) |
PRE | 0.9775 (0.0357) | 0.9773 (0.0333) | 0.9713 (0.0346) | 0.9776 (0.0328) |
SPEC | 0.9995 (0.0007) | 0.9995 (0.0007) | 0.9994 (0.0008) | 0.9995 (0.0007) |
ACC | 0.9676 (0.0049) | 0.9680 (0.0052) | 0.9690 (0.0048) | 0.9683 (0.0049) |
Scenario 2 | ||||
MCC | 0.6249 (0.0673) | 0.6242 (0.0695) | 0.6212 (0.0681) | 0.6135 (0.0644) |
F1 | 0.5809 (0.0831) | 0.5796 (0.0855) | 0.5771 (0.0835) | 0.5665 (0.0808) |
AUC | 0.9526 (0.0218) | 0.9563 (0.0185) | 0.9581 (0.0170) | 0.9523 (0.0183) |
PRE | 0.9710 (0.0338) | 0.9725 (0.0373) | 0.9680 (0.0396) | 0.9712 (0.0384) |
SPEC | 0.9993 (0.0008) | 0.9994 (0.0009) | 0.9993 (0.0009) | 0.9993 (0.0009) |
ACC | 0.9703 (0.0043) | 0.9703 (0.0044) | 0.9701 (0.0043) | 0.9696 (0.0040) |
Scenario 3 | ||||
MCC | 0.5081 (0.0842) | 0.5080 (0.0847) | 0.5340 (0.0797) | 0.5224 (0.0808) |
F1 | 0.4320 (0.1053) | 0.4320 (0.1056) | 0.4659 (0.1001) | 0.4489 (0.1018) |
AUC | 0.9335 (0.0235) | 0.9401 (0.0238) | 0.9477 (0.0180) | 0.9452 (0.0209) |
PRE | 0.9721 (0.0438) | 0.9714 (0.0425) | 0.9682 (0.0402) | 0.9772 (0.0360) |
SPEC | 0.9995 (0.0007) | 0.9995 (0.0007) | 0.9994 (0.0007) | 0.9996 (0.0006) |
ACC | 0.9624 (0.0044) | 0.9624 (0.0045) | 0.9638 (0.0045) | 0.9631 (0.0043) |
Scenario 4 | ||||
MCC | 0.7513 (0.0462) | 0.7554 (0.0462) | 0.7625 (0.0461) | 0.7572 (0.0478) |
F1 | 0.7354 (0.0538) | 0.7413 (0.0528) | 0.7535 (0.0518) | 0.7449 (0.0542) |
AUC | 0.9552 (0.0162) | 0.9627 (0.0119) | 0.9685 (0.0107) | 0.9661 (0.0087) |
PRE | 0.9787 (0.0264) | 0.9736 (0.0284) | 0.9532 (0.0312) | 0.9657 (0.0289) |
SPEC | 0.9993 (0.0009) | 0.9991 (0.0010) | 0.9984 (0.0013) | 0.9988 (0.0010) |
ACC | 0.9789 (0.0034) | 0.9792 (0.0035) | 0.9797 (0.0035) | 0.9794 (0.0036) |
Scenario 5 | ||||
MCC | 0.8920 (0.0229) | 0.8832 (0.0249) | 0.8529 (0.0232) | 0.8694 (0.0241) |
F1 | 0.8951 (0.0219) | 0.8860 (0.0241) | 0.8534 (0.0235) | 0.8710 (0.0238) |
AUC | 0.9985 (0.0010) | 0.9985 (0.0010) | 0.9985 (0.0011) | 0.9985 (0.0011) |
PRE | 0.8346 (0.0300) | 0.8170 (0.0334) | 0.7560 (0.0330) | 0.7856 (0.0326) |
SPEC | 0.9898 (0.0021) | 0.9885 (0.0025) | 0.9832 (0.0029) | 0.9859 (0.0026) |
ACC | 0.9886 (0.0020) | 0.9875 (0.0028) | 0.9831 (0.0030) | 0.9855 (0.0029) |
aNote. The values in the table represent the average and F1 scores, the average precision (PRE), specificity (SPEC), accuracy (ACC), and the area under the curve (AUC) of the corresponding receiver operating characteristic curve, over 30 replicates with corresponding standard deviations between brackets.
Table 2 reports the results from the comparison with alternative multiple-testing methods. Compared to our method, the method of Martin and Tokdar (2012) performs quite well in all scenarios except the fat-tailed one, Scenario 4, where our 2PPD model outperforms four of five competitors. The BH procedure also performs quite well, although with slightly lower precision, in the first four scenarios. However, small departures from the standard Gaussian null assumption (Scenario 5) considerably affect the performance of the BH procedure. The performance of two-group DP mixtures is impacted by the flexible modeling of both the null and alternative distribution, which leads to a relatively high number of false assignments. This result is remarkable as various types of mixture of DP processes have been often proposed for hypothesis testing in the two-group modeling framework. The results also appear fairly robust to different sample sizes (see Web Appendix E).
Simulation study: performance metrics for five other multiple comparison methods in the five simulation scenarios considered in Section 4.1 ()
. | DPmix . | local fdr . | Benjamini and Hochberg (1995) . | Muralidharan (2012) . | Martin and Tokdar (2012) . |
---|---|---|---|---|---|
Scenario 1 | |||||
MCC | 0.2329 (0.0209) | 0.5708 (0.0721) | 0.6629 (0.0648) | 0.5379 (0.0804) | 0.5835 (0.0757) |
F1 | 0.1980 (0.0145) | 0.5067 (0.0932) | 0.6427 (0.0736) | 0.4643 (0.1023) | 0.5251 (0.0962) |
AUC | 0.9053 (0.0301) | 0.8869 (0.0365) | 0.9237 (0.0205) | 0.9242 (0.0230) | 0.9230 (0.0216) |
PRE | 0.1113 (0.0092) | 0.9897 (0.0216) | 0.9141 (0.0603) | 0.9915 (0.0227) | 0.9825 (0.0284) |
SPEC | 0.6150 (0.0432) | 0.9998 (0.0004) | 0.9974 (0.0019) | 0.9999 (0.0004) | 0.9996 (0.0006) |
ACC | 0.6297 (0.0396) | 0.9671 (0.0042) | 0.9726 (0.0044) | 0.9653 (0.0043) | 0.9679 (0.0044) |
Scenario 2 | |||||
MCC | 0.2506 (0.0180) | 0.6088 (0.0773) | 0.6674 (0.0638) | 0.5805 (0.0667) | 0.6435 (0.0686) |
F1 | 0.2037 (0.0138) | 0.5578 (0.0991) | 0.6486 (0.0721) | 0.5194 (0.0857) | 0.6048 (0.0846) |
AUC | 0.9524 (0.0218) | 0.9231 (0.0428) | 0.9544 (0.0169) | 0.9668 (0.0174) | 0.9762 (0.0087) |
PRE | 0.1140 (0.0087) | 0.9796 (0.0304) | 0.9129 (0.0556) | 0.9895 (0.0216) | 0.9698 (0.0332) |
SPEC | 0.6033 (0.0364) | 0.9995 (0.0008) | 0.9974 (0.0018) | 0.9998 (0.0004) | 0.9993 (0.0008) |
ACC | 0.6213 (0.0339) | 0.9694 (0.0048) | 0.9729 (0.0042) | 0.9676 (0.0004) | 0.9715 (0.0044) |
Scenario 3 | |||||
MCC | 0.2337 (0.0195) | 0.5397 (0.0854) | 0.6544 (0.0578) | 0.4840 (0.0883) | 0.5591 (0.0740) |
F1 | 0.1948 (0.0129) | 0.4708 (0.1087) | 0.6342 (0.0670) | 0.3973 (0.1080) | 0.4970 (0.0948) |
AUC | 0.9400 (0.0206) | 0.9069 (0.0359) | 0.9500 (0.0191) | 0.9481 (0.0197) | 0.9481 (0.0182) |
PRE | 0.1085 (0.0079) | 0.9759 (0.0424) | 0.9089 (0.0561) | 0.9901 (0.0328) | 0.9707 (0.0437) |
SPEC | 0.5679 (0.0349) | 0.9995 (0.0008) | 0.9972 (0.0020) | 0.9999 (0.0005) | 0.9994 (0.0010) |
ACC | 0.5880 (0.033) | 0.9641 (0.0050) | 0.9710 (0.0040) | 0.9611 (0.0043) | 0.9652 (0.0044) |
Scenario 4 | |||||
MCC | 0.2671 (0.0194) | 0.7080 (0.0474) | 0.7849 (0.0443) | 0.6840 (0.0486) | 0.6831 (0.0470) |
F1 | 0.2161 (0.0156) | 0.6801 (0.0586) | 0.7853 (0.0448) | 0.6492 (0.0602) | 0.6485 (0.0595) |
AUC | 0.9612 (0.0156) | 0.9406 (0.0246) | 0.9709 (0.0085) | 0.9627 (0.0159) | 0.9658 (0.0139) |
PRE | 0.1217 (0.0099) | 0.9919 (0.0172) | 0.9136 (0.0502) | 0.9972 (0.0106) | 0.9953 (0.0123) |
SPEC | 0.6288 (0.0345) | 0.9998 (0.0005) | 0.9965 (0.0023) | 0.9999 (0.0003) | 0.9999 (0.0004) |
ACC | 0.6459 (0.0325) | 0.9758 (0.0033) | 0.9812 (0.0036) | 0.9741 (0.0034) | 0.9741 (0.0032) |
Scenario 5 | |||||
MCC | 0.2671 (0.0194) | 0.8632 (0.0492) | 0.7861 (0.0360) | 0.8506 (0.0486) | 0.8879 (0.0332) |
F1 | 0.5303 (0.0420) | 0.8611 (0.0529) | 0.7792 (0.0395) | 0.8475 (0.0414) | 0.8888 (0.0338) |
AUC | 0.9980 (0.0012) | 0.9971 (0.0041) | 0.9986 (0.0010) | 0.9985 (0.0012) | 0.9985 (0.0011) |
PRE | 0.3622 (0.0163) | 0.9811 (0.0286) | 0.6433 (0.0531) | 0.9866 (0.0229) | 0.9745 (0.0323) |
SPEC | 0.9058 (0.0155) | 0.9992 (0.0013) | 0.9705 (0.0067) | 0.9994 (0.0009) | 0.9988 (0.0015) |
ACC | 0.9104 (0.0385) | 0.9878 (0.0041) | 0.9716 (0.0064) | 0.9867 (0.0032) | 0.9899 (0.0028) |
. | DPmix . | local fdr . | Benjamini and Hochberg (1995) . | Muralidharan (2012) . | Martin and Tokdar (2012) . |
---|---|---|---|---|---|
Scenario 1 | |||||
MCC | 0.2329 (0.0209) | 0.5708 (0.0721) | 0.6629 (0.0648) | 0.5379 (0.0804) | 0.5835 (0.0757) |
F1 | 0.1980 (0.0145) | 0.5067 (0.0932) | 0.6427 (0.0736) | 0.4643 (0.1023) | 0.5251 (0.0962) |
AUC | 0.9053 (0.0301) | 0.8869 (0.0365) | 0.9237 (0.0205) | 0.9242 (0.0230) | 0.9230 (0.0216) |
PRE | 0.1113 (0.0092) | 0.9897 (0.0216) | 0.9141 (0.0603) | 0.9915 (0.0227) | 0.9825 (0.0284) |
SPEC | 0.6150 (0.0432) | 0.9998 (0.0004) | 0.9974 (0.0019) | 0.9999 (0.0004) | 0.9996 (0.0006) |
ACC | 0.6297 (0.0396) | 0.9671 (0.0042) | 0.9726 (0.0044) | 0.9653 (0.0043) | 0.9679 (0.0044) |
Scenario 2 | |||||
MCC | 0.2506 (0.0180) | 0.6088 (0.0773) | 0.6674 (0.0638) | 0.5805 (0.0667) | 0.6435 (0.0686) |
F1 | 0.2037 (0.0138) | 0.5578 (0.0991) | 0.6486 (0.0721) | 0.5194 (0.0857) | 0.6048 (0.0846) |
AUC | 0.9524 (0.0218) | 0.9231 (0.0428) | 0.9544 (0.0169) | 0.9668 (0.0174) | 0.9762 (0.0087) |
PRE | 0.1140 (0.0087) | 0.9796 (0.0304) | 0.9129 (0.0556) | 0.9895 (0.0216) | 0.9698 (0.0332) |
SPEC | 0.6033 (0.0364) | 0.9995 (0.0008) | 0.9974 (0.0018) | 0.9998 (0.0004) | 0.9993 (0.0008) |
ACC | 0.6213 (0.0339) | 0.9694 (0.0048) | 0.9729 (0.0042) | 0.9676 (0.0004) | 0.9715 (0.0044) |
Scenario 3 | |||||
MCC | 0.2337 (0.0195) | 0.5397 (0.0854) | 0.6544 (0.0578) | 0.4840 (0.0883) | 0.5591 (0.0740) |
F1 | 0.1948 (0.0129) | 0.4708 (0.1087) | 0.6342 (0.0670) | 0.3973 (0.1080) | 0.4970 (0.0948) |
AUC | 0.9400 (0.0206) | 0.9069 (0.0359) | 0.9500 (0.0191) | 0.9481 (0.0197) | 0.9481 (0.0182) |
PRE | 0.1085 (0.0079) | 0.9759 (0.0424) | 0.9089 (0.0561) | 0.9901 (0.0328) | 0.9707 (0.0437) |
SPEC | 0.5679 (0.0349) | 0.9995 (0.0008) | 0.9972 (0.0020) | 0.9999 (0.0005) | 0.9994 (0.0010) |
ACC | 0.5880 (0.033) | 0.9641 (0.0050) | 0.9710 (0.0040) | 0.9611 (0.0043) | 0.9652 (0.0044) |
Scenario 4 | |||||
MCC | 0.2671 (0.0194) | 0.7080 (0.0474) | 0.7849 (0.0443) | 0.6840 (0.0486) | 0.6831 (0.0470) |
F1 | 0.2161 (0.0156) | 0.6801 (0.0586) | 0.7853 (0.0448) | 0.6492 (0.0602) | 0.6485 (0.0595) |
AUC | 0.9612 (0.0156) | 0.9406 (0.0246) | 0.9709 (0.0085) | 0.9627 (0.0159) | 0.9658 (0.0139) |
PRE | 0.1217 (0.0099) | 0.9919 (0.0172) | 0.9136 (0.0502) | 0.9972 (0.0106) | 0.9953 (0.0123) |
SPEC | 0.6288 (0.0345) | 0.9998 (0.0005) | 0.9965 (0.0023) | 0.9999 (0.0003) | 0.9999 (0.0004) |
ACC | 0.6459 (0.0325) | 0.9758 (0.0033) | 0.9812 (0.0036) | 0.9741 (0.0034) | 0.9741 (0.0032) |
Scenario 5 | |||||
MCC | 0.2671 (0.0194) | 0.8632 (0.0492) | 0.7861 (0.0360) | 0.8506 (0.0486) | 0.8879 (0.0332) |
F1 | 0.5303 (0.0420) | 0.8611 (0.0529) | 0.7792 (0.0395) | 0.8475 (0.0414) | 0.8888 (0.0338) |
AUC | 0.9980 (0.0012) | 0.9971 (0.0041) | 0.9986 (0.0010) | 0.9985 (0.0012) | 0.9985 (0.0011) |
PRE | 0.3622 (0.0163) | 0.9811 (0.0286) | 0.6433 (0.0531) | 0.9866 (0.0229) | 0.9745 (0.0323) |
SPEC | 0.9058 (0.0155) | 0.9992 (0.0013) | 0.9705 (0.0067) | 0.9994 (0.0009) | 0.9988 (0.0015) |
ACC | 0.9104 (0.0385) | 0.9878 (0.0041) | 0.9716 (0.0064) | 0.9867 (0.0032) | 0.9899 (0.0028) |
aNote. The values in the table represent the average and F1 scores, the average precision (PRE), specificity (SPEC), accuracy (ACC), and the area under the curve (AUC) of the corresponding receiver operating characteristic curve, over 30 replicates with corresponding standard deviations between brackets.
Simulation study: performance metrics for five other multiple comparison methods in the five simulation scenarios considered in Section 4.1 ()
. | DPmix . | local fdr . | Benjamini and Hochberg (1995) . | Muralidharan (2012) . | Martin and Tokdar (2012) . |
---|---|---|---|---|---|
Scenario 1 | |||||
MCC | 0.2329 (0.0209) | 0.5708 (0.0721) | 0.6629 (0.0648) | 0.5379 (0.0804) | 0.5835 (0.0757) |
F1 | 0.1980 (0.0145) | 0.5067 (0.0932) | 0.6427 (0.0736) | 0.4643 (0.1023) | 0.5251 (0.0962) |
AUC | 0.9053 (0.0301) | 0.8869 (0.0365) | 0.9237 (0.0205) | 0.9242 (0.0230) | 0.9230 (0.0216) |
PRE | 0.1113 (0.0092) | 0.9897 (0.0216) | 0.9141 (0.0603) | 0.9915 (0.0227) | 0.9825 (0.0284) |
SPEC | 0.6150 (0.0432) | 0.9998 (0.0004) | 0.9974 (0.0019) | 0.9999 (0.0004) | 0.9996 (0.0006) |
ACC | 0.6297 (0.0396) | 0.9671 (0.0042) | 0.9726 (0.0044) | 0.9653 (0.0043) | 0.9679 (0.0044) |
Scenario 2 | |||||
MCC | 0.2506 (0.0180) | 0.6088 (0.0773) | 0.6674 (0.0638) | 0.5805 (0.0667) | 0.6435 (0.0686) |
F1 | 0.2037 (0.0138) | 0.5578 (0.0991) | 0.6486 (0.0721) | 0.5194 (0.0857) | 0.6048 (0.0846) |
AUC | 0.9524 (0.0218) | 0.9231 (0.0428) | 0.9544 (0.0169) | 0.9668 (0.0174) | 0.9762 (0.0087) |
PRE | 0.1140 (0.0087) | 0.9796 (0.0304) | 0.9129 (0.0556) | 0.9895 (0.0216) | 0.9698 (0.0332) |
SPEC | 0.6033 (0.0364) | 0.9995 (0.0008) | 0.9974 (0.0018) | 0.9998 (0.0004) | 0.9993 (0.0008) |
ACC | 0.6213 (0.0339) | 0.9694 (0.0048) | 0.9729 (0.0042) | 0.9676 (0.0004) | 0.9715 (0.0044) |
Scenario 3 | |||||
MCC | 0.2337 (0.0195) | 0.5397 (0.0854) | 0.6544 (0.0578) | 0.4840 (0.0883) | 0.5591 (0.0740) |
F1 | 0.1948 (0.0129) | 0.4708 (0.1087) | 0.6342 (0.0670) | 0.3973 (0.1080) | 0.4970 (0.0948) |
AUC | 0.9400 (0.0206) | 0.9069 (0.0359) | 0.9500 (0.0191) | 0.9481 (0.0197) | 0.9481 (0.0182) |
PRE | 0.1085 (0.0079) | 0.9759 (0.0424) | 0.9089 (0.0561) | 0.9901 (0.0328) | 0.9707 (0.0437) |
SPEC | 0.5679 (0.0349) | 0.9995 (0.0008) | 0.9972 (0.0020) | 0.9999 (0.0005) | 0.9994 (0.0010) |
ACC | 0.5880 (0.033) | 0.9641 (0.0050) | 0.9710 (0.0040) | 0.9611 (0.0043) | 0.9652 (0.0044) |
Scenario 4 | |||||
MCC | 0.2671 (0.0194) | 0.7080 (0.0474) | 0.7849 (0.0443) | 0.6840 (0.0486) | 0.6831 (0.0470) |
F1 | 0.2161 (0.0156) | 0.6801 (0.0586) | 0.7853 (0.0448) | 0.6492 (0.0602) | 0.6485 (0.0595) |
AUC | 0.9612 (0.0156) | 0.9406 (0.0246) | 0.9709 (0.0085) | 0.9627 (0.0159) | 0.9658 (0.0139) |
PRE | 0.1217 (0.0099) | 0.9919 (0.0172) | 0.9136 (0.0502) | 0.9972 (0.0106) | 0.9953 (0.0123) |
SPEC | 0.6288 (0.0345) | 0.9998 (0.0005) | 0.9965 (0.0023) | 0.9999 (0.0003) | 0.9999 (0.0004) |
ACC | 0.6459 (0.0325) | 0.9758 (0.0033) | 0.9812 (0.0036) | 0.9741 (0.0034) | 0.9741 (0.0032) |
Scenario 5 | |||||
MCC | 0.2671 (0.0194) | 0.8632 (0.0492) | 0.7861 (0.0360) | 0.8506 (0.0486) | 0.8879 (0.0332) |
F1 | 0.5303 (0.0420) | 0.8611 (0.0529) | 0.7792 (0.0395) | 0.8475 (0.0414) | 0.8888 (0.0338) |
AUC | 0.9980 (0.0012) | 0.9971 (0.0041) | 0.9986 (0.0010) | 0.9985 (0.0012) | 0.9985 (0.0011) |
PRE | 0.3622 (0.0163) | 0.9811 (0.0286) | 0.6433 (0.0531) | 0.9866 (0.0229) | 0.9745 (0.0323) |
SPEC | 0.9058 (0.0155) | 0.9992 (0.0013) | 0.9705 (0.0067) | 0.9994 (0.0009) | 0.9988 (0.0015) |
ACC | 0.9104 (0.0385) | 0.9878 (0.0041) | 0.9716 (0.0064) | 0.9867 (0.0032) | 0.9899 (0.0028) |
. | DPmix . | local fdr . | Benjamini and Hochberg (1995) . | Muralidharan (2012) . | Martin and Tokdar (2012) . |
---|---|---|---|---|---|
Scenario 1 | |||||
MCC | 0.2329 (0.0209) | 0.5708 (0.0721) | 0.6629 (0.0648) | 0.5379 (0.0804) | 0.5835 (0.0757) |
F1 | 0.1980 (0.0145) | 0.5067 (0.0932) | 0.6427 (0.0736) | 0.4643 (0.1023) | 0.5251 (0.0962) |
AUC | 0.9053 (0.0301) | 0.8869 (0.0365) | 0.9237 (0.0205) | 0.9242 (0.0230) | 0.9230 (0.0216) |
PRE | 0.1113 (0.0092) | 0.9897 (0.0216) | 0.9141 (0.0603) | 0.9915 (0.0227) | 0.9825 (0.0284) |
SPEC | 0.6150 (0.0432) | 0.9998 (0.0004) | 0.9974 (0.0019) | 0.9999 (0.0004) | 0.9996 (0.0006) |
ACC | 0.6297 (0.0396) | 0.9671 (0.0042) | 0.9726 (0.0044) | 0.9653 (0.0043) | 0.9679 (0.0044) |
Scenario 2 | |||||
MCC | 0.2506 (0.0180) | 0.6088 (0.0773) | 0.6674 (0.0638) | 0.5805 (0.0667) | 0.6435 (0.0686) |
F1 | 0.2037 (0.0138) | 0.5578 (0.0991) | 0.6486 (0.0721) | 0.5194 (0.0857) | 0.6048 (0.0846) |
AUC | 0.9524 (0.0218) | 0.9231 (0.0428) | 0.9544 (0.0169) | 0.9668 (0.0174) | 0.9762 (0.0087) |
PRE | 0.1140 (0.0087) | 0.9796 (0.0304) | 0.9129 (0.0556) | 0.9895 (0.0216) | 0.9698 (0.0332) |
SPEC | 0.6033 (0.0364) | 0.9995 (0.0008) | 0.9974 (0.0018) | 0.9998 (0.0004) | 0.9993 (0.0008) |
ACC | 0.6213 (0.0339) | 0.9694 (0.0048) | 0.9729 (0.0042) | 0.9676 (0.0004) | 0.9715 (0.0044) |
Scenario 3 | |||||
MCC | 0.2337 (0.0195) | 0.5397 (0.0854) | 0.6544 (0.0578) | 0.4840 (0.0883) | 0.5591 (0.0740) |
F1 | 0.1948 (0.0129) | 0.4708 (0.1087) | 0.6342 (0.0670) | 0.3973 (0.1080) | 0.4970 (0.0948) |
AUC | 0.9400 (0.0206) | 0.9069 (0.0359) | 0.9500 (0.0191) | 0.9481 (0.0197) | 0.9481 (0.0182) |
PRE | 0.1085 (0.0079) | 0.9759 (0.0424) | 0.9089 (0.0561) | 0.9901 (0.0328) | 0.9707 (0.0437) |
SPEC | 0.5679 (0.0349) | 0.9995 (0.0008) | 0.9972 (0.0020) | 0.9999 (0.0005) | 0.9994 (0.0010) |
ACC | 0.5880 (0.033) | 0.9641 (0.0050) | 0.9710 (0.0040) | 0.9611 (0.0043) | 0.9652 (0.0044) |
Scenario 4 | |||||
MCC | 0.2671 (0.0194) | 0.7080 (0.0474) | 0.7849 (0.0443) | 0.6840 (0.0486) | 0.6831 (0.0470) |
F1 | 0.2161 (0.0156) | 0.6801 (0.0586) | 0.7853 (0.0448) | 0.6492 (0.0602) | 0.6485 (0.0595) |
AUC | 0.9612 (0.0156) | 0.9406 (0.0246) | 0.9709 (0.0085) | 0.9627 (0.0159) | 0.9658 (0.0139) |
PRE | 0.1217 (0.0099) | 0.9919 (0.0172) | 0.9136 (0.0502) | 0.9972 (0.0106) | 0.9953 (0.0123) |
SPEC | 0.6288 (0.0345) | 0.9998 (0.0005) | 0.9965 (0.0023) | 0.9999 (0.0003) | 0.9999 (0.0004) |
ACC | 0.6459 (0.0325) | 0.9758 (0.0033) | 0.9812 (0.0036) | 0.9741 (0.0034) | 0.9741 (0.0032) |
Scenario 5 | |||||
MCC | 0.2671 (0.0194) | 0.8632 (0.0492) | 0.7861 (0.0360) | 0.8506 (0.0486) | 0.8879 (0.0332) |
F1 | 0.5303 (0.0420) | 0.8611 (0.0529) | 0.7792 (0.0395) | 0.8475 (0.0414) | 0.8888 (0.0338) |
AUC | 0.9980 (0.0012) | 0.9971 (0.0041) | 0.9986 (0.0010) | 0.9985 (0.0012) | 0.9985 (0.0011) |
PRE | 0.3622 (0.0163) | 0.9811 (0.0286) | 0.6433 (0.0531) | 0.9866 (0.0229) | 0.9745 (0.0323) |
SPEC | 0.9058 (0.0155) | 0.9992 (0.0013) | 0.9705 (0.0067) | 0.9994 (0.0009) | 0.9988 (0.0015) |
ACC | 0.9104 (0.0385) | 0.9878 (0.0041) | 0.9716 (0.0064) | 0.9867 (0.0032) | 0.9899 (0.0028) |
aNote. The values in the table represent the average and F1 scores, the average precision (PRE), specificity (SPEC), accuracy (ACC), and the area under the curve (AUC) of the corresponding receiver operating characteristic curve, over 30 replicates with corresponding standard deviations between brackets.
4.2 Case Study: Microbiome Data
We illustrate the applicability of the proposed two-group 2PPD process model on a publicly available data set of microbial abundances from a case-controlled study on postdiarrheal disruption in children from low-income countries. The purpose of the study was to identify potential microbiota which may show positive associations with MSD in the case group. Negative associations are also of interest since they may suggest potential target treatments for recovery from dysbiosis.
Stool samples were obtained from 992 children between the ages of 0 and 59 months, 508 of whom had recently suffered from the MSD, with the remaining 484 children acting as age-matched controls. The samples were obtained in Mali (M), the Gambia (G), Kenya (K), and Bangladesh (B) and case/control proportions were approximately equal for each country.
Due to the nature of the sampling mechanism, the distribution of the microbiome counts is highly skewed, ie, a few are highly abundant, whereas most microbes have low frequencies (Chen and Li, 2016). Here, we are interested in evaluating the ability of our model to identify microbiota which may be differently abundant in healthy and MSD subjects. Therefore, we employ a Negative-Binomial regression model on the taxonomic abundances , where
indexes the microbiotic taxa, and
indexes the samples. As it is typical when dealing with sequencing data (see, eg, Witten, 2011), we let
denote an estimate of a sample-specific size factor, to take into account the different sequencing depths of the samples. Also, we let
,
, and
denote the three available covariates for the MSD status, age, and country. More specifically,
for cases and
for the matched controls. We adopt Gambia as the reference value for the other countries, and let
, and
be dummy variables for the other countries. Then, we assume:

where represents a taxon-specific dispersion parameter, and
represents a taxon-specific effect, which captures the abundance of taxon j in the control group, and the
's represent the effects of each covariate on the taxon abundance. The Negative-Binomial distribution was chosen due to its flexibility over the Poisson alternative. The model was fitted using the glmmTMB package. To illustrate our multiple-testing procedure, we consider the fixed case-control effect captured by the estimates of the coefficients
's, which provide the z-scores for testing the differences in abundance between healthy and MSD subjects. A histogram of the 535 z-scores from the data is given in Figure 1. Since the estimated coefficients are a function of the original data, the independence assumption may not be satisfied if the original taxonomic abundances are correlated. Indeed, the presence of hidden correlation among the observables and unknown associations with unobserved covariates are major motivations for the two-group model formulation in Efron (2004).

Microbiome data case study: Histogram of 535 z-scores obtained from the case term (β1) in the Negative-Binomial generalized linear mixed effects model. We superimpose the posterior probabilities of the events and the threshold corresponding to a Bayesian FDR of 1%
In the two-group model (6)-(7), we fix the hyperparameters for the prior processes as ,
, and
. The specific choice for σ0 allows small departures of the empirical null from the theoretical
distribution, while maintaining computational feasibility in the generation of the latent clusters from the null. A Beta(1,99) is chosen for ρ to further encourage sparsity of discoveries. The hyperparameters of the base measures were set as in Section 4.1. For the results provided here, we run 20 000 iterations after 20 000 iterations as burn-in. Figure 1 overlays the Monte Carlo estimates of the posterior probability of each taxon belonging to the nonnull distribution to the histogram of the z-scores. By thresholding the Monte Carlo estimate of posterior probability of the nonnull process at a value corresponding to a Bayesian false discovery rate (Newton et al., 2004) of 1%, we identify a total of 74 nonnull taxa. On the contrary, the BH procedure leads to 143 significant microbes, when controlling the FDR at the 1% level. The locfdr model detects as relevant only 6 taxa. Tables 1 and 2 in the Web Appendix F report the taxa with the highest discovery probabilities, separately for positive and negative z-scores. A close inspection of our results reveals some interesting biological findings (see Web Appendix G).
4.3 Case Study: Prostate Cancer Data Set
To assess how our model performs in large-sample cases, we apply our methodology to the widely known Prostate data set of Singh et al. (2002). See also Efron (2009). We exploit the split-merge move in the MCMC to improve computational efficiency (see Web Appendix D). The data set is composed of 6033 genes for 102 observations from 52 prostate cancer patients and 50 healthy men. We adopt the same prior specification as in the microbiome case study, with the exception that here we set , as in the simulation studies. This choice is in accordance with the discussion in Efron (2008), who suggests a proportion a priori of no more than 10% nonnull genes for these data. Figure 2 reports the posterior probabilities of discovery for this data set. When thresholding the BFDR at the 20% level, our method flags only 18 genes as relevant. Similarly, the locfdr procedure flags 19 genes. On the contrary, the BH procedure identifies 60 genes as significant, even when thresholding the FDR at the 10% level.

Prostate data set: Histogram of 6033 z-scores obtained from a two-group comparison. We superimpose the posterior probabilities of the events and the threshold corresponding to a Bayesian FDR of 20%
5 Discussion and Conclusion
We have considered the two-group model by Efron (2004) for multiple hypotheses testing and we have proposed the use of a mixture prior of 2PPD processes as a flexible class of prior processes in that framework. In particular, an appropriate choice of the hyperparameters of the 2PPD processes allows the characterization of small departures from the theoretical null in the estimation of the empirical null distribution, while leaving flexibility in the modeling of the nonnull distribution. We have also employed a mixture of NLP densities as base measure for the alternative distribution, to improve separation and facilitate the estimation and identifiability of the mixture components. The proposed approach has been shown to provide a robust testing procedure, which compares favorably with recently proposed methods for estimating the components of the two-group model, including the widely used DP mixture models. A limitation of the procedure is related to the computing effort, since MCMC algorithms for Bayesian nonparametric models typically require considerable computational time for posterior inference. To provide an illustration, in the analysis of the Prostate cancer data set of Section 4.3, it took approximately 56 hours to run 20 000 MCMC iterations on a Xeon(R) E5-2640 v4, 2.40 GHz Linux sever, with the computational bottleneck being represented by the iterations requiring a full Pólya-Urn sampling. Variational Bayes techniques have been developed for many Bayesian nonparametric models, including the 2PPD process (see, eg, Jordan and Blei, 2006). However, the speed up of MCMC algorithms for Bayesian nonparametric models in high-dimensional settings is still a topic of ongoing research (see, eg, Canale et al., 2019).
A careful choice of the hyperparameters of the two-group 2PPD model is essential to ensure good operating characteristics of the testing procedures. We have followed prevailing practices and set in both the simulations and the data analyses. Priors on θ0 and θ1 would need to incorporate constraints to facilitate the identification of the two-group components.
Finally, in our data analyses, we have proposed a two-group model for the analysis of data observed under two conditions. However, often the interest is in studying longitudinal changes of repeated measurements within a subject. Therefore, models that take into account the temporal dependence of the hypotheses are required.
Data Availability Statement
The data that support the findings in this paper are openly available in the R packages msd16s at http://doi.org/10.1186/gb-2014-15-6-r76 (Pop et al., 2014) and sda, at https://CRAN.R-project.org/package=sda (Ahdesmaki et al., 2015).
Acknowledgments
The authors thank the reviewers and the Editor for their careful and inspiring comments, that helped improving the content and the structure of this paper. A. Lijoi has been partially supported by MIUR, grant 2015SNS29B.
References