Flexible control of the median of the false discovery proportion

We introduce a multiple testing procedure that controls the median of the proportion of false discoveries (FDP) in a flexible way. The procedure only requires a vector of p-values as input and is comparable to the Benjamini-Hochberg method, which controls the mean of the FDP. Our method allows freely choosing one or several values of alpha after seeing the data -- unlike Benjamini-Hochberg, which can be very liberal when alpha is chosen post hoc. We prove these claims and illustrate them with simulations. Our procedure is inspired by a popular estimator of the total number of true hypotheses. We adapt this estimator to provide simultaneously median unbiased estimators of the FDP, valid for finite samples. This simultaneity allows for the claimed flexibility. Our approach does not assume independence. The time complexity of our method is linear in the number of hypotheses, after sorting the p-values.


Introduction
Multiple hypothesis testing procedures have the common aim of ensuring that the number of incorrect rejections, i.e. false positives, is likely small.The most commonly used multiple testing procedures control either the family-wise error rate or the false discovery rate (FDR) (Dickhaus, 2014;Harvey et al., 2020).The false discovery rate is the expected value of the false discovery proportion (FDP), which is the proportion of false positives among all rejections of null hypotheses.Controlling the FDR means ensuring that the expected FDP is kept below some pre-specified value α (Benjamini and Hochberg, 1995;Benjamini and Yekutieli, 2001;Goeman and Solari, 2014).
The FDP, which is an unknown quantity, can vary widely about its mean, when the tested variables are strongly correlated (Efron, 2007;Schwartzman and Lin, 2011;Delattre and Roquain, 2015).For this reason, methods have been developed that do not control the FDR or estimate the FDP, but rather provide a confidence interval for the FDP (Hemerik and Goeman, 2018).Some methods provide confidence intervals for several choices of the (1982) and advocated in Storey (2002).We will refer to it as the Schweder-Spjøtvoll-Storey estimator.Some publications refer to it as Storey's estimator or the Schweder-Spjøtvoll estimator (Hoang and Dickhaus, 2022).The literature proposes multiple π 0 estimators based on p-values (Rogan and Gladen, 1978;Hochberg and Benjamini, 1990;Langaas et al., 2005;Meinshausen et al., 2006;Rosenblatt, 2021).As a side result of our investigation of π 0 and FDP estimation, we add to this literature a novel π 0 estimator that is slightly different from Schweder-Spjøtvoll-Storey, unless its tuning parameter is 0.5.
The proposed methodology also draws from an idea in Hemerik et al. (2019), which is to construct simultaneous FDP bounds, called confidence envelopes, in a manner that is partly data-based and partly reliant on a pre-specified family of candidate envelopes.The simultaneity of the constructed bounds allows for post hoc selection of rejection thresholds and hence post hoc specification of γ.The methodology proposed here is applicable in many situations where the method in Hemerik et al. (2019) is not.The reason is that one cannot generally use permutations if one only has p-values, which is the setting we assume.
Our mFDP controlling approach conceptually relates to recent methods that bound the FDR by α by finding the largest p-value threshold for which some conservative estimate of the FDP is below α (Barber and Candès, 2015;Li and Barber, 2017;Lei and Fithian, 2018;Luo et al., 2020;Lei et al., 2021;Rajchert and Keich, 2022).Those methods do not offer the simultaneity provided in the present paper.
In Section 2, we start with non-simultaneous estimation of the number of false positives.Section 3 contains the main theoretical results, which build on Section 2. In Section 4 we use simulations to investigate properties of our method.We find that the method was valid in all considered simulation settings and had good power compared to competitors, especially in high-dimensional settings with many false hypotheses.The Supplementary Material contains proofs, additional simulations, an analysis of real RNA-Seq data and theoretical extensions of our results.
2 Median unbiased estimation of the FDP

Notation
Throughout this paper we consider hypotheses H 1 , ..., H m and corresponding p-values p 1 , ..., p m , which take values in (0, 1].Write p = (p 1 , ..., p m ).Let N = {1 ≤ i ≤ m : H i is true} be the set of indices of true hypotheses and let N = |N | be the number of true hypotheses, which we assume to be strictly positive for convenience.The fraction of true hypotheses is π 0 = N/m.Let q 1 , ..., q N denote the the p-values corresponding to the true hypotheses, in any order.Write q = (q 1 , ..., q N ).
If t ∈ (0, 1), we write R(t) = {1 ≤ i ≤ m : p i ≤ t}.We will call R = R(t) the set of rejected hypotheses, since t will usually denote the p-value threshold.Write R = |R|.Let V = |N ∩R| be the number of true hypotheses in R, i.e., the number of false positive findings.We write a ∧ b for the minimum of numbers a and b.

The Schweder-Spjøtvoll-Storey estimate
The paper's first results, which inspired Section 3, follow from a reinvestigation of the Schweder-Spjøtvoll-Storey estimator of π 0 (Schweder and Spjøtvoll, 1982;Storey, 2002).The estimator depends on a tuning parameter in (0, 1) that is usually denoted by λ.For practical reasons we will write the estimator in terms of t := 1 − λ.The estimator is The heuristics behind this estimate are as follows.The non-null p-values, i.e., the p-values corresponding to false hypotheses, tend to be smaller than 1 − t, so that most of the p-values larger than 1 − t are null p-values.Since for point null hypotheses the null p-values are standard uniform, one expects about t • 100% of the null p-values to be larger than 1 − t.Hence, a (conservative) estimate of the number of null p-values is t −1 |{i : p i > 1 − t}|.Thus, π′ 0 is an estimate of π 0 .Storey's estimator is related to the concept of accumulation functions, used to estimate false discovery proportions (Li and Barber, 2017;Lei et al., 2021).

Median unbiased estimation of V and π 0
Here we derive estimators of V and π 0 that are inspired by the Schweder-Spjøtvoll-Storey estimator.We make the following assumption.
Assumption 1.The following holds: Assumption 1 says that the number of small (≤ t) null p-values tends to be smaller than the number of large (≥ 1 − t) p-values (null and non-null).Note that this assumption is satisfied in particular if Further, note that the probability in (3) is equal to If the null p-values q 1 , ..., q N are independent and standard uniform, then Assumption 1 is clearly satisfied.As another example, suppose q = (q 1 , ..., q N ) is symmetric about 1/2, i.e., (q 1 , ..., q N ) d = (1 − q 1 , ..., 1 − q N ). (5) Then property (4) and hence Assumption 1 also hold.The symmetry property (5) holds for instance if q 1 , ..., q N are left-or right-sided p-values from Z-tests based on test statistics Z 1 , ..., Z m with joint N (0, Σ) distribution.Further, note that null p-values that are stochastically larger than uniform, or the presence of many non-null's makes it easier for Assumption 1 to be satisfied.Note that if t is used as a rejection threshold, the number of false positive findings is Under Assumption 1, with probability at least 0.5, we have In other words, V (t) is a 50%-confidence upper bound for V (t).We will refer to such bounds as median unbiased estimators for brevity, although writing 'not-downward biased' instead of 'unbiased' would be more precise.This result also leads to a median unbiased estimator of π 0 .Indeed, if V ≤ V , then R contains at least R − V false hypotheses, so that π 0 is at most A rewrite gives the following result.
Thus, if the p-values are continuous and t = 0.5, then π ′ 0 = π′ 0 with probability 1.For other values of λ, we obtain a median unbiased estimate π ′ 0 that is slightly different from π′ 0 .In the Supplementary Material, we provide a theoretical comparison of E(π ′ 0 ) versus E(π ′ 0 ).In the Supplementary Material we also obtain the estimate π ′ 0 in an alternative way and, doing so, discover a broader class of π 0 estimators.
We write π 0 = min{π ′ 0 , 1}.In Example 1 and the corresponding Figure 1, the Schweder-Spjøtvoll-Storey method is applied to 500 simulated p-values.Example 1 (Running example, part 1: estimating π 0 and V ).As a toy example we generated 500 independent p-values, 400 of which were uniformly distributed on [0, 1] and 100 of which were stochastically smaller than uniform on [0, 1].Thus, we can say that N = 400.A scatterplot of the sorted p-values is shown in Figure 1, as well as a visual illustration of how Storey's estimate π0 m of the number of true hypothes is computed, in case λ = 1 − t = 0.8.Often λ is taken smaller, but considering small t instead will turn out to be useful.In this example, Storey's estimate π0 • m was 410 and our estimate, which is less easy to visualize, was π 0 • m = 402.Thus, the estimates were close, as is often the case.Since property (5) and hence Assumption 1 is satisfied, we know that π 0 is a median unbiased estimator of π 0 .In particular, we know with 50% confidence that there are at least 500−402 = 98 false hypotheses in total.
As explained in this section, we can make this statement stronger by noting that R(t) = 180 and V (t) = 82.The latter means that we know with 50% confidence that there are at least 180 − 82 = 98 false hypotheses among the hypotheses with p-values below t = 0.2.

Median unbiased estimation of the FDP
Define the FDP to be the proportion of false positives, which is understood to be 0 when R = 0.The median unbiased estimate V immediately implies a median unbiased estimate of the FDP.
Theorem 2. Suppose Assumption 1 is satisfied.The variable F DP (t) = V (t)/R(t) is a median unbiased estimator for the FDP, i.e., To prove this, we only need to remark that if V ≤ V , then F DP ≤ F DP .
3 Controlling the mFDP 3.1 Overview of our method and comparison with FDR control In section 2.4 we considered a fixed rejection threshold t and provided a median unbiased estimate for F DP (t).In many situations, one would like to adapt the threshold t based on the data, in such a way that one still obtains a valid median unbiased estimate.Note that naively choosing t in such a way that an attractive (low) estimate of the FDP is obtained, can invalidate the procedure, in the sense that inequality (7) not longer holds.In Section 3.3 however, we derive a method that provides median unbiased bounds for a large range of t, in such a way that with probability at least 0.5, the bounds are simultaneously valid for all t.Specifically, we let the user choose some range T ⊆ [0, 1] of rejection thresholds t of interest, before looking at the data.Usually a good choice for T will be [0, 1/2] or another interval starting at 0. Then we provide 50%-confidence upper bounds B(t) for V (t) that are simultaneously valid over all t ∈ T: It then immediately follows that B(t)/R(t), t ∈ T, are simultaneously valid 50%-confidence bounds for F DP (t): Since the threshold t can be chosen based on the data, it can be picked such that B(t)/R(t) is low.In particular, one can prespecify a value γ ∈ [0, 1], for example γ = 0.05, and take the threshold t ∈ T to be the largest value for which B(t)/R(t) ≤ γ, if such a t exists.This means that our method can be used to reject a set of hypotheses in such a way that the median of the FDP is bounded by γ: In other words, we can control the median of the FDP, which we will call the mFDP.Our method is an example of false discovery exceedance control, but with the added property that γ can be chosen post hoc, as we discuss below.Our notation 'γ' is in line with e.g.Romano and Wolf (2007) and Basu et al. (2021).
Our method is related to the popular BH procedure, which ensures that E(F DP ) ≤ γ (Benjamini and Hochberg, 1995).BH ensures that the mean of the FDP is controlled, while we ensure that the median of the FDP is controlled.The mean and the median of the FDP can be asymptotically equal in some settings where the dependencies among the p-values are not too strong (Neuvial, 2008;Ditzhaus and Janssen, 2019), but there is no general guarantee that they are similar (Romano and Shaikh, 2006;Schwartzman and Lin, 2011).Especially under strong dependence, mF DP ≤ γ does not need to imply E(F DP ) ≤ γ, while the converse does hold in many practical situations.Moreover, unlike mFDP control, FDR control always implies weak control of the family-wise error rate (Romano et al., 2008, Section 6.4).Note, however, that before applying any multiple testing method, we could first perform a global test, to enforce weak family-wise error rate control (Bernhard et al., 2004).
The most important advantage of our method over BH, is that it provides simultaneous 50% confidence bounds for the FDP.This allows simultaneous as well as post hoc inference, in the sense that t ∈ T can be chosen after seeing the data.Further, we can choose multiple values of t and obtain simultaneously valid statements on the FDP.Moreover, we can choose the target FDP γ post hoc.With BH, such inference is not possible: if we choose γ after seeing the data, then BH can become very anti-conservative.This is discussed in subsection 3.2.

Benjamini-Hochberg is not flexible
The main advantage of the method that we will propose, is that it allows choosing one or several rejection thresholds or target FDPs after seeing the data.This contrasts our method with BH.Indeed, when the target FDR α (or γ in our notation) is chosen based on the data, then BH no longer guarantees that E(F DP ) ≤ α, conditional on the post hoc chosen α.Note that when testing a single hypothesis, choosing α post hoc is not generally valid either (Hubbard, 2004;Grünwald, 2022).For simulations illustrating that BH is not valid post hoc, see the Supplementary Material.Another related result is Fig. 5 in Katsevich and Ramdas (2020), which illustrates based on simulations that BH does not have a simultaneous interpretation.We now provide some mathematical examples that prove that BH is not valid post hoc.
Suppose all m hypotheses are true and the p-values are mutually independent and uniformly distributed on (0, 1].BH provides m adjusted p-values and rejects all hypotheses with adjusted p-values that are at most α.Let p bh (1) denote the smallest adjusted p-value.It is well known that if α ∈ [0, 1] is prespecified and all p-values are independent and uniform on (0, 1], then the probability that BH rejects any hypotheses is exactly α (Goeman and Solari, 2011).BH rejects any hypotheses if and only if p bh (1) ≤ α.Thus, p bh (1) is uniform on (0, 1].As a simple example of a post hoc chosen α, take α := p bh (1) .We now show that we then no longer have E(F DP/α) ≤ 1.Since α = p bh (1) , we know that α is uniform on (0, 1].By definition of α, there is always at least one rejected hypothesis.Since all hypotheses are true, this means that we always have F DP = 1.Consequently, Of course, we have considered an extreme situation, where α can take any value.We now consider a less extreme situation, where we only allow α to take two values, say, a 1 and a 2 , with 0 < a 1 ≤ a 2 < 1.Specifically, we define α to be a 1 if p bh (1) ≤ a 1 and otherwise we take α = a 2 .This mimics the psychology of a researcher who uses a 2 as a default value for α but takes α to be a 1 if this still leads to at least one rejection.Note that if p bh (1) > a 2 , then we reject nothing, so that F DP = 0. Thus, with this definition of α, we have which always exceeds 1, except if a 1 = a 2 .As an example, take a 1 = 0.05 and a 2 = 0.1, which are values often used in practice.This defines a rather limited set of allowed values for α.
Nevertheless, we find that E(F DP/α) = 1.5, which is already much larger than 1.The reader can check that if we allow α to take more than two values, then E(F DP/α) can become huge.Indeed, if we allow α to take any value in (0, 1], then E(F DP/α) can become infinity, as we saw in the previous example where α = p bh (1) .
These examples show that if α depends on the data, then marginally we often have E(F DP/α) > 1.This means in particular that conditional on α taking a certain value, we do not generally have E(F DP ) ≤ α.A simulation study that illustrates this point in various other settings, is in the Supplementary Material.

Simultaneous bounds for the FDP
Let N be the set of natural numbers.We call a function B : T → N a confidence envelope is it satisfies inequality (8) (cf.Hemerik et al., 2019).We restrict ourselves to such 50% confidence envelopes and do not consider e.g.95% confidence envelopes.Let B be a set of maps for all t ∈ T. We call B the family of candidate envelopes (cf.Hemerik et al., 2019).
We will obtain a confidence envelope by picking the smallest B ∈ B for which B(t) ≥ V (t) for all t ∈ T. We call this envelope B: If r is a vector containing, say, l r p-values, then we write R(r, t) = {1 ≤ i ≤ l r : r i < t}, to make the dependence on the p-values explicit.Analogously we define V (r, t), V (r, t) and B(r).We use the convention that We will only require the following assumption.
Assumption 2. The following holds: Due to the monotonicity of the set B, we always have either B(q) < B(1 − q) or B(q) ≥ B(1 − q).If the latter inequality has the largest probability, then Assumption 2 is always satisfied, since B(p) ≥ B(q).Assumption 2 is a generalization of Assumption 1, in the sense that if T is equal to the singleton {t} then Assumptions 1 and 2 will coincide, for most reasonable choices of B (e.g. for B as in Section 3.4).
Assumption 2 always holds if property ( 5) is satisfied, regardless of our choice of B. Indeed, if (5) holds, we have Since the latter two probabilities are equal, they are both at least 0.5, so that Assumption 2 is satisfied.Moreover, property (5) is not necessary for Assumption 2 to hold, as confirmed by our simulations.
Let [•] + be the positive part function.The following theorem states that B provides simultaneously valid 50%-confidence bounds.
Theorem 3. Suppose Assumption 2 holds.Then the function B is a confidence envelope, i.e., In addition, B′ : T → N defined by which satisfies B′ ≤ B, is also a confidence envelope and potentially improves B.
Discussion of the proof.The proof is the Supplementary Material, but here we will give the intuition.First of all, note that V (t) is a 50% confidence bound for V (t), but not simultaneously over all t.The reason is that if multiple events have probability 0.5, then the probability that all events happen is usually smaller than 0.5.For example, if for t 1 , t 2 ∈ (0, 1) we have P V (t j ) ≤ V (t j ) ≥ 0.5 for j = 1 and j = 2, then we do not generally have P V (t 1 ) ≤ V (t 1 ) and V (t 2 ) ≤ V (t 2 ) ≥ 0.5.To get a simultaneous bound for V (t), we usually need a stricter requirement, i.e., it is not sufficient to simply define B(t) = V (t).In the proof, we note that if B(p) ≥ B(1 − q), then B(t) ≥ V (t) for all t.It thus follows from Assumption 2 that our B is a confidence envelope.That B is chosen from a fixed, monotone family is not directly used in the proof.However, if B is chosen from such a family, then B(p) ≥ B(q) and it follows that if (5) holds, then Assumption 2 is satisfied.Thus, that B is chosen from a fixed, monotone family makes Assumption 2 reasonable.It also allows defining B as a simple minimum.
In the rest of this subsection, we provide an extention of the bounds B′ (t) and a result on admissibility.It turns out that B′ coincides with an envelope obtained through a novel closed testing-based procedure, in the sense of Goeman and Solari (2011) and Goeman et al. (2021).This novel procedure provides a 50% confidence bound for the number of true hypotheses in I, for every subset I ⊆ {1, ..., m}.These bounds are all simultaneously valid with probability at least 50%.We denote these bounds by B(I).where the maximum of an empty set is interpreted as 0. Assume T ⊆ [0, 1/2) and P B(q) ≥ B(1 − q) ≥ 0.5.Then i.e., the B(I) are simultaneous 50% confidence bounds for the number of true hypotheses in I.In particular, the function T → N defined by t → B(R(t)) is a confidence envelope.
If the local tests discussed in the proof of Theorem 4 are admissible, then the method of Theorem 4 is admissible, in the sense of Theorem 3 in Goeman et al. (2021).The local tests will usually be admissible when B is any reasonable family, for example the family considered in Section 3.4.By Theorem 5 below, if the procedure of Theorem 4 is admissible, then the envelope B′ (t) from Theorem 3 is also admissible.
Theorem 5.For every t ∈ T, the bound B′ (t) from Theorem 3 is equal to the bound B(R(t)) from Theorem 4.Moreover, if the procedure from Theorem 4 that provides bounds for all I ∈ M is admissible, then the envelope B′ is also admissible.Here admissibility of B′ means that there exists no envelope B : T → N such that B(t) ≤ B′ (t) for all t ∈ T and such that P{∃t ∈ T : B(t) < B′ (t)} > 0.
The admissibility property of our method contrasts it with BH.The latter method is not admissible, since it is uniformly improved by the method in Solari and Goeman (2017), of which it is not known whether it is admissible.In the rest of this paper we will only focus on bounds for rejected sets of the form R(t) = {1 ≤ i ≤ m : p i ≤ t}, as constructed in Theorem 3.

A default mFDP envelope
The envelope B depends on a general family B of candidate confidence bounds.The choice of this family can have a large influence on the bounds obtained (cf.Hemerik et al., 2019).An important question is thus how to choose this set B in a suitable way.Typically we want B to contain at least one function B that is a tight upper envelope of the function t → V (t).Note that between t = 0 and, say, t = 0.5, the function V (t) tends to be roughly linear in t, at least under independence.Thus, it can make sense to also take the candidate envelopes B ∈ B to be roughly linear.Also, giving them a small positive intercept will often be useful to avoid that B is too sensitive to p-values near 1.
Further, it is usually suitable to take T = [s 1 , s 2 ], where s 1 ≥ 0 is the smallest threshold of interest and s 2 < 1 is the largest threshold of interest.Based on these considerations, we propose to use the following default family B of candidate functions: with Here, c ≥ 0 is a pre-specified small constant.The discrete function B κ is roughly linear in t and has slope 1/κ.The choice of c influences the slope and intercept of B κ and hence of the resulting envelope B. Taking c to be 0 or very small tends to lead to tighther bounds B(t) for very small t, while taking c a bit larger tends to lead to tighter bounds for larger t.We found in simulations that taking c = 1/(2m) usually gave good overall power.
If we take B as in expression (11), then the confidence envelope becomes For computer programming this method, a useful equivalent formulation is the following, if T is an interval.
We then have where (If the denominator is 0, the expression is interpreted as ∞.) Note that we can sometimes straightforwardly improve the envelope B κmax by using the second part of Theorem 3. In Example 2, we continue the running example and compute simultaneous mFDP bounds.Figure 2 shows the confidence envelope and Figure 3 illustrates how the envelope was determined.Example 2 (Running example, part 2: Confidence envelopes.).We continue on Example 1 by computing confidence envelopes, i.e., simultaneous 50%-confidence upper bounds for V (t), the number of false positives, which depends on the threshold t.We took T = [0, 0.2] and defined B as in (12).We computed B for both c = 0 and c = 2/m = 0.004.These choices for c were somewhat arbitrary.The number of rejections R(t), as well as the bounds B(t) for both values of c, are plotted in Figure 2. The construction of the confidence envelopes B is illustrated in Figure 3.
The figure shows that as expected, near t = 0, the number of rejections increases quickly with t.The reason is that there were many p-values near 0, as seen in Figure 1.By definition (12), the bounds B(t) are roughly linear in t and we see this in the figures.We also see that for this specific dataset, the bound B depends strongly on c: for c = 0.004, the bound B(t) is lower than for c = 0 if t is close to 0, but much higher otherwise.For most values of t ∈ [0, 1] the envelope for c = 0.004 is better, i.e. lower, than the envelope for c = 0. On the other hand, the smallest cutoffs are often most relevant.Finally, we remark that the bounds in the figures can be somewhat improved using the last part of Theorem 3.This improvement was used to obtain Figure 4, where simultaneous 50% confidence bounds for F DP (t) are shown.

Controlling the median of the FDP
Consider γ ∈ [0, 1].As discussed in section 3.1, we can use any confidence envelope B to guarantee that P(F DP ≤ γ) ≥ 0.5.In other words, we can control the mFDP.Note that by mF DP we mean the median of the distribution that the FDP has, conditional on the data and conditional on γ, which can be chosen after seeing the data.This is stated in the following Theorem.(The maximum of an empty set is taken to be 0.) Theorem 6.Let B : T → N be a confidence envelope, for example B. Let the target FDP γ ∈ [0, 1] be freely chosen based on the data.Define Reject all hypotheses with p-values at most t max and denote the FDP by F DP γ .Then with probability 0.5 the FDP is at most γ, i.e., In fact we have i.e., the procedure offers mFDP control simultaneously over all γ ∈ [0, 1].
In other words, if we reject all hypotheses with p-values that are at most t max , then a median unbiased estimate of the FDP is γ.This follows directly from the fact that the estimates F DP (t), t ∈ T, are simultaneously valid 50%-confidence upper bounds, by inequality (8).Inequality (14) holds despite the fact that γ can depend on the data.In fact, with probability at least 50%, F DP γ ≤ γ simultaneosly over all γ ∈ [0, 1].This contrasts our method with many other procedures, which require considering only one rejection criterion, which moreover needs to be chosen in advance (Benjamini and Hochberg, 1995;van der Laan et al., 2004;Lehmann and Romano, 2005;Romano and Wolf, 2007;Guo and Romano, 2007;Roquain, 2011;Neuvial, 2008;Guo et al., 2014;Delattre and Roquain, 2015;Ditzhaus and Janssen, 2019;Döhler and Roquain, 2020;Basu et al., 2021;Miecznikowski and Wang, 2022).In Example 3 we continue the running example and apply our mFDP control method.Example 3 (Running example, part 3: Controlling the mFDP.).We continue on Example 2. Take γ = 0.05 and consider the confidence envelope B discussed in Example 2. To find a rejection threshold t max for which we can ensure mF DP ≤ γ, we use Theorem 6.It computes t max as the largest t for which the estimate in Figure 4 is at most γ.
Recall that in Example 2, we computed bounds B(t) for both c = 0 and c = 0.004.For c = 0, we now find t max = 0.002709, which is the 54-th smallest p-value.Thus, we can reject 54 hypotheses.More precisely, if we reject the 54 smallest p-values, we know that the mFDP is below γ = 0.05.Note that t max is about 27 times higher than the Bonferroni threshold 0.05/500 = 0.0001.
If c = 0.004 then t max = 0.001660, so that we can only reject 53 hypotheses.The reason For every rejection threshold t, V (t) is a 50% confidence upper bound for the number of false positives, V (t).The confidence envelope B(t) is constructed in such a way that it lies above the pointwise bound V (t) for all t ∈ T. Due to this construction, the bounds B(t) are simultaneous 50%-confidence bounds for V (t).The intercept and slope of B are influenced by the choice of c.
why t max is lower if c = 0.004, it that for small values of t, the bound B(t) is higher for c = 0.004 than for c = 0. We saw this in Figure 2. Note that it is allowed to change γ after looking at the data.For instance, if we decrease γ to 0.01, we reject 44 hypotheses if c = 0 and we reject no hypotheses for c = 0.004.

Adjusted p-values for mFDP control
Adjusted p-values can be a useful tool in multiple testing.They are defined as the smallest level, e.g. the smallest γ, at which the multiple testing procedure would reject the hypothesis.Adjusted p-values can be problematic in the context of e.g.FDR control and ours.The reason is that the adjusted p-value does not have an independent meaning and can easily be misinterpreted when taken out of context (Goeman and Solari, 2014, §5.4).Moreover, an mFDP-adjusted p-value could be 0, which also shows that the interpretation is very different than for real p-values, which cannot be 0. Nevertheless, in our context, adjusted p-values are quite useful, because, once computed, they allow checking quickly which hypotheses are rejected for various γ.
Let B be a confidence envelope and 1 ≤ i ≤ m.As discussed in Section 3.5, B defines an mFDP controlling procedure.The mFDP adjusted p-value for H i is the largest γ ∈ [0, 1] for which H i is still rejected by the mFDP controlling procedure.Consequently, if we reject all Figure 4: For two values of c, simultaneous 50% confidence upper bounds for F DP (t) are shown.Here F DP (t) := B(t)/R(t).Note that if c = 0.004, the bound is larger than zero at t = 0.The reason is that B(0) > 0 for this value of c. Roughly speaking, the bound F DP (t) then decreases for a while, before it starts to increase.Note that if c = 0, then the bound starts at zero and increases from there.
hypotheses H i with p ad i ≤ γ, then mF DP ≤ γ.Proposition 2. Let 1 ≤ i ≤ m.Then the value is an mFDP-adjusted p-value for H i , i.e., if we reject all hypotheses H i with p ad i ≤ γ, then P(F DP γ ≤ γ) ≥ 0.5.Here γ may be chosen based on the data.In fact, inequality (15) holds.We take the minimum of an empty set to be ∞.Suppose T, the set of rejection thresholds of interest, is of the form [s 1 , s 2 ].Then we have the following useful reformulation of Proposition 2. Proposition 3. Suppose T is of the form [s 1 , s 2 ], with 0 ≤ s 1 < s 2 ≤ 1.For each 1 ≤ i ≤ m with p i ≤ s 2 , the adjusted p-value defined above is then Note that given the data, the adjusted p-value is non-decreasing function of the unadjusted p-value.As a consequence of this and Proposition 3, if T is of the form [s 1 , s 2 ], we can use Algorithm 1 to efficiently compute the mFDP adjusted p-values.The algorithm takes the m sorted p-values, p (1) , ..., p (m) , as input and returns the corresponding sorted adjusted p-values.
The p-values were right-sided, so that they were negatively correlated between blocks (NE).
We computed B as in Section 3.4.We took T = [0, 0.1], i.e. our bounds and mFDPadjusted p-values were simultaneously valid with respect to all thresholds t in this interval.We took c = 1/(2m), as recommended in Section 3.4.
We first assessed whether our method provided appropriate simultaneous mFDP control.We show simulation results in Table 1.For each setting, the table shows the estimate of the probability P{for some t ∈ T, V (t) > B(t)}, which is identical to the probability that there is a 0 < γ < 1 for which F DP γ exceeds γ.Each estimate was based on 10 4 repeated simulations.
The table confirms the simultaneous control of our method.We see that the estimated error rate is about 0.5 under independence if π 0 = 1.Indeed, the true error rate is then exactly 0.5.The reason is that then p = q and the equality (5) then holds, so that the probability in Assumption 2 is exactly 0.5.We see that for π 0 = 0.95, the error rate is also about 0.5, rather than less.This is because our method is rather adaptive, as mentioned in the Introduction.
In the setting with negative dependence and π 0 = 1 and one-sided p-values, the error rate is also exactly 0.5, again because (5) then holds.Note that in the other cases, the method was also valid.
Next, we assessed the power of our method by comparing it to that of the mentioned methods from Goeman et al. (2019) ("CT+Simes") and Katsevich and Ramdas (2020) ("K&R").The power was defined as the average fraction of the false hypotheses that was rejected.For three values of the target FDP γ we estimated the power for the three methods.The results are shown in Figure 5, where m = 10 3 and π 0 = 0.9.Note that overall K&R performed least well among the three methods, especially for γ = 0.01.This may partly be due to the +1 in their formula for the bound for the number of false positives.Further, for γ = 0.01, CT+Simes had better power than our novel method.However, as shown in the Supplementary material, for m = 10 4 and π 0 = 0.9, our method was better than that method overall.Further, for m = 10 4 and π 0 = 0.5 our method was clearly better than both competitors, as shown in Figure 6.This can be understood by noting that K&R is not adaptive, i.e., it is conservative when π 0 is far from 1.
Further, our method was orders of magnitude faster than CT+Simes, especially for large m.For example, in the setting of the first panel of Figure 6, our method took 1.7•10 −2 seconds on average, while CT+Simes took 4.8 seconds on average.K&R was the fastest with 8.6 • 10 −4 seconds on average.The reason is that the bounds for V (t) that that method provides, only depend on m and t and not on the data.
Finally, for the same simulation settings, we computed the power of BH and two adaptive versions of BH.The results are in Table 2.The first column shows the power of standard BH.
Table 1: The error rate of our procedure, in various settings with m = 10 3 .The last column indicates the simulation-based estimate of the probability that there is a 0 < γ < 1 for which F DP γ exceeds γ.This probability should not be larger than 0.5.For the settings with π 0 < 1, the signal for the false hypotheses was ∆ = 3.The other columns show the power of two versions of the right-boundary procedure of Liang and Nettleton (2012), which makes BH more powerful by using an estimate of π 0 .The first one (BH*) is the their original procedure based on Storey's estimator, π′ 0 .The second one (BH**) is the same, but based on our novel estimator π ′ 0 .Since BH and adaptive BH require choosing α beforehand, we only show the power for α = 0.05, i.e., γ = 0.05.
Comparing Figure 5 and Table 2 shows that for γ = 0.05, the power of our method was roughly equal to that of BH, yet often slightly lower.However, our method provides simultaneous bounds and γ can be chosen after seeing the results.As expected, the adaptive BH methods had a bit more power than BH.The adaptive methods performed similarly to each other.We found that they provided valid FDR control in all the settings, except in the settings "HO", where the FDR of BH* varied around 0.08 and the FDR of BH** varied around 0.07.

Discussion
This paper provides an exploratory multiple testing approach, which is useful in particular because the user is allowed to freely use the data to choose rejection thresholds.This is what many researchers would like to do, but is not allowed by most popular methods.We have provided a result on admissibility of our approach and simulations show good power, especially in settings with many false hypotheses.Moreover, the power properties can be influenced by the user, who may select an appropriate family of candidate envelopes B. The choice of the range T of rejection thresholds also affects power, since the method focuses the The power of our novel method (solid lines), CT+Simes (dashed lines) and K&R (dotted lines) as depending on γ, for various settings with m = 10 3 and π 0 = 0.9.Each estimate is based on 10 4 simulations.power on the thresholds within this range.
Since our method essentially provides estimates for the FDP without confidence intervals, we encourage users to also compute a confidence interval, using e.g. the methods listed in the Introduction.However, as discussed, the methods among those that are valid under dependence have limited power.This means that the confidence interval for the FDP may contain 1, even when there are several strong signals.If permuting data is valid, this can often be used to construct tighter confidence intervals (Hemerik et al., 2019;Blain et al., 2022;Andreella et al., 2023).
Our simulations illustrate that for a given γ, BH tends to have slightly more power than our method, but our method has the advantage that it provides post hoc inference.Indeed, we have shown that BH often becomes too liberal when α is chosen post hoc.On the other hand, we control the median of the FDP, which may not always be as appealing as control  of the mean.To further illustrate the utility of our method, in the Supplementary Material we provide a data analysis of real RNA-Seq data.Here we further explain that the flexibility leads to added insights into the data.Both our method and BH have certain proven, finite-sample, theoretical guarantees, in particular under independence.None of the methods are guaranteed to be valid under an unknown dependence structure.However, there is much evidence that BH is valid for many dependence structures.Likewise, we did not find a simulation setting where our method was invalid.
Besides FDP estimators, we have provided a novel π 0 estimator.We have discussed simulations where this estimator was used within an adaptive BH approach.Future work may more extensively assess our estimator in such settings.Further avenues for potential future research become apparent in the Supplementary Material.There we discuss more general estimates of π 0 and V (t), which can be combined with the approach in Section 3.3 of constructing simultaneous mFDP bounds.
Note that "uniform" or " simultaneous" control usually means that the probability of a union of events is kept below some value (Genovese and Wasserman, 2004;Meinshausen, 2006;Blanchard et al., 2020;Goeman et al., 2021).Since "FDR control" is not defined as controlling a probability, "simultaneous FDR control" is in that sense undefined.However, interestingly, Corollary 1 in Katsevich and Ramdas (2018) provides what might be called "simultaneous FDR control", assuming the p-values are independent.In particular, there α can be chosen post hoc, while still guaranteeing that the FDR is at most α.

A Overview of the supplementary material
Section B contains proofs of results in the main paper.Section C provides simulation results that show that Benjamini-Hochberg does not have a flexible, post hoc interpretation.Section D provides simulation results as in Section 4 of the main article, but with m = 10 4 and π 0 = 0.9.An analysis of real RNA-Seq data is in Section E. Section F contains a theoretical comparison of the means of the estimators π ′ 0 and π′ 0 from the main paper.Section G contains theory on additional flexible, median unbiased estimates of π 0 and FDP based on closed testing.

B.1 Proof of Theorem 3
Proof.If r is a vector containing, say, l r p-values, then we write R(r, t) = {1 ≤ i ≤ l r : r i < t}, to make the dependence on both the p-values and the threshold explicit.Analogously we define V (r, t), V (r, t) and B(r).We use the convention that Let E be the event { B(p) ≥ B(1 − q)} and suppose E holds.Note that Note that V (q, t) = |N ∩ R(t)| = R(q, t).Hence, for every t ∈ T, Since P(E) ≥ 0.5, it follows that as was to be shown.Now we show that B′ is also a confidence envelope.Assume E holds.Then for every l ∈ T, [R(l) − B(l)] + ≤ S(l), where we recall that for every t ∈ T. Consequently V (t) ≤ B′ (t) for every t ∈ T. This is true whenever E holds.Since P(E) ≥ 0.5, it follows that B′ is a confidence envelope.It improves B when [R(•) − B(•)] + in strictly decreasing somewhere on T.

B.2 Proof of Theorem 4
Proof.We first define a closed testing procedure (Goeman and Solari, 2011;Goeman et al., 2021).For every Define B′ I analogously to B′ from Theorem 3: For each I ∈ M, consider the following local test (Goeman and Solari, 2011) for the intersection hypothesis H I = ∩ i∈I H i : where 1(•) denotes the indicator function.The closed testing procedure (Goeman and Solari, 2011) corresponding to these local tests is defined by the tests Note that this equals ϕ loc I∪{1≤i≤m:p i >1/2} , where we used that T ⊆ [0, 1/2).By definition, this equals Since T ⊆ [0, 1/2), we have R I∪{1≤i≤m:p i >1/2} (t) = R I (t), B′ I∪{1≤i≤m:p i >1/2} (t) = B′ I (t).Thus, (18), equals By assumption, B(q) = BN (p) is an envelope.Consequently, B′ N is an envelope, which follows from an argument analogous to the second part of the proof of Theorem 3. Hence, the local test ϕ loc N rejects with probability at most 0.5.By Goeman and Solari (2011, p.588)

C.2 Simulation results
The simulation results are shown in Table 3.Here, for different settings, the estimate of E(F DP/α) is shown.Each estimate was based on 5 • 10 4 repeated simulations.The results shown are based on simulations with R des = 20.For other values of R des , we obtained comparable results.Note that in most of the simulations settings, E(F DP/α) was estimated to be larger than 1.There is a clear pattern in the results: if there were few false hypotheses and small effect sizes, then E(F DP/α) was the largest.In particular, it can be seen that for the setting where all hypotheses were true, the estimate of E(F DP/α) was 3.23.This means that in this setting, conditional on the post hoc chosen α, E(F DP ) often greatly exceeds α.
We see that when there were many strong effects, then E(F DP/α) was smaller.This is as expected, since in those settings, most of smallest FDR-adjusted p-values tend to be correspond to false hypotheses.Likewise, note that if all 1000 hypotheses would be false, then E(F DP/α) = 0 of course.

D Additional simulations
The present section provides simulation results as in Section 4 of the paper, but with m = 10 4 , i.e., with 10 times more hypotheses.Other than that the settings were the same and π 0 was still 0.9, so that there were now 10 3 false hypotheses.The results are in Figure 7.Each estimate in the figure is based on 10 3 repeated simulations.Note that in this setting, our novel method performed somewhat better than CT+Simes overall.
The performance of K&R was comparable to that of the novel method.As illustrated in Figure 6 of the main article, the novel method usually beats K&R if π 0 < 0.9.The reason is that K&R is not adaptive, i.e., it is quite conservative when π 0 is far from 1.

E Data analysis
We analyzed part of the RNA-Seq count data discussed in Best et al. (2015).The data are from 283 blood platelet samples.We downloaded the data from the Gene Expression Omnibus, accession GSE68086.The samples are from patients with one of six types of cancer, as well as controls.We used the data from the 35 patients with pancreatic cancer and the 42 improved envelope B′ (t) from the second part of Theorem 3 was almost identical to B(t).In Figure 9 the corresponding simultaneous 50%-confidence upper bounds for the FDP are shown.
Figure 8: The number of rejections and the simultaneous bound for the number of false positives, as functions of the rejection threshold t.
We used Algorithm 3.6 to compute mFDP-adjusted p-values.These are useful, because the number of hypotheses that the method rejects can be computed as the number of adjusted p-values that are at most γ.We used the adjusted p-values for generating Table 4, where the number of rejections is shown for various values of the mFDP-threshold γ.For comparison, we also show the number of rejections with BH for α = γ.Note, however, that BH only allows using one value for α, which moreover needs to chosen before seeing the data.
Table 4: The number of rejected hypotheses for different values of γ, for two methods.The first method is our procedure for simultaneous mFDP control.The inferences with this method are simultaneous, which means that with 50% confidence, F DP γ ≤ γ for all γ ∈ T simultaneously.The second method is BH, which ensures that if α is chosen prior to the data analysis, then F DR = F DR(α) ≤ α, but not if α is chosen after inspecting the data.The interpretation of the table is as follows.If the user first chooses e.g.γ = 0.05, then she can reject 125 hypotheses.This means that with probability at least 50%, the true FDP is below 0.05 if we reject the 125 hypotheses with the smallest p-values.Based on this promising result, the user may wonder how many hypotheses are rejected when γ is decreased to 0.01.She finds that then 24 hypotheses are rejected.Since γ = 0.01, the FDP is below 0.01 with probability at least 50%.The FDP must be a multiple of 1/24, so that it follows that with probability 50%, there are no false positives when these hypotheses are rejected.Thus, if, hypothetically, we repeat the experiment many times, then in at least 50% of the cases, for all values of γ that the user considers, the F DP = F DP γ will be below γ.

F Theoretical comparison of π ′
0 and π′ 0 The following result says that often, ).The difference between the expected values is often small, but usually strictly positive.Note that t is often taken larger than 0.5 in practice and then our estimator is less conservative than Schweder-Spjøtvoll-Storey.
Proposition 4. Assume that all p-values have non-increasing densities or that both E|{i : where For t = 1/2, this is 0. Now suppose t ∈ (0, 1/2).If the densities f i of the p-values p i are non-increasing, then E(|{i : Here, the inequality is due to fact that the average of f i (x) on [1 − t, 1] is smaller than or equal to the average of f i (x) on [t, 1], since t < 1 − t and the f i are non-increasing.Multiplying both sides by t(1 − t) gives E λ|{i : p i > 1 − t}| − t|{i : p i > t}| ≤ 0, so that the expected value of ( 22) is at most 0. In case t ∈ (1/2, 1), an analogous proof shows that the expected value of ( 22) is at least 0.
G Other methods for estimation of π 0 and FDP The main paper contains a useful framework for estimation of π 0 and the FDP, inspired by the Schweder-Spjøtvoll-Storey estimator of π 0 .For every cut-off t considered, the paper derives a median unbiased estimate V (t) of the number of false positives V (t).In Theorem 3, these bounds were used to derive a confidence envelope, which provides simultaneous 50%confidence bounds for F DP (t).This confidence envelope was in turn used to provide flexible mFDP control.
In this appendix, we employ existing results related to closed testing, to generalize the definition of V (t).We will obtain a wide range of novel median unbiased estimates of π 0 and V (t).Using the approach developed in the main paper, the novel estimates V (t) could also be used to provide novel confidence envelopes and FDP controlling procedures.These would be a generalization of the methods developed in the main paper.
Note that Theorem 4 in the main paper also relies on closed testing theory, but there we directly constructed simultaneous bounds.Below, however, we construct bounds for fixed t, which could subsequently be used to construct envelopes.
The procedures that we will derive, vary in terms of properties such as accuracy and bias.These properties always depend on the distribution of the data and there is no method that is uniformly best.We consider the method developed in the main paper particularly valuable, because it is sensible and relatively simple.For this reason, we focus on that method in the main paper.

G.1 The Schweder-Spjøtvoll-Storey method and closed testing
In Section 2, we derived median unbiased estimators of π 0 and the FDP.Here we first derive the same result, but from the perspective of closed testing (Marcus et al., 1976;Goeman and Solari, 2011;Goeman et al., 2021).This perspective will reveal the broad class of novel estimators.
We start by explaining what closed testing is and how it can be used to obtain median unbiased estimators.The closed testing principle goes back to Marcus et al. (1976) and can be used to construct multiple testing procedures that control the family-wise error rate.Goeman and Solari (2011) show that such procedures can be extended to provide confidence bounds for the number of true hypotheses in all sets of hypotheses simultaneously.They construct (1 − α)100%-confidence upper bounds -(1 − α)-bounds for short -for the FDP, where α ∈ (0, 1).In this paper, we always consider γ = α = 0.5.
Let C be the collection of all nonempty subsets of {1, ..., m}.For every I ∈ C consider the intersection hypothesis H I = ∩ i∈I H i .This is the hypothesis that all H i with i ∈ I are true.For every I ∈ C, consider some local test δ(I), which is 1 if H I is rejected and 0 otherwise.Assume the test δ(N ) has level at most α, so that P(δ(N ) ≥ 1) is bounded by α.Define X = {I ∈ C : δ(J) = 1 for all I ⊆ J ⊆ C}.
The general closed testing procedure rejects all intersection hypotheses H I with I ∈ X .It is well-known that this procedure controls the familywise error rate (Marcus et al., 1976).In Goeman and Solari (2011) it is shown that we can also use the set X to provide a (1 − α)confidence upper bound for the number of true hypotheses in any I ∈ C.They show that t α (I) := max{J ⊆ I : J ̸ ∈ X }. is a (1 − α)-confidence upper bound for |N ∩ I|.In fact, they show that the bounds t α (I) are valid simultaneously over all I ∈ C : The proof is short: H N is rejected with probability at most α, and if it is not rejected, then X contains no (sets of indices of) true hypotheses, which implies that |N ∩ I| ≤ t α (I) for all I ∈ C. A different method, formulated in Genovese and Wasserman (2006), turns out to lead to the same bounds.This was first noted in the supplementary material of Hemerik et al. (2019) and in Goeman et al. (2021).
We now turn to a closed testing procedure inspired by the Schweder-Spjøtvoll-Storey estimator, which will lead to the same estimates as obtained in the previous sections.We only assume that Assumption 1 is satisfied and, for convenience, that N > 0. Let 1(•) be the Take α = 0.5.It follows from Assumption 1 that P(δ(N ) = 1) ≤ α.Consequently the bounds t α (I), I ∈ C, are simultaneous 50%-confidence upper bounds, i.e., the inequality ( 23) is satisfied for α = 0.5.In particular, t α ({1, ..., m}) is a bound for the total number of true hypotheses, N .For every 1 ≤ a ≤ m, Q a be the set of indices of the a largest p-values, with ties broken arbitrarily.For t ∈ (0.0.5] we have t α ({1, ..., m}) = max{J ⊆ {1, ..., m} : By a similar argument, we get the same result when t ∈ (0.5, 0).Dividing this estimate by m gives precisely our estimate π ′ 0 .Thus, based on the closed testing principle we obtain the same bound as using the argument in Section 2.1.Now let t ∈ (0, 1) be a threshold and consider the rejected set R(t) = {1 ≤ i ≤ m : p i ≤ t}.Then one can check that t α (R) is precisely the bound V (t) from section 2.3.Thus, the closed testing principle gives the same estimate as obtained before.Below, we will consider alternative local tests δ, to obtain different methods.

G.2 Different median unbiased estimates
We will now consider a more general class of local tests δ, which lead to estimates different from the ones considered until now.Consider any non-decreasing, data-independent function ψ : [0, 1/2] → R. For every I ∈ C, define where I − = {i ∈ I : p i ≤ t}, I + = {i ∈ I : p i ≥ 1−t}.This is a generalization of the definition of W − I and W + I from section G.1.Indeed, if we take ψ ≡ 1, then the definitions ( 24) and (25) coincide.
We make the following assumption, which is a generalization of Assumption 1 in the main paper.
Assumption 3. The following holds: (If N = 0, assume nothing.) In case ψ ≡ 1, the above assumption is the same as Assumption 1 in the main paper.We noted in section 2.2 that that assumption is satisfied in particular if (q 1 , ..., q N ) and (1 − q 1 , ..., 1 − q N ) have the same distribution.Note that Assumption 3 is then satisfied as well for general ψ.
For every I ∈ C we now consider the general local test where W − I and W + I depend on ψ as in the definition (25).This general local test defines a general closed testing method that depends on ψ.We again denote the collection of sets rejected by the closed procedure by X .Based on this general closed tesing procedure we obtain ψ-dependent bounds t α (I).Like before we have This is a general, ψ-dependent, median unbiased estimator of N .Now suppose we use a rejection threshold t ∈ (0, 1/2], i.e., we reject all hypotheses with indices in R(t).For every 1 ≤ a ≤ R(t), define Q t a to be the set containing the indices of the largest a p-values that are strictly smaller than t (with ties broken arbitrarily).
Note that Q t a ̸ ∈ X if and only if its superset J := Q t a ∪ {i : p i ≥ 1 − t} is not rejected by its local test δ(J), i.e. when W − J ≤ W + J , i.e. when W − Q t a ≤ W + {i:p i ≥1−t} .Hence the quantity ( 28) is equal to The bounds V ψ (t) can be immediately used within the theorems in the main paper to obtain confidence envelopes and FDP controlling procedures.For good performance, it can be necessary to adapt the set B of candidate envelopes in an appropriate way depending on the choice of ψ.
We will now discuss two new examples of functions ψ, namely ψ(x) = x and ψ(x) = x 2 .If ψ(x) = x, then for I ∈ C we have

Figure 1 :
Figure 1: Illustration of the computation of the Schweder-Spjøtvoll-Storey estimate π0 , based on 500 sorted simulated p-values.The dashed, straight line is constructed in such a way that it goes through both (500,1) and the point where the dotted line intersects the curve of p-values, roughly speaking.

Figure 2 :
Figure 2: Graph showing the number of rejections and two confidence envelopes for the running example.The solid line shows the number of rejections, which depends on the rejection threshold t.The other lines are two confidence envelopes B. These are simultaneous 50%confidence upper bounds for the number of false positives V (t).The intercept and slope of B depend on the user-specified constant c.Note that for c = 0, the intercept is slightly smaller than for c = 0.004.Indeed, the intercepts are 0 and 2 respectively.

Figure 3 :
Figure 3: Illustration of the construction of the confidence envelope for the running example.For every rejection threshold t, V (t) is a 50% confidence upper bound for the number of false positives, V (t).The confidence envelope B(t) is constructed in such a way that it lies above the pointwise bound V (t) for all t ∈ T. Due to this construction, the bounds B(t) are simultaneous 50%-confidence bounds for V (t).The intercept and slope of B are influenced by the choice of c.

Figure 9 :
Figure 9: The simultaneous 50%-confidence upper bound F DP (t) := B′ (t)/R(t) as function of the rejection threshold t.For several values of t, R(t) is shown at the top of the graph.
indicator function.For every I ∈ C, consider the local testδ(I) = 1 i ∈ I : p i ≤ t > i ∈ I : p i ≥ 1 − t = 1 W − I > W + I , where W − I = i ∈ I : p i ≤ t , W + I = i ∈ I : p i ≥ 1 − t .

Table 2 :
The power of BH and two adaptive versions of BH.The target FDR α, i.e., γ, was 0.05.Each estimate in the table is based on 10 4 simulations.

Table 3 :
The expected value of F DP/α, depending on the total number of false hypotheses and the effect size ∆.