On Selecting and Conditioning in Multiple Testing and Selective Inference

We investigate a class of methods for selective inference that condition on a selection event. Such methods follow a two-stage process. First, a data-driven (sub)collection of hypotheses is chosen from some large universe of hypotheses. Subsequently, inference takes place within this data-driven collection, conditioned on the information that was used for the selection. Examples of such methods include basic data splitting, as well as modern data carving methods and post-selection inference methods for lasso coefficients based on the polyhedral lemma. In this paper, we adopt a holistic view on such methods, considering the selection, conditioning, and final error control steps together as a single method. From this perspective, we demonstrate that multiple testing methods defined directly on the full universe of hypotheses are always at least as powerful as selective inference methods based on selection and conditioning. This result holds true even when the universe is potentially infinite and only implicitly defined, such as in the case of data splitting. We provide a comprehensive theoretical framework, along with insights, and delve into several case studies to illustrate instances where a shift to a non-selective or unconditional perspective can yield a power gain.


Introduction
When many potential research questions are considered simultaneously, researchers often only report a subset of the findings, typically the most striking, interesting, or surprising ones.When interpreting results selected in this way, it is crucial to recognize that the evidence for the findings may be exaggerated due to the selection process.The field of selective inference, also known as multiple testing, strives to adjust inference for this data-driven selection of research questions.Selective inference methods ensure that the number or proportion of incorrect findings among the final reported findings remains small.The selective inference literature is large and well-established (Benjamini, 2010;Dickhaus, 2014;Taylor and Tibshirani, 2015;Taylor, 2018;Benjamini et al., 2019;Cui et al., 2021;Kuchibhotla et al., 2021;Zhang et al., 2022).Classic approaches in the field either control of the familywise error rate or the false discovery rate.
Recently, a two-step approach to selective inference has gained popularity (Fithian et al., 2014;Lee et al., 2016;Tibshirani et al., 2016;Charkhi and Claeskens, 2018;Bi et al., 2020).In this conditional approach, the data are first used to select a small set of hypotheses of interest from a large universe of hypotheses.Next, inference is conducted on the selected hypotheses using the same data, but conditional on the information used for the selection.The conditional approach can be seen as a sophisticated generalization of data splitting.In data splitting, part of the subjects are used to select hypotheses, and the rest for inference on them.Conditional

Multiple testing adjustment of selection-adjusted p-values
Having calculated selection-adjusted p-values, the usual next step is to decide which of the hypotheses in S can be rejected.A method must be decided for this, be it simply to reject all hypotheses with p H|S ≤ α for some α, or some more sophisticated multiple testing procedure.Whatever method was chosen, the end result is a random set R ⊆ S of rejected hypotheses.
There are different views on the properties the set R should have, but generally the focus is on avoiding false discoveries.Let T P = {H ∈ U : P ∈ H} be the collection of all true hypotheses in U .Rejection of R induces |R ∩ T P | false discoveries, giving a false discovery proportion of To keep false discoveries in check we can control the expectation of some error rate e P (R), for which there are many choices (Benjamini, 2010;Benjamini et al., 2019), e.g., e P (R) = f P (R) to control FDR; e P (R) = 1 fP(R)>0 to control FWER; or e P (R) = 1 fP(R)>γ to control FDX-γ.We assume that 0 ≤ e P (R) ≤ 1, and that e P (R) = 0 whenever R ∩ T P = ∅.
To control a chosen error rate, we bound its expectation by α.There are two flavors here.We can control the error rate conditional on S, requiring that, for every P ∈ M and every S ∈ S, where E P (•) = Ω • dP is the expectation corresponding to P. Alternatively, we can aim for unconditional control, requiring that, for every P ∈ M , E P [e P (R)] ≤ α.
Most authors in conditional selective inference advocate control of the conditional error rate (Fithian et al., 2014;Lee et al., 2016;Kuffner and Young, 2018), though it has been shown that conditioning can sometimes be problematic (Kivaranovic andLeeb, 2020, 2021).Other authors, however, have argued for the unconditional error rate, sometimes finding that it leads to more power (Wu et al., 2010;Andrews et al., 2019Andrews et al., , 2022)).Indeed, the conditional error rate is the more stringent one, since conditional control implies unconditional control.
In the toy example, multiple testing is an issue only if S = {1, 2}.If we choose to control FWER at level α, we may use the methods of Hochberg (1988) or Hommel (1988), which are equivalent in the case of two hypotheses.This method rejects each H i if P i|S ≤ α/2, and rejects both hypotheses if P 1|S and P 2|S are both at most α.The resulting procedure is displayed graphically on the left-hand side of Figure 1.Alternatively, we may choose to control FDR.With two hypotheses, the procedure of Benjamini and Hochberg (1995) is equivalent to the Hommel/Hochberg-procedure just described, and controls FWER as well as FDR.For controlling FDR we can do uniformly better with the minimally adaptive Benjamini-Hochberg procedure (MABH, Solari and Goeman, 2017).In the case of two hypotheses, this procedure also uniformly improves the adaptive procedure of Benjamini et al. (2006).MABH rejects each H i if P i|S ≤ α/2; it rejects both hypotheses if either P 1|S and P 2|S are both at most α, or if the smallest is at most α/2 and the largest at most 2α.It is displayed graphically in the middle part of Figure 1.
So far we have assumed that the error rate only depends on R, but not on S.This assumption excludes the rate that is implied by inference based on confidence intervals controlling the False Coverage Rate (FCR, Benjamini and Yekutieli, 2005).This is also the rate that is controlled if we do no further multiple testing adjustment on the Figure 1: A simple conditional selective inference procedure for two hypotheses inspired by Zhao et al. (2019) and Ellis et al. (2020).The left-hand procedure controls FWER, the middle one FDR, and the right-hand side the FCR-inspired error rate (2).The sets displayed in the upper right corner of each quadrant are the realisation of S in that quadrant.Grey indicates areas in which one hypothesis is rejected; black indicates areas in which both are rejected.The plot uses λ = 0.7 and α = 0.3.
selection-adjusted p-values, but simply reject R = {i : P i|S ≤ α}.This procedure is given on the right-hand side of Figure 1.In the next few sections we will assume that the error rate is a function of R only, but we return to S-dependent error rates in Section 12.

A holistic perspective and main observation
The approaches described in the previous two sections can be seen as two-stage methods.First, from a universe U of hypotheses a selection S ⊆ U is made.Next, within that selection some hypotheses are rejected, while others are not, and we return R ⊆ S. The set R is the final result of any method; it is the set we make inferential claims about.Rather than analyzing the two steps U → S and S → R separately, in this paper we will take a holistic perspective, viewing the two steps together as a single method U → S → R, or briefly U → R. By viewing the two steps together we stress that the selection step U → S and the rejection step S → R are in the hands of the same analyst.The analyst chooses a method for the selection step U → S and a method for the inference step S → R. The analyst also chooses what part of the information in the data to spend for the selection step, and what part of the data to reserve for the inference step.
In the holistic perspective, the choice of S, in a procedure U → S → R, is, therefore, part of the method, and this part may be optimized.The holistic perspective implies that such optimization should be focused on obtaining a larger or more useful set R, since R, not S, represents the final inference of the method.In general, we would like to have as many rejections as possible, while keeping the chosen error rate under control.Moreover, from the holistic perspective all rejections of hypotheses in U are welcome, since every hypothesis in U could have been in S.
In the toy example, we can visualize the holistic view of the three procedures simply by removing all reference to S in Figure 1, as shown in Figure 2.This now displays three single-step procedures, defined directly on the universe U = {1, 2}, and based on the non-selection-unadjusted P 1 and P 2 .The rejected sets R for the procedures in Figure 2 are trivially identical to those of their counterparts in Figure 1.However, in the holistic perspective of Figure 2, the λ that previously determined S now becomes a tuning parameter, freely to be chosen by the analyst before seeing the data.The holistic perspective de-emphasizes the importance of S. Viewed from the holistic perspective, we see that S plays two distinct roles in conditional selective inference.In the first place, S focuses the attention of the multiple testing procedure to hypotheses in S, restricting R to be a subset of S. This is the selective property of the procedure.Secondly, by conditioning on S, the procedure ignores the information used to find S for the final inference.This is the conditional property of the procedure.We see both roles of S in the procedures of the toy example in Figure 1.The procedure never rejects hypotheses outside S, so it is selective.We can see that the procedures is conditional, because the procedure in each Sdefined quadrant is a valid multiple testing procedure by itself: if we would stretch any quadrant to cover the entire unit square, we would obtain a method with valid FWER, FDR or FCR control, respectively.
The holistic perspective allows us to decouple the selective and conditional properties of conditional selective inference.We call a procedure U → R selective on S ′ if, surely for all P ∈ M , R ⊆ S ′ .We call U → R conditional on S ′′ if it controls its error rate conditionally on S ′′ , i.e., if, surely, E P [e P (R, S ′′ ) | S ′′ ] ≤ α.By design, a conditional selective procedure U → S → R is selective on S and conditional on S.However, the same procedure may be selective or conditional on sets it was not constructed around.Procedures are always selective on sets that are surely larger than S, and every procedure is, trivially, selective on R. Every procedure that is conditional on S is also conditional on U \ S, since S and U \ S carry the same information.In Figure 2 we may verify that all three procedures are conditional and selective on, for example, S ′ = {i : P i ≤ (1+λ)/2}.
In an important special case, every procedure is selective on U , since R ⊆ U by definition.Moreover, every procedure is conditional on U , since the conditional error rate for U is the unconditional error rate, and control of any conditional error rate implies control of the unconditional error rate.This brings us to our first main observation: For every conditional selective multiple testing procedure on S there exists a conditional selective procedure on U , i.e. an unconditional, non-selective procedure, that always rejects at least as many hypotheses.
Observation 1.Let U → S → R be a conditional selective inference procedure with the property that R ⊆ S surely, and that E P [e P (R) | S] ≤ α, surely, for all P ∈ M .Then there exists a procedure U → R ′ such that R ′ ⊇ R surely, and E P [e P (R ′ )] = E P [e P (R ′ ) | U ] ≤ α for all P ∈ M .
To prove Observation 1, simply take R ′ = R and observe that E P [e P (R)] = E P [E P {e P (R) | S}].We call Observation 1 an observation rather than a theorem or proposition, because as a mathematical result it is completely trivial: if we do not restrict to R ⊆ S but allow the method also to reject hypotheses in U \ S, it may achieve more rejections that way; if we do not condition on S, we retain more information for finding a possibly larger R. Observation 1 is merely an immediate consequence of the holistic perspective we have adopted.
However, Observation 1 answers the important question how much of the information in the data to allocate to the selection step U → S and how much to the rejection step S → R. According to Observation 1, the optimal choice is always simply to take S = U .Without losing power, we can allocate zero information to the selection step, and retain all of our information for the rejection step.This is an important insight.
5 First example: the toy example Observation 1 says that a holistic method U → R ′ always exists that is at least as powerful, in the sense that R ′ ⊇ R, as a conditional selective procedure U → S → R.However, it does not show that it is always possible to achieve a true improvement, nor does it show how to find such an improvement if it exists.However, there are many cases in which substantial improvement over a conditional selective procedure is possible.
In this section we will illustrate this with the toy example of Figure 1, focusing on its FDR-controlling variant.The toy example will help to build an intuition for the general case.As a preview, Figure 3 displays the FDR-controlling conditional selective procedure (top-left), with two uniform improvements top-right and bottom-left.The bottom-left procedure is not selective on S, sometimes rejecting hypotheses outside S, but still controls FDR conditional on S. The top-right procedure still selective on S, guaranteeing R ⊆ S, but only has unconditional FDR control.The standard MABH procedure is given at bottom right for comparison.
Figure 3: The conditional selective procedure of the toy example, controlling FDR, with its conditional and selective improvements.The MABH procedure is given as reference.Grey indicates areas in which one hypothesis is rejected; black indicates areas in which both are rejected.Here, λ ′ = 1 − λ, and α ′ = α/(2λ − λ 2 ).
How did we arrive at these improvements?For the conditional improvement (bottom left), we keep aiming for control of FDR conditional on S, but we allow the procedure to reject hypotheses in U \ S. To do this, we also calculate selection-adjusted p-values P i|S for i / ∈ S. We obtain While the selection-adjusted p-values are larger than the original ones for i ∈ S, the reverse is true when i / ∈ S. Next, we extend the procedure by continuing to test hypotheses in U \ S after all hypotheses in S are rejected.If S = {1, 2} the procedure is not changed.If S = {1} and H 1 was rejected, we may continue to test H 2 , rejecting when P 2|S={1} ≤ 2α, and analogous for S = {2}.This fixed-sequence procedure (conditional on S = {1}) is easily seen to be valid for FDR control, and is related to fixed-sequence FDR-controlling procedures by Farcomeni and Finos (2013) and Lynch et al. (2017).If S = ∅, rather than rejecting nothing, we may use a MABH procedure on P 1|S=∅ and P 2|S=∅ .
The resulting procedure, quite a strange one, is given at bottom-left in Figure 3.It consists of four miniature multiple testing procedures, applied to conditional p-values, and valid conditional on S for the four realizations of S. For S = {1, 2} and S = ∅ we have a conditional MABH; for S = {1} and S = {2} was have a fixed-sequence FDR-controlling procedure, prioritizing the hypothesis in S. The resulting procedure clearly uniformly improves the procedure of Figure 1.It does this by also considering hypotheses outside S for rejection.However, the improved procedure retains the property that it controls FDR conditional on S, since each of the miniature procedures is valid for FDR control.
A different type of improvement may be achieved if we are willing to give up on conditional FDR control.This is shown in the top-right of Figure 3.The improvement comes in two parts.First, we remark that the original procedure does not exhaust the α-level under the global null hypothesis: if H 1 ∩ H 2 is true, FDR is controlled at level (2λ−λ 2 )α.We can therefore gain power by starting the procedure at level α ′ = α/(2λ−λ 2 ) instead of at α. Secondly, after the original procedure has rejected H 1 , it rejects H 2 if P 2|S={1,2} ≤ 2α, i.e., when P 2 ≤ 2λα.If we are not doing conditional control, however, there is no need to use the conditional p-value, and we may alternatively reject H 2 , after we have rejected H 1 , simply if P 2 ≤ 2α.The procedure resulting from these two improvements is given at the top-right of Figure 3.The procedure's FDR control is not conditional on S anymore, but it remains selective on S, assuming λ ≥ 2α.The validity of this new procedure may not be immediately obvious; we prove it in in the following lemma.
Proof.We prove FDR control separately the cases that 2, 1, or 0 null hypotheses are true.We remark that the procedure is visualized on the right-hand side of Figure 3.
Suppose H 1 and H 2 are true.Then the false discovery proportion is 1 whenever at least one rejection occurs.Since P 1 and P 2 are independent and standard uniform, this probability can be checked to be Suppose H 1 is true, but H 2 is not.Then the false dicovery proportion is 1/2 in the black area of Figure 3, 0 in the upper left grey area, and 1 in the lower right grey area.
If H 1 and H 2 are both false then FDP is always 0 and there is nothing to prove.
We have constructed two improvements of the conditional selective procedure we started with.One of the procedures retains the property of the original procedure that it controls FDR conditional on S, the second retains the property that it only rejects hypotheses in S. The holistic perspective, however, does not care about S or about properties relating to S. It sees these two new methods simply as uniform improvements of the original that never reject fewer hypotheses and sometimes more.One of these, the bottom-left one, is arguably somewhat weird and difficult to motivate from a holistic perspective (compare Berger (1989)'s tests improving the likelihood ratio test and Perlman and Wu's (1999) discussion); the top-right one seems more reasonable.
As a fourth procedure, bottom-right in Figure 3, we have given the regular MABH procedure, that does not attempt to be conditional or selective on S.This might be the procedure we would have chosen if we would have adopted a holistic perspective from the beginning.In this particular case, MABH actually happens to be selective on S (as long as λ ≥ 2α).Comparing the conditional procedure (bottom-left) to MABH, we see a massive shift of power away from S = {1, 2} towards S = {1}, S = {2}, and S = ∅.Comparing the selective procedure (top-right) to MABH, we see that, while both procedures are selective, the original MABH still focuses relatively more power on S = {1, 2}; the top-right procedure still has a relatively large focus on small sets S.This focus actually chimes with the motivation of the procedure we started from: Zhao et al. (2019) and Ellis et al. (2020) advocated their method for an application context in which null p-values tend to be near 1, so that S = {1} and S = {2} are relatively likely.
The comparison with MABH also serves to illustrate that uniformly improving a method U → S → R by U → U → R ′ , with the requirement that R ′ ⊇ R, is not usually a question of simply adjusting the tuning parameter λ in such a way that S becomes U .The MABH procedure (bottom-right), resulting from the choice λ = 1 in the conditional selective method (top-left), will be a more powerful method in many situations, but is not a uniform improvement of the original method unless λ ≤ 1/2.Generally, finding a true uniform improvement, in the sense that R ′ ⊇ R surely for all P ∈ M , involves much more work than merely adjusting a tuning parameter.
Comparing the conditional selective procedure and its two improvements, we see that the conditional selective procedure is exactly the intersection of its conditional and its selective improvement: it rejects either of H 1 , H 2 if and only if both the selective and the conditional improvements do.Compared to the conditional selective procedure, the selective improvement may have additional rejections if S = {1, 2}, while the conditional improvement cannot.On the other hand, the conditional improvement may have more rejections if S = ∅, while the selective procedure remains powerless there.If S = {1} or S = {2}, both procedures may have additional rejections compared to the conditional selective procedure.However, the selective procedure has more chance of rejecting the hypothesis in S, while the conditional procedure may additionally reject a hypothesis outside S. The two improvements are, in this sense, disjoint.
The two improvements in Figure 3 are easy to generalize to the case of more than two null hypotheses.They illustrate an important general principle about selection and conditioning in multiple testing.This principle says that selection and conditioning each pull a procedure in opposite directions.Conditioning forces a procedure to distribute its power evenly over the outcome space, since the procedure must have proper error control on all realizations of S, conditional on S. Selection, on the other hand, focuses the power of procedures away from hypotheses in U \ S, since it restricts rejections to S. A procedure that is both selective and conditional must therefore necessarily focus power both away from S and away from U \ S. Since there is nowhere for the power to go, it vanishes.The conditional selective procedure at top left, being the intersection of a conditional and a selective procedure, is therefore sub-optimal as either.It is definitely sub-optimal from the holistic perspective.
We call a conditional selective inference procedure U → S → R inadmissible if U → R ′ exists that uniformly improves upon U → S → R in the sense that R ⊆ R ′ , surely for all P ∈ M , and P(R ⊂ R ′ ) > 0 for at least one P ∈ M , while still controlling the error rate, i.e., E P [e P (R ′ )] ≤ α.We will be a bit more precise and call U → S → R inadmissable as a selective method on S if the uniform improvement still satisfies R ′ ⊆ S, surely.Similarly, we call U → S → R inadmissable as a conditional method on S if the uniform improvement still controls its error rate conditional on S. Remember, however, that from the holistic perspective we do not care too much about S or about these sub-classes of inadmissibility.
Our definition of a uniform improvement is very strict (as in Goeman et al., 2021), requiring that R ′ ⊆ R for every outcome ω ∈ Ω.A uniform improvement, therefore, can never fail to reject a hypothesis that the method it improves does reject.This requirement makes admissibility a very low bar to achieve.For example, a FWER-controlling method that rejects all hypotheses in U with probability α, independently of the data, and rejects nothing with probability 1 − α, is admissible according to our definition.Since admissibility is so easy to achieve, inadmissibility is particularly bad news.
We will give several sufficient conditions for inadmissibility of conditional selective methods.Propositions 1, 2 and 3 apply to any error rate.Proposition 4 is only for FWER control.
It follows that R ′ controls the unconditional error rate.
Noting that q < 1, we have P(R ⊂ R ′ ) = (1 − q)P(R ⊂ S) > 0 for at least one P ∈ M unless R = S surely for all P ∈ M , so R ′ uniformly improves over R.
Finally, we have trivially that R ⊆ S surely for all P ∈ M .It follows that U → S → R is inadmissible as a selective procedure on S.
In words, Proposition 1 says that any conditional selective procedure is inadmissible if, with positive probability, the selection step results in a set S without true hypotheses (for examples, see Ellis et al., 2020;Al Mohamad et al., 2020;Heller and Solari, 2023).In this case, it is impossible to make false discoveries, and the α for such S can be better spent elsewhere.The condition of the proposition implies that S has FWER control at level δ, but allows δ > α.The proposition does not apply when R = S surely, but we come back to that case in Observation 4 in Section 12.
Proposition 2. If P(S = ∅) > 0 for some P ∈ M , then U → S → R is inadmissible as a conditional procedure on S. It is inadmissible as a selective procedure on any S ′ for which S ′ ⊇ S surely for all P ∈ M , and S ′ ̸ = ∅ surely for all P ∈ M .
Proof.Choose any S ′ that fulfils the assumptions, noting that S ′ = U always fits.Let R ′ = R if S ̸ = ∅, and if S = ∅, let R ′ = S ′ with ancillary probability α and R ′ = ∅ otherwise.Then by assumption there exists ≤ α by construction, since e P (∅) = 0.This proves inadmissibility as a conditional method on S. Noting that conditional control on S implies unconditional control, and that R ′ ⊂ S ′ surely, we have inadmissibility as a selective method on S ′ .Proposition 2 says that a conditional selective procedure may be improved if it sometimes selects S = ∅.There is a subtle but important difference with Proposition 1: if P(S = ∅) > 0 for all P ∈ M , then we would fulfil the conditions for Proposition 1, but Proposition 2 only requires that this happens for at least one P ∈ M .Intuitively, if S = ∅ sometimes, we can make no errors in that case, and we can spend the α allocated to that case elsewhere.
and P(R = S) < 1 for at least one P ∈ M , then U → S → R is inadmissible as a selective method.
Proof.Note that sup P∈M E P [e P (R) | S)] ≤ α for all S, since U → S → R controls its error rate conditionally.Therefore, the right-hand side of ( 4) is at most α.Let α ′ = sup P∈M E P [e P (R)], so α ′ < α.Let R ′ = R with probability 1 − q, and R = S with probability q, where q = (α − α ′ )/(1 − α ′ ).Then we have By the condition of the proposition, there is a To understand Proposition 3, note that the left-hand side of ( 4) is equal to so that (4) holds with ≤ by definition.Unconditional control bounds the left-hand side of (4) by α, while conditional control implies that the right-hand side of (4) is bounded by α.Any gap between the two can be exploited by an unconditional test to gain power.Such a gap may arise if the 'worst case' P, for which the conditional α-level is exhausted, depends on S. We give an example in Appendix A.
Proposition 4. If U → S → R controls FWER conditional on S, and there exists P ∈ M such that P(R = S | S) > 0 for some S ⊂ U , then U → S → R is inadmissible as a conditional procedure on S.
Proof.With probability 1−α, let R ′ = R, and with probability α, let R ′ = U if R = S, and R ′ = R otherwise.
We will prove that R ′ controls FWER conditional on S for every P ∈ M .We have either T P ∩ S = ∅ or T P ∩ S ̸ = ∅.In the former case, P(R ′ ∩ T P ̸ = ∅ | S) ≤ P(R ′ ⊇ R) ≤ α, since it is not possible to make a Type I error with R ⊆ S. In the latter case, It follows that R ′ controls FWER conditional on S for every P ∈ M .According to the assumption, there exists It follows that R is inadmissible as a conditional procedure on S.
Proposition 4 exploits the Sequential Rejection Principle (Goeman and Solari, 2010), which says that if we reject all hypotheses under consideration, we may recycle the α and continue testing with a new batch.For a conditional selective procedure, this means that if we have exhausted all hypotheses in S, we may continue testing hypotheses in U \ S.
In the toy example, we see that the conditions of Propositions 1, 2 and 4 are all fulfilled, provided that λ < 1.The probability that we select only false null hypotheses is (1 − λ) 2 , 1 − λ, or 1 respectively in the situation that 2, 1 or 0 hypotheses are true, so the condition of Proposition 1 is fulfilled with δ = (1 − λ) 2 .Under P ∈ H 1 ∩ H 2 we have P(S = ∅) = (1 − λ) 2 > 0, so also the condition of Proposition 2 is fulfilled.Finally, if FWER was controlled, take S = ∅; then all hypotheses in S are rejected with positive probability for every P ∈ M , conditional on S = ∅.It may seem from this checking of the conditions that the crucial characteristic that makes the procedure in the toy example inadmissible is the fact that it selects S = ∅ with positive probability.However, this is not the only driving factor.For example, perhaps the most important improvement of the top-left over the top-right procedure in Figure 3 is the increase of the critical value from 2λα to 2α for rejecting the second hypothesis after rejecting the first.This change is not tied to the selection of S = ∅ in any way.The propositions of this section are sufficient conditions for inadmissibility, but they are by no means necessary.We will see examples of improvements of procedures that never select S = ∅ in Sections 7 and 8.
The propositions in this section should be seen as examples of classes of procedures that might be improved by letting go of selection and conditioning.The emphasis was on uniform improvements.Often, procedures may be constructed that do not necessarily uniformly improve upon the original, but are substantially more powerful for relevant alternatives.An example is the standard MABH in the toy example, which, although not a uniform improvement over the original, has much larger rejection regions for both H 1 and H 2 .
7 Second example: conditioning on the winner The toy example that we considered thus far may have seemed to hinge much on the property that it selected S = ∅ with positive probability.Here, we look at a situation in which P(S = ∅) = 0 for all P ∈ M .
The hypotheses that attract most attention in publications are generally those with smallest p-values.It is of interest, therefore, to consider selection rules based on ranks.Selective inference for such selections, "inference on winners", has been considered by Zhong and Prentice (2008) 2022); Zrnic and Fithian (2022).We consider the simplest set-up here, where we select only a single "winner".In this set-up, we consider the question whether the winner is truly non-null.
Let P 1 , . . ., P n be independent p-values, standard uniform under their respective null hypotheses H 1 , . . ., H n , so that U = {1, . . ., n}.We consider the selection rule that selects the single hypothesis for which the p-value is smallest, with ties broken arbitrarily, so that |S| = 1 always.
If we want to condition on the selection event S = {i}, we cannot simply reject for small values of P i , adjusting the critical value for the selection event as we did in the toy example of Figure 1.To see why this would be problematic, consider a set-up with n = 2 in which H 1 is null, but H 2 is not.Then Since P 2 is under the alternative, its distribution is arbitrary, so it could be uniform on [0, t].In that case, (5) evaluates to 1. Therefore, for every t > 0, there exists a P ∈ M such that P(P 1 ≤ t | S = {1}) = 1.Therefore, it is impossible to bound (5), in supremum over P ∈ M , by α.Consequently, it impossible to construct a conditional selective procedure that rejects for small values of P i .
A way out of this conundrum was offered by Reid et al. (2017), who proposed to use as an alternative test statistic P i|S={i} = P i / min j̸ =i P j .Conditional on S = {i}, we have that P i / min j̸ =i P j is standard uniform for all P ∈ H i , as Lemma 2 states.Based on this lemma we can construct a conditional selective inference procedure.It rejects H i , i ∈ S, when P i / min j̸ =i P j ≤ α.We call this Procedure A.
Proof.Choose any P ∈ H i .We have where we use that min j̸ =i P j and P i are independent.Taking expectations conditional on S = {i} on both sides, the result follows.
What error rate does this conditional procedure on S control?On a family S of only one hypothesis, unadjusted testing, FCR, FWER and FDR control are all identical; Procedure A, therefore, controls all these error rates simultaneously.To construct potential improvements of the method, we must, therefore, decide which error rate to retain control of.We choose FDR for this example.
As in Section 5 we will construct three alternative procedures.The first, Procedure B, retains validity conditional on S, but possibly rejects hypotheses outside S. The second, Procedure C, will have unconditional FWER control, but still only rejects hypotheses within S. The third procedure, Procedure D, will be fully unconditional and defined on U .
To construct procedure B, we must extend the notion of conditional p-values for H j , j / ∈ S. We need the following lemma.
Proof.Choose any P ∈ H j .We have where we use that (P k ) k̸ =j and P j are independent.Taking expectations on both sides, we have the required unconditional uniformity.Since the conditional probability does not depend on (P k ) k̸ =j , it follows that (P j − P i )/(1 − P i ) is independent of these p-values.
We will use P i|S = (P j − P i )/(1 − P i ) for j ̸ = i.As in Section 5, we see that adjustment for nonselection results in p-values that are smaller than their unadjusted counterparts, rather than larger.Procedure B will be a two-step method based on these selection-adjusted p-values.Let i be such that S = {i}.Then, first, the procedure tests H i , rejecting if P i / min j̸ =i P j ≤ α.If it fails to reject H i , the procedure stops.Otherwise it continues with a BH-procedure at level α ′ = nα/(n − 1) on the n − 1 hypotheses H j , j ̸ = i, using (P j − P i )/(1 − P i ) as p-values.This procedure clearly uniformly improves upon Procedure A if n > 1.The validity of this procedure is proved by Lemma 4 below.
Proof.Let R denote the rejected set of Procedure B. We condition on S = {i}.Choose any P ∈ M .We either have P ∈ H i , or P / ∈ H i .If P ∈ H i , then by Lemma 2 we have that P(P i / min k̸ =i P k ≤ α) ≤ α, so R = ∅ with probability 1 − α, so FWER is controlled given S = {i}, so FDR is controlled given S = {i}.If P / ∈ H i , then let R ′ be the rejected set of the second step of the procedure.By Lemma 3, this step is applied on independent and uniform p-values, given S = {i}.By Benjamini and Hochberg (1995), therefore, Since R is either the empty set or R ′ ∪ {i}, we have, using that i / ∈ T P and It follows that so Procedure B also controls FDR given S = {i} when P / ∈ H i .
For Procedure C, we ignore the conditioning on S = {i}, but still restrict rejection to S only.This means that we can simply reject H i for small P i .By independence of the p-values, we may reject H i when P i ≤ 1 − (1 − α) 1/n .This is Procedure C. For Procedure D, the fully unconditional procedure, we simply choose the familiar BH-procedure.
While Procedure B uniformly improves upon procedure A, the unconditional Procedures C and D do not.To see this, consider the situation that P 2 , . . ., P n are always equal to 1 (which they could be under the alternative, or if null p-values are allowed to be stochastically larger than uniform).In that case, Procedures A and B reject H 1 if P 1 ≤ α, while Procedures C and D need P 1 ≤ 1 − (1 − α) 1/n and α/n, respectively.We compared the four procedures in a simple simulation.Out of 100 hypotheses, from 0 to 10 were considered to be under the alternative, getting a p-value based on a one-sided normal test with a mean shift of 3; the remaining p-values were standard uniform.Figure 4 reports the expected number of rejected hypotheses for each of the methods A, B C and D. We see that the original conditional Procedure A is very much directed toward sparse alternatives, even losing power as the density of the signal increases.In contrast, all other methods gain power with increasing signal.The unconditional Procedure C, which like Procedure A only ever rejects the winner, rejects it with larger probability than Procedure A for all scenarios.The fully unconditional BH method, although not a uniform improvement, is the clear overall winner, rejecting most hypotheses on average even in the sparse scenarios.

Data splitting and carving
Data splitting is perhaps the archetypal conditional selective inference method.It splits the data into two parts, using the first part for selecting S, and the second part for inference.Standard data splitting splits the data by subjects.Data carving is a more advanced version of data splitting (Fithian et al., 2014;Panigrahi, 2018;Schultheiss et al., 2021) that uses alternative ways of splitting the information in the data into independent parts, and use the data more efficiently that way.We show that data splitting and carving are inadmissible in general, at least for FWER control.
A special feature of data splitting is that the selection step that results in S is completely unconstrained, as long as the selection remains independent of the second part of the data.This implies that the universe U from which S was chosen is in principle infinite.The inadmissibility conditions of Section 4 still apply, however.We have a simple corollary to Proposition 4, due to the infinite nature of U .The inefficiency of data splitting has been noted by other authors.Jacobovic (2022) established inadmissibility of Moran's (1973) data-split test, and Fithian et al. (2014) have shown that data splitting yields inadmissible selective tests in exponential family models.
Proposition 5. Data splitting is inadmissible as a selective method for FWER control if U is infinite and S is almost surely finite.
Proof.By Proposition 4, since S ̸ = U almost surely, it is sufficient to show that P(R = S) > 0 for some P ∈ M .Choose any V ⊆ U .We will show that P(R = S|S = V ) > 0 for some P ∈ M .Conditional on S = V , the set R is the result of a procedure with conditional FWER control on V .We write the procedure as a sequential rejection procedure (Goeman and Solari, 2010).Let N be the next function in that formulation.Suppose that the procedure is admissible, and that P(R = V |S = V ) = 0 for all P ∈ M .We will derive a contradiction.Let W ⊂ V be the largest set such that P(R = W |S = V ) > 0 for at least one P ∈ M .Then the procedure is equivalent to a procedure that has N (W ) = ∅ almost surely.A uniform improvement is, therefore, a procedure that has N ′ (W ) = {i} with probability α, where i is the smallest element of S \ W .This is a uniform improvement, since the probability that the new procedure rejects more is αP(R = W |S = V ) > 0.
To check that the new procedure retains FWER control given S = V , we need to check the monotonicity and single step conditions of (Goeman and Solari, 2010, Theorem 1), both of which are trivial.It follows that the procedure we started with is inadmissible, and we have the contradiction we need.
Proposition 5 says that a data splitting procedure is inadmissible because the analyst always runs the risk of selecting too few hypotheses for S. If all hypotheses in S are rejected, the classic data splitting procedure must stop, and loses out on some rejections it could have made.A uniform improvement would be a procedure that selects not just S, but an infinite sequence of pairwise disjoint continuations S 1 , S 2 , . ... This procedure would always continue testing the next selected set after the previous one has been completely rejected.All of S 1 , S 2 , . . .must still be chosen using the first part of the data only.Control is, therefore, still conditional on the first part of the data.
Proposition 5 speaks about FWER control only.We conjecture that the same result holds for FDR, since FDR by its nature is more lenient than FWER for making further rejections (in S 2 , S 3 , . ..) if has already made many rejections (all of S 1 ).We do not have a general proof for this, but, as an example, consider FDRcontrolling methods of the type discussed by Li and Barber (2017).These estimate FDR along an incremental sequence of potential rejection sets, rejecting the largest set for which the FDR estimate is less than α.Such procedures would gain power if the sequence is continued beyond S into S 1 , S 2 , . ...
With data splitting, the split of the data in two parts is arbitrary by nature, and the question how much of the data to use for the selection and inference steps arises naturally.Some authors have proposed repeated splitting (Meinshausen et al., 2009;DiCiccio et al., 2020).Such methods are unconditional: while inference in each random split is conditional on the S from that split, control in the final analysis unconditional.Multiple data splitting can, therefore, also be seen as an unconditional improvement of a conditional method.
9 Third example: data splitting In Section 8, we showed that data splitting is inadmissible as a conditional method for FWER control.If we are prepared to move away from conditional control, we can often improve methods further, although not always uniformly.We investigate a specific simple case in more detail.
Let U = {1, . . ., n} be finite, and suppose the analysis on the two parts of the data results in pairs of independent p-values P 1,i , P 2,i , for H i , for i = 1, . . ., n.A natural choice for S is S = {i : P 1,i ≤ λ} for some fixed 0 ≤ λ ≤ 1.With this choice, a conditional Bonferroni procedure would reject We can rewrite this as R = {i ∈ U : Here, Q i is a valid unconditional p-value, since P(Q i ≤ t) = P(P 1,i ≤ λ)P(λP 2,i ≤ t) = λ min(t/λ, 1) ≤ t.
We could also have constructed an unconditional procedure on U based on the same Q i .This would reject Comparing the conditional and unconditional procedures ( 6) and ( 7), we see that R ′ ⊆ R whenever |S| ≤ λn, and R ′ ⊇ R otherwise.The conditional procedure, seemingly, only has a chance to reject more than the unconditional if |S| is smaller than its expectation under the complete null hypothesis with uniform p-values.The more signal in the data, the larger we would expect S to be, and the smaller the conditional R becomes relative to the unconditional R ′ .The conditional procedure only has a chance to be better only if null p-values are stochastically larger than uniform.This argument generalizes immediately beyond Bonferroni to other symmetric monotone procedures.E.g., the unconditional procedure of Benjamini and Hochberg (1995) In the example just discussed, with S = {i : P 1,i ≤ λ}, if λ was fixed a priori and P 1,1 , . . ., P 1,n are independent, then we are not using all the information remaining after selecting S. Rather than splitting the data into P 1,1 , . . ., P 1,n used for finding S and P 2,1 , . . ., P 2,n used for testing, the data can be split into 1 {P1,1≤λ} , . . ., 1 {P1,n≤λ} used for finding S and P 1,1|S , . . ., P 1,n|S and P 2,1 , . . ., P 2,n used for testing.Such an alternative splits are known as data carving.They tune the amount of information that is allocated to the selection and testing steps more efficiently.However, from the perspective of unconditional procedures, this still seems a rather convoluted way of combining the information from P 1,i and P 2,i .A natural and more powerful choice would be, e.g., a Fisher combination, equivalent to rejecting for low values of P 1,i × P 2,i , or, even more naturally, a single p-value calculated form a direct analysis of the combined data.Such analyses also obviate the need for choosing λ.
10 Selective confidence intervals and the False Coverage Rate So far we have focused mostly on rejection of hypotheses based on p-values.However, a large part of the selective inference literature focuses on selection-adjusted confidence intervals, controlling the (conditional) FCR.In this section we will apply the holistic perspective to selective inference based on confidence intervals.
A confidence interval is a random subset C ⊆ M of the model space M .A confidence interval is said to have (1 − α)-coverage if, for all P ∈ M , We define confidence intervals always as a subset of the full parameter space.We can do this without loss of generality.For example, if our parameter space for θ = (θ 1 , θ 2 ) is R 2 , we can write the confidence interval [a, b] for θ 1 as the "interval" C = [a, b] × R for θ.This greatly simplifies notation.We keep using the word interval, though C can be any region.
In the selective inference context, we have S ⊆ U be a random set of confidence intervals of interest, where U , as before, is the universe from which we are selecting.The collection of confidence intervals depends on S, and we write C i|S , i ∈ S. The confidence intervals should have conditional (1 − α)-coverage if, for all P ∈ M , and for i ∈ S, If we report more than one confidence interval we must account for multiplicity.We can demand that the confidence intervals are (conditionally) simultaneous over the selected, i.e., surely for all P ∈ M , where the unconditional variant drops the conditioning on S. Similarly, we can control FCR.The unconditional variant demands that, for all P ∈ M , Conditional on S, this simplifies to the demand that, surely for all P ∈ M , It is one of the attractive properties of selection-adjusted confidence intervals that they control FCR without further adjustment, since (8) implies (11); see also Weinstein et al. (2013), Theorem 2; Lee et al. (2016), Lemma 2.1; Fithian et al. (2014), Proposition 11.
For confidence intervals we have the following analogue of Observation 1.
This observation is, again, trivial.We simply take Like Observation 1, Observation 2 answers the question what the optimal choice of S is, if we are interested in confidence intervals that are as narrow as possible.The answer is that S = U is the optimal choice.
Like Observation 1, Observation 2 does not say whether taking S = U can actually help to shorten the confidence intervals.However, it is easy to find examples in which this is possible, certainly for FCR control.Take, for example, the original FCR-controlling method of Benjamini and Yekutieli (2005), which constructs marginal confidence intervals of level 1 − |S|α/|U |.For this method, taking S = U clearly results in the narrowest confidence intervals.This observation holds generally for FCR control: as confidence intervals tends to become narrower as S becomes larger, there is every incentive for the analyst to choose S as large as possible, since they will obtain both more and narrower confidence intervals.In the extreme case that S = U , FCR control reduces to average marginal coverage, an even weaker criterion than marginal coverage, which is achieved by uncorrected confidence intervals.
Specifically for the property of simultaneous over the selected, we have the following additional observation.
Observation 3. If C i , i ∈ S, are unconditionally simultaneous over the selected S, then for every S ′ ⊆ U , there exists C ′ i , i ∈ S ′ , which are unconditionally simultaneous over the selected S ′ , such that C ′ i ⊆ C i surely for all i ∈ S ∩ S ′ .
To see that this observation is true, simply take C ′ i = C i for i ∈ S ∩ S ′ , and C ′ i = M for i ∈ S ′ \ S. The observation says that any unconditional method that is simultaneous on the selected for some S ⊆ U , is also simultaneous on the selected on any other S ′ ⊆ U .This suggests, at least for unconditional methods, that simultaneous on the selected is not a different concept from just simultaneous over U , i.e., simultaneous.
11 Fourth example: post-selection inference for the lasso One of the major application areas of conditional selective inference is post-selection inference on the parameters of a lasso model.A major breakthrough here has been the polyhedral lemma (Lee et al., 2016), which allows calculation of p-values and confidence intervals for regression coefficients, conditional on their selection by a lasso algorithm.The toy example of Section 5 is in fact a special case of the approach of Lee et al. (2016), and we will not discuss that again.In this section we consider a variant due to Liu et al. (2018) of lasso-based selective inference, in which additional interesting issues arise.
The set-up is as follows.We assume the usual linear model setting, in which we have a fixed n × m design matrix X, and assume that Y = Xβ + ϵ, where β (an m-vector) is unknown, and ϵ ∼ N (0, σ 2 I n ), where σ 2 is assumed known.In this model we fit a lasso regression with a fixed penalty parameter λ.Let βi , i = 1, . . ., m, be the resulting coefficient estimates.We define the selected set as S = {i : βi ̸ = 0}.Liu et al. (2018) define selection-adjusted confidence intervals by not conditioning on the full selected set S, but only on the selection of the confidence interval of interest.They require that, for all P ∈ M , and for i ∈ S, Condition ( 12), while implied by ( 8), is substantially weaker, because it conditions on less information.In a part of their paper Fithian et al. (2014) considered conditioning on i ∈ S, rather than on the full S for testing, recognizing that less conditioning leads to more information for inference.Liu et al. (2018) adopted this viewpoint for confidence intervals, arguing that by conditioning on this minimal event, more variation remains in the data for determining the precise value of β i .The methodology of Jewell et al. (2022) and Neufeld et al. (2022) shares the 'general recipe' of Liu et al. (2018), stating that the ultimate goal is to fulfill equation ( 12) rather than (8) when it comes to selective inference.Indeed, the conceptual difference between the two properties ( 12) and ( 8) is huge, but there is a steep price to pay for conditioning only on i ∈ S. Complications arise in subsequent error rate control because the coverage of each C i|i∈S is conditional on a different event for every i ∈ S. Because of this, the property, mentioned in Section 10, that selection-adjusted coverage (8) implies FCR control (11), is lost: (12) does not imply (11) or even (10).Without a common conditioning event, there is no hope for combining the confidence intervals into any combined conditional error rate.For example, making |S| confidence intervals, each conditional on j ∈ S, at level 1 − α/|S| does not guarantee simultaneous coverage, even unconditionally; we need confidence intervals at level 1−α/m for that.In Appendix B we give a numerical example showing lack of conditional and unconditional FCR control of the confidence intervals of Liu et al. (2018) at confidence level 1 − α, and lack of conditional and unconditional simultaneous control at confidence level 1 − α/|S|.Lack of FCR control of the method of Liu et al. (2018) was also observed by Panigrahi and Taylor (2022, Table 1), but without explanation.
By Observation 2, there is no reason to be selective and report confidence intervals for i ∈ S only.Indeed, the premise of restricting attention to the selection of S is often that variables not in S are not important for the outcome.Confidence intervals or p-values for non-selected variables are an important instrument to check this.It is straightforward to extend the theory of Liu et al. (2018) to calculate C i|i / ∈S , i / ∈ S, for the non-selected regression coefficients, and we give the mathematical details in Appendix B. Figure 5 display 90%-confidence intervals for all eight variables of the famous Prostate data set (Stamey et al., 1989) as a function of λ, with intervals for selected coefficients in black and for non-selected ones in grey.We see a similar paradoxical effect as in the toy example: conditional intervals of selected variables tend to move towards 0, while confidence intervals for non-selected variables tend to move away from 0 (see also Figure 7 in Appendix B).Both are equal to the unconditional intervals for very large or small λ, when the probability of selection is close to 0 or 1, but tend to become longer close to the critical threshold for selection.Kivaranovic andLeeb (2020, 2021) provide conditions under which intervals obtained from the polyhedral lemma are either bounded or unbounded.The intervals constructed through the method of Liu et al. (2018) have bounded lengths when they are conditional on selection, whereas the intervals are potentially unbounded when they are conditional on non-selection.The intervals C i , defined as C i|i∈S , if i ∈ S, and C i|i / ∈S , if i / ∈ S, are unconditional intervals and, due to the absence of a common conditioning event, have no conditional interpretation as a collection.We may present them all as uncorrected intervals, but if we aim to present only a selection V ⊆ {1, . . ., m} from these intervals we must correct for this using methods to correct unconditional intervals.We may use level 1 − α/m to obtain simultaneous coverage over the selected intervals, or we may use the method of Benjamini and Yekutieli (2005) and use level 1 − |V |α/m to control FCR.This applies if V = S or for any other V .There is no way in which the conditioning of the intervals on i ∈ S helped for this correction step; in fact, it merely discarded valuable information, lengthening the intervals and moving them towards zero.Arguably, the superior method is simply X 1 < 0 and S = {2} if X 1 ≥ 0. This seems sensible, since we expect X 2 to have the same sign as X 1 with high probability if at least one of the null hypotheses is false.Therefore, S pre-selects the null hypothesis we are most likely to reject.Next we choose how to test the hypothesis in S. If S = {1}, we reject H 1 if X 2 < −δ − z α , where z α is chosen such that Φ(−z α ) = α, and Φ is the standard normal distribution function; If S = {2}, we reject H 2 when X 2 > δ + z α .It is easy to check that conditional on S, the probability of falsely rejecting the hypothesis in S is bounded by α, and that this probability is exactly α in the situation that µ = δ if S = {2} and if µ = −δ if S = {1}.As a conditional selective procedure, this procedure can not be uniformly improved.Since we have |S| = 1, as in Section 7, the procedure controls FWER as well as all less stringent error rates.
We can, however, improve the procedure uniformly as an unconditional procedure.The condition of Proposition 3 is fulfilled, since for this procedure the 'worst case' P, i.e. the distribution for which the α-level of the test is exhausted, depends on S. Let us aim to retain FWER control, and write down the closed testing procedure that is implied by the procedure we have just constructed.Write We can check that this procedure rejects H 1 or H 2 exactly when the conditional procedure does.
Next, we check whether this procedure exhausts its α-level.The probability of rejecting H 12 is Within −δ ≤ µ ≤ δ, this is maximized when µ = δ or µ = −δ, when we have a rejection probability of This probability is equal to α only if δ = 0, and strictly smaller than α otherwise.Reasoning similarly, we can calculate the probability of rejecting H 1 or H 2 as bounded by a smaller factor αΦ(δ).Since none of the local tests exhaust the α-level, we can uniformly improve the closed testing procedure by performing all tests at an increased nominal level α ′ such that instead of α.Although the difference between α ′ and α vanishes as δ → 0 or δ → ∞, it can be substantial in between.For example, with α = 0.05 and δ = 0.5, we find α ′ = 0.0654.We can increase the α-level of the local tests of H 1 and H 2 even further to α/Φ(δ), but doing so would not improve the closed testing procedure as a whole.
B Supplementary material to the Fourth Example B.1 Mathematical details of Liu et al. (2018) Let η i = X(X T X) −1 e i , where e i is a vector with all components equal to 0, except the ith, which is 1, and write y as the sum of two orthogonal vectors u i + v i with u i = P ηi y, v i = (I n − P ηi )y and P ηi = η i η T i /∥η i ∥ 2 .Proposition 1 of Liu et al. (2018) shows that the distribution of βi = η T i y conditional to i ∈ S and v i is the is the submatrix of X after removing the ith column, and β−i is the lasso fit of v i to X −i with penalty λ.Likewise, the distribution of βi = η T i y conditional to i / ∈ S and v i is the N (β i , σ 2 ∥η i ∥ 2 ) truncated to (a i , b i ).

B.2 Numerical example
In the following simulation, it is demonstrated numerically that there is a lack of FCR control at the α level and a lack of simultaneous control at the confidence level of 1 − α/|S|.We have considered a setting that is similar to the one used in Appendix B of Liu et al. (2018), with n = m = 2, X 1 = ( (1 + ρ)/2, − (1 − ρ)/2) T and X 2 = ( (1 + ρ)/2, (1 − ρ)/2) T .Each column of X has unit norm and X T 1 X 2 = ρ.We set ρ = 0.95 and we simulated 10 5 realizations of Y ∼ N (µ, σ 2 I) by choosing σ 2 = 1 and β = (5, 5) T , which gives µ = Xβ = (10, 0) T .The penalty parameter for the lasso was set to λ = 0.2 and the confidence level (1 − α) of Liu et al.'s confidence intervals was set to 90%. Figure 6 shows the lasso selection regions in Y space, e.g. the parallelogram region corresponds to X T i Y = ±λ for i = 1, 2. The simulation setting was chosen to ensure that the probability of selecting no feature is almost zero, i.e., P(S = ∅) ≈ 0, and that the probability of selecting the first feature is equal to that of selecting the second feature, i.e., P(S = {1}) = P(S = {2}).Additionally, the conditional coverage is the same for both confidence intervals, i.According to the table, when one feature is selected, the confidence intervals have substantially less coverage than the desired level of 90%.However, when both features are selected, the confidence intervals have a slightly The estimate from the regression model with all variables was utilized as the true value of σ 2 .For this dataset, Liu et al. (2018) used λ = 0.0324 (chosen by 10-fold cross-validation), which resulted in the selection of 7 variables.The following table compares the unconditional p-values for the hypotheses β i = 0 with the selective conditional p-values (the row corresponding to λ = 0.0324) for the selected features (in black) and the non-selected features (in gray).λ lcavol lweight age lbph svi lcp gleason pgg45 0.0324 0.0000 0.0036 0.0915 0.1779 0.0031 0.4666 0.0073 0.7419 0 0.0000 0.0020 0.0552 0.0949 0.0016 0.2380 0.7513 0.3072

• µ
In Figure 7, the selective conditional p-values are shown as a function of λ.It is worth noting that the pvalues for the selected features begin with unconditional values and tend to either increase or increase and then decrease as they approach the critical threshold for selection.In contrast, the p-values for non-selected features start at 0 and eventually converge to the unconditional p-values.Benjamini, Y., Y. Hechtlinger, and P. B. Stark (2019).Confidence intervals for selected parameters.arXiv preprint arXiv:1906.00505.Benjamini, Y. and Y. Hochberg (1995).Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal statistical society: series B (Methodological) 57(1), 289-300.

Figure 2 :
Figure 2: Holistic perspective on the procedures of Figure 1.Grey indicates areas in which one hypothesis is rejected; black indicates areas in which both are rejected.

Figure 5 :
Figure 5: Selective conditional confidence intervals using the method of Liu et al. (2018) applied to the variables of the Prostate data, as a function of λ.Black intervals are conditional on selection by the lasso, grey ones are conditional on non-selection.

Figure 7 :
Figure 7: Selective conditional p-values as a function of λ.Black p-value curves are conditional on selection by the grey ones are conditional on non-selection.Vertical dashed lines indicate the values of λ at which the active set change.