Adjusting for gene-specific covariates to improve RNA-seq analysis

Abstract Summary This article suggests a novel positive false discovery rate (pFDR) controlling method for testing gene-specific hypotheses using a gene-specific covariate variable, such as gene length. We suppose the null probability depends on the covariate variable. In this context, we propose a rejection rule that accounts for heterogeneity among tests by using two distinct types of null probabilities. We establish a pFDR estimator for a given rejection rule by following Storey’s q-value framework. A condition on a type 1 error posterior probability is provided that equivalently characterizes our rejection rule. We also present a suitable procedure for selecting a tuning parameter through cross-validation that maximizes the expected number of hypotheses declared significant. A simulation study demonstrates that our method is comparable to or better than existing methods across realistic scenarios. In data analysis, we find support for our method’s premise that the null probability varies with a gene-specific covariate variable. Availability and implementation The source code repository is publicly available at https://github.com/hsjeon1217/conditional_method.


Introduction
Gene expression refers to messenger RNA transcript abundance quantified by RNA profiling techniques. The invention of RNA-seq enables researchers to profile nearly all genes in an organism simultaneously. Research questions involving RNA-seq data often focus on identifying genes differentially expressed (DE) across different experimental conditions. Genes not DE are called equally or equivalently expressed (EE) genes. DE genes are typically identified through hypothesis testing on each gene in a statistical framework, viewed as a multiple testing problem. When dealing with gene expression data under the multiple testing framework, the most useful error quantity is typically the false discovery rate (FDR), introduced by Benjamini and Hochberg (1995). FDR refers to the expected proportion of false positives among all tests whose null hypotheses have been rejected. The most widely used procedure is Storey's (2002) q-value method.
Contemporary methods for FDR control are based on genespecific covariate variables such as mean nonzero expression and the proportion of samples with the nonzero expression (Korthauer et al. 2019). As circumstances vary across hypothesis tests, it is vital to consider each test separately. Cai and Sun (2009) developed an FDR-controlling method using external grouping information. An FDR regression method proposed by Scott et al. (2015) regulates FDR by utilizing the local FDR and treating the null probability as a function of covariate variables. Lei and Fithian (2016) and Li and Barber (2019) also used prior information regarding a specific predetermined structure in the pattern of locations of the signals and nulls within the list of hypotheses, such as ordered structure, to adjust the P-values adaptively. Boca and Leek (2018) also proposed a method (BL), considering the FDR and null probability as functions of a covariate variable. Ignatiadis et al. (2016) and Ignatiadis and Huber (2021) proposed an independent hypothesis weighting method (IHW) which maximizes the number of rejected null hypotheses, based on a covariate-variable-based group. Recently, Lei and Fithian (2018) developed a covariatespecific P-value thresholding method (AdaPT), based on adaptively determined significance thresholds.
The AdaPT method has developed into a powerful approach that is expected to yield more discoveries by focusing on promising hypotheses and utilizing adaptively defined P-value rejection thresholds. Initially, the method establishes a constant threshold across all covariate values. The initial threshold is updated continuously to gradually increase rejection power. As a result of considering multiple thresholds, we predict that the method's average ability to classify the true positives across all nominal FDR levels may deteriorate. Simultaneously, adaptively determined thresholds complicate FDR estimation. This article presents a novel and more straightforward rejection rule that accounts for the heterogeneity between hypotheses. Specifically, our rejection rule is based on the product of the P-value and covariate-specific conditional null probability, given the P-value is no larger than a. Due to the simplicity, the approach easily demonstrates positive FDR (pFDR) control of the type suggested by Storey (2002). Because pFDR provides an upper bound on FDR, our approach also provides FDR control. We demonstrate that the rejection rule is uniquely determined by a property of equalizing a particular conditional type 1 error posterior probability across tests.
Recently, it was discovered that there exist relationships between biological timing and gene length: shorter genes tend to regulate immediate physical processes such as skin recovery, whereas longer genes tend to regulate long-term physical processes such as muscle development (Lopes et al. 2021). Thus, the fraction of DE or EE genes may vary by gene length depending on the experimental conditions studied. From a Bayesian perspective, the null probability may vary by gene length. Because of this heterogeneity, we consider gene length as a covariate variable potentially important to consider when identifying DE genes. Though we focus exclusively on gene length in this article, our approach is applicable for any gene-specific covariate.
The remainder of this article is organized as follows. In Section 2, we define our method in detail and argue its mathematical implications in terms of posterior probability. In Section 3, we demonstrate the effectiveness of the method through simulation studies. In Section 4, we illustrate our method's efficacy through data analysis. Lastly, Section 5 evaluates the proposed method's potential for further development.

Materials and methods
Our research objective is to declare genes to be DE while controlling pFDR in the multiple testing framework. Our method is inspired by Storey's (2002) q-value method based on the Bayesian perspective. Following the Bayesian perspective, we consider two types of conditional prior probabilities of being an EE gene, also referred to as conditional null probabilities. Both conditional null probabilities are considered as functions of a covariate variable. Section 2.1 presents a rejection rule based on a conditional null probability. By inverting the rejection rule, its rejection region is naturally determined in Section 2.2. In Section 2.3, we establish the pFDR estimator and q-value estimator based on another conditional null probability through mathematical reasoning. Section 2.4 describes a procedure for estimating the conditional null probabilities, which serves as the foundation for our method. Section 2.5 delves into the rejection rule's intrinsic meaning regarding posterior probability.

Rejection rule
Our rejection rule is based on the premise that a P-value rejection threshold should be negatively associated with null probability. Furthermore, we assume that null probability is associated with a gene-specific covariate. This assumption is reasonable given the change in the fraction of DE genes with gene length discussed in the previous section. Therefore, we present a rejection rule based on the conditional null probability, given the covariate and an event involving the P-value.
Consider hypothesis testing for each of m genes. For gene j 2 f1; . . . ; mg, let X j and P j denote the value of a covariate and the P-value, respectively. Let H 0j denote the event that gene j is an EE gene. Let (1) Expressions (1) and (2) are conditional probabilities of gene j being an EE gene. These conditional null probabilities are functions of the covariate value X j . Furthermore, (2) is the conditional null probability conditioning on the jth P-value being no larger than a. It is worth noting that a can either be specified as a value or selected via a procedure, as described in Section 3.2. Define the jthp-value asP j ¼ P j Á p 0ja ðX j Þ. The following is the rejection rule we propose: Rejection Rule 2.1. Reject all null hypotheses whosep-value t, for some t > 0.
The genes declared to be DE (DDE) following Rejection Rule 2.1 are naturally determined by fj :P j tg. Under the rejection rule, both the P-value and the conditional null probability in (2) affect the rejection decision for each hypothesis test. Note that we initially assume that p 0 ðÁÞ and p 0ja ðÁÞ are known and then replace these functions with estimates discussed in Section 2.4. Section 2.5 discusses the rejection rule's intrinsic meaning.

Rejection region
By inverting the rejection rule, the rejection region for the P-value of the jth gene can be obtained as follows: where u t ðX j Þ ¼ 1 if p 0ja ðX j Þ t and u t ðX j Þ ¼ t p 0ja ðXjÞ otherwise. Note that P j t () P j 2 C Xj ðtÞ () P j u t ðX j Þ: Considering the rejection region associated with a rejection rule is useful for estimating the pFDR and for gaining a better understanding of the rule. Figure 1B illustrates how the rejection region's upper bound varies with x for various t-values for the arbitrarily chosen p 0ja ðxÞ in Fig. 1A. In addition, Fig. 1B demonstrates that genes with relatively high P-values may, nonetheless, be declared to be DE genes when their conditional null probabilities are low. The phenomenon is noticeable when x is between 2 and 3.

False discovery rate estimator
For a givenp-value significance threshold t, the number of genes declared to be DE is 1ðP j tÞ: The number of false positives among the RðtÞ genes can be expressed by From (4), V j ðtÞ has another expression: 2 Jeon et al.
pFDR can be defined as pFDRðtÞ ¼ E VðtÞ RðtÞ jRðtÞ > 0 n o . For a generalized significance regionC ofP j , VðCÞ and RðCÞ can be naturally defined by replacingP j t withP j 2C in the definitions (5) and (6). The positive FDR is defined by The following theorem is based on the generalized significance regionC.
The standard q-value method is established on this modeling setup. Therefore, we can still apply the standard q-value method, while controlling pFDR, to the P-values generated from the model in Theorem 2.1.
Theorem 2.1 establishes that pFDRðtÞ ¼ EVðtÞ ERðtÞ . Our estimator is obtained by estimating EVðtÞ and ERðtÞ. The denominator ERðtÞ can be easily estimated as RðtÞ. However, the number of false positives VðtÞ is unknown. To estimate the numerator EVðtÞ, we propose to estimate EVðtÞ using EfVðtÞjX ! ¼ ðX 1 ; . . . ; X m Þg, which is both the best predictor of VðtÞ under a squared error loss function and an unbiased estimator of EVðtÞ.
When the simple null hypothesis is true, and the test statistic is continuous, the P-value follows a uniform distribution between 0 and 1. From this fact, we make the following assumption: Under the model assumption described in Theorem 2.1, the following properties are obtained: Under properties from (10) to (12) and Assumption 2.1, EfVðtÞjX ! g has expression: PfP j u t ðX j ÞjH 0j ; X j g Á PðH 0j jX j Þ ,ð10; 11Þ u t ðX j Þ Á p 0 ðX j Þ,ð12; Assumption 2:1Þ: By combining the predetermined form of pFDRðtÞ and (13), the pFDR estimator is established: Adjusting for gene-specific covariates where p 0 ðÁÞ and p 0ja ðÁÞ are considered known. The pFDR estimator (15) serves as an upper bound for (14), where the equality holds when p 0ja ðX j Þ ! t for all j. We adopt the simpler version (15) as our pFDR estimator, used in the simulation study and data analysis. Then, we can define q-value and its estimator that can be utilized to declare genes to be DE: Up to this point, p 0 ðÁÞ and p 0ja ðÁÞ have been treated as given. In practice, we must estimate both conditional null probabilities to apply our method. The following section discusses an estimating procedure.

Estimation of p 0 ðÁÞ and p 0ja ðÁÞ
To simplify the problem of estimating p 0 ðÁÞ and p 0ja ðÁÞ, we first derive a useful property. Under the model described in Theorem 2.1 and Assumption 2.1, p 0ja ðÁÞ satisfies According to equality (17), when both p 0 ðX j Þ and PðP j ajX j Þ are known, p 0ja ðX j Þ can be obtained. Thus, we now discuss how to estimate p 0 ðX j Þ and PðP j ajX j Þ. Let N nh be a user-selected neighborhood size. Let N j f1; . . . ; mg contain the N nh indices corresponding to the N nh genes whose covariate values are closest to X j in Euclidean distance. Both probabilities p 0 ðX j Þ and PðP j ajX j Þ are estimated using only the neighborhood P-values fP i : i 2 N j g. First, p 0 ðX j Þ is estimated using the method of Nettleton et al. (2006) where P cut;j is a threshold determined by Nettleton et al. (2006) such that the empirial distribution of fP i : i 2 N j ; P i ! P cut;j g is approximately uniform. See Nettleton et al. (2006) for the details. Next, PðP j ajX j Þ can be easily estimated as the proportion of the P-values in fP i : By (17), a natural estimator of . As a result, all necessary components for our method are obtained. The following Section 2.5 provides an in-depth discussion of the rejection rule.

Implications of the rejection rule
To better understand our rejection rule, we derive an equivalent condition characterizing the rejection rule in terms of a conditional type 1 error posterior probability, as specified in the following theorem.
Theorem 2.2. Consider the same inference setup described in Theorem 2.1 with a rejection rule P j uðX j Þ, for a given nonnegative function uðÁÞ. Assume that Assumption 2.1 holds. Let T 1j be the event that a type 1 error occurs for test j. If the rejection rule is more conservative than the classic rejection rule P j a, i.e. max j uðX j Þ a, then, for all j ¼ 1; . . . ; m and some t > 0: The proof is included in the supplementary material. According to Theorem 2.2, among more conservative rejection rules than the classic rejection rule, the rejection rule that preserves constant type 1 error posterior probability given the low P-value condition and covariate variables X ! is uniquely determined by uðX j Þ ¼ t p 0ja ðXjÞ for some t. In other words, under the conservativeness condition, our proposed rejection rule is the only one that equalizes the conditional type 1 error posterior probability across all tests. According to the model assumed in Theorem 2.2, rejection situations vary by covariate variables. A rejection rule ignoring the distinct situations is incapable of equalizing error control as described in Theorem 2.2. However, our rejection rule ensures constant conditional type 1 error posterior probabilities across all tests, contrary to traditional rejection rules.

Model description
We conduct a simulation study to assess our method's performance, inspired by the model in Theorem 2.1. We consider gene expression datasets with m¼10 000 genes generated independently from normal distributions with gene-specific variance from an inverse chi-square distribution. The covariate variable, which affects the probability of being an EE gene, is denoted by X and assumed to be normally distributed. A DE gene's treatment effect is randomly generated from a normal distribution. Let j and k be the gene and treatment group indices, respectively. Let s denote a sample index within a treatment group. The sample size within a treatment group n is set to 10. Then, the data model with Y j sk as the response variable is described as follows: Y j sk j d j k ; r 2 j $ Nðd j k ; r 2 j Þ; where j 2 f1; . . . mg and s 2 f1; . . . ng and independence among all random variables holds except where indicated otherwise by conditioning. After generating the dataset from (20), a two-sample t-test is used to obtain a P-value for testing each gene's treatment effect. The simulation is conducted with different combinations of l d and p 0 ðÁÞ. l d is chosen from a set of four equally spaced values from 0.15 to 0.24. Three p 0 ðÁÞ functions are considered, illustrated in Fig. 2. The function p A 0 ðÁÞ is a constant function, whereas p B 0 ðÁÞ and p C 0 ðÁÞ are increasing sigmoid functions. Using p A 0 ðÁÞ, we determine whether the proposed method works well when the probability of being an EE gene does not vary with the gene-specific covariate. Using p B 0 ðÁÞ and p C 0 ðÁÞ, we determine whether the proposed method performs better than other methods when the true model follows the working model. p C 0 ðÁÞ has a more extreme characteristic than p B 0 ðÁÞ due to a covariate region with a null probability of one.

Methods description
Under a target FDR level of 0.05, the proposed method is compared to the standard q-value, IHW, BL, and AdaPT methods. These methods are chosen because they enable precise control of the FDR in the simulation study of Korthauer et al. (2019).
Let us begin by discussing the tuning parameters of the proposed method: N nh and a. N nh is set to 2000. The value of a is chosen arbitrarily or through cross-validation (cv). First, we choose a values of 0.05 and 1 to better understand the proposed method's properties. In addition, when a equals 1, we include the proposed method with true null probability p 0 ðÁÞ for a reference. Depending on whether the true p 0 ðÁÞ is used (true) or whether p 0 ðÁÞ is estimated (est), and on the value of a, the proposed method's procedures are referred to as prop.q(true, a ¼ 1), prop.q(est, a ¼ 1), prop.q(est, a ¼ 0.05), and prop.q(est, a ¼ cv).
The latter approach is our suggested a selection procedure based on repeated 10-fold cross-validation that maximizes the expected number of DDE genes, described as follows. We partition the observations fðX j ; P j Þ : j ¼ 1; . . . ; mg completely at random into 10 parts. Holding each part out as a test set in turn, the other nine parts are used as a training set. For each of 100 equally spaced a values between 0.001 and 0.2, the training data are used to estimate p 0ja ðÁÞ and our rejection rule for controlling pFDR at the target level 0.05. The number of DDE genes is determined based on applying the estimated rejection rule to the test data. This entire 10-fold crossvalidation process is repeated M times, and the average number of DDE genes across the 10 Â M test sets is determined for each value of a. The value of a with the highest average number of DDE genes is selected and used with our proposed procedure on the entire dataset to identify differentially expressed genes. In the simulation study, we use M ¼ 1, while M ¼ 100 in the data analysis section.
As discussed in Remark 2.1, the standard q-value method is still applicable in our simulation setup and is guaranteed to control pFDR. To estimate p 0 ¼ PðH ¼ 0Þ, the histogrambased method (Nettleton et al. 2006) is used. Moreover, p 0 can be easily approximated by . Depending on whether the true parameter is used or not, the standard q-value method's procedures are referred to as std.q(true) and std.q(est). For simplicity, the omission of the estimator and true parameter symbols indicates the estimator version of the procedure with parameters estimated from data. For example, std.q ¼ std.q(est).
Lastly, we turn to the IHW, BL, and AdaPT methods implemented in R packages IHW, swfdr, and adaptMT. IHW and swfdr are Bioconductor R packages, and adaptMT is a CRAN R package. Essentially, we follow the default configuration of the packages. For the AdaPT method, inspired by the simulation results in Korthauer et al. (2019), we use the adapt glm function with the settings specified in the article. The procedures associated with the three methods are denoted by their respective names. In total, nine procedures are compared. The simulation results are analyzed without the procedures that use true parameter values because these methods cannot be used in practice.

Simulation results
The nine procedures are compared in terms of mean false discovery proportion, mean true positive number, mean area under the receiver-operating characteristic (ROC) curve (AUC), and mean partial area under the ROC curve (pAUC). The ROC curve displays the trade-off between true-positive rate and false-positive rate. AUC and pAUC are the ROC curve's summary statistics, calculated based on each procedure's adjusted P-values or q-values. High AUC and pAUC values indicate that the procedure generally prioritizes true positives over false positives. The pAUC value is calculated by the standardized area under the ROC curve with a false-positive rate 0.1, regarded as a relevant region in our inference situation. For each scenario composed of l d and p 0 ðÁÞ, we generated 5000 datasets, which were used to approximate the four mean values: mean false discovery proportion, mean true Adjusting for gene-specific covariates positive number, mean AUC, and mean pAUC, denoted by V=R, S, AUC, and pAUC. When a procedure declares no significant hypotheses, the false discovery proportion is set to zero, which means V=R is an empirical estimate of FDR rather than pFDR. However, in all our simulation scenarios, the probabilities of discovery corresponding to our proposed procedures are $1. Therefore, for our proposed procedures, FDR % pFDR in our simulation. Supplementary Table S1A-C summarizes all the simulation results. Figure 3 illustrates the results associated with the functions p A 0 ðÁÞ, p B 0 ðÁÞ, and p C 0 ðÁÞ, respectively. In the figures, except for V=R, the ratio to std.q is calculated to illustrate the relative performance. Above all, all procedures under consideration control FDR in all scenarios.
Let us discuss the p A 0 ðÁÞ results. As illustrated in Fig. 3, all procedures have nearly identical AUC and pAUC across all scenarios, showing that they perform similarly in terms of prioritizing true discoveries. In terms of true positive number S, when l d is small, the std.q outperforms the IHW and AdaPT. On the other hand, all procedures associated with the proposed method perform nearly identically to the std.q, which is understandable as the proposed method generalizes the standard q-value method. The results of p A 0 ðÁÞ suggest that the proposed method performs as well as std.q even when the covariate is irrelevant.
We now turn to the p B 0 ðÁÞ and p C 0 ðÁÞ results. First, we explore the proposed method's properties by comparing the related procedures to std.q. The summary statistics S, AUC, and pAUC all indicate the same conclusion. The procedure performs best in the order of prop.q(a¼cv), prop.q(a¼0.05), prop.q(a¼1), then std.q. The order is well illustrated in Fig. 3. Since prop.q(a ¼ 1) is better than std.q, we can conclude that there is an improvement by considering covariate-specific null probability. It is noteworthy that the BL method consistently outperforms prop.q(a¼1), even though both methods use the covariate-specific null probability. By comparing prop.q(a¼0.05) and prop.q(a¼1), we can conclude that incorporating the classic rejection rule improves the proposed method. From the comparison between prop.q(a¼cv) and prop.q(a¼0.05), it can be concluded that cross-validation is beneficial for a selection to improve all evaluation criteria. As a result, we recommend using cross-validation to determine the value of a and setting the default value to 0.05.
The prop.q(a¼cv) method is now compared to IHW, BL, and AdaPT. In terms of S and AUC, prop.q(a¼cv) surpasses other procedures in all scenarios. In the case of the AdaPT method, we can see that the performance is weakened in terms of AUC, which is likely due to the vulnerability stated in Section 1. As seen in Fig. 3, there are scenarios where AdaPT method outperforms prop.q(a¼cv) regarding pAUC. For the corresponding scenarios, however, prop.q(a¼cv) consistently generates more true positives S than AdaPT, which may be attributed to the differing FDR estimators. Based on our simulation setup, we can conclude that the proposed method using cross-validation to select a outperforms the competing FDR-controlling methods in most scenarios and evaluation criteria that we considered.
The supplementary material depicts additional simulations under various conditions, including scenarios with a multimodal null probability function, covariates generated from a mixed normal distribution, and correlated P-values. Supplementary Figure S4 demonstrates the validity of our method to maintaining FDR control and power under a distinct shape of null probability function and the covariate distribution. In addition, we investigated a simulation in which gene expressions are correlated. As shown in Supplementary  Fig. S5, when the correlation is relatively small, there are no problems with FDR control or true discovery capability. As with other methods, we observed that FDR levels become higher than the nominal rate when the correlation is relatively high.

Data analysis
We tested our proposed method using RNA-seq data regarding disease resilience in young, healthy pigs (Lim et al. 2021), and additional data on gene lengths. A comprehensive description of the study's design and hypotheses testing is described in Lim et al. (2021), which is summarized as follows. The study enrolled 912 F1 barrows at $27 days of age in 15 batches. After three weeks in a healthy quarantine nursery, the piglets were exposed to natural polymicrobial diseases found on commercial farms. Not only were gene expression levels of the piglets' blood samples quantified, but also disease resilience phenotypes such as subjective health score, treatment rate, mortality, and growth rate. Although the article (Lim et al. 2021) tested numerous hypotheses, our current article focuses on the association between gene expression and concurrent growth rate using blood samples taken during quarantine nursery periods before disease exposure. We anticipated that the disease-independent growth rate would be a long-term physical process, which is expected to be associated with the expression of longer genes. This expectation motivated us to concentrate on the association involving growth rate before disease exposure.
The following is the analysis we conducted. The gene expression in blood samples acquired during quarantine nursery was quantified using 3 0 mRNA sequencing with a globin block. Using the data in Lim et al. (2021) and genes in the Ensembl database, we analyzed 10 858 genes with a nonzero read count for at least 80% of the samples. The growth rate of a pig was used as a common dependent variable. We used log-scale read counts normalized and adjusted for nuisance factors as described by Lim et al. (2021). A P-value was calculated for each gene, testing whether the adjusted log2 transformed read count has a zero slope coefficient. In total, we generated 10 858 P-values. Figure 4 illustrates the histogram of log10-transformed gene lengths utilized to determine the covariate distribution in the simulation discussed in Section 3.
We applied the seven procedures, described in Section 3, to the P-values and their associated gene lengths. The number of  Table 1. Regarding the proposed method, decreasing a from 1 to 0.05 or using cross-validation to select a tended to increase the number of significant tests, consistent with the simulation outcome. Furthermore, prop.q(a¼cv) consistently declared a greater or similar number of tests significant than the std.q, IHW, and BL methods. Except for the nominal level of 0.1, the prop.q(a¼cv) generated more significant results than AdaPT. When the nominal level is set to 0.01, the AdaPT method declared no tests significant.
To observe additional patterns genes are classified into four groups according to their lengths. As illustrated in Fig. 5, regardless of procedures, the significantly declared tests are observed in greater abundance in the 4th quantile group than in all other quantile groups. The number of significant tests increases gradually from the second quantile group. The finding supports our intuition that the growth rate is a long-term physical process that tends to involve longer genes and supports our method's assumption that the null probability varies with gene length. The null    probability estimates described in Fig. 6 also shows a tendency supporting the assumption. We conducted gene set enrichment analyses (GSEA) with preranked, adjusted P-values to bring biological significance to the data analysis. Conducting the GSEA using the adjusted P-value, not the raw P-value, makes sense as gene length provides additional information regarding the genes of interest. Using the same FDR threshold of 0.05 for the enriched terms, our method declares a comparable number or more biological processes significant compared to std.q, BL, and IHW ( Supplementary Fig. S1A). In addition, Supplementary Fig.  S1B demonstrates that the P-values derived from GSEA with prop.q(a ¼ cv) are generally lower than the P-values derived from GSEA with other methods, indicating that our method may have a greater power to discover meaningful biological processes. The simulation result indicates that our method consistently outperforms other approaches regarding AUC and that gene set enrichment testing with preranked GSEA can be advantageous, which supports the result illustrated in Supplementary Fig. S1B. The application of AdaPT to the data reveals that AdaPT generates a large number of duplicate adjusted P-values, thereby limiting the enrichment test; therefore, we excluded the GSEA results with the AdaPT method.
Additional data analysis is performed using four gene expression datasets described in Lei and Fithian (2018) and available at https://github.com/lihualei71/adaptPaper/tree/ master/data. Each dataset is named after the corresponding file name. Supplementary Table S2 presents the results, showing that, except for AdaPT, our approach consistently declares more tests significant than other methods. Moreover, there is a substantial overlap between our approach and the AdaPT method in terms of the significantly declared tests. The null probability functions estimated from these data analyses (depicted in Supplementary Fig. S3) show that a sigmoidal shape may be a common form for the null probability function.

Discussion
While the proposed method demonstrates significant gains over existing methods, there are still areas for improvement. First, the modeling framework upon which our method is developed is generalizable. One may consider a method in which the alternative distribution F 1 varies with the covariate variable. Second, the estimation procedure for estimating the null probabilities can be improved. The simulation results indicate that the BL method consistently beats our method with a ¼ 1, indicating a promising direction for further development of the estimation procedure. Finally, different rejection rules can be defined using different posterior probability types. Performance is predicted to vary according to the target posterior probability. We anticipate that subsequent studies will examine our method from various perspectives.
The data analysis demonstrates that the estimation of null probabilities provides valuable insights into the relationship between predictors and the features of interest. These estimates are meaningful on their own and can be utilized effectively. For instance, clustering features based on estimates can reveal additional research areas of interest. Moreover, conditional null probability estimates can specify genes relevant to specific biological processes, essential for gene-set enrichment analysis.

Supplementary data
Supplementary data are available at Bioinformatics online.

Conflict of interest
None declared.

Funding
This article is a product of the Iowa Agriculture and Home Economics Experiment Station, Ames, Iowa. Project No. IOW05658 is supported by USDA/NIFA and State of Iowa funds. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author and do not necessarily reflect the views of the U.S. Department of Agriculture. The data used in this study were generated with funding from USDA National Institute of Food and Agriculture [2017-67007-26144], Genome Canada, Genome Alberta, and PigGen Canada.

Data availability
The data were generated on commercially owned animals and, therefore, contain proprietary information. They can be made available by the corresponding author upon reasonable request. The data analyzed in the supplemental materials can be found at https://github.com/hsjeon1217/conditional_method.