- Split View
-
Views
-
Cite
Cite
Yeil Kwon, Zhigen Zhao, On F-modelling-based empirical Bayes estimation of variances, Biometrika, Volume 110, Issue 1, March 2023, Pages 69–81, https://doi.org/10.1093/biomet/asac019
- Share Icon Share
Summary
We consider the problem of empirical Bayes estimation of multiple variances when provided with sample variances. Assuming an arbitrary prior on the variances, we derive different versions of the Bayes estimators using different loss functions. For one particular loss function, the resulting Bayes estimator relies on the marginal cumulative distribution function of the sample variances only. When replacing it with the empirical distribution function, we obtain an empirical Bayes version called the |$F$|-modelling-based empirical Bayes estimator of variances. We provide theoretical properties of this estimator, and further demonstrate its advantages through extensive simulations and real data analysis.
1. Introduction
The empirical Bayes approach was introduced as a compound decision procedure in Robbins (1951) and has been widely studied thereafter (Dvoretzky et al., 1956, Robbins, 1956, Efron & Morris, 1972, 1973, 1975; Laird & Louis, 1987; Jiang & Zhang, 2009; Koenker & Gu, 2017). This approach plays an important role in the kinds of data analysis conducted during gene expression experiments, which often involve a large number of parallel inference problems.
The core idea of the empirical Bayes approach is to estimate the prior distribution either directly or indirectly using the available data, wherein the final inference is based on the posterior distribution when using this estimated prior. Efron (2014) classified empirical Bayes approaches as pursuing one of two strategies: (i) |$f$|-modelling, which is modelling on the data scale; and (ii) |$g$|-modelling, which is modelling on the parameter scale. Under |$f$|-modelling, the resulting empirical Bayes rule usually depends on the prior indirectly via the marginal probability density function; under |$g$|-modelling, the prior distribution is estimated and then plugged into the posterior calculation. It is further commented in that paper that the |$g$|-modelling approach has been widely used in theoretical investigations (Morris, 1983; Laird & Louis, 1987; Jiang & Zhang, 2009), whereas the |$f$|-modelling approaches are more prevalent in applications (Robbins, 1956; Brown & Greenshtein, 2009; Efron, 2011).
The simultaneous estimation of variances and the covariance matrix have a long history, dating back to James & Stein (1961). Haff (1980) provided a parametric empirical Bayes estimator of the covariance matrix by assuming an inv-Wishart prior distribution on the covariance matrix. Efron & Morris (1976) proposed an estimator to dominate the sample covariance. Wild (1980) considered simultaneous estimation of the variances under different loss functions. Robbins (1982) discussed a parametric empirical Bayes method for the scale mixture of Gaussians. Champion (2003) considered the shrinkage estimator of variances based on the Kullback–Leibler distance.
Heteroskedasticity is prevalent in many applications, such as microarray experiments, rendering the simultaneous estimation of variances even more important. There have been many attempts to estimate these parameters with different approaches (Tusher et al., 2001; Lönnstedt & Speed, 2002; Lin et al., 2003; Storey & Tibshirani, 2003; Tong & Wang, 2007; Koenker & Gu, 2017). Among these, there are a few parametric empirical Bayes estimators that are widely used. When assuming an inverse gamma prior, Smyth (2004) developed a parametric empirical Bayes estimator of the variances. Cui et al. (2005) approximated both the chi-square distribution and the inverse gamma prior by log-normal random variables and derived the exponential Lindley–James–Stein estimator. Lu & Stephens (2016) assumed that the prior of the variances follows a mixture of inverse gamma distributions to derive a flexible empirical Bayes estimator. These parametric empirical Bayes methods have the advantage of providing the full posterior distribution of the variances for further inference such as constructing credible intervals and performing hypothesis testing. Koenker & Gu (2017) took the |$g$|-modelling approach by estimating the probability density function of the prior distribution using a nonparametric maximum likelihood estimator (Kiefer & Wolfowitz, 1956; Koenker & Mizera, 2014).
In this work, we assume an arbitrary prior distribution |$g(\sigma^2)$| for the variances to produce a nonparametric empirical Bayes estimator. When assuming some commonly used loss functions, we derive empirical Bayes estimators for the variances by modelling on the data scale. For a particular loss function, the resulting Bayes estimator depends only on the marginal cumulative distribution function of the sample variances, |$F(s^2)$|. To the best of our knowledge, this is the first estimator for the variances that relies on the marginal cumulative distribution function rather than the marginal probability density function. To differentiate our method from the terminology used in Efron (2014), we call this estimator an |$F$|-modelling-based estimator. The advantage of the |$F$|-modelling-based estimator is that one can simply replace the marginal cumulative distribution function with the empirical distribution function to obtain the proposed empirical Bayes version, which we call the |$F$|-modelling-based empirical Bayes estimator for the variances. The computation of the proposed method is instantaneous without any tuning parameters.
It is known that the empirical distribution function converges to the true distribution function uniformly (Dvoretzky et al., 1956). As shown in § 3, the proposed empirical Bayes estimator converges to the Bayes version uniformly over a set |$\mathcal{D}_\delta=(0, D_\delta)$|, where |$D_\delta$| is a large value and tends to infinity when |$\delta$| goes to zero. We impose this condition for technical reasons, so as to prevent the denominator of the Bayes estimator being arbitrarily small. It causes little practical concern because most often one would be interested in parameters corresponding to the small and moderate sample variances that fall in |$\mathcal{D}_\delta$|. We have also derived the estimator of the variances for postselection inference and finite Bayes inference (Efron, 2019).
2. Empirical Bayes estimator for variances
The squared error loss function, |$L_0(\cdot)$|, is not scale invariant. The other three loss functions are scale invariant. The loss function |$L_1'(\cdot)$| is used in the 1964 Stanford University PhD thesis by J. B. Selliah and Ghosh & Sinha (1987). The loss function |$L_1'(\cdot)$| is equivalent to using |$L_1(\cdot)$| for estimating the precision parameters (Ghosh & Sinha, 1987). The loss function |$L_1'(\cdot)$| by nature favours underestimation because ‘underestimation has only a finite penalty, while overestimation has an infinite penalty’(Casella & Berger, 2002). This could lead to an estimator that works extremely poorly when focusing on the parameter with the smallest sample variance. On the contrary, both the loss function |$L_1(\cdot)$| and Stein’s loss function |$L_2(\cdot)$| have an infinite penalty for the underestimation. In addition, the loss function |$L_2(\cdot)$| is tied to the Kullback–Leibler divergence and the entropy loss (Haff, 1977, 1980; Wild, 1980; Ghosh & Sinha, 1987). A potential drawback of the loss function |$L_1(\cdot)$| is that it imposes a finite penalty on the overestimation.
In this article, we derive empirical Bayes estimators with respect to the scale-invariant loss functions |$L_1'(\cdot)$|, |$L_1(\cdot)$| and |$L_2(\cdot)$| by modelling on the data scale. We start with the loss function |$L_1'(\cdot)$|, where |$\hat{\sigma}_{B,[1:N]}^{'2}=(\hat{\sigma}_{1,B}^{'2}, \hat{\sigma}_{2,B}^{'2},\ldots, \hat{\sigma}_{N,B}^{'2})$| is the corresponding Bayes rule.
Next, consider Stein’s loss |$L_2(\cdot)$| and let |$\hat{\sigma}^2_{{\rm Stein},[1:N]}=(\hat{\sigma}^2_{1,{\rm Stein}}, \hat{\sigma}^2_{2,{\rm Stein}},\ldots,\hat{\sigma}^2_{N,{\rm Stein}})$| be the corresponding Bayes rule. Then we have the following theorem.
When assuming Stein’s loss, the empirical Bayes estimator does not require the estimation of the second derivative of the marginal probability density function. However, it still relies on the marginal density function and its first-order derivative. The nonparametric estimation of the density function and its derivatives is a challenging problem, not to mention that the estimation accuracy on the tail becomes even worse. Additionally, the commonly used approaches such as kernel density estimation relies on the choice of tuning parameters, which are difficult to choose in practice.
Next, we consider the loss function |$L_1(\cdot)$| and the corresponding Bayes decision rule |$\hat{\sigma}_{B, [1:N]}^2=(\hat{\sigma}_{1,B}^2, \hat{\sigma}_{2,B}^2,\ldots, \hat{\sigma}_{N,B}^2)$|. We have the following theorem.
The proposed estimator is calculated instantaneously and does not involve any tuning parameters.
As an example, we might be interested in the variances corresponding to the |$K$| smallest sample variances. In other words, order the sample variances |$s_i^2$| increasingly as |$s_{(1)}^2 \leqslant s_{(2)}^2\leqslant \cdots \leqslant s_{(N)}^2$|. Let |$\sigma^2_{(i)}$| be the parameter corresponding to |$s_{(i)}^2$|. Set |$\mathcal{C}=\{i\colon s_i^2 \leqslant s_{(K)}^2\}$|.
This implies that the posterior distribution of |$\sigma_i^2$| when conditioning on both the data and the selection set is the same as the posterior distribution of |$\sigma^2$| conditioning on the data. Consequently, the Bayes rule based on the selection remains the same and it is immune to the selection (Dawid, 1994). We therefore propose to estimate |$\sigma_i^2, i\in\mathcal{C}$|, according to (5) without adjustment. This argument is true because the full dataset is available for the postselection inference. Otherwise, the Bayes rule might be affected by the selection. For instance, if only the data post the selection are available for further inference, then the Bayes rule needs to be corrected for such a selection rule. See Yekutieli (2012) for a full discussion on this issue.
3. Theoretical properties
First, we study the numerator and denominator separately.
Since |$\int_0^\infty (s^2)^{-({k}/{2}-1)}\,{\rm d}F(s^2) <\infty$|, then |$\mathcal{D}^\delta= (0, D_\delta)$| for some positive number |$D_\delta$|. We then have the following theorem.
In practice, especially when focusing on parameters with small sample variances, this modification does not make much difference.
We can extend the result to the postselection inference and finite Bayes inference.
As commented in § 2, the Bayes estimator is immune to the selection rule |$\mathcal{C}$|, and the empirical Bayes estimator could be a good approximation of the Bayes estimator. However, the discrepancy between these two widens when focusing on the selected case (Pan et al., 2017), and some correction is needed (Hwang & Zhao, 2013). On the other hand, Corollary 1 indicates that the proposed |$F$|-modelling-based empirical Bayes estimator converges to the corresponding Bayes version if |$s_{i}^2\in \mathcal{D}^\delta, i\in\mathcal{C}$|. In other words, we do not need to make further corrections for the selection.
Similarly, when considering the finite Bayes inference, the uniform convergence of the proposed estimator guarantees a good estimation as long as |$s_0^2\in \mathcal{D}^\delta$|.
4. Numerical studies
Similarly, we include two versions of variance adaptive shrinkage estimators: the original version, vash, and the modified version, mvash, in our simulation studies.
Let |$(\sigma_i^2, s_i^2), i=1,2,\ldots, N$| be the parameters, and let the sample variances be generated according to (1), where the degree of freedom |$k$| is chosen as 5 and the prior |$g(\sigma^2)$| is chosen from the following settings.
Setting I. Let |$\sigma_i^2\sim$| inverse gamma distribution: |${\small{\text{IG}}}(a,1)$| with |$a=10$| and |$6$|.
Setting II. Let |$\sigma_i^2\sim$| mixture of inverse gamma distributions: |$0.2{\small{\text{IG}}}(a,1)+0.4{\small{\text{IG}}}(8,6) + 0.4{\small{\text{IG}}}(9, 19)$|, where |$a=10$| and |$6$|.
Setting III. Let |$\sigma_i^2=a$| with |$0.4$| probability and |$1/a$| with |$0.6$| probability, where |$a=3$| and |$4$|.
Setting IV. Let |$\sigma_i^2\sim$| mixture of inverse Gaussian distributions: |$0.4\text{InvGauss}(1/a, 1) + 0.6 \text{InvGauss}(a, a^4)$|, where |$a=3$| and |$4$|.
For all simulations, we set |$N=1000$| and the number of replications as 500. For each replication, we generate the data |$(\sigma_i^2,s_i^2)$| and order them according to the sample variances increasing. We consider three different selection rules: (i) the parameters corresponding to the 1|$\%$| smallest sample variances, (ii) the parameters corresponding to the 5|$\%$| smallest sample variances and (iii) all the parameters. We calculate the estimated values based on the aforementioned methods. The risks associated with the loss function (7) are calculated and reported in Table 1 and the Supplementary Material. In our numerical studies, it is shown that the two |$f$|-modelling estimators defined in (3) and (4) perform poorly, and the results are not reported in the tables. The proposed |$F$|-modelling-based empirical Bayes estimator performs the best among all the estimators considered. The modified Smyth method and modified variance adaptive shrinkage method perform similarly under these settings. Under Setting I, when the prior of the variance is an inverse gamma distribution, the proposed method, the modified Smyth method and modified variance adaptive shrinkage method are essentially the same. However, for Settings II to IV, when the prior distribution is not an inverse gamma distribution, the proposed method outperforms all other competing methods, including the modified Smyth method and the modified variance adaptive shrinkage method.
Setting . | |$a$| . | Selection rule . | |$s^2$| . | ELJS . | TW . | Smyth . | mSmyth . | vash . | mvash . | REBayes . | Proposed . |
---|---|---|---|---|---|---|---|---|---|---|---|
1|$\%$| | 2.60 | |$-0.48$| | |$-0.72$| | |$-0.90$| | |$-1.06$| | |$-0.87$| | |$-1.06$| | |$-0.65$| | |$-1.06$| | ||
I | 10 | 5|$\%$| | 2.00 | |$-0.70$| | |$-0.87$| | |$-0.89$| | |$-1.05$| | |$-0.88$| | |$-1.05$| | |$-0.92$| | |$-1.05$| |
All | 0.77 | |$-0.94$| | |$-0.98$| | |$-0.91$| | |$ -1.05$| | |$-0.92$| | |$-1.05$| | |$-0.97$| | |$-1.03$| | ||
1|$\%$| | 2.34 | 1.05 | 0.45 | |$-0.14$| | |$-0.21$| | |$0.87$| | |$-0.10$| | |$-0.05$| | |$-0.22$| | ||
II | 10 | 5|$\%$| | 1.79 | 0.62 | 0.17 | |$-0.10$| | |$-0.20$| | 0.74 | |$-0.11$| | |$-0.06$| | |$-0.22$| |
All | 0.75 | 0.01 | 0.00 | 0.14 | |$-0.43$| | 0.26 | |$-0.48$| | |$-0.38$| | |$-0.52$| | ||
1|$\%$| | 2.22 | 1.15 | 0.88 | |$-0.28$| | |$-0.48$| | |$-0.26$| | |$-0.49$| | |$-0.50$| | |$-0.60$| | ||
III | 4 | 5|$\%$| | 1.72 | 0.74 | 0.53 | |$-0.06 $| | |$-0.36$| | |$ -0.05$| | |$-0.37$| | |$-0.22$| | |$-$|0.39 |
All | 0.69 | 0.10 | 0.16 | 0.26 | |$-0.35$| | 0.27 | |$-0.35$| | |$-0.32$| | |$-0.58$| | ||
1|$\%$| | 2.28 | 1.26 | 0.97 | |$ -0.08$| | |$-0.28$| | |$-0.06$| | |$-0.28$| | |$-0.13$| | |$-0.28$| | ||
IV | 4 | 5|$\%$| | 1.73 | 0.77 | 0.53 | |$-0.13$| | |$-0.28$| | |$-0.11$| | |$-0.29$| | |$-0.22$| | |$-0.32$| |
All | 0.72 | 0.14 | 0.20 | 0.29 | |$-0.34$| | 0.30 | |$-0.34$| | |$-0.30$| | |$-0.56$| |
Setting . | |$a$| . | Selection rule . | |$s^2$| . | ELJS . | TW . | Smyth . | mSmyth . | vash . | mvash . | REBayes . | Proposed . |
---|---|---|---|---|---|---|---|---|---|---|---|
1|$\%$| | 2.60 | |$-0.48$| | |$-0.72$| | |$-0.90$| | |$-1.06$| | |$-0.87$| | |$-1.06$| | |$-0.65$| | |$-1.06$| | ||
I | 10 | 5|$\%$| | 2.00 | |$-0.70$| | |$-0.87$| | |$-0.89$| | |$-1.05$| | |$-0.88$| | |$-1.05$| | |$-0.92$| | |$-1.05$| |
All | 0.77 | |$-0.94$| | |$-0.98$| | |$-0.91$| | |$ -1.05$| | |$-0.92$| | |$-1.05$| | |$-0.97$| | |$-1.03$| | ||
1|$\%$| | 2.34 | 1.05 | 0.45 | |$-0.14$| | |$-0.21$| | |$0.87$| | |$-0.10$| | |$-0.05$| | |$-0.22$| | ||
II | 10 | 5|$\%$| | 1.79 | 0.62 | 0.17 | |$-0.10$| | |$-0.20$| | 0.74 | |$-0.11$| | |$-0.06$| | |$-0.22$| |
All | 0.75 | 0.01 | 0.00 | 0.14 | |$-0.43$| | 0.26 | |$-0.48$| | |$-0.38$| | |$-0.52$| | ||
1|$\%$| | 2.22 | 1.15 | 0.88 | |$-0.28$| | |$-0.48$| | |$-0.26$| | |$-0.49$| | |$-0.50$| | |$-0.60$| | ||
III | 4 | 5|$\%$| | 1.72 | 0.74 | 0.53 | |$-0.06 $| | |$-0.36$| | |$ -0.05$| | |$-0.37$| | |$-0.22$| | |$-$|0.39 |
All | 0.69 | 0.10 | 0.16 | 0.26 | |$-0.35$| | 0.27 | |$-0.35$| | |$-0.32$| | |$-0.58$| | ||
1|$\%$| | 2.28 | 1.26 | 0.97 | |$ -0.08$| | |$-0.28$| | |$-0.06$| | |$-0.28$| | |$-0.13$| | |$-0.28$| | ||
IV | 4 | 5|$\%$| | 1.73 | 0.77 | 0.53 | |$-0.13$| | |$-0.28$| | |$-0.11$| | |$-0.29$| | |$-0.22$| | |$-0.32$| |
All | 0.72 | 0.14 | 0.20 | 0.29 | |$-0.34$| | 0.30 | |$-0.34$| | |$-0.30$| | |$-0.56$| |
Setting . | |$a$| . | Selection rule . | |$s^2$| . | ELJS . | TW . | Smyth . | mSmyth . | vash . | mvash . | REBayes . | Proposed . |
---|---|---|---|---|---|---|---|---|---|---|---|
1|$\%$| | 2.60 | |$-0.48$| | |$-0.72$| | |$-0.90$| | |$-1.06$| | |$-0.87$| | |$-1.06$| | |$-0.65$| | |$-1.06$| | ||
I | 10 | 5|$\%$| | 2.00 | |$-0.70$| | |$-0.87$| | |$-0.89$| | |$-1.05$| | |$-0.88$| | |$-1.05$| | |$-0.92$| | |$-1.05$| |
All | 0.77 | |$-0.94$| | |$-0.98$| | |$-0.91$| | |$ -1.05$| | |$-0.92$| | |$-1.05$| | |$-0.97$| | |$-1.03$| | ||
1|$\%$| | 2.34 | 1.05 | 0.45 | |$-0.14$| | |$-0.21$| | |$0.87$| | |$-0.10$| | |$-0.05$| | |$-0.22$| | ||
II | 10 | 5|$\%$| | 1.79 | 0.62 | 0.17 | |$-0.10$| | |$-0.20$| | 0.74 | |$-0.11$| | |$-0.06$| | |$-0.22$| |
All | 0.75 | 0.01 | 0.00 | 0.14 | |$-0.43$| | 0.26 | |$-0.48$| | |$-0.38$| | |$-0.52$| | ||
1|$\%$| | 2.22 | 1.15 | 0.88 | |$-0.28$| | |$-0.48$| | |$-0.26$| | |$-0.49$| | |$-0.50$| | |$-0.60$| | ||
III | 4 | 5|$\%$| | 1.72 | 0.74 | 0.53 | |$-0.06 $| | |$-0.36$| | |$ -0.05$| | |$-0.37$| | |$-0.22$| | |$-$|0.39 |
All | 0.69 | 0.10 | 0.16 | 0.26 | |$-0.35$| | 0.27 | |$-0.35$| | |$-0.32$| | |$-0.58$| | ||
1|$\%$| | 2.28 | 1.26 | 0.97 | |$ -0.08$| | |$-0.28$| | |$-0.06$| | |$-0.28$| | |$-0.13$| | |$-0.28$| | ||
IV | 4 | 5|$\%$| | 1.73 | 0.77 | 0.53 | |$-0.13$| | |$-0.28$| | |$-0.11$| | |$-0.29$| | |$-0.22$| | |$-0.32$| |
All | 0.72 | 0.14 | 0.20 | 0.29 | |$-0.34$| | 0.30 | |$-0.34$| | |$-0.30$| | |$-0.56$| |
Setting . | |$a$| . | Selection rule . | |$s^2$| . | ELJS . | TW . | Smyth . | mSmyth . | vash . | mvash . | REBayes . | Proposed . |
---|---|---|---|---|---|---|---|---|---|---|---|
1|$\%$| | 2.60 | |$-0.48$| | |$-0.72$| | |$-0.90$| | |$-1.06$| | |$-0.87$| | |$-1.06$| | |$-0.65$| | |$-1.06$| | ||
I | 10 | 5|$\%$| | 2.00 | |$-0.70$| | |$-0.87$| | |$-0.89$| | |$-1.05$| | |$-0.88$| | |$-1.05$| | |$-0.92$| | |$-1.05$| |
All | 0.77 | |$-0.94$| | |$-0.98$| | |$-0.91$| | |$ -1.05$| | |$-0.92$| | |$-1.05$| | |$-0.97$| | |$-1.03$| | ||
1|$\%$| | 2.34 | 1.05 | 0.45 | |$-0.14$| | |$-0.21$| | |$0.87$| | |$-0.10$| | |$-0.05$| | |$-0.22$| | ||
II | 10 | 5|$\%$| | 1.79 | 0.62 | 0.17 | |$-0.10$| | |$-0.20$| | 0.74 | |$-0.11$| | |$-0.06$| | |$-0.22$| |
All | 0.75 | 0.01 | 0.00 | 0.14 | |$-0.43$| | 0.26 | |$-0.48$| | |$-0.38$| | |$-0.52$| | ||
1|$\%$| | 2.22 | 1.15 | 0.88 | |$-0.28$| | |$-0.48$| | |$-0.26$| | |$-0.49$| | |$-0.50$| | |$-0.60$| | ||
III | 4 | 5|$\%$| | 1.72 | 0.74 | 0.53 | |$-0.06 $| | |$-0.36$| | |$ -0.05$| | |$-0.37$| | |$-0.22$| | |$-$|0.39 |
All | 0.69 | 0.10 | 0.16 | 0.26 | |$-0.35$| | 0.27 | |$-0.35$| | |$-0.32$| | |$-0.58$| | ||
1|$\%$| | 2.28 | 1.26 | 0.97 | |$ -0.08$| | |$-0.28$| | |$-0.06$| | |$-0.28$| | |$-0.13$| | |$-0.28$| | ||
IV | 4 | 5|$\%$| | 1.73 | 0.77 | 0.53 | |$-0.13$| | |$-0.28$| | |$-0.11$| | |$-0.29$| | |$-0.22$| | |$-0.32$| |
All | 0.72 | 0.14 | 0.20 | 0.29 | |$-0.34$| | 0.30 | |$-0.34$| | |$-0.30$| | |$-0.56$| |
Next, we consider the finite Bayes inference problem. Namely, for each generated dataset |$s^2_{[1:N]}$| and a new observation |$s_0^2$|, we calculate the estimated values based on different approaches and calculate the risk according to the loss function (6). The risks are reported in Table 2 and the Supplementary Material. Overall, the proposed |$F$|-modelling-based empirical Bayes estimator performs the best among all the estimators considered. The modified Smyth method and modified variance adaptive shrinkage method are essentially the same. Under Setting I, when the prior of the variance is an inverse gamma distribution, the proposed method, the modified Smyth method and modified variance adaptive shrinkage method perform similarly with negligible differences. However, for Settings II to IV, when the prior distribution is not an inverse gamma distribution, the proposed method outperforms all other competing methods.
Setting . | |$(a)$| . | |$s^2$| . | ELJS . | TW . | Smyth . | mSmyth . | vash . | mvash . | REBayes . | Proposed . |
---|---|---|---|---|---|---|---|---|---|---|
I | 10 | 0.38 | 0.16 | |$-1.05$| | |$-0.96$| | |$-1.06 $| | |$-0.96$| | |$-1.07$| | |$-1.02$| | |$-1.03$| |
II | 10 | 0.36 | 0.14 | |$ -0.11$| | 0.01 | |$-0.48 $| | |$-0.02 $| | |$-0.50$| | |$-0.51 $| | |$-0.55 $| |
III | 4 | 0.92 | 0.72 | 0.23 | 0.23 | |$-0.36$| | 0.25 | |$-0.36 $| | |$-0.31$| | |$ -0.47$| |
IV | 4 | 0.70 | 0.49 | 0.25 | 0.37 | |$-0.30$| | 0.38 | |$-0.29$| | |$-0.10$| | |$-0.51$| |
Setting . | |$(a)$| . | |$s^2$| . | ELJS . | TW . | Smyth . | mSmyth . | vash . | mvash . | REBayes . | Proposed . |
---|---|---|---|---|---|---|---|---|---|---|
I | 10 | 0.38 | 0.16 | |$-1.05$| | |$-0.96$| | |$-1.06 $| | |$-0.96$| | |$-1.07$| | |$-1.02$| | |$-1.03$| |
II | 10 | 0.36 | 0.14 | |$ -0.11$| | 0.01 | |$-0.48 $| | |$-0.02 $| | |$-0.50$| | |$-0.51 $| | |$-0.55 $| |
III | 4 | 0.92 | 0.72 | 0.23 | 0.23 | |$-0.36$| | 0.25 | |$-0.36 $| | |$-0.31$| | |$ -0.47$| |
IV | 4 | 0.70 | 0.49 | 0.25 | 0.37 | |$-0.30$| | 0.38 | |$-0.29$| | |$-0.10$| | |$-0.51$| |
Setting . | |$(a)$| . | |$s^2$| . | ELJS . | TW . | Smyth . | mSmyth . | vash . | mvash . | REBayes . | Proposed . |
---|---|---|---|---|---|---|---|---|---|---|
I | 10 | 0.38 | 0.16 | |$-1.05$| | |$-0.96$| | |$-1.06 $| | |$-0.96$| | |$-1.07$| | |$-1.02$| | |$-1.03$| |
II | 10 | 0.36 | 0.14 | |$ -0.11$| | 0.01 | |$-0.48 $| | |$-0.02 $| | |$-0.50$| | |$-0.51 $| | |$-0.55 $| |
III | 4 | 0.92 | 0.72 | 0.23 | 0.23 | |$-0.36$| | 0.25 | |$-0.36 $| | |$-0.31$| | |$ -0.47$| |
IV | 4 | 0.70 | 0.49 | 0.25 | 0.37 | |$-0.30$| | 0.38 | |$-0.29$| | |$-0.10$| | |$-0.51$| |
Setting . | |$(a)$| . | |$s^2$| . | ELJS . | TW . | Smyth . | mSmyth . | vash . | mvash . | REBayes . | Proposed . |
---|---|---|---|---|---|---|---|---|---|---|
I | 10 | 0.38 | 0.16 | |$-1.05$| | |$-0.96$| | |$-1.06 $| | |$-0.96$| | |$-1.07$| | |$-1.02$| | |$-1.03$| |
II | 10 | 0.36 | 0.14 | |$ -0.11$| | 0.01 | |$-0.48 $| | |$-0.02 $| | |$-0.50$| | |$-0.51 $| | |$-0.55 $| |
III | 4 | 0.92 | 0.72 | 0.23 | 0.23 | |$-0.36$| | 0.25 | |$-0.36 $| | |$-0.31$| | |$ -0.47$| |
IV | 4 | 0.70 | 0.49 | 0.25 | 0.37 | |$-0.30$| | 0.38 | |$-0.29$| | |$-0.10$| | |$-0.51$| |
5. Real data analysis
We declare the |$i$|th gene, where |$i=1,2,\ldots,N$|, to be significant if the corresponding interval does not enclose zero. We do the same for the other subgroup. We call the decision of the |$i$|th gene discordant if the interval based on the first subgroup does (does not) enclose zero while the interval based on the second subgroup does not (does) enclose zero. If a decision is discordant, this implies that a significant conclusion based on one subgroup cannot be replicated by the other. We repeat these steps 500 times to calculate the average proportions of discordant decisions. We perform the same calculation for the colon cancer data by splitting the data into the patient group and healthy group.
In Fig. 1, we plot the box plots of the rate of discordant decisions. The average percentage of discordant decisions are reported in Table 3. It is seen that the proposed method, the modified Smyth and modified variance adaptive shrinkage estimator produce a similar number of discordance decisions. This number is substantially smaller than all the other competing methods.
Data . | |$s^2$| . | ELJS . | TW . | Smyth . | mSmyth . | vash . | mvash . | REBayes . | Proposed . |
---|---|---|---|---|---|---|---|---|---|
Colon | 0.27 | 0.20 | 0.20 | 0.20 | 0.17 | 0.20 | 0.17 | 0.19 | 0.17 |
Lukemia | 0.23 | 0.15 | 0.15 | 0.15 | 0.13 | 0.16 | 0.13 | 0.14 | 0.13 |
Data . | |$s^2$| . | ELJS . | TW . | Smyth . | mSmyth . | vash . | mvash . | REBayes . | Proposed . |
---|---|---|---|---|---|---|---|---|---|
Colon | 0.27 | 0.20 | 0.20 | 0.20 | 0.17 | 0.20 | 0.17 | 0.19 | 0.17 |
Lukemia | 0.23 | 0.15 | 0.15 | 0.15 | 0.13 | 0.16 | 0.13 | 0.14 | 0.13 |
Data . | |$s^2$| . | ELJS . | TW . | Smyth . | mSmyth . | vash . | mvash . | REBayes . | Proposed . |
---|---|---|---|---|---|---|---|---|---|
Colon | 0.27 | 0.20 | 0.20 | 0.20 | 0.17 | 0.20 | 0.17 | 0.19 | 0.17 |
Lukemia | 0.23 | 0.15 | 0.15 | 0.15 | 0.13 | 0.16 | 0.13 | 0.14 | 0.13 |
Data . | |$s^2$| . | ELJS . | TW . | Smyth . | mSmyth . | vash . | mvash . | REBayes . | Proposed . |
---|---|---|---|---|---|---|---|---|---|
Colon | 0.27 | 0.20 | 0.20 | 0.20 | 0.17 | 0.20 | 0.17 | 0.19 | 0.17 |
Lukemia | 0.23 | 0.15 | 0.15 | 0.15 | 0.13 | 0.16 | 0.13 | 0.14 | 0.13 |
To further investigate why these three methods perform similarly, we test the hypothesis that the distribution of the sample variances is the convolution of a scaled chi-square distribution and an inverse gamma distribution. The Kolmogorov–Smirnov test statistics for the colon dataset and leukemia dataset are 0.014 and 0.017, respectively. The resulting |$p$|-values are 0.80 and 0.031, respectively. In other words, there is no evidence to reject the null hypothesis that states that the prior is an inverse gamma distribution for the colon data, and there is only moderate evidence to reject the null hypothesis for the leukemia data. We expect to see similar performances for these three methods.
The code for the simulations and real data analysis is available on github: https://github.com/zhaozhg81/FEBV.
6. Conclusion
The proposed method is developed under (1), assuming a scaled chi-square distribution with equal degrees of freedom. The Bayes estimator in Theorem 4 still applies when the degrees of freedom are different. However, the estimation of the cumulative distribution function requires that the sample variances are identically distributed. Therefore, the proposed method could not be directly applied to cases with unequal degrees of freedom. In practice, we take a slightly conservative approach by considering the smallest degrees of freedom as the common one. Many parametric empirical Bayesian approaches based on |$g$|-modelling estimate the prior distribution explicitly and can handle unequal degrees of freedom.
In the real data analysis, we use the estimator of the variances as a plug-in estimator for inferring the mean parameters. One natural follow-up challenge to address is how to obtain a nonparametric empirical Bayes estimator of the means, assuming arbitrary priors for both the means and the variances. Given the observed advantages of the |$F$|-modelling-based approach, we would like to further extend this framework to broader settings in future research. We will further study the properties of the |$F$|-modelling-based approach under the decision theoretical framework.
Acknowledgement
The authors were supported by the National Science Foundation. The authors thank the associate editor and the reviewers for comments that substantially helped improve the quality of the paper, and Mr. Matthew P. MacNaughton for editing the manuscript.
Supplementary material
The Supplementary Material includes technical details and additional simulations.