- Split View
-
Views
-
Cite
Cite
Yuanpei Cao, Wei Lin, Hongzhe Li, Two-sample tests of high-dimensional means for compositional data, Biometrika, Volume 105, Issue 1, March 2018, Pages 115–132, https://doi.org/10.1093/biomet/asx060
- Share Icon Share
Summary
Compositional data are ubiquitous in many scientific endeavours. Motivated by microbiome and metagenomic research, we consider a two-sample testing problem for high-dimensional compositional data and formulate a testable hypothesis of compositional equivalence for the means of two latent log basis vectors. We propose a test through the centred log-ratio transformation of the compositions. The asymptotic null distribution of the test statistic is derived and its power against sparse alternatives is investigated. A modified test for paired samples is also considered. Simulations show that the proposed tests can be significantly more powerful than tests that are applied to the raw and log-transformed compositions. The usefulness of our tests is illustrated by applications to gut microbiome composition in obesity and Crohn’s disease.
1. Introduction
Compositional data, which belong to a simplex, are ubiquitous in scientific disciplines such as geology, economics and genomics. This paper is motivated by microbiome and metagenomic research, where relative abundances of hundreds to thousands of bacterial taxa on a few tens or hundreds of individuals are available for analysis (The Human Microbiome Project Consortium, 2012). Due to varying amounts of DNA-generating material across different samples, sequencing read counts are often normalized to relative abundances; the resulting data are therefore compositional (Li, 2015). One fundamental problem in microbiome data analysis is to test whether two populations have the same microbiome composition, which can be viewed as a two-sample testing problem for high-dimensional compositional data. Since the components of a composition must sum to unity, directly applying standard multivariate statistical methods intended for unconstrained data to compositional data may result in inappropriate or misleading inferences (Aitchison, 2003).
Various methods for compositional data analysis have been developed since the seminal work of Aitchison (1982). Most existing methods for two-sample testing, however, deal only with the low-dimensional setting where the dimensionality is smaller than the sample size; see, for example, the generalized likelihood ratio tests discussed in Aitchison (2003, § 7|$.$|5). In this paper, we consider the two-sample testing problem for high-dimensional compositional data, where compositions in the |$(p-1)$|-dimensional simplex |$\mathcal{S}^{p-1}$| are thought of as arising from latent basis vectors in the |$p$|-dimensional positive orthant |$\mathbb{R}_+^p$|. In microbiome studies, the basis components may represent the true abundances of bacterial taxa in a microbial community such as the gut of a healthy individual (Li, 2015). To circumvent the nonidentifiability issue associated with the basis vectors, we formulate a testable hypothesis of compositional equivalence for the means of two log basis vectors. We then propose a test through the centred log-ratio transformation of the compositions. The proposed test is therefore scale-invariant, which is crucial for compositional data analysis.
We emphasize here the extrinsic analysis point of view in compositional data analysis (Aitchison, 1982), which leads to biologically meaningful interpretations, in contrast to intrinsic analysis, where interest lies solely in the composition. Classical extrinsic analysis however, primarily concerns problems where the bases are observed, and thus differs radically from the focus of this paper.
The development of tests for the equality of two high-dimensional means has received much attention; see, for instance, Bai & Saranadasa (1996), Srivastava & Du (2008), Srivastava (2009), Chen & Qin (2010) and Cai et al. (2014). Such tests, however, are not directly applicable to high-dimensional compositional data because the required regularity conditions are generally not met. For example, the covariance matrix of compositional data is singular, thereby violating the usual assumptions on the eigenvalues of the covariance matrix, such as those in Cai et al. (2014). Our assumptions are imposed on the latent log basis vectors, which are free of the simplex constraint. We show that, under mild conditions, the centred log-ratio variables satisfy certain desired properties, which guarantee the validity of the proposed test. Then the asymptotic null distribution of the test statistic is derived and the power of the test against sparse alternatives is investigated. The proposed two-sample test is further modified to accommodate paired samples. All proofs are deferred to the Appendix.
2. A testable hypothesis of compositional equivalence
These hypotheses, however, are not testable through the observed compositional data |$X^{(k)}$| (|$k=1,2$|). Clearly, a basis is determined by its composition only up to a multiplicative factor, and the set of bases giving rise to a composition |$x\in\mathcal{S}^{p-1}$| forms the equivalence class |$\mathcal{W}(x)=\{(tx_1,\dots,tx_p):t>0\}$| (Aitchison, 2003, p. 32). As an immediate consequence, a log basis vector is determined by the resulting composition only up to an additive constant, and the set of log basis vectors corresponding to |$x$| constitutes the equivalence class |$\mathcal{Z}(x)=\{(\log x_1+c,\dots,\log x_p+c):c\in\mathbb{R}\}$|. We therefore introduce the following definition.
Two log basis vectors |$z_1$| and |$z_2$| are said to be compositionally equivalent if their components differ by a constant |$c\in\mathbb{R}$|, i.e., |$z_1=z_2+c1_p$|, where |$1_p$| is the |$p$|-vector of ones.
3. Tests for compositional equivalence
3.1. The centred log-ratio transformation and an equivalent hypothesis
Despite this equivalence, the hypotheses in (3) are meaningful only when the bases exist, which is the case in microbiome studies. On the other hand, the hypotheses in (8) concern only the compositions through the centred log ratios, from which their scale-invariance and testability using the observed compositional data are evident.
3.2. A two-sample test for compositional equivalence
Although |$M_n$| is similar to the test statistic |$M_I$| defined in Cai et al. (2014), the theoretical analysis is radically different, in that our assumptions are not imposed on the observed variables. Besides, the test statistic based on a linear transformation by the precision matrix that Cai et al. (2014) proposed is not considered here, because the covariance matrix of |$Y_1^{(k)}$| is singular and so its precision matrix is not well defined.
3.3. A paired test for compositional equivalence
4. Theoretical results
4.1. Assumptions and implications
Denote the correlation matrices of |$Z_1^{(k)}$| and |$Y_1^{(k)}$| by |$R=(\rho_{ij})$| and |$T=(\tau_{ij})$|, respectively.
We first impose the following conditions on the covariance structures of the log basis variables:
|$1/\kappa_1{\leqslant}\omega_{jj}{\leqslant}\kappa_1$| for |$j=1,\dots,p$| and some constant |$\kappa_1>0$|;
|$\max_{1{\leqslant} i<j{\leqslant} p}|\rho_{ij}|{\leqslant} r_1$| for some constant |$0<r_1<1$|;
|$\max_{1{\leqslant} j{\leqslant} p}\sum_{i=1}^p\rho_{ij}^2{\leqslant} r_2$| for some constant |$r_2>0$|.
Conditions 1–3 are mild and standard in the high-dimensional testing literature. Condition 1 requires that the variances be bounded away from zero and infinity. Condition 2 is mild since |$\max_{1{\leqslant} i<j{\leqslant} p}|\rho_{ij}|=1$| would imply that |$\Omega$| is singular. Condition 3 is weaker than the usual assumption that the maximum eigenvalue of |$R$| is bounded.
Under Conditions 1–3, the following proposition shows that similar properties are satisfied by the centred log-ratio covariance matrix |$\Gamma$| and correlation matrix |$T$|.
Assume that Conditions 1–3 hold. Then, for sufficiently large |$p$|, the centred log-ratio covariance matrix |$\Gamma$| and correlation matrix |$T$| satisfy the following properties:
(i) |$1/\kappa_2{\leqslant}\gamma_{jj}{\leqslant}\kappa_2$|for|$j=1,\dots,p$|and some constant|$\kappa_2>0$|;
(ii) |$\max_{1{\leqslant} i<j{\leqslant} p}|\tau_{ij}|{\leqslant} r_3$|for some constant|$0<r_3<1$|; and
(iii) |$\max_{1{\leqslant} j{\leqslant} p}\sum_{i=1}^p\tau_{ij}^2{\leqslant} r_4$|for some constant|$r_4>0$|.
We also need a moment condition on the log basis variables and a restriction on the dimensionality.
We have |$n_1\asymp n_2\asymp n$| and |$\log p=o(n^{1/3})$|, where |$n=n_1n_2/(n_1+n_2)$|.
Condition 4 is a popular sub-Gaussian tail assumption that can easily be relaxed to the case of polynomial tails. It allows us to establish the following concentration properties for the centred log-ratio variables and the pooled-sample variances.
4.2. Asymptotic properties of the two-sample test
We are now in a position to state our main results concerning the asymptotic properties of the proposed two-sample test. The validity of the test relies on the fact that certain desired properties of the centred log-ratio variables can be related to those of the log basis variables, which have been established in Propositions 1 and 2. The following theorem derives the asymptotic null distribution of |$M_n$| defined in (9).
Compared with the usual sparse alternatives in the literature, such as in Cai et al. (2014), all components in the alternative (15) are shifted by a term of order |$O\{\|\delta\|_1p^{-1}(\log p/n)^{1/2}\}$|. To prevent this term from interfering with signals of order at least |$O\{(\log p/n)^{1/2}\}$|, it suffices to assume that |$\|\delta\|_1=o(p)$|. This key observation leads to the following theorem concerning the asymptotic power of |$\Phi_\alpha$| defined in (10).
Assume that Conditions 1 and 3–5 hold. Under|$H_1$|in (13), if|$\|\delta\|_1=o(p)$|and|$\max_{j\in S}|\delta_j|{\geqslant}\surd{2}+{\varepsilon}$|for some constant|${\varepsilon}>0$|, then|${\rm pr}(\Phi_\alpha=1)\to1$|as|$n,p\to\infty$|.
Two remarks on Theorem 2 are in order. First, if the signals |$\delta_i$| are bounded, then the condition |$\|\delta\|_1=o(p)$| holds provided the alternative (13) is sparse in the sense that |$s=o(p)$|. Second, by Theorem 3 of Cai et al. (2014), the condition |$\max_{j\in S}|\delta_j|{\geqslant}\surd{2}+{\varepsilon}$| is minimax rate-optimal for testing sparse alternatives in the classical two-sample testing problem. Thus, the proposed test achieves the best possible rate even though the bases are not observed.
5. Simulation studies
We conducted simulation studies to evaluate the numerical performance of the proposed two-sample and paired tests. For comparison, we consider counterparts applied to the raw and log-transformed compositions, which are obtained by replacing |$Y^{(k)}$| in the proposed tests with |$X^{(k)}$| and |$\log X^{(k)}$|, respectively. The oracle tests based on the unobserved |$W^{(k)}$|, though impracticable, serve as the benchmarks for comparison.
We first examine the case of two independent samples. The simulated data were generated as follows. We first generated |$Z^{(k)}$| from the following distributions:
(i) multivariate normal distribution, |$Z_i^{(k)}\sim N_p(\tilde\mu_k,\Omega)$|;
(ii) multivariate gamma distribution, |$Z_i^{(k)}=\tilde\mu_k+FU_i^{(k)}/\surd10$|, where |$F$| is a |$p\times p$| matrix |$F=QS^{1/2}$| with |$Q$| and |$S$| obtained from the singular value decomposition |$\Omega=QSQ^{{\mathrm{\scriptscriptstyle T}}}$|, and the components of |$U_k$| were generated independently from the standard gamma distribution with shape parameter 10.
(i) banded covariance, |$\Omega=D^{1/2}AD^{1/2}$|, where |$A$| has nonzero entries |$a_{jj}=1$| and |$a_{j-1,j}=a_{j,j+1}=-0{\cdot}5$|, and |$D$| is a diagonal matrix with entries drawn from |${\rm Un}(1,3)$|;
(ii) sparse covariance, |$\Omega={\rm diag}(A_1,A_2)$| with |$A_1=B+{\varepsilon} I_q$| and |$A_2=I_{p-q}$|, where |$q=\lfloor 3p^{1/2}\rfloor$|, |$B$| is a symmetric matrix with lower-triangular entries drawn from the uniform distribution on |$[-1,-0{\cdot}5]\cup[0{\cdot}5,1]$| with probability |$0{\cdot}5$| and set to 0 with probability |$0{\cdot}5$|, and |${\varepsilon}=\max\{-\lambda_{\min}(B),0\}+0{\cdot}05$| with |$\lambda_{\min}(\cdot)$| denoting the smallest eigenvalue of a matrix.
We took the sample sizes to be |$n_1=n_2=100$| for two independent samples and |$n=100$| for paired samples, with varying dimensions |$p=50,100$| and |$200$|. We repeated the simulation 1000 times for each setting and calculated the empirical sizes and powers of four tests with significance level |$\alpha=0{\cdot}05$|. The results for two independent samples and paired samples are summarized in Tables 1 and 2. The proposed test has higher power than the tests applied to the raw and log-transformed compositions, and it controls the size reasonably well around the nominal level 0|$\cdot$|05 and closely mimics the performance of the oracle test. Its power gains over the tests based on log-transformed and raw compositions tend to be more pronounced in the more challenging scenarios with moderate dimensions and sparse signals. Its superiority does not seem to depend on the distributions or covariance structures.
. | . | Banded covariance . | Sparse covariance . | ||||
---|---|---|---|---|---|---|---|
Method | |$p=50$| | |$p=100$| | |$p=200$| | |$p=50$| | |$p=100$| | |$p=200$| | |
Normal, |$H_0$| | Oracle | 4|$\cdot$|7 | 5|$\cdot$|3 | 4|$\cdot$|8 | 4|$\cdot$|0 | 4|$\cdot$|8 | 5|$\cdot$|2 |
Proposed | 4|$\cdot$|6 | 4|$\cdot$|9 | 5|$\cdot$|1 | 3|$\cdot$|7 | 4|$\cdot$|5 | 5|$\cdot$|3 | |
Log | 3|$\cdot$|9 | 5|$\cdot$|1 | 5|$\cdot$|3 | 3|$\cdot$|5 | 3|$\cdot$|3 | 5|$\cdot$|2 | |
Raw | 0|$\cdot$|9 | 1|$\cdot$|0 | 0|$\cdot$|3 | 1|$\cdot$|5 | 1|$\cdot$|0 | 1|$\cdot$|3 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 38|$\cdot$|2 | 70|$\cdot$|7 | 92|$\cdot$|5 | 40|$\cdot$|1 | 70|$\cdot$|7 | 91|$\cdot$|8 |
Proposed | 36|$\cdot$|5 | 70|$\cdot$|5 | 92|$\cdot$|2 | 38|$\cdot$|0 | 70|$\cdot$|2 | 91|$\cdot$|0 | |
Log | 26|$\cdot$|1 | 60|$\cdot$|8 | 84|$\cdot$|7 | 25|$\cdot$|5 | 51|$\cdot$|4 | 70|$\cdot$|7 | |
Raw | 4|$\cdot$|0 | 5|$\cdot$|5 | 8|$\cdot$|2 | 7|$\cdot$|0 | 16|$\cdot$|8 | 23|$\cdot$|7 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}1p\rfloor$| | Oracle | 68|$\cdot$|7 | 90|$\cdot$|6 | 99|$\cdot$|1 | 69|$\cdot$|4 | 91|$\cdot$|0 | 99|$\cdot$|5 |
Proposed | 66|$\cdot$|9 | 89|$\cdot$|9 | 98|$\cdot$|9 | 67|$\cdot$|6 | 91|$\cdot$|0 | 99|$\cdot$|5 | |
Log | 53|$\cdot$|7 | 79|$\cdot$|7 | 97|$\cdot$|3 | 50|$\cdot$|0 | 77|$\cdot$|1 | 91|$\cdot$|7 | |
Raw | 9|$\cdot$|1 | 10|$\cdot$|1 | 14|$\cdot$|1 | 16|$\cdot$|6 | 31|$\cdot$|7 | 49|$\cdot$|2 | |
Normal |$s=\lfloor0{\cdot}5p\rfloor$| | Oracle | 99|$\cdot$|3 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|4 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 99|$\cdot$|2 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|7 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 96|$\cdot$|5 | 99|$\cdot$|9 | 100|$\cdot$|0 | 96|$\cdot$|6 | 99|$\cdot$|9 | 100|$\cdot$|0 | |
Raw | 39|$\cdot$|2 | 37|$\cdot$|1 | 61|$\cdot$|5 | 55|$\cdot$|0 | 84|$\cdot$|0 | 96|$\cdot$|0 | |
Gamma, |$H_0$| | Oracle | 5|$\cdot$|6 | 4|$\cdot$|4 | 4|$\cdot$|7 | 5|$\cdot$|9 | 4|$\cdot$|8 | 4|$\cdot$|8 |
Proposed | 5|$\cdot$|3 | 4|$\cdot$|9 | 4|$\cdot$|8 | 5|$\cdot$|0 | 4|$\cdot$|9 | 5|$\cdot$|1 | |
Log | 4|$\cdot$|7 | 3|$\cdot$|6 | 3|$\cdot$|7 | 5|$\cdot$|0 | 4|$\cdot$|7 | 4|$\cdot$|5 | |
Raw | 1|$\cdot$|6 | 0|$\cdot$|8 | 0|$\cdot$|2 | 1|$\cdot$|6 | 0|$\cdot$|6 | 1|$\cdot$|1 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 35|$\cdot$|7 | 70|$\cdot$|0 | 91|$\cdot$|3 | 36|$\cdot$|7 | 70|$\cdot$|5 | 92|$\cdot$|3 |
Proposed | 36|$\cdot$|7 | 71|$\cdot$|5 | 91|$\cdot$|9 | 36|$\cdot$|3 | 68|$\cdot$|0 | 92|$\cdot$|0 | |
Log | 27|$\cdot$|0 | 52|$\cdot$|6 | 82|$\cdot$|8 | 23|$\cdot$|6 | 49|$\cdot$|9 | 66|$\cdot$|0 | |
Raw | 5|$\cdot$|1 | 4|$\cdot$|4 | 10|$\cdot$|2 | 4|$\cdot$|2 | 6|$\cdot$|0 | 9|$\cdot$|4 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}1p\rfloor$| | Oracle | 68|$\cdot$|5 | 91|$\cdot$|8 | 99|$\cdot$|6 | 69|$\cdot$|0 | 91|$\cdot$|7 | 99|$\cdot$|5 |
Proposed | 66|$\cdot$|8 | 91|$\cdot$|5 | 99|$\cdot$|5 | 66|$\cdot$|9 | 90|$\cdot$|8 | 99|$\cdot$|7 | |
Log | 52|$\cdot$|4 | 78|$\cdot$|4 | 96|$\cdot$|2 | 50|$\cdot$|9 | 75|$\cdot$|6 | 90|$\cdot$|7 | |
Raw | 11|$\cdot$|6 | 9|$\cdot$|8 | 17|$\cdot$|3 | 10|$\cdot$|3 | 13|$\cdot$|2 | 17|$\cdot$|4 | |
Gamma |$s=\lfloor0{\cdot}5p\rfloor$| | Oracle | 99|$\cdot$|9 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 99|$\cdot$|9 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|5 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 96|$\cdot$|5 | 99|$\cdot$|7 | 100|$\cdot$|0 | 96|$\cdot$|9 | 99|$\cdot$|9 | 100|$\cdot$|0 | |
Raw | 42|$\cdot$|7 | 53|$\cdot$|1 | 61|$\cdot$|9 | 40|$\cdot$|4 | 50|$\cdot$|7 | 62|$\cdot$|9 |
. | . | Banded covariance . | Sparse covariance . | ||||
---|---|---|---|---|---|---|---|
Method | |$p=50$| | |$p=100$| | |$p=200$| | |$p=50$| | |$p=100$| | |$p=200$| | |
Normal, |$H_0$| | Oracle | 4|$\cdot$|7 | 5|$\cdot$|3 | 4|$\cdot$|8 | 4|$\cdot$|0 | 4|$\cdot$|8 | 5|$\cdot$|2 |
Proposed | 4|$\cdot$|6 | 4|$\cdot$|9 | 5|$\cdot$|1 | 3|$\cdot$|7 | 4|$\cdot$|5 | 5|$\cdot$|3 | |
Log | 3|$\cdot$|9 | 5|$\cdot$|1 | 5|$\cdot$|3 | 3|$\cdot$|5 | 3|$\cdot$|3 | 5|$\cdot$|2 | |
Raw | 0|$\cdot$|9 | 1|$\cdot$|0 | 0|$\cdot$|3 | 1|$\cdot$|5 | 1|$\cdot$|0 | 1|$\cdot$|3 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 38|$\cdot$|2 | 70|$\cdot$|7 | 92|$\cdot$|5 | 40|$\cdot$|1 | 70|$\cdot$|7 | 91|$\cdot$|8 |
Proposed | 36|$\cdot$|5 | 70|$\cdot$|5 | 92|$\cdot$|2 | 38|$\cdot$|0 | 70|$\cdot$|2 | 91|$\cdot$|0 | |
Log | 26|$\cdot$|1 | 60|$\cdot$|8 | 84|$\cdot$|7 | 25|$\cdot$|5 | 51|$\cdot$|4 | 70|$\cdot$|7 | |
Raw | 4|$\cdot$|0 | 5|$\cdot$|5 | 8|$\cdot$|2 | 7|$\cdot$|0 | 16|$\cdot$|8 | 23|$\cdot$|7 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}1p\rfloor$| | Oracle | 68|$\cdot$|7 | 90|$\cdot$|6 | 99|$\cdot$|1 | 69|$\cdot$|4 | 91|$\cdot$|0 | 99|$\cdot$|5 |
Proposed | 66|$\cdot$|9 | 89|$\cdot$|9 | 98|$\cdot$|9 | 67|$\cdot$|6 | 91|$\cdot$|0 | 99|$\cdot$|5 | |
Log | 53|$\cdot$|7 | 79|$\cdot$|7 | 97|$\cdot$|3 | 50|$\cdot$|0 | 77|$\cdot$|1 | 91|$\cdot$|7 | |
Raw | 9|$\cdot$|1 | 10|$\cdot$|1 | 14|$\cdot$|1 | 16|$\cdot$|6 | 31|$\cdot$|7 | 49|$\cdot$|2 | |
Normal |$s=\lfloor0{\cdot}5p\rfloor$| | Oracle | 99|$\cdot$|3 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|4 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 99|$\cdot$|2 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|7 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 96|$\cdot$|5 | 99|$\cdot$|9 | 100|$\cdot$|0 | 96|$\cdot$|6 | 99|$\cdot$|9 | 100|$\cdot$|0 | |
Raw | 39|$\cdot$|2 | 37|$\cdot$|1 | 61|$\cdot$|5 | 55|$\cdot$|0 | 84|$\cdot$|0 | 96|$\cdot$|0 | |
Gamma, |$H_0$| | Oracle | 5|$\cdot$|6 | 4|$\cdot$|4 | 4|$\cdot$|7 | 5|$\cdot$|9 | 4|$\cdot$|8 | 4|$\cdot$|8 |
Proposed | 5|$\cdot$|3 | 4|$\cdot$|9 | 4|$\cdot$|8 | 5|$\cdot$|0 | 4|$\cdot$|9 | 5|$\cdot$|1 | |
Log | 4|$\cdot$|7 | 3|$\cdot$|6 | 3|$\cdot$|7 | 5|$\cdot$|0 | 4|$\cdot$|7 | 4|$\cdot$|5 | |
Raw | 1|$\cdot$|6 | 0|$\cdot$|8 | 0|$\cdot$|2 | 1|$\cdot$|6 | 0|$\cdot$|6 | 1|$\cdot$|1 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 35|$\cdot$|7 | 70|$\cdot$|0 | 91|$\cdot$|3 | 36|$\cdot$|7 | 70|$\cdot$|5 | 92|$\cdot$|3 |
Proposed | 36|$\cdot$|7 | 71|$\cdot$|5 | 91|$\cdot$|9 | 36|$\cdot$|3 | 68|$\cdot$|0 | 92|$\cdot$|0 | |
Log | 27|$\cdot$|0 | 52|$\cdot$|6 | 82|$\cdot$|8 | 23|$\cdot$|6 | 49|$\cdot$|9 | 66|$\cdot$|0 | |
Raw | 5|$\cdot$|1 | 4|$\cdot$|4 | 10|$\cdot$|2 | 4|$\cdot$|2 | 6|$\cdot$|0 | 9|$\cdot$|4 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}1p\rfloor$| | Oracle | 68|$\cdot$|5 | 91|$\cdot$|8 | 99|$\cdot$|6 | 69|$\cdot$|0 | 91|$\cdot$|7 | 99|$\cdot$|5 |
Proposed | 66|$\cdot$|8 | 91|$\cdot$|5 | 99|$\cdot$|5 | 66|$\cdot$|9 | 90|$\cdot$|8 | 99|$\cdot$|7 | |
Log | 52|$\cdot$|4 | 78|$\cdot$|4 | 96|$\cdot$|2 | 50|$\cdot$|9 | 75|$\cdot$|6 | 90|$\cdot$|7 | |
Raw | 11|$\cdot$|6 | 9|$\cdot$|8 | 17|$\cdot$|3 | 10|$\cdot$|3 | 13|$\cdot$|2 | 17|$\cdot$|4 | |
Gamma |$s=\lfloor0{\cdot}5p\rfloor$| | Oracle | 99|$\cdot$|9 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 99|$\cdot$|9 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|5 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 96|$\cdot$|5 | 99|$\cdot$|7 | 100|$\cdot$|0 | 96|$\cdot$|9 | 99|$\cdot$|9 | 100|$\cdot$|0 | |
Raw | 42|$\cdot$|7 | 53|$\cdot$|1 | 61|$\cdot$|9 | 40|$\cdot$|4 | 50|$\cdot$|7 | 62|$\cdot$|9 |
. | . | Banded covariance . | Sparse covariance . | ||||
---|---|---|---|---|---|---|---|
Method | |$p=50$| | |$p=100$| | |$p=200$| | |$p=50$| | |$p=100$| | |$p=200$| | |
Normal, |$H_0$| | Oracle | 4|$\cdot$|7 | 5|$\cdot$|3 | 4|$\cdot$|8 | 4|$\cdot$|0 | 4|$\cdot$|8 | 5|$\cdot$|2 |
Proposed | 4|$\cdot$|6 | 4|$\cdot$|9 | 5|$\cdot$|1 | 3|$\cdot$|7 | 4|$\cdot$|5 | 5|$\cdot$|3 | |
Log | 3|$\cdot$|9 | 5|$\cdot$|1 | 5|$\cdot$|3 | 3|$\cdot$|5 | 3|$\cdot$|3 | 5|$\cdot$|2 | |
Raw | 0|$\cdot$|9 | 1|$\cdot$|0 | 0|$\cdot$|3 | 1|$\cdot$|5 | 1|$\cdot$|0 | 1|$\cdot$|3 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 38|$\cdot$|2 | 70|$\cdot$|7 | 92|$\cdot$|5 | 40|$\cdot$|1 | 70|$\cdot$|7 | 91|$\cdot$|8 |
Proposed | 36|$\cdot$|5 | 70|$\cdot$|5 | 92|$\cdot$|2 | 38|$\cdot$|0 | 70|$\cdot$|2 | 91|$\cdot$|0 | |
Log | 26|$\cdot$|1 | 60|$\cdot$|8 | 84|$\cdot$|7 | 25|$\cdot$|5 | 51|$\cdot$|4 | 70|$\cdot$|7 | |
Raw | 4|$\cdot$|0 | 5|$\cdot$|5 | 8|$\cdot$|2 | 7|$\cdot$|0 | 16|$\cdot$|8 | 23|$\cdot$|7 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}1p\rfloor$| | Oracle | 68|$\cdot$|7 | 90|$\cdot$|6 | 99|$\cdot$|1 | 69|$\cdot$|4 | 91|$\cdot$|0 | 99|$\cdot$|5 |
Proposed | 66|$\cdot$|9 | 89|$\cdot$|9 | 98|$\cdot$|9 | 67|$\cdot$|6 | 91|$\cdot$|0 | 99|$\cdot$|5 | |
Log | 53|$\cdot$|7 | 79|$\cdot$|7 | 97|$\cdot$|3 | 50|$\cdot$|0 | 77|$\cdot$|1 | 91|$\cdot$|7 | |
Raw | 9|$\cdot$|1 | 10|$\cdot$|1 | 14|$\cdot$|1 | 16|$\cdot$|6 | 31|$\cdot$|7 | 49|$\cdot$|2 | |
Normal |$s=\lfloor0{\cdot}5p\rfloor$| | Oracle | 99|$\cdot$|3 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|4 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 99|$\cdot$|2 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|7 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 96|$\cdot$|5 | 99|$\cdot$|9 | 100|$\cdot$|0 | 96|$\cdot$|6 | 99|$\cdot$|9 | 100|$\cdot$|0 | |
Raw | 39|$\cdot$|2 | 37|$\cdot$|1 | 61|$\cdot$|5 | 55|$\cdot$|0 | 84|$\cdot$|0 | 96|$\cdot$|0 | |
Gamma, |$H_0$| | Oracle | 5|$\cdot$|6 | 4|$\cdot$|4 | 4|$\cdot$|7 | 5|$\cdot$|9 | 4|$\cdot$|8 | 4|$\cdot$|8 |
Proposed | 5|$\cdot$|3 | 4|$\cdot$|9 | 4|$\cdot$|8 | 5|$\cdot$|0 | 4|$\cdot$|9 | 5|$\cdot$|1 | |
Log | 4|$\cdot$|7 | 3|$\cdot$|6 | 3|$\cdot$|7 | 5|$\cdot$|0 | 4|$\cdot$|7 | 4|$\cdot$|5 | |
Raw | 1|$\cdot$|6 | 0|$\cdot$|8 | 0|$\cdot$|2 | 1|$\cdot$|6 | 0|$\cdot$|6 | 1|$\cdot$|1 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 35|$\cdot$|7 | 70|$\cdot$|0 | 91|$\cdot$|3 | 36|$\cdot$|7 | 70|$\cdot$|5 | 92|$\cdot$|3 |
Proposed | 36|$\cdot$|7 | 71|$\cdot$|5 | 91|$\cdot$|9 | 36|$\cdot$|3 | 68|$\cdot$|0 | 92|$\cdot$|0 | |
Log | 27|$\cdot$|0 | 52|$\cdot$|6 | 82|$\cdot$|8 | 23|$\cdot$|6 | 49|$\cdot$|9 | 66|$\cdot$|0 | |
Raw | 5|$\cdot$|1 | 4|$\cdot$|4 | 10|$\cdot$|2 | 4|$\cdot$|2 | 6|$\cdot$|0 | 9|$\cdot$|4 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}1p\rfloor$| | Oracle | 68|$\cdot$|5 | 91|$\cdot$|8 | 99|$\cdot$|6 | 69|$\cdot$|0 | 91|$\cdot$|7 | 99|$\cdot$|5 |
Proposed | 66|$\cdot$|8 | 91|$\cdot$|5 | 99|$\cdot$|5 | 66|$\cdot$|9 | 90|$\cdot$|8 | 99|$\cdot$|7 | |
Log | 52|$\cdot$|4 | 78|$\cdot$|4 | 96|$\cdot$|2 | 50|$\cdot$|9 | 75|$\cdot$|6 | 90|$\cdot$|7 | |
Raw | 11|$\cdot$|6 | 9|$\cdot$|8 | 17|$\cdot$|3 | 10|$\cdot$|3 | 13|$\cdot$|2 | 17|$\cdot$|4 | |
Gamma |$s=\lfloor0{\cdot}5p\rfloor$| | Oracle | 99|$\cdot$|9 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 99|$\cdot$|9 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|5 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 96|$\cdot$|5 | 99|$\cdot$|7 | 100|$\cdot$|0 | 96|$\cdot$|9 | 99|$\cdot$|9 | 100|$\cdot$|0 | |
Raw | 42|$\cdot$|7 | 53|$\cdot$|1 | 61|$\cdot$|9 | 40|$\cdot$|4 | 50|$\cdot$|7 | 62|$\cdot$|9 |
. | . | Banded covariance . | Sparse covariance . | ||||
---|---|---|---|---|---|---|---|
Method | |$p=50$| | |$p=100$| | |$p=200$| | |$p=50$| | |$p=100$| | |$p=200$| | |
Normal, |$H_0$| | Oracle | 4|$\cdot$|7 | 5|$\cdot$|3 | 4|$\cdot$|8 | 4|$\cdot$|0 | 4|$\cdot$|8 | 5|$\cdot$|2 |
Proposed | 4|$\cdot$|6 | 4|$\cdot$|9 | 5|$\cdot$|1 | 3|$\cdot$|7 | 4|$\cdot$|5 | 5|$\cdot$|3 | |
Log | 3|$\cdot$|9 | 5|$\cdot$|1 | 5|$\cdot$|3 | 3|$\cdot$|5 | 3|$\cdot$|3 | 5|$\cdot$|2 | |
Raw | 0|$\cdot$|9 | 1|$\cdot$|0 | 0|$\cdot$|3 | 1|$\cdot$|5 | 1|$\cdot$|0 | 1|$\cdot$|3 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 38|$\cdot$|2 | 70|$\cdot$|7 | 92|$\cdot$|5 | 40|$\cdot$|1 | 70|$\cdot$|7 | 91|$\cdot$|8 |
Proposed | 36|$\cdot$|5 | 70|$\cdot$|5 | 92|$\cdot$|2 | 38|$\cdot$|0 | 70|$\cdot$|2 | 91|$\cdot$|0 | |
Log | 26|$\cdot$|1 | 60|$\cdot$|8 | 84|$\cdot$|7 | 25|$\cdot$|5 | 51|$\cdot$|4 | 70|$\cdot$|7 | |
Raw | 4|$\cdot$|0 | 5|$\cdot$|5 | 8|$\cdot$|2 | 7|$\cdot$|0 | 16|$\cdot$|8 | 23|$\cdot$|7 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}1p\rfloor$| | Oracle | 68|$\cdot$|7 | 90|$\cdot$|6 | 99|$\cdot$|1 | 69|$\cdot$|4 | 91|$\cdot$|0 | 99|$\cdot$|5 |
Proposed | 66|$\cdot$|9 | 89|$\cdot$|9 | 98|$\cdot$|9 | 67|$\cdot$|6 | 91|$\cdot$|0 | 99|$\cdot$|5 | |
Log | 53|$\cdot$|7 | 79|$\cdot$|7 | 97|$\cdot$|3 | 50|$\cdot$|0 | 77|$\cdot$|1 | 91|$\cdot$|7 | |
Raw | 9|$\cdot$|1 | 10|$\cdot$|1 | 14|$\cdot$|1 | 16|$\cdot$|6 | 31|$\cdot$|7 | 49|$\cdot$|2 | |
Normal |$s=\lfloor0{\cdot}5p\rfloor$| | Oracle | 99|$\cdot$|3 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|4 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 99|$\cdot$|2 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|7 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 96|$\cdot$|5 | 99|$\cdot$|9 | 100|$\cdot$|0 | 96|$\cdot$|6 | 99|$\cdot$|9 | 100|$\cdot$|0 | |
Raw | 39|$\cdot$|2 | 37|$\cdot$|1 | 61|$\cdot$|5 | 55|$\cdot$|0 | 84|$\cdot$|0 | 96|$\cdot$|0 | |
Gamma, |$H_0$| | Oracle | 5|$\cdot$|6 | 4|$\cdot$|4 | 4|$\cdot$|7 | 5|$\cdot$|9 | 4|$\cdot$|8 | 4|$\cdot$|8 |
Proposed | 5|$\cdot$|3 | 4|$\cdot$|9 | 4|$\cdot$|8 | 5|$\cdot$|0 | 4|$\cdot$|9 | 5|$\cdot$|1 | |
Log | 4|$\cdot$|7 | 3|$\cdot$|6 | 3|$\cdot$|7 | 5|$\cdot$|0 | 4|$\cdot$|7 | 4|$\cdot$|5 | |
Raw | 1|$\cdot$|6 | 0|$\cdot$|8 | 0|$\cdot$|2 | 1|$\cdot$|6 | 0|$\cdot$|6 | 1|$\cdot$|1 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 35|$\cdot$|7 | 70|$\cdot$|0 | 91|$\cdot$|3 | 36|$\cdot$|7 | 70|$\cdot$|5 | 92|$\cdot$|3 |
Proposed | 36|$\cdot$|7 | 71|$\cdot$|5 | 91|$\cdot$|9 | 36|$\cdot$|3 | 68|$\cdot$|0 | 92|$\cdot$|0 | |
Log | 27|$\cdot$|0 | 52|$\cdot$|6 | 82|$\cdot$|8 | 23|$\cdot$|6 | 49|$\cdot$|9 | 66|$\cdot$|0 | |
Raw | 5|$\cdot$|1 | 4|$\cdot$|4 | 10|$\cdot$|2 | 4|$\cdot$|2 | 6|$\cdot$|0 | 9|$\cdot$|4 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}1p\rfloor$| | Oracle | 68|$\cdot$|5 | 91|$\cdot$|8 | 99|$\cdot$|6 | 69|$\cdot$|0 | 91|$\cdot$|7 | 99|$\cdot$|5 |
Proposed | 66|$\cdot$|8 | 91|$\cdot$|5 | 99|$\cdot$|5 | 66|$\cdot$|9 | 90|$\cdot$|8 | 99|$\cdot$|7 | |
Log | 52|$\cdot$|4 | 78|$\cdot$|4 | 96|$\cdot$|2 | 50|$\cdot$|9 | 75|$\cdot$|6 | 90|$\cdot$|7 | |
Raw | 11|$\cdot$|6 | 9|$\cdot$|8 | 17|$\cdot$|3 | 10|$\cdot$|3 | 13|$\cdot$|2 | 17|$\cdot$|4 | |
Gamma |$s=\lfloor0{\cdot}5p\rfloor$| | Oracle | 99|$\cdot$|9 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 99|$\cdot$|9 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|5 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 96|$\cdot$|5 | 99|$\cdot$|7 | 100|$\cdot$|0 | 96|$\cdot$|9 | 99|$\cdot$|9 | 100|$\cdot$|0 | |
Raw | 42|$\cdot$|7 | 53|$\cdot$|1 | 61|$\cdot$|9 | 40|$\cdot$|4 | 50|$\cdot$|7 | 62|$\cdot$|9 |
. | . | Banded covariance . | Sparse covariance . | ||||
---|---|---|---|---|---|---|---|
Method | |$p=50$| | |$p=100$| | |$p=200$| | |$p=50$| | |$p=100$| | |$p=200$| | |
Normal, |$H_0$| | Oracle | 4|$\cdot$|8 | 5|$\cdot$|6 | 6|$\cdot$|3 | 5|$\cdot$|9 | 6|$\cdot$|6 | 6|$\cdot$|0 |
Proposed | 4|$\cdot$|9 | 5|$\cdot$|5 | 6|$\cdot$|8 | 5|$\cdot$|9 | 6|$\cdot$|1 | 6|$\cdot$|5 | |
Log | 5|$\cdot$|4 | 4|$\cdot$|4 | 7|$\cdot$|1 | 5|$\cdot$|2 | 3|$\cdot$|7 | 4|$\cdot$|1 | |
Raw | 1|$\cdot$|1 | 0|$\cdot$|4 | 0|$\cdot$|2 | 1|$\cdot$|5 | 1|$\cdot$|1 | 1|$\cdot$|2 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 55|$\cdot$|3 | 86|$\cdot$|8 | 98|$\cdot$|3 | 54|$\cdot$|4 | 85|$\cdot$|8 | 99|$\cdot$|0 |
Proposed | 52|$\cdot$|9 | 86|$\cdot$|5 | 98|$\cdot$|4 | 54|$\cdot$|0 | 84|$\cdot$|8 | 98|$\cdot$|9 | |
Log | 39|$\cdot$|6 | 75|$\cdot$|0 | 95|$\cdot$|6 | 35|$\cdot$|7 | 68|$\cdot$|7 | 87|$\cdot$|3 | |
Raw | 5|$\cdot$|2 | 7|$\cdot$|3 | 13|$\cdot$|7 | 9|$\cdot$|2 | 21|$\cdot$|6 | 34|$\cdot$|3 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}1p\rfloor$| | Oracle | 85|$\cdot$|1 | 98|$\cdot$|6 | 99|$\cdot$|9 | 83|$\cdot$|9 | 98|$\cdot$|6 | 99|$\cdot$|9 |
Proposed | 82|$\cdot$|8 | 98|$\cdot$|4 | 99|$\cdot$|9 | 82|$\cdot$|8 | 98|$\cdot$|3 | 99|$\cdot$|9 | |
Log | 71|$\cdot$|2 | 94|$\cdot$|5 | 99|$\cdot$|6 | 65|$\cdot$|4 | 90|$\cdot$|0 | 98|$\cdot$|3 | |
Raw | 13|$\cdot$|8 | 14|$\cdot$|9 | 22|$\cdot$|1 | 22|$\cdot$|7 | 43|$\cdot$|7 | 64|$\cdot$|5 | |
Normal |$s=\lfloor0{\cdot}5p\rfloor$| | Oracle | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 99|$\cdot$|5 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|7 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Raw | 50|$\cdot$|0 | 50|$\cdot$|0 | 74|$\cdot$|0 | 77|$\cdot$|9 | 94|$\cdot$|4 | 99|$\cdot$|3 | |
Gamma, |$H_0$| | Oracle | 4|$\cdot$|2 | 6|$\cdot$|1 | 7|$\cdot$|2 | 5|$\cdot$|1 | 6|$\cdot$|3 | 7|$\cdot$|3 |
Proposed | 4|$\cdot$|3 | 6|$\cdot$|2 | 7|$\cdot$|2 | 5|$\cdot$|8 | 6|$\cdot$|1 | 7|$\cdot$|1 | |
Log | 4|$\cdot$|7 | 4|$\cdot$|7 | 5|$\cdot$|6 | 5|$\cdot$|2 | 4|$\cdot$|5 | 5|$\cdot$|5 | |
Raw | 1|$\cdot$|2 | 0|$\cdot$|5 | 0|$\cdot$|4 | 1|$\cdot$|4 | 1|$\cdot$|7 | 2|$\cdot$|0 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 55|$\cdot$|5 | 86|$\cdot$|1 | 98|$\cdot$|4 | 51|$\cdot$|3 | 84|$\cdot$|2 | 98|$\cdot$|3 |
Proposed | 53|$\cdot$|9 | 86|$\cdot$|2 | 98|$\cdot$|4 | 50|$\cdot$|1 | 84|$\cdot$|0 | 98|$\cdot$|0 | |
Log | 42|$\cdot$|4 | 77|$\cdot$|3 | 94|$\cdot$|8 | 35|$\cdot$|9 | 67|$\cdot$|2 | 88|$\cdot$|1 | |
Raw | 6|$\cdot$|1 | 8|$\cdot$|4 | 11|$\cdot$|1 | 9|$\cdot$|6 | 22|$\cdot$|6 | 39|$\cdot$|4 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}1p\rfloor$| | Oracle | 83|$\cdot$|8 | 97|$\cdot$|7 | 100|$\cdot$|0 | 87|$\cdot$|9 | 98|$\cdot$|2 | 100|$\cdot$|0 |
Proposed | 82|$\cdot$|6 | 97|$\cdot$|1 | 100|$\cdot$|0 | 86|$\cdot$|5 | 98|$\cdot$|3 | 99|$\cdot$|9 | |
Log | 70|$\cdot$|8 | 93|$\cdot$|0 | 99|$\cdot$|8 | 67|$\cdot$|8 | 90|$\cdot$|8 | 98|$\cdot$|4 | |
Raw | 12|$\cdot$|0 | 13|$\cdot$|2 | 20|$\cdot$|7 | 24|$\cdot$|0 | 44|$\cdot$|0 | 61|$\cdot$|5 | |
Gamma |$s=\lfloor0{\cdot}5p\rfloor$| | Oracle | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 99|$\cdot$|4 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|8 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Raw | 51|$\cdot$|5 | 52|$\cdot$|7 | 75|$\cdot$|0 | 80|$\cdot$|3 | 94|$\cdot$|5 | 99|$\cdot$|0 |
. | . | Banded covariance . | Sparse covariance . | ||||
---|---|---|---|---|---|---|---|
Method | |$p=50$| | |$p=100$| | |$p=200$| | |$p=50$| | |$p=100$| | |$p=200$| | |
Normal, |$H_0$| | Oracle | 4|$\cdot$|8 | 5|$\cdot$|6 | 6|$\cdot$|3 | 5|$\cdot$|9 | 6|$\cdot$|6 | 6|$\cdot$|0 |
Proposed | 4|$\cdot$|9 | 5|$\cdot$|5 | 6|$\cdot$|8 | 5|$\cdot$|9 | 6|$\cdot$|1 | 6|$\cdot$|5 | |
Log | 5|$\cdot$|4 | 4|$\cdot$|4 | 7|$\cdot$|1 | 5|$\cdot$|2 | 3|$\cdot$|7 | 4|$\cdot$|1 | |
Raw | 1|$\cdot$|1 | 0|$\cdot$|4 | 0|$\cdot$|2 | 1|$\cdot$|5 | 1|$\cdot$|1 | 1|$\cdot$|2 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 55|$\cdot$|3 | 86|$\cdot$|8 | 98|$\cdot$|3 | 54|$\cdot$|4 | 85|$\cdot$|8 | 99|$\cdot$|0 |
Proposed | 52|$\cdot$|9 | 86|$\cdot$|5 | 98|$\cdot$|4 | 54|$\cdot$|0 | 84|$\cdot$|8 | 98|$\cdot$|9 | |
Log | 39|$\cdot$|6 | 75|$\cdot$|0 | 95|$\cdot$|6 | 35|$\cdot$|7 | 68|$\cdot$|7 | 87|$\cdot$|3 | |
Raw | 5|$\cdot$|2 | 7|$\cdot$|3 | 13|$\cdot$|7 | 9|$\cdot$|2 | 21|$\cdot$|6 | 34|$\cdot$|3 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}1p\rfloor$| | Oracle | 85|$\cdot$|1 | 98|$\cdot$|6 | 99|$\cdot$|9 | 83|$\cdot$|9 | 98|$\cdot$|6 | 99|$\cdot$|9 |
Proposed | 82|$\cdot$|8 | 98|$\cdot$|4 | 99|$\cdot$|9 | 82|$\cdot$|8 | 98|$\cdot$|3 | 99|$\cdot$|9 | |
Log | 71|$\cdot$|2 | 94|$\cdot$|5 | 99|$\cdot$|6 | 65|$\cdot$|4 | 90|$\cdot$|0 | 98|$\cdot$|3 | |
Raw | 13|$\cdot$|8 | 14|$\cdot$|9 | 22|$\cdot$|1 | 22|$\cdot$|7 | 43|$\cdot$|7 | 64|$\cdot$|5 | |
Normal |$s=\lfloor0{\cdot}5p\rfloor$| | Oracle | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 99|$\cdot$|5 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|7 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Raw | 50|$\cdot$|0 | 50|$\cdot$|0 | 74|$\cdot$|0 | 77|$\cdot$|9 | 94|$\cdot$|4 | 99|$\cdot$|3 | |
Gamma, |$H_0$| | Oracle | 4|$\cdot$|2 | 6|$\cdot$|1 | 7|$\cdot$|2 | 5|$\cdot$|1 | 6|$\cdot$|3 | 7|$\cdot$|3 |
Proposed | 4|$\cdot$|3 | 6|$\cdot$|2 | 7|$\cdot$|2 | 5|$\cdot$|8 | 6|$\cdot$|1 | 7|$\cdot$|1 | |
Log | 4|$\cdot$|7 | 4|$\cdot$|7 | 5|$\cdot$|6 | 5|$\cdot$|2 | 4|$\cdot$|5 | 5|$\cdot$|5 | |
Raw | 1|$\cdot$|2 | 0|$\cdot$|5 | 0|$\cdot$|4 | 1|$\cdot$|4 | 1|$\cdot$|7 | 2|$\cdot$|0 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 55|$\cdot$|5 | 86|$\cdot$|1 | 98|$\cdot$|4 | 51|$\cdot$|3 | 84|$\cdot$|2 | 98|$\cdot$|3 |
Proposed | 53|$\cdot$|9 | 86|$\cdot$|2 | 98|$\cdot$|4 | 50|$\cdot$|1 | 84|$\cdot$|0 | 98|$\cdot$|0 | |
Log | 42|$\cdot$|4 | 77|$\cdot$|3 | 94|$\cdot$|8 | 35|$\cdot$|9 | 67|$\cdot$|2 | 88|$\cdot$|1 | |
Raw | 6|$\cdot$|1 | 8|$\cdot$|4 | 11|$\cdot$|1 | 9|$\cdot$|6 | 22|$\cdot$|6 | 39|$\cdot$|4 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}1p\rfloor$| | Oracle | 83|$\cdot$|8 | 97|$\cdot$|7 | 100|$\cdot$|0 | 87|$\cdot$|9 | 98|$\cdot$|2 | 100|$\cdot$|0 |
Proposed | 82|$\cdot$|6 | 97|$\cdot$|1 | 100|$\cdot$|0 | 86|$\cdot$|5 | 98|$\cdot$|3 | 99|$\cdot$|9 | |
Log | 70|$\cdot$|8 | 93|$\cdot$|0 | 99|$\cdot$|8 | 67|$\cdot$|8 | 90|$\cdot$|8 | 98|$\cdot$|4 | |
Raw | 12|$\cdot$|0 | 13|$\cdot$|2 | 20|$\cdot$|7 | 24|$\cdot$|0 | 44|$\cdot$|0 | 61|$\cdot$|5 | |
Gamma |$s=\lfloor0{\cdot}5p\rfloor$| | Oracle | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 99|$\cdot$|4 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|8 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Raw | 51|$\cdot$|5 | 52|$\cdot$|7 | 75|$\cdot$|0 | 80|$\cdot$|3 | 94|$\cdot$|5 | 99|$\cdot$|0 |
. | . | Banded covariance . | Sparse covariance . | ||||
---|---|---|---|---|---|---|---|
Method | |$p=50$| | |$p=100$| | |$p=200$| | |$p=50$| | |$p=100$| | |$p=200$| | |
Normal, |$H_0$| | Oracle | 4|$\cdot$|8 | 5|$\cdot$|6 | 6|$\cdot$|3 | 5|$\cdot$|9 | 6|$\cdot$|6 | 6|$\cdot$|0 |
Proposed | 4|$\cdot$|9 | 5|$\cdot$|5 | 6|$\cdot$|8 | 5|$\cdot$|9 | 6|$\cdot$|1 | 6|$\cdot$|5 | |
Log | 5|$\cdot$|4 | 4|$\cdot$|4 | 7|$\cdot$|1 | 5|$\cdot$|2 | 3|$\cdot$|7 | 4|$\cdot$|1 | |
Raw | 1|$\cdot$|1 | 0|$\cdot$|4 | 0|$\cdot$|2 | 1|$\cdot$|5 | 1|$\cdot$|1 | 1|$\cdot$|2 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 55|$\cdot$|3 | 86|$\cdot$|8 | 98|$\cdot$|3 | 54|$\cdot$|4 | 85|$\cdot$|8 | 99|$\cdot$|0 |
Proposed | 52|$\cdot$|9 | 86|$\cdot$|5 | 98|$\cdot$|4 | 54|$\cdot$|0 | 84|$\cdot$|8 | 98|$\cdot$|9 | |
Log | 39|$\cdot$|6 | 75|$\cdot$|0 | 95|$\cdot$|6 | 35|$\cdot$|7 | 68|$\cdot$|7 | 87|$\cdot$|3 | |
Raw | 5|$\cdot$|2 | 7|$\cdot$|3 | 13|$\cdot$|7 | 9|$\cdot$|2 | 21|$\cdot$|6 | 34|$\cdot$|3 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}1p\rfloor$| | Oracle | 85|$\cdot$|1 | 98|$\cdot$|6 | 99|$\cdot$|9 | 83|$\cdot$|9 | 98|$\cdot$|6 | 99|$\cdot$|9 |
Proposed | 82|$\cdot$|8 | 98|$\cdot$|4 | 99|$\cdot$|9 | 82|$\cdot$|8 | 98|$\cdot$|3 | 99|$\cdot$|9 | |
Log | 71|$\cdot$|2 | 94|$\cdot$|5 | 99|$\cdot$|6 | 65|$\cdot$|4 | 90|$\cdot$|0 | 98|$\cdot$|3 | |
Raw | 13|$\cdot$|8 | 14|$\cdot$|9 | 22|$\cdot$|1 | 22|$\cdot$|7 | 43|$\cdot$|7 | 64|$\cdot$|5 | |
Normal |$s=\lfloor0{\cdot}5p\rfloor$| | Oracle | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 99|$\cdot$|5 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|7 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Raw | 50|$\cdot$|0 | 50|$\cdot$|0 | 74|$\cdot$|0 | 77|$\cdot$|9 | 94|$\cdot$|4 | 99|$\cdot$|3 | |
Gamma, |$H_0$| | Oracle | 4|$\cdot$|2 | 6|$\cdot$|1 | 7|$\cdot$|2 | 5|$\cdot$|1 | 6|$\cdot$|3 | 7|$\cdot$|3 |
Proposed | 4|$\cdot$|3 | 6|$\cdot$|2 | 7|$\cdot$|2 | 5|$\cdot$|8 | 6|$\cdot$|1 | 7|$\cdot$|1 | |
Log | 4|$\cdot$|7 | 4|$\cdot$|7 | 5|$\cdot$|6 | 5|$\cdot$|2 | 4|$\cdot$|5 | 5|$\cdot$|5 | |
Raw | 1|$\cdot$|2 | 0|$\cdot$|5 | 0|$\cdot$|4 | 1|$\cdot$|4 | 1|$\cdot$|7 | 2|$\cdot$|0 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 55|$\cdot$|5 | 86|$\cdot$|1 | 98|$\cdot$|4 | 51|$\cdot$|3 | 84|$\cdot$|2 | 98|$\cdot$|3 |
Proposed | 53|$\cdot$|9 | 86|$\cdot$|2 | 98|$\cdot$|4 | 50|$\cdot$|1 | 84|$\cdot$|0 | 98|$\cdot$|0 | |
Log | 42|$\cdot$|4 | 77|$\cdot$|3 | 94|$\cdot$|8 | 35|$\cdot$|9 | 67|$\cdot$|2 | 88|$\cdot$|1 | |
Raw | 6|$\cdot$|1 | 8|$\cdot$|4 | 11|$\cdot$|1 | 9|$\cdot$|6 | 22|$\cdot$|6 | 39|$\cdot$|4 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}1p\rfloor$| | Oracle | 83|$\cdot$|8 | 97|$\cdot$|7 | 100|$\cdot$|0 | 87|$\cdot$|9 | 98|$\cdot$|2 | 100|$\cdot$|0 |
Proposed | 82|$\cdot$|6 | 97|$\cdot$|1 | 100|$\cdot$|0 | 86|$\cdot$|5 | 98|$\cdot$|3 | 99|$\cdot$|9 | |
Log | 70|$\cdot$|8 | 93|$\cdot$|0 | 99|$\cdot$|8 | 67|$\cdot$|8 | 90|$\cdot$|8 | 98|$\cdot$|4 | |
Raw | 12|$\cdot$|0 | 13|$\cdot$|2 | 20|$\cdot$|7 | 24|$\cdot$|0 | 44|$\cdot$|0 | 61|$\cdot$|5 | |
Gamma |$s=\lfloor0{\cdot}5p\rfloor$| | Oracle | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 99|$\cdot$|4 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|8 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Raw | 51|$\cdot$|5 | 52|$\cdot$|7 | 75|$\cdot$|0 | 80|$\cdot$|3 | 94|$\cdot$|5 | 99|$\cdot$|0 |
. | . | Banded covariance . | Sparse covariance . | ||||
---|---|---|---|---|---|---|---|
Method | |$p=50$| | |$p=100$| | |$p=200$| | |$p=50$| | |$p=100$| | |$p=200$| | |
Normal, |$H_0$| | Oracle | 4|$\cdot$|8 | 5|$\cdot$|6 | 6|$\cdot$|3 | 5|$\cdot$|9 | 6|$\cdot$|6 | 6|$\cdot$|0 |
Proposed | 4|$\cdot$|9 | 5|$\cdot$|5 | 6|$\cdot$|8 | 5|$\cdot$|9 | 6|$\cdot$|1 | 6|$\cdot$|5 | |
Log | 5|$\cdot$|4 | 4|$\cdot$|4 | 7|$\cdot$|1 | 5|$\cdot$|2 | 3|$\cdot$|7 | 4|$\cdot$|1 | |
Raw | 1|$\cdot$|1 | 0|$\cdot$|4 | 0|$\cdot$|2 | 1|$\cdot$|5 | 1|$\cdot$|1 | 1|$\cdot$|2 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 55|$\cdot$|3 | 86|$\cdot$|8 | 98|$\cdot$|3 | 54|$\cdot$|4 | 85|$\cdot$|8 | 99|$\cdot$|0 |
Proposed | 52|$\cdot$|9 | 86|$\cdot$|5 | 98|$\cdot$|4 | 54|$\cdot$|0 | 84|$\cdot$|8 | 98|$\cdot$|9 | |
Log | 39|$\cdot$|6 | 75|$\cdot$|0 | 95|$\cdot$|6 | 35|$\cdot$|7 | 68|$\cdot$|7 | 87|$\cdot$|3 | |
Raw | 5|$\cdot$|2 | 7|$\cdot$|3 | 13|$\cdot$|7 | 9|$\cdot$|2 | 21|$\cdot$|6 | 34|$\cdot$|3 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}1p\rfloor$| | Oracle | 85|$\cdot$|1 | 98|$\cdot$|6 | 99|$\cdot$|9 | 83|$\cdot$|9 | 98|$\cdot$|6 | 99|$\cdot$|9 |
Proposed | 82|$\cdot$|8 | 98|$\cdot$|4 | 99|$\cdot$|9 | 82|$\cdot$|8 | 98|$\cdot$|3 | 99|$\cdot$|9 | |
Log | 71|$\cdot$|2 | 94|$\cdot$|5 | 99|$\cdot$|6 | 65|$\cdot$|4 | 90|$\cdot$|0 | 98|$\cdot$|3 | |
Raw | 13|$\cdot$|8 | 14|$\cdot$|9 | 22|$\cdot$|1 | 22|$\cdot$|7 | 43|$\cdot$|7 | 64|$\cdot$|5 | |
Normal |$s=\lfloor0{\cdot}5p\rfloor$| | Oracle | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 99|$\cdot$|5 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|7 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Raw | 50|$\cdot$|0 | 50|$\cdot$|0 | 74|$\cdot$|0 | 77|$\cdot$|9 | 94|$\cdot$|4 | 99|$\cdot$|3 | |
Gamma, |$H_0$| | Oracle | 4|$\cdot$|2 | 6|$\cdot$|1 | 7|$\cdot$|2 | 5|$\cdot$|1 | 6|$\cdot$|3 | 7|$\cdot$|3 |
Proposed | 4|$\cdot$|3 | 6|$\cdot$|2 | 7|$\cdot$|2 | 5|$\cdot$|8 | 6|$\cdot$|1 | 7|$\cdot$|1 | |
Log | 4|$\cdot$|7 | 4|$\cdot$|7 | 5|$\cdot$|6 | 5|$\cdot$|2 | 4|$\cdot$|5 | 5|$\cdot$|5 | |
Raw | 1|$\cdot$|2 | 0|$\cdot$|5 | 0|$\cdot$|4 | 1|$\cdot$|4 | 1|$\cdot$|7 | 2|$\cdot$|0 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 55|$\cdot$|5 | 86|$\cdot$|1 | 98|$\cdot$|4 | 51|$\cdot$|3 | 84|$\cdot$|2 | 98|$\cdot$|3 |
Proposed | 53|$\cdot$|9 | 86|$\cdot$|2 | 98|$\cdot$|4 | 50|$\cdot$|1 | 84|$\cdot$|0 | 98|$\cdot$|0 | |
Log | 42|$\cdot$|4 | 77|$\cdot$|3 | 94|$\cdot$|8 | 35|$\cdot$|9 | 67|$\cdot$|2 | 88|$\cdot$|1 | |
Raw | 6|$\cdot$|1 | 8|$\cdot$|4 | 11|$\cdot$|1 | 9|$\cdot$|6 | 22|$\cdot$|6 | 39|$\cdot$|4 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}1p\rfloor$| | Oracle | 83|$\cdot$|8 | 97|$\cdot$|7 | 100|$\cdot$|0 | 87|$\cdot$|9 | 98|$\cdot$|2 | 100|$\cdot$|0 |
Proposed | 82|$\cdot$|6 | 97|$\cdot$|1 | 100|$\cdot$|0 | 86|$\cdot$|5 | 98|$\cdot$|3 | 99|$\cdot$|9 | |
Log | 70|$\cdot$|8 | 93|$\cdot$|0 | 99|$\cdot$|8 | 67|$\cdot$|8 | 90|$\cdot$|8 | 98|$\cdot$|4 | |
Raw | 12|$\cdot$|0 | 13|$\cdot$|2 | 20|$\cdot$|7 | 24|$\cdot$|0 | 44|$\cdot$|0 | 61|$\cdot$|5 | |
Gamma |$s=\lfloor0{\cdot}5p\rfloor$| | Oracle | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 99|$\cdot$|4 | 100|$\cdot$|0 | 100|$\cdot$|0 | 99|$\cdot$|8 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Raw | 51|$\cdot$|5 | 52|$\cdot$|7 | 75|$\cdot$|0 | 80|$\cdot$|3 | 94|$\cdot$|5 | 99|$\cdot$|0 |
To further examine the performance of the proposed test in very high-dimensional settings, we carried out simulations for two independent samples with dimension |$p=2000$| and sample sizes |$n_1=n_2=100,200$|. The results are summarized in Table 3 and indicate that the proposed test still has approximately correct size and improved power over the two competing tests.
. | . | Banded covariance . | Sparse covariance . | ||
---|---|---|---|---|---|
Method | |$n_1=n_2=100$| | |$n_1=n_2=200$| | |$n_1=n_2=100$| | |$n_1=n_2=200$| | |
Normal, |$H_0$| | Oracle | 6|$\cdot$|5 | 3|$\cdot$|7 | 7|$\cdot$|6 | 4|$\cdot$|1 |
Proposed | 6|$\cdot$|4 | 3|$\cdot$|7 | 7|$\cdot$|5 | 4|$\cdot$|1 | |
Log | 5|$\cdot$|0 | 4|$\cdot$|7 | 2|$\cdot$|4 | 1|$\cdot$|8 | |
Raw | 0|$\cdot$|2 | 0|$\cdot$|1 | 0|$\cdot$|5 | 0|$\cdot$|9 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 100|$\cdot$|0 | 100|$\cdot$|0 | 98|$\cdot$|7 | 98|$\cdot$|0 | |
Raw | 48|$\cdot$|4 | 55|$\cdot$|0 | 60|$\cdot$|7 | 68|$\cdot$|6 | |
Gamma, |$H_0$| | Oracle | 7|$\cdot$|0 | 6|$\cdot$|1 | 6|$\cdot$|4 | 6|$\cdot$|7 |
Proposed | 6|$\cdot$|9 | 6|$\cdot$|1 | 6|$\cdot$|6 | 7|$\cdot$|1 | |
Log | 4|$\cdot$|9 | 4|$\cdot$|5 | 2|$\cdot$|7 | 2|$\cdot$|2 | |
Raw | 0|$\cdot$|2 | 0|$\cdot$|2 | 0|$\cdot$|4 | 0|$\cdot$|1 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 100|$\cdot$|0 | 100|$\cdot$|0 | 98|$\cdot$|5 | 98|$\cdot$|9 | |
Raw | 36|$\cdot$|1 | 45|$\cdot$|3 | 22|$\cdot$|1 | 22|$\cdot$|0 |
. | . | Banded covariance . | Sparse covariance . | ||
---|---|---|---|---|---|
Method | |$n_1=n_2=100$| | |$n_1=n_2=200$| | |$n_1=n_2=100$| | |$n_1=n_2=200$| | |
Normal, |$H_0$| | Oracle | 6|$\cdot$|5 | 3|$\cdot$|7 | 7|$\cdot$|6 | 4|$\cdot$|1 |
Proposed | 6|$\cdot$|4 | 3|$\cdot$|7 | 7|$\cdot$|5 | 4|$\cdot$|1 | |
Log | 5|$\cdot$|0 | 4|$\cdot$|7 | 2|$\cdot$|4 | 1|$\cdot$|8 | |
Raw | 0|$\cdot$|2 | 0|$\cdot$|1 | 0|$\cdot$|5 | 0|$\cdot$|9 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 100|$\cdot$|0 | 100|$\cdot$|0 | 98|$\cdot$|7 | 98|$\cdot$|0 | |
Raw | 48|$\cdot$|4 | 55|$\cdot$|0 | 60|$\cdot$|7 | 68|$\cdot$|6 | |
Gamma, |$H_0$| | Oracle | 7|$\cdot$|0 | 6|$\cdot$|1 | 6|$\cdot$|4 | 6|$\cdot$|7 |
Proposed | 6|$\cdot$|9 | 6|$\cdot$|1 | 6|$\cdot$|6 | 7|$\cdot$|1 | |
Log | 4|$\cdot$|9 | 4|$\cdot$|5 | 2|$\cdot$|7 | 2|$\cdot$|2 | |
Raw | 0|$\cdot$|2 | 0|$\cdot$|2 | 0|$\cdot$|4 | 0|$\cdot$|1 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 100|$\cdot$|0 | 100|$\cdot$|0 | 98|$\cdot$|5 | 98|$\cdot$|9 | |
Raw | 36|$\cdot$|1 | 45|$\cdot$|3 | 22|$\cdot$|1 | 22|$\cdot$|0 |
. | . | Banded covariance . | Sparse covariance . | ||
---|---|---|---|---|---|
Method | |$n_1=n_2=100$| | |$n_1=n_2=200$| | |$n_1=n_2=100$| | |$n_1=n_2=200$| | |
Normal, |$H_0$| | Oracle | 6|$\cdot$|5 | 3|$\cdot$|7 | 7|$\cdot$|6 | 4|$\cdot$|1 |
Proposed | 6|$\cdot$|4 | 3|$\cdot$|7 | 7|$\cdot$|5 | 4|$\cdot$|1 | |
Log | 5|$\cdot$|0 | 4|$\cdot$|7 | 2|$\cdot$|4 | 1|$\cdot$|8 | |
Raw | 0|$\cdot$|2 | 0|$\cdot$|1 | 0|$\cdot$|5 | 0|$\cdot$|9 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 100|$\cdot$|0 | 100|$\cdot$|0 | 98|$\cdot$|7 | 98|$\cdot$|0 | |
Raw | 48|$\cdot$|4 | 55|$\cdot$|0 | 60|$\cdot$|7 | 68|$\cdot$|6 | |
Gamma, |$H_0$| | Oracle | 7|$\cdot$|0 | 6|$\cdot$|1 | 6|$\cdot$|4 | 6|$\cdot$|7 |
Proposed | 6|$\cdot$|9 | 6|$\cdot$|1 | 6|$\cdot$|6 | 7|$\cdot$|1 | |
Log | 4|$\cdot$|9 | 4|$\cdot$|5 | 2|$\cdot$|7 | 2|$\cdot$|2 | |
Raw | 0|$\cdot$|2 | 0|$\cdot$|2 | 0|$\cdot$|4 | 0|$\cdot$|1 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 100|$\cdot$|0 | 100|$\cdot$|0 | 98|$\cdot$|5 | 98|$\cdot$|9 | |
Raw | 36|$\cdot$|1 | 45|$\cdot$|3 | 22|$\cdot$|1 | 22|$\cdot$|0 |
. | . | Banded covariance . | Sparse covariance . | ||
---|---|---|---|---|---|
Method | |$n_1=n_2=100$| | |$n_1=n_2=200$| | |$n_1=n_2=100$| | |$n_1=n_2=200$| | |
Normal, |$H_0$| | Oracle | 6|$\cdot$|5 | 3|$\cdot$|7 | 7|$\cdot$|6 | 4|$\cdot$|1 |
Proposed | 6|$\cdot$|4 | 3|$\cdot$|7 | 7|$\cdot$|5 | 4|$\cdot$|1 | |
Log | 5|$\cdot$|0 | 4|$\cdot$|7 | 2|$\cdot$|4 | 1|$\cdot$|8 | |
Raw | 0|$\cdot$|2 | 0|$\cdot$|1 | 0|$\cdot$|5 | 0|$\cdot$|9 | |
Normal, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 100|$\cdot$|0 | 100|$\cdot$|0 | 98|$\cdot$|7 | 98|$\cdot$|0 | |
Raw | 48|$\cdot$|4 | 55|$\cdot$|0 | 60|$\cdot$|7 | 68|$\cdot$|6 | |
Gamma, |$H_0$| | Oracle | 7|$\cdot$|0 | 6|$\cdot$|1 | 6|$\cdot$|4 | 6|$\cdot$|7 |
Proposed | 6|$\cdot$|9 | 6|$\cdot$|1 | 6|$\cdot$|6 | 7|$\cdot$|1 | |
Log | 4|$\cdot$|9 | 4|$\cdot$|5 | 2|$\cdot$|7 | 2|$\cdot$|2 | |
Raw | 0|$\cdot$|2 | 0|$\cdot$|2 | 0|$\cdot$|4 | 0|$\cdot$|1 | |
Gamma, |$H_1$||$s=\lfloor0{\cdot}05p\rfloor$| | Oracle | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 |
Proposed | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | 100|$\cdot$|0 | |
Log | 100|$\cdot$|0 | 100|$\cdot$|0 | 98|$\cdot$|5 | 98|$\cdot$|9 | |
Raw | 36|$\cdot$|1 | 45|$\cdot$|3 | 22|$\cdot$|1 | 22|$\cdot$|0 |
6. Applications to microbiome data
6.1. Analysis of obesity microbiome data
We illustrate the application of the proposed tests by analysing two microbiome datasets. The first is a dataset from Wu et al. (2011) collected in a cross-sectional study of 98 subjects to investigate the effects of habitual diet on the human gut microbiome. The dataset was analysed by regression in Lin et al. (2014), where it was found to suggest an association between obesity and changes in gut microbiome composition. For each subject, DNA collected from stool samples was analysed by 454/Roche pyrosequencing of 16S rRNA gene segments from the V1–V2 region. An average of 9265 reads per sample were obtained, with a standard deviation of 3864, by denoising the pyrosequences prior to taxonomic assignment. The resulting 3068 operational taxonomic units were further merged into 87 genera that were observed in at least one sample. As suggested by Aitchison (2003) and Lin et al. (2014), zero counts were replaced by 0|$\cdot$|5 before the count data were converted into compositional data by normalization. Demographic information, including body mass index, BMI, was recorded on the subjects.
We are interested in testing whether lean and obese individuals have the same gut microbiome composition. To this end, we divided the subjects into a lean group, BMI |$<25$|, |$n_1=63$|, and an obese group, BMI |${\geqslant}25$|, |$n_2=35$|, on which we performed various two-sample tests. The proposed test yielded a |$p$|-value of 0|$\cdot$|001, indicating a marked difference between the two groups. In contrast, the tests based on the log-transformed and raw compositions gave |$p$|-values of 0|$\cdot$|129 and 0|$\cdot$|261, and hence failed to detect the difference at the 0|$\cdot$|05 level. To assess the stability of our proposed test and to perform power comparisons, we generated 5000 bootstrap subsamples within each group, with the subsampling proportion varying from 0|$\cdot$|2 to 1. For each subsampling proportion, we obtained the empirical power as the proportion of subsamples for which the null hypothesis was rejected at the 0|$\cdot$|05 level. The empirical power curves based on the bootstrap subsamples, presented in Fig. 1(a), show that the proposed test greatly outperforms the competitors. We further conducted back-testing to check whether the signal disappears upon breaking the association. We generated 1000 bootstrap samples from the pooled data and then randomly divided each sample into two groups with the same sizes as before. The histogram of |$p$|-values from our test based on the bootstrap samples is displayed in Fig. 1(b). The |$p$|-values are distributed quite evenly, indicating good accuracy of the asymptotics. Overall, our results confirm previous findings that the microbiomes of lean and obese individuals differ at the taxonomic and functional levels (Turnbaugh et al., 2009).
To further assess the sensitivity of the results to zero replacements, we repeated the analysis with the zero counts replaced by 0|$\cdot$|1 before normalization. The proposed test resulted in a |$p$|-value of 0|$\cdot$|0001, while the tests based on the log-transformed and raw compositions gave |$p$|-values of 0|$\cdot$|015 and 0|$\cdot$|080, respectively. In this case only the proposed test rejects the null hypothesis at the 0|$\cdot$|01 level, and the inference does not seem sensitive to the zero-replacement values.
6.2. Analysis of Crohn’s disease microbiome data
Crohn’s disease is a type of inflammatory bowel disease characterized by altered gut bacterial composition, whose etiology appears multifactorial and remains poorly understood. We analyse a dataset from a longitudinal study of 90 pediatric Crohn’s disease patients reported by Lewis et al. (2015). Among these patients, 26 were classified as responders to anti-tumour necrosis factor therapy, where response to therapy was defined as a reduction in faecal calprotectin, FCP, concentration to |$250\,\mu$|g/g or below among those with baseline FCP greater than |$250\,\mu$|g/g. Twenty-four of the responders had stool samples collected at four time-points: baseline, and then 1 week, 4 weeks and 8 weeks into therapy. The bacterial composition was quantified using shotgun metagenomic sequencing and the MetaPhlAn package (Segata et al., 2012), yielding 43 genera that appeared in at least three samples across all time-points. As the read counts were not available, zero proportions were replaced by half or 10% of the minimum nonzero proportions in the dataset.
To determine the effect of the therapy among responders, we applied various paired tests to test for changes in gut microbiome composition between baseline and the three later time-points. As shown in Table 4, the |$p$|-values for the comparison between baseline and week 8 from all tests were significant or close to significant, with the strongest evidence provided by the proposed test. The comparisons at the two earlier time-points did not yield decisive conclusions. These inferences do not seem sensitive to the zero-replacement strategies. The empirical power curves based on bootstrap subsamples in Fig. 1(c) exhibit more substantial power gains of the proposed test over the competitors with smaller sample sizes. Moreover, the histogram of |$p$|-values in Fig. 1(d) indicates that the proposed test survives the back-testing, where the observations at two time-points were randomly interchanged for each subject in the bootstrap samples. Our results provide further support for the effect of the therapy on gut microbiome composition through reduced inflammation and suggest that it may take longer for the intestinal dysbiosis to be resolved.
. | Zero replacement by half . | Zero replacement by 10% . | ||||
---|---|---|---|---|---|---|
. | Proposed . | Log . | Raw . | Proposed . | Log . | Raw . |
Baseline versus week 1 | 0|$\cdot$|119 | 0|$\cdot$|605 | 0|$\cdot$|757 | 0|$\cdot$|141 | 0|$\cdot$|611 | 0|$\cdot$|757 |
Baseline versus week 4 | 0|$\cdot$|460 | 0|$\cdot$|553 | 0|$\cdot$|468 | 0|$\cdot$|373 | 0|$\cdot$|684 | 0|$\cdot$|468 |
Baseline versus week 8 | 0|$\cdot$|014 | 0|$\cdot$|033 | 0|$\cdot$|082 | 0|$\cdot$|018 | 0|$\cdot$|058 | 0|$\cdot$|082 |
. | Zero replacement by half . | Zero replacement by 10% . | ||||
---|---|---|---|---|---|---|
. | Proposed . | Log . | Raw . | Proposed . | Log . | Raw . |
Baseline versus week 1 | 0|$\cdot$|119 | 0|$\cdot$|605 | 0|$\cdot$|757 | 0|$\cdot$|141 | 0|$\cdot$|611 | 0|$\cdot$|757 |
Baseline versus week 4 | 0|$\cdot$|460 | 0|$\cdot$|553 | 0|$\cdot$|468 | 0|$\cdot$|373 | 0|$\cdot$|684 | 0|$\cdot$|468 |
Baseline versus week 8 | 0|$\cdot$|014 | 0|$\cdot$|033 | 0|$\cdot$|082 | 0|$\cdot$|018 | 0|$\cdot$|058 | 0|$\cdot$|082 |
. | Zero replacement by half . | Zero replacement by 10% . | ||||
---|---|---|---|---|---|---|
. | Proposed . | Log . | Raw . | Proposed . | Log . | Raw . |
Baseline versus week 1 | 0|$\cdot$|119 | 0|$\cdot$|605 | 0|$\cdot$|757 | 0|$\cdot$|141 | 0|$\cdot$|611 | 0|$\cdot$|757 |
Baseline versus week 4 | 0|$\cdot$|460 | 0|$\cdot$|553 | 0|$\cdot$|468 | 0|$\cdot$|373 | 0|$\cdot$|684 | 0|$\cdot$|468 |
Baseline versus week 8 | 0|$\cdot$|014 | 0|$\cdot$|033 | 0|$\cdot$|082 | 0|$\cdot$|018 | 0|$\cdot$|058 | 0|$\cdot$|082 |
. | Zero replacement by half . | Zero replacement by 10% . | ||||
---|---|---|---|---|---|---|
. | Proposed . | Log . | Raw . | Proposed . | Log . | Raw . |
Baseline versus week 1 | 0|$\cdot$|119 | 0|$\cdot$|605 | 0|$\cdot$|757 | 0|$\cdot$|141 | 0|$\cdot$|611 | 0|$\cdot$|757 |
Baseline versus week 4 | 0|$\cdot$|460 | 0|$\cdot$|553 | 0|$\cdot$|468 | 0|$\cdot$|373 | 0|$\cdot$|684 | 0|$\cdot$|468 |
Baseline versus week 8 | 0|$\cdot$|014 | 0|$\cdot$|033 | 0|$\cdot$|082 | 0|$\cdot$|018 | 0|$\cdot$|058 | 0|$\cdot$|082 |
7. Discussion
We have shown that it is possible to develop tests for high-dimensional parameters of the log basis variables from which compositional data are derived, even though the bases are not observed. In this regard, our method extends the scope of the log-ratio transformation methodology due to Aitchison (1982). The mild assumption that |$\|\delta\|_1=o(p)$| for the proposed test to achieve the minimax optimal rate is due to the use of centred log-ratio variables as a proxy for the latent log basis variables, and bears a resemblance to an approximate identifiability condition for large covariance estimation from compositional data considered in Cao et al. (2016).
Our testing framework may be extended in at least two directions. First, it would be worthwhile to exploit the covariance structure of compositional data for power enhancement, by borrowing ideas of Cai et al. (2014). Such an extension, however, seems nontrivial owing to the singularity of the centred log-ratio covariance matrix. Second, in addition to the global test developed in this paper, a multiple testing procedure with accurate error control would be helpful for identifying specific taxa that differ significantly between groups and contribute to the outcome of interest.
Acknowledgement
This research was supported in part by the U.S. National Institutes of Health and the National Natural Science Foundation of China. We thank the associate editor and three reviewers for their very helpful comments.
Appendix
Proofs of the theoretical results
We first introduce some notation. For a matrix |$A=(a_{ij})_{p\times p}$|, denote by |$\|A\|_1$| and |$\|A\|_{\max}$| the matrix 1-norm and entrywise |$\ell_\infty$|-norm, respectively, i.e., |$\|A\|_1=\max_{1{\leqslant} j{\leqslant} p}\sum_{i=1}^p|a_{ij}|$| and |$\|A\|_{\max}=\max_{1{\leqslant} i,j{\leqslant} p}|a_{ij}|$|. Write |$a_{i\cdot}=p^{-1}\sum_{j=1}^p a_{ij}$| and |$a_{\cdot\cdot}=p^{-2}\sum_{i=1}^p\sum_{j=1}^p a_{ij}$|. We will use |$C_1,C_2,\ldots>0$| to denote generic constants, whose values may vary from line to line.
Proof of Proposition 1
This and Condition 1 imply (i).
This and Condition 3 imply (iii) and thus complete the proof.
Proof of Proposition 2
Combining (A6) with (A7), we arrive at (11).
The proof is completed by invoking the following lemma, which recaps a concentration result in Bickel & Levina (2008), and using the fact that |$\hat\gamma_{jj}=n_1\hat\gamma_{jj}^{(1)}/(n_1+n_2)+n_2\hat\gamma_{jj}^{(2)}/(n_1+n_2)$|.
Proof of Theorem 1
Then (A9) is proved by applying the following lemma, which follows from the same arguments as those for Lemma 6 of Cai et al. (2014), and letting |$m\to\infty$|.
Proof of Theorem 2
This implies (A11) and completes the proof.
References