- Split View
-
Views
-
Cite
Cite
I. M. Johnstone, B. Nadler, Roy’s largest root test under rank-one alternatives, Biometrika, Volume 104, Issue 1, March 2017, Pages 181–193, https://doi.org/10.1093/biomet/asw060
- Share Icon Share
SUMMARY
Roy’s largest root is a common test statistic in multivariate analysis, statistical signal processing and allied fields. Despite its ubiquity, provision of accurate and tractable approximations to its distribution under the alternative has been a longstanding open problem. Assuming Gaussian observations and a rank-one alternative, or concentrated noncentrality, we derive simple yet accurate approximations for the most common low-dimensional settings. These include signal detection in noise, multiple response regression, multivariate analysis of variance and canonical correlation analysis. A small-noise perturbation approach, perhaps underused in statistics, leads to simple combinations of standard univariate distributions, such as central and noncentral |$\chi^2$| and |$F$|. Our results allow approximate power and sample size calculations for Roy’s test for rank-one effects, which is precisely where it is most powerful.
1. INTRODUCTION
Hypothesis testing plays an important role in the analysis of multivariate data. Classical examples include the multiple response linear model, principal component and canonical correlation analysis, and other methods which together form the main focus of standard multivariate texts, e.g. Anderson (2003), Mardia et al. (1979) and Muller & Stewart (2006). They find widespread use in signal processing, social sciences and many other domains.
Under multivariate Gaussian assumptions, in all these cases the associated hypothesis tests can be formulated in terms of either one or two independent Wishart matrices. These are conventionally denoted by |$H$|, for hypothesis, and |$E$|, for error, depending on whether the covariance matrix |$\Sigma$| is known or unknown; in the latter case |$E$| serves to estimate |$\Sigma$|.
James (1964) provided a remarkable five-way classification of the distribution theory associated with these problems. Elements of the classification are indicated in Table 1, along with some representative applications. Departure from the null hypothesis is captured by a matrix |$\Omega$|, so that the testing problem might be |$ \mathcal H_0:\,\Omega=0$| versus |$\mathcal H_1:\, \Omega \neq 0$|. Depending on the particular application, the matrix |$\Omega$| captures the difference in group means, or the number of signals or canonical correlations and their strengths. In the absence of detailed knowledge about the structure of |$\Omega$| under |$\mathcal H_1$|, group invariance arguments show that generic tests depend on the eigenvalues of either |$\Sigma^{-1} H$| or |$E^{-1} H$|, e.g., Muirhead (1982, Ch. 6).
Case . | Multivariate . | Distribution for . | Dimension . | Application . |
---|---|---|---|---|
. | distribution . | dimension |$m=1$| . | |$m > 1$| . | . |
1 | |$\overset{\,}{\,_0F_0}$| | |$\chi^2$| | |$H\sim W_m(n_H,\Sigma+\Omega)$| | Signal detection in noise, |
|$\Sigma$| known | known covariance matrix | |||
2 | |$\,_0F_1$| | noncentral | |$H\sim W_m(n_H,\Sigma,\Omega)$| | Equality of group means, |
|$\chi^2$| | |$\Sigma$| known | known covariance matrix | ||
3 | |$\overset{\,}{\,_1F_0}$| | |$F$| | |$H\sim W_m(n_H,\Sigma+\Omega)$| | Signal detection in noise, |
|$E\sim W_m(n_E,\Sigma)$| | estimated covariance matrix | |||
4 | |$\,_1F_1$| | noncentral | |$H\sim W_m(n_H,\Sigma,\Omega)$| | Equality of group means, |
|$F$| | |$E\sim W_m(n_E,\Sigma) $| | estimated covariance matrix | ||
5 | |$\overset{\,}{\,_2F_1}$| | Correlation coeff. | |$H\sim W_p(q,\Sigma,\Omega)$| | Canonical correlation |
|$r^2/(1-r^2)$| | |$ E\sim W_p(n-q,\Sigma)$| | analysis between two | ||
|$t$|-distribution | |$\Omega$| itself random | groups of sizes |$p\leq q$|. |
Case . | Multivariate . | Distribution for . | Dimension . | Application . |
---|---|---|---|---|
. | distribution . | dimension |$m=1$| . | |$m > 1$| . | . |
1 | |$\overset{\,}{\,_0F_0}$| | |$\chi^2$| | |$H\sim W_m(n_H,\Sigma+\Omega)$| | Signal detection in noise, |
|$\Sigma$| known | known covariance matrix | |||
2 | |$\,_0F_1$| | noncentral | |$H\sim W_m(n_H,\Sigma,\Omega)$| | Equality of group means, |
|$\chi^2$| | |$\Sigma$| known | known covariance matrix | ||
3 | |$\overset{\,}{\,_1F_0}$| | |$F$| | |$H\sim W_m(n_H,\Sigma+\Omega)$| | Signal detection in noise, |
|$E\sim W_m(n_E,\Sigma)$| | estimated covariance matrix | |||
4 | |$\,_1F_1$| | noncentral | |$H\sim W_m(n_H,\Sigma,\Omega)$| | Equality of group means, |
|$F$| | |$E\sim W_m(n_E,\Sigma) $| | estimated covariance matrix | ||
5 | |$\overset{\,}{\,_2F_1}$| | Correlation coeff. | |$H\sim W_p(q,\Sigma,\Omega)$| | Canonical correlation |
|$r^2/(1-r^2)$| | |$ E\sim W_p(n-q,\Sigma)$| | analysis between two | ||
|$t$|-distribution | |$\Omega$| itself random | groups of sizes |$p\leq q$|. |
James’s classification of eigenvalue distributions was based on hypergeometric functions |$_aF_b$| of matrix argument; their univariate analogues are shown in column 3. Column 4 details the corresponding Wishart assumptions for the sum of squares and cross products matrices; the rightmost column gives a nonexhaustive list of sample applications.
Case . | Multivariate . | Distribution for . | Dimension . | Application . |
---|---|---|---|---|
. | distribution . | dimension |$m=1$| . | |$m > 1$| . | . |
1 | |$\overset{\,}{\,_0F_0}$| | |$\chi^2$| | |$H\sim W_m(n_H,\Sigma+\Omega)$| | Signal detection in noise, |
|$\Sigma$| known | known covariance matrix | |||
2 | |$\,_0F_1$| | noncentral | |$H\sim W_m(n_H,\Sigma,\Omega)$| | Equality of group means, |
|$\chi^2$| | |$\Sigma$| known | known covariance matrix | ||
3 | |$\overset{\,}{\,_1F_0}$| | |$F$| | |$H\sim W_m(n_H,\Sigma+\Omega)$| | Signal detection in noise, |
|$E\sim W_m(n_E,\Sigma)$| | estimated covariance matrix | |||
4 | |$\,_1F_1$| | noncentral | |$H\sim W_m(n_H,\Sigma,\Omega)$| | Equality of group means, |
|$F$| | |$E\sim W_m(n_E,\Sigma) $| | estimated covariance matrix | ||
5 | |$\overset{\,}{\,_2F_1}$| | Correlation coeff. | |$H\sim W_p(q,\Sigma,\Omega)$| | Canonical correlation |
|$r^2/(1-r^2)$| | |$ E\sim W_p(n-q,\Sigma)$| | analysis between two | ||
|$t$|-distribution | |$\Omega$| itself random | groups of sizes |$p\leq q$|. |
Case . | Multivariate . | Distribution for . | Dimension . | Application . |
---|---|---|---|---|
. | distribution . | dimension |$m=1$| . | |$m > 1$| . | . |
1 | |$\overset{\,}{\,_0F_0}$| | |$\chi^2$| | |$H\sim W_m(n_H,\Sigma+\Omega)$| | Signal detection in noise, |
|$\Sigma$| known | known covariance matrix | |||
2 | |$\,_0F_1$| | noncentral | |$H\sim W_m(n_H,\Sigma,\Omega)$| | Equality of group means, |
|$\chi^2$| | |$\Sigma$| known | known covariance matrix | ||
3 | |$\overset{\,}{\,_1F_0}$| | |$F$| | |$H\sim W_m(n_H,\Sigma+\Omega)$| | Signal detection in noise, |
|$E\sim W_m(n_E,\Sigma)$| | estimated covariance matrix | |||
4 | |$\,_1F_1$| | noncentral | |$H\sim W_m(n_H,\Sigma,\Omega)$| | Equality of group means, |
|$F$| | |$E\sim W_m(n_E,\Sigma) $| | estimated covariance matrix | ||
5 | |$\overset{\,}{\,_2F_1}$| | Correlation coeff. | |$H\sim W_p(q,\Sigma,\Omega)$| | Canonical correlation |
|$r^2/(1-r^2)$| | |$ E\sim W_p(n-q,\Sigma)$| | analysis between two | ||
|$t$|-distribution | |$\Omega$| itself random | groups of sizes |$p\leq q$|. |
James’s classification of eigenvalue distributions was based on hypergeometric functions |$_aF_b$| of matrix argument; their univariate analogues are shown in column 3. Column 4 details the corresponding Wishart assumptions for the sum of squares and cross products matrices; the rightmost column gives a nonexhaustive list of sample applications.
The most common tests fall into two categories. The first consists of linear statistics which depend on all the eigenvalues and are expressible in the form |$\sum_i f(\ell_i)$| for some univariate function |$f$|. This class includes the three ubiquitous test statistics: Wilks’s |$U$|, the Bartlett–Nanda–Pillai |$V$| and the Lawley–Hotelling |$W$|.
The second category involves functions of the extreme eigenvalues, the first few largest and smallest eigenvalues. Here we focus on the largest root statistic, based on |$\ell_1$|, which arises systematically in multivariate analysis as the union-intersection test (Roy, 1957). To summarize extensive simulations by Schatzoff (1966) and Olson (1974), Roy’s test is most powerful among the common tests when the alternative is of rank-one, i.e., concentrated noncentrality. For fixed dimension, Kritchman & Nadler (2009) showed asymptotic, in sample size, optimality of Roy’s test against rank-one alternatives.
We briefly contrast the state of knowledge regarding approximate distributions, both null and alternative, for the two categories of test statistics. Tests based on linear statistics have long had adequate approximations. In particular, for the two-matrix cases, approximations using an |$F$| distribution are traditional and widely available in software: central |$F$| under the null (SAS, 1999) and noncentral |$F$| under the alternative; see Muller & Peterson (1984), Muller et al. (1992) and a 1999 unpublished manuscript by R. G. O’Brien and G. Shieh, discussed in the Supplementary Material. Saddlepoint approximations (Butler & Wood, 2005; Butler & Paige, 2010) are also available.
For Roy’s largest root test, the situation is less complete. Under the null hypothesis, evaluation or approximation of the distribution is important, for example, to determine the critical values of Roy’s test. Some options are listed in the Supplementary Material.
In contrast, under the alternative, derivation of a simple approximation to the distribution of |$\ell_1$| has remained a longstanding problem. In principle, the distribution of the largest eigenvalue has an exact representation in terms of a hypergeometric function of matrix argument. Despite recent advances in the numerical evaluation of these special functions (Koev & Edelman, 2006), unless the dimension and sample size are less than |$15$|, say, these formulae are challenging to evaluate numerically. In addition, to date, for dimension |$m>2$|, no acceptable method has been developed for transforming Roy’s largest root test statistic to an |$F$| or a |$\chi^2$| statistic, and no straightforward method exists for computing powers for Roy’s statistic itself, as noted by by Anderson (2003, p. 332), Muller et al. (1992), and O’Brien & Shieh.
We develop simple and quite accurate approximations to the distribution of |$\ell_1$| for the classic problems of multivariate analysis, under a rank-one alternative. Under this concentrated noncentrality alternative, the noncentrality matrix has the form |$\Omega = \omega ww^{{\mathrm{\scriptscriptstyle T}}}, \omega > 0$|, where |$w \in\mathbb{R}^m$| is an arbitrary and unknown unit-norm vector. This setting, in which |$\ell_1$| is approximately the most powerful test, may be viewed as a specific form of sparsity, indicating that the effect under study can be described by relatively few parameters.
Our approach keeps |$(m, n_H, n_E)$| fixed. We study the limit of a large noncentrality parameter or, equivalently, small noise. While small-noise perturbation is a classical method in applied mathematics and mathematical physics, it has apparently seen less use in statistics. Some exceptions, mostly focused on other multivariate problems, include Kadane (1970, 1971), Anderson (1977), Schott (1986), Nadler & Coifman (2005) and Nadler (2008).
Our small-noise analysis uses tools from matrix perturbation theory and yields an approximate stochastic representation for |$\ell_1$|. In concert with standard Wishart results, we deduce its approximate distribution for the five cases of Table 1 in Propositions 1 through 5. The expressions obtained can be readily evaluated numerically, typically via a single univariate integration. Code for the resulting distributions and their power is provided in the Supplementary Material.
The results of this paper can aid power analyses and sample size design in exactly those settings in which Roy’s test may be most appropriate, namely when the relevant alternatives are thought to be predominantly of rank-one. Table 2 gives a small illustrative example for the classical problem of comparison of group means, setting 4 in Table 1. Here, we observe 20 multivariate samples from each of |$p=6$| different groups, and test a null hypothesis that their group means are all equal. Under the alternative, the |$k= 1, \ldots, p$| group means are assumed to vary as multiples |$\mu_k = k \tau \mu_0$| of a fixed vector |$ \mu_0=(1,1,1,-1,-1,-1)\in\mathbb{R}^6$|, with scale factor |$\tau$|. The noise covariance matrix is |$\Sigma=(1-\rho)I+\rho 1 1^{{\mathrm{\scriptscriptstyle T}}}$|. This setting leads to a rank-one noncentrality matrix for which, assuming Gaussian observations, Proposition 4 of § 3 applies.
Power . | |$\tau =$| 0|$\cdot09$| . | |$\tau =$| 0|$\cdot11$| . | |$\tau =$| 0|$\cdot13$| . |
---|---|---|---|
Pillai trace |$\rho = 0$| | 25|$\cdot$|1 | 46|$\cdot$|7 | 70|$\cdot$|3 |
Largest root |$\rho = 0$| | 34|$\cdot$|9 | 63|$\cdot$|8 | 86|$\cdot$|9 |
Approximation from Proposition 4 | 30|$\cdot$|0 | 60|$\cdot$|1 | 85|$\cdot$|4 |
Pillai trace |$\rho =$| 0|$\cdot3$| | 43|$\cdot$|8 | 71|$\cdot$|9 | 91|$\cdot$|2 |
Largest root |$\rho =$| 0|$\cdot3$| | 60|$\cdot$|4 | 88|$\cdot$|1 | 98|$\cdot$|3 |
Approximation from Proposition 4 | 56|$\cdot$|5 | 86|$\cdot$|8 | 98|$\cdot$|1 |
Power . | |$\tau =$| 0|$\cdot09$| . | |$\tau =$| 0|$\cdot11$| . | |$\tau =$| 0|$\cdot13$| . |
---|---|---|---|
Pillai trace |$\rho = 0$| | 25|$\cdot$|1 | 46|$\cdot$|7 | 70|$\cdot$|3 |
Largest root |$\rho = 0$| | 34|$\cdot$|9 | 63|$\cdot$|8 | 86|$\cdot$|9 |
Approximation from Proposition 4 | 30|$\cdot$|0 | 60|$\cdot$|1 | 85|$\cdot$|4 |
Pillai trace |$\rho =$| 0|$\cdot3$| | 43|$\cdot$|8 | 71|$\cdot$|9 | 91|$\cdot$|2 |
Largest root |$\rho =$| 0|$\cdot3$| | 60|$\cdot$|4 | 88|$\cdot$|1 | 98|$\cdot$|3 |
Approximation from Proposition 4 | 56|$\cdot$|5 | 86|$\cdot$|8 | 98|$\cdot$|1 |
Power . | |$\tau =$| 0|$\cdot09$| . | |$\tau =$| 0|$\cdot11$| . | |$\tau =$| 0|$\cdot13$| . |
---|---|---|---|
Pillai trace |$\rho = 0$| | 25|$\cdot$|1 | 46|$\cdot$|7 | 70|$\cdot$|3 |
Largest root |$\rho = 0$| | 34|$\cdot$|9 | 63|$\cdot$|8 | 86|$\cdot$|9 |
Approximation from Proposition 4 | 30|$\cdot$|0 | 60|$\cdot$|1 | 85|$\cdot$|4 |
Pillai trace |$\rho =$| 0|$\cdot3$| | 43|$\cdot$|8 | 71|$\cdot$|9 | 91|$\cdot$|2 |
Largest root |$\rho =$| 0|$\cdot3$| | 60|$\cdot$|4 | 88|$\cdot$|1 | 98|$\cdot$|3 |
Approximation from Proposition 4 | 56|$\cdot$|5 | 86|$\cdot$|8 | 98|$\cdot$|1 |
Power . | |$\tau =$| 0|$\cdot09$| . | |$\tau =$| 0|$\cdot11$| . | |$\tau =$| 0|$\cdot13$| . |
---|---|---|---|
Pillai trace |$\rho = 0$| | 25|$\cdot$|1 | 46|$\cdot$|7 | 70|$\cdot$|3 |
Largest root |$\rho = 0$| | 34|$\cdot$|9 | 63|$\cdot$|8 | 86|$\cdot$|9 |
Approximation from Proposition 4 | 30|$\cdot$|0 | 60|$\cdot$|1 | 85|$\cdot$|4 |
Pillai trace |$\rho =$| 0|$\cdot3$| | 43|$\cdot$|8 | 71|$\cdot$|9 | 91|$\cdot$|2 |
Largest root |$\rho =$| 0|$\cdot3$| | 60|$\cdot$|4 | 88|$\cdot$|1 | 98|$\cdot$|3 |
Approximation from Proposition 4 | 56|$\cdot$|5 | 86|$\cdot$|8 | 98|$\cdot$|1 |
Table 2 compares the power of multivariate tests at three signal strengths and two correlation models at level |$1%$|. The Pillai trace |$V$| is chosen as representative of the three tests that use all the roots. Especially in the larger two signal settings, Roy’s largest root test makes the difference between a plausible experiment and an underpowered study. For a real multivariate example in which Roy’s test is argued to be most appropriate, see Hand & Taylor (1987, Study C). In other cases, past experience may suggest that an approximately rank-one alternative is plausible.
Simulations like this can suggest the magnitude of improvement possible in selected cases, but the essence of power and sample size analysis is the comparison of a range of scenarios thought to encompass the likely experimental setting. For this, relatively simple approximate formulas such as those derived in this paper are invaluable.
It is not the purpose of this paper to argue for a general and unexamined use of Roy’s test. It is well-established that there is no uniformly best test, and in particular settings issues of robustness with respect to nonnormality already studied by, for example, Olson (1974) may be important. Instead, when there is interest in the performance of the largest root in the rank-one Gaussian cases where it should shine, we provide approximations that have long been lacking.
2. DEFINITIONS AND TWO APPLICATIONS
We present two applications, one from multivariate statistics and the other from signal processing, that illustrate settings 1–4 of Table 1. Following Muirhead (1982, p. 441), we recall that if |$z_i {\sim} N_m(\mu_i, \Sigma)$| for |$i=1, \ldots, n$| are independent, |$Z^{{\mathrm{\scriptscriptstyle T}}} = (z_1, \ldots, z_n)$| and |$M^{{\mathrm{\scriptscriptstyle T}}} = ( \mu_1, \ldots, \mu_n),$| then the |$m \times m$| matrix |$A = Z^{{\mathrm{\scriptscriptstyle T}}} Z$| is said to have the noncentral Wishart distribution |$W_m(n, \Sigma, \Omega)$| with |$n$| degrees of freedom, covariance matrix |$\Sigma$| and noncentrality matrix |$\Omega = \Sigma^{-1} M^{{\mathrm{\scriptscriptstyle T}}} M$|. When |$\Omega = 0$|, the distribution is a central Wishart, |$W_m(n, \Sigma)$|. We denote by |$\chi^2_n$| a random variate following the chi-squared distribution with |$n$| degrees of freedom, and by |$\chi^2_n(\delta)$| a variate following the noncentral chi-squared distribution with noncentrality parameter |$\delta$|.
Here |$\xi_i {\sim}\mathcal{N}_m (0,\Sigma)$| are independent with the error covariance |$\Sigma$| assumed to be the same for all groups, and the indices |$\{1, \ldots, n \} = I_1 \cup \cdots \cup I_p$|, with |$n_k = |I_k|$| and |$n = n_1 + \cdots + n_p$|. We test the equality of group means, |$\mathcal{H}_0: \mu_1 = \cdots = \mu_p$|, versus the alternative |$\mathcal{H}_1$| that the |$\mu_k$| are not all equal. Known |$\Sigma$| leads to setting 2. When |$\Sigma$| is unknown we obtain setting 4.
A rank-one noncentrality matrix is obtained if we assume that under the alternative, the means of the different groups are all proportional to the same unknown vector |${\mu}_0$|, with each multiplied by a group-dependent strength parameter, that is, |${\mu}_k = s_k {\mu}_0$|. This yields a rank-one noncentrality matrix |$\Omega = \omega {\Sigma}^{-1}{\mu}_0 {\mu}_0^{{\mathrm{\scriptscriptstyle T}}}$|, where |$\bar{s} = n^{-1} \sum_k n_k s_k$| and |$\omega = \sum_{k=1}^p n_k (s_k - \bar{s})^2$|. As discussed in the Supplementary Material, comparison of group means is but a particular example of the more general and ubiquitous multiple linear regression setting, to which, under rank-one alternatives and Gaussian observations, our results apply.
3. MAIN RESULTS FOR RANK-ONE ALTERNATIVES
3.1. Setting
Under the null hypothesis in all four settings, |$\Omega = 0$| and thus |$H$| has a central |$W_m(n_H, \Sigma)$| distribution. Settings (1, 2) and (3, 4) differ by the presence or absence of |$E \sim W_m(n_E, \Sigma)$|. An exact or approximate threshold |$t_\alpha$| may be found by the methods referenced in the Supplementary Material.
As previously discussed, the focus of this paper is on the power |$P_D$| under rank-one alternatives. To this end, we present, in five propositions, simple approximate expressions for the distribution of Roy’s largest root statistic |$\ell_1$| for all five settings described in Table 1, under a rank-one alternative. Detailed proofs appear in the Supplementary Material.
3.2. Single Wishart matrix
We begin with settings 1 and 2, where the matrix |$\Sigma$| is assumed to be known. Then, without loss of generality we study the largest eigenvalue of |$\Sigma^{-1/2}H\Sigma^{-1/2}$|. In setting 1, this matrix is distributed as |$W_{m}(n_H,\sigma^2 I+\lambda_H ww^{{\mathrm{\scriptscriptstyle T}}})$|, for some suitable vector |$w\in\mathbb{R}^m$|.
If in the above propositions |$\sigma^2$| is held fixed along with |$m$| and |$n_H$|, and instead we suppose that |$\lambda_H\to\infty$| or |$\omega\to\infty$|, then the same expansions hold, but now with error terms |$o_{\rm p}(1/\lambda_H)$| or |$o_{\rm p}(1/\omega)$|. Lest it be thought unrealistic to base approximations on large |$\lambda_H$|, small |$\sigma$| or, in setting 5, |$\rho$| near 1, in standard simulation situations they lead to levels of power conventionally regarded as desirable, see § 4; indeed in these cases, weaker signals would not be acceptably detectable.
In setting 2, we simply replace |$n_H \lambda_H$| by |$\omega$| in the first term, and the denominator of the third term by |$\omega + \sigma^2(n_H-2)$|.
Thus, for |$\lambda_H \gg 1$|, the fluctuations of |$\ell_1$| in setting 2 are significantly smaller. While beyond the scope of this paper, this result has implications for the detection power of Gaussian signals versus those of constant modulus.
3.3. Two Wishart matrices
Next, we consider the two-matrix case, where |$\Sigma$| is estimated from data.
In the |$n_E\to\infty$| limit, the |$F$|-variates in (8) and (11) converge to |$\chi^2$| variates and we recover the first two terms in the approximations of Propositions 1 and 2, with |$\sigma^2 = 1$| held fixed.
In (8) and (11), |$\approx$| denotes a stochastic approximation whose nature we discuss briefly here. When |$m=1$|, we have |$c_2 = c_3 = 0$| and the first term gives the exact distribution of |$H/E$| for both Propositions 3 and 4. For |$m > 1$|, in setting 4, we note that to leading order |$\ell_1 = O_{\rm p} \{ (\omega + n_H)/n_E \}$|, whereas the errors arise from ignoring terms |$O_{\rm p}(\omega^{-1/2})$| and higher in an eigenvalue expansion and replacing stochastic terms of order |$O_{\rm p} \{ (\omega + n_H)^{1/2} m^{1/2} n_E^{-3/2} \}$| by their expectations. The corresponding statements apply to setting 3 if we replace |$\omega + n_H$| by |$\lambda_H n_H$| and |$\omega^{-1/2}$| by |$\lambda_H^{-1/2}$|. Detailed discussion appears in the Supplementary Material.
We now turn to expressions for |$E(\ell_1)$| and |$\text{var}(\ell_1)$| in settings 3 and 4, analogous to (6) and (7).
In setting 3, |$\omega$| is replaced by |$\lambda_H n_H$| and in (13) the term |$n_{H} + 2 \omega$| is increased to |$n_H(\lambda_H +1)^2$|.
Let |$\hat \Sigma = n_E^{-1} E$| be an unbiased estimator of |$\Sigma$|. Comparison with Propositions 1 and 2 shows that in expectation |$\ell_1( \hat \Sigma^{-1} H)$| exceeds |$\ell_1( \Sigma^{-1} H)$| by a multiplicative factor close to |$n_E/(n_E - m - 1)$|. Hence, the largest eigenvalue of |$n_E E^{-1}H$| is typically larger than that of |${ \Sigma}^{-1} H$|. Again, the fluctuations of |$\ell_1$| in setting 4 are smaller than for signal detection, setting 3.
Nadakuditi & Silverstein (2010) studied the large-parameter limiting value, but not the distribution, of |$\ell_1( E^{-1} H)$| as |$m/n_H \rightarrow c_E $| and as |$ m/n_H \to c_H$|, also in non-Gaussian cases. In this limit, our formula (12) agrees, to leading order terms, with the large-|$\lambda_H$| limit of their expression (equation (23)). Hence, our analysis shows that the limits for the mean of |$\ell_1(E^{-1}H)$| are quite accurate even at smallish values of |$m,n_E$| and |$n_H$|. This is also reflected in our simulations in the Supplementary Material.
3.4. Canonical correlation analysis
The population and sample canonical correlation coefficients, denoted by |$\rho_1,\ldots,\rho_p$| and |$r_1,\ldots,r_p$|, are the positive square roots of the eigenvalues of |$\Sigma_{11}^{-1} \Sigma_{12}\Sigma_{22}^{-1} \Sigma_{21}$| and |$S_{11}^{-1} S_{12}S_{22}^{-1} S_{21}$|. We study the distribution of the largest sample canonical correlation, in the presence of a single large population correlation coefficient, |$\rho_1 > 0$| and |$ \rho_2=\cdots=\rho_p=0$|.
To state our final proposition, we need a modification of the noncentral |$F$| distribution that is related to the squared multiple correlation coefficient.
In the first, |$F_{a,b;\omega}$| is the noncentral |$F$| distribution with noncentrality parameter |$\omega$| and |$p_n$| is the density of |$\chi_n^2$|: this is just the definition. In the second, |$p_K$| is the discrete probability density function of a negative binomial variate with parameters |$(n/2,c)$|: this is an analogue of the more familiar representation of noncentral |$F_{a,b;\omega}$| as a mixture of |$F_{a+ 2k,b}$| with Poisson weights with parameter |$\omega/2$|. The equality above may be verified directly or from Muirhead (1982, p. 175ff), which also gives an expression for the |$F^\chi$| distribution in terms of the Gauss hypergeometric function |${}_2 F_1$|.
When |$p=1$|, the quantity |$r_1^2$| reduces to the squared multiple correlation coefficient, or coefficient of determination, between a single response variable and |$q$| predictor variables. Equation (14) then reduces to a single term |$\{ q/(n-q) \} F_{q,n-q}^\chi(c,n)$|, which is in fact the exact distribution of |$r_1^2$| in this setting (Muirhead, 1982, p. 173).
4. SIMULATIONS
4.1. Empirical densities
We present a series of simulations that support our theoretical analysis and illustrate the accuracy of our approximations. For different signal strengths we make 2 500 000 independent random realizations of the two matrices |$E$| and |$H$|, and record the largest eigenvalue |$\ell_1$|. We also compute its approximate density and power of Roy’s test via our Propositions 1–5.
The top row of Fig. 1 compares the empirical density of |$\{\ell_1(H) - E(\ell_1) \}/\sigma(\ell_1)$| in the signal detection and multivariate analysis of variance cases to the theoretical formulas, (3) and (4), respectively. In this simulation, where all parameter values are small, the theoretical approximation is remarkably accurate, far more so than the classical asymptotic Gaussian approximation. The latter would be valid in the large-parameter limit with |$m, n_H \to \infty$| and |$m/n_H \to c > 0$|, so long as |$\lambda_H > c^{1/2}$| (Baik et al., 2005; Paul, 2007).
Further figures showing the relative accuracy of these approximations appear in the Supplementary Material. The relative error in the top panels of Fig. 1 is less than 0|$\cdot$|02 except in the left tail, which is excluded in approximations to power, |$G_1(x) = \text{pr} (\ell_1 > x)$|. This is consistent with the theoretical analysis in the proof of the power approximation (5) and with plots of the relative error in the approximation to |$G_1(x)$|, both given in the Supplementary Material.
The bottom row of Fig. 1 corresponds to the two-matrix case and the approximate density of |$\ell_1(E^{-1}H)$|, after normalization, in settings 3 and 4. As expected from the analysis, the density is skewed. The approximated density matches quite closely the empirical one, and both are far from the nominal limiting Gaussian density. The relative error in the bottom panels is less than 0|$\cdot$|05 except in the left tail and, in Setting 4, also in the right tail; see the Supplementary Material.
4.2. Power calculations
We conclude this section with a comparison of the empirical detection probability (2) of Roy’s test to the theoretical formulas. We first consider the multivariate analysis of variance setting. Table 3 compares the theoretical power, which follows from our Proposition 4, to the results of simulations. Each entry in the table is the result of 2 000 000 independent random realizations of matrices |$H$| and |$E$|, with |$\Sigma=I$| and different noncentrality values |$\omega$|. The parameters in the table are a subset of those studied by Olson (1974). Two features are apparent from the table: first, our approximations are quite accurate for small sample size and dimension, and become less accurate as the dimension increases. This is to be expected given that the leading error terms in our expansion are of the form |$O({m}^{1/2})$|. Second, the approximation is relatively more accurate at high powers, say larger than |$80%$|, which fortunately are those most relevant to the design of studies in practice. This too is expected, as our approximation is based on a high signal-to-noise ratio, and is valid when no eigenvalue crossover has occurred, meaning that the largest eigenvalue is not due to large fluctuations in the noise. At the other extreme, when the signal strength is weak, our approximation of power is usually conservative since we do not model the case where the largest eigenvalue may arise due to large deviations of the noise.
Finally, we consider setting 5 of canonical correlation analysis. The corresponding comparison of simulations to theory is reported in Table 4, with similar behaviour to Table 3. For simulation results for the case of detection of signals in noise, we refer to Nadler & Johnstone (2011).
Dim. . | Groups . | Samples per . | Noncentrality . | |$P_D$| sim. . | |$P_D$| . | |$P_D$| sim. . | |$P_D$| . |
---|---|---|---|---|---|---|---|
|$m$| . | |$p$| . | group, |$n_k$| . | |$\omega$| . | |$\alpha=1%$| . | theory . | |$\alpha=5%$| . | theory . |
3 | 3 | 10 | 10 | 28|$\cdot$|3 | 27|$\cdot$|1 | 54|$\cdot$|4 | 53|$\cdot$|3 |
3 | 3 | 10 | 20 | 67|$\cdot$|8 | 67|$\cdot$|9 | 88|$\cdot$|2 | 88|$\cdot$|4 |
3 | 3 | 10 | 40 | 97|$\cdot$|5 | 97|$\cdot$|7 | 99|$\cdot$|7 | 99|$\cdot$|7 |
6 | 3 | 10 | 10 | 15|$\cdot$|0 | 13|$\cdot$|8 | 36|$\cdot$|9 | 33|$\cdot$|9 |
6 | 3 | 10 | 20 | 44|$\cdot$|1 | 42|$\cdot$|8 | 71|$\cdot$|8 | 70|$\cdot$|4 |
6 | 3 | 10 | 40 | 87|$\cdot$|5 | 87|$\cdot$|9 | 97|$\cdot$|5 | 97|$\cdot$|5 |
6 | 6 | 10 | 10 | 10|$\cdot$|4 | 6|$\cdot$|4 | 27|$\cdot$|4 | 18|$\cdot$|6 |
6 | 6 | 10 | 20 | 35|$\cdot$|7 | 30|$\cdot$|8 | 61|$\cdot$|3 | 55|$\cdot$|4 |
6 | 6 | 10 | 40 | 85|$\cdot$|0 | 83|$\cdot$|9 | 95|$\cdot$|6 | 95|$\cdot$|1 |
10 | 6 | 20 | 10 | 8|$\cdot$|3 | 5|$\cdot$|4 | 22|$\cdot$|9 | 14|$\cdot$|3 |
10 | 6 | 20 | 20 | 31|$\cdot$|2 | 25|$\cdot$|4 | 55|$\cdot$|1 | 45|$\cdot$|6 |
10 | 6 | 20 | 40 | 82|$\cdot$|8 | 79|$\cdot$|5 | 94|$\cdot$|0 | 91|$\cdot$|7 |
Dim. . | Groups . | Samples per . | Noncentrality . | |$P_D$| sim. . | |$P_D$| . | |$P_D$| sim. . | |$P_D$| . |
---|---|---|---|---|---|---|---|
|$m$| . | |$p$| . | group, |$n_k$| . | |$\omega$| . | |$\alpha=1%$| . | theory . | |$\alpha=5%$| . | theory . |
3 | 3 | 10 | 10 | 28|$\cdot$|3 | 27|$\cdot$|1 | 54|$\cdot$|4 | 53|$\cdot$|3 |
3 | 3 | 10 | 20 | 67|$\cdot$|8 | 67|$\cdot$|9 | 88|$\cdot$|2 | 88|$\cdot$|4 |
3 | 3 | 10 | 40 | 97|$\cdot$|5 | 97|$\cdot$|7 | 99|$\cdot$|7 | 99|$\cdot$|7 |
6 | 3 | 10 | 10 | 15|$\cdot$|0 | 13|$\cdot$|8 | 36|$\cdot$|9 | 33|$\cdot$|9 |
6 | 3 | 10 | 20 | 44|$\cdot$|1 | 42|$\cdot$|8 | 71|$\cdot$|8 | 70|$\cdot$|4 |
6 | 3 | 10 | 40 | 87|$\cdot$|5 | 87|$\cdot$|9 | 97|$\cdot$|5 | 97|$\cdot$|5 |
6 | 6 | 10 | 10 | 10|$\cdot$|4 | 6|$\cdot$|4 | 27|$\cdot$|4 | 18|$\cdot$|6 |
6 | 6 | 10 | 20 | 35|$\cdot$|7 | 30|$\cdot$|8 | 61|$\cdot$|3 | 55|$\cdot$|4 |
6 | 6 | 10 | 40 | 85|$\cdot$|0 | 83|$\cdot$|9 | 95|$\cdot$|6 | 95|$\cdot$|1 |
10 | 6 | 20 | 10 | 8|$\cdot$|3 | 5|$\cdot$|4 | 22|$\cdot$|9 | 14|$\cdot$|3 |
10 | 6 | 20 | 20 | 31|$\cdot$|2 | 25|$\cdot$|4 | 55|$\cdot$|1 | 45|$\cdot$|6 |
10 | 6 | 20 | 40 | 82|$\cdot$|8 | 79|$\cdot$|5 | 94|$\cdot$|0 | 91|$\cdot$|7 |
Dim. . | Groups . | Samples per . | Noncentrality . | |$P_D$| sim. . | |$P_D$| . | |$P_D$| sim. . | |$P_D$| . |
---|---|---|---|---|---|---|---|
|$m$| . | |$p$| . | group, |$n_k$| . | |$\omega$| . | |$\alpha=1%$| . | theory . | |$\alpha=5%$| . | theory . |
3 | 3 | 10 | 10 | 28|$\cdot$|3 | 27|$\cdot$|1 | 54|$\cdot$|4 | 53|$\cdot$|3 |
3 | 3 | 10 | 20 | 67|$\cdot$|8 | 67|$\cdot$|9 | 88|$\cdot$|2 | 88|$\cdot$|4 |
3 | 3 | 10 | 40 | 97|$\cdot$|5 | 97|$\cdot$|7 | 99|$\cdot$|7 | 99|$\cdot$|7 |
6 | 3 | 10 | 10 | 15|$\cdot$|0 | 13|$\cdot$|8 | 36|$\cdot$|9 | 33|$\cdot$|9 |
6 | 3 | 10 | 20 | 44|$\cdot$|1 | 42|$\cdot$|8 | 71|$\cdot$|8 | 70|$\cdot$|4 |
6 | 3 | 10 | 40 | 87|$\cdot$|5 | 87|$\cdot$|9 | 97|$\cdot$|5 | 97|$\cdot$|5 |
6 | 6 | 10 | 10 | 10|$\cdot$|4 | 6|$\cdot$|4 | 27|$\cdot$|4 | 18|$\cdot$|6 |
6 | 6 | 10 | 20 | 35|$\cdot$|7 | 30|$\cdot$|8 | 61|$\cdot$|3 | 55|$\cdot$|4 |
6 | 6 | 10 | 40 | 85|$\cdot$|0 | 83|$\cdot$|9 | 95|$\cdot$|6 | 95|$\cdot$|1 |
10 | 6 | 20 | 10 | 8|$\cdot$|3 | 5|$\cdot$|4 | 22|$\cdot$|9 | 14|$\cdot$|3 |
10 | 6 | 20 | 20 | 31|$\cdot$|2 | 25|$\cdot$|4 | 55|$\cdot$|1 | 45|$\cdot$|6 |
10 | 6 | 20 | 40 | 82|$\cdot$|8 | 79|$\cdot$|5 | 94|$\cdot$|0 | 91|$\cdot$|7 |
Dim. . | Groups . | Samples per . | Noncentrality . | |$P_D$| sim. . | |$P_D$| . | |$P_D$| sim. . | |$P_D$| . |
---|---|---|---|---|---|---|---|
|$m$| . | |$p$| . | group, |$n_k$| . | |$\omega$| . | |$\alpha=1%$| . | theory . | |$\alpha=5%$| . | theory . |
3 | 3 | 10 | 10 | 28|$\cdot$|3 | 27|$\cdot$|1 | 54|$\cdot$|4 | 53|$\cdot$|3 |
3 | 3 | 10 | 20 | 67|$\cdot$|8 | 67|$\cdot$|9 | 88|$\cdot$|2 | 88|$\cdot$|4 |
3 | 3 | 10 | 40 | 97|$\cdot$|5 | 97|$\cdot$|7 | 99|$\cdot$|7 | 99|$\cdot$|7 |
6 | 3 | 10 | 10 | 15|$\cdot$|0 | 13|$\cdot$|8 | 36|$\cdot$|9 | 33|$\cdot$|9 |
6 | 3 | 10 | 20 | 44|$\cdot$|1 | 42|$\cdot$|8 | 71|$\cdot$|8 | 70|$\cdot$|4 |
6 | 3 | 10 | 40 | 87|$\cdot$|5 | 87|$\cdot$|9 | 97|$\cdot$|5 | 97|$\cdot$|5 |
6 | 6 | 10 | 10 | 10|$\cdot$|4 | 6|$\cdot$|4 | 27|$\cdot$|4 | 18|$\cdot$|6 |
6 | 6 | 10 | 20 | 35|$\cdot$|7 | 30|$\cdot$|8 | 61|$\cdot$|3 | 55|$\cdot$|4 |
6 | 6 | 10 | 40 | 85|$\cdot$|0 | 83|$\cdot$|9 | 95|$\cdot$|6 | 95|$\cdot$|1 |
10 | 6 | 20 | 10 | 8|$\cdot$|3 | 5|$\cdot$|4 | 22|$\cdot$|9 | 14|$\cdot$|3 |
10 | 6 | 20 | 20 | 31|$\cdot$|2 | 25|$\cdot$|4 | 55|$\cdot$|1 | 45|$\cdot$|6 |
10 | 6 | 20 | 40 | 82|$\cdot$|8 | 79|$\cdot$|5 | 94|$\cdot$|0 | 91|$\cdot$|7 |
|$p$| . | |$q$| . | |$n$| . | |$\rho$| . | |$P_D$| sim. . | |$P_D$| . | |$P_D$| sim. . | |$P_D$| . |
---|---|---|---|---|---|---|---|
. | . | . | . | |$\alpha=1%$| . | theory . | |$\alpha=5%$| . | theory . |
2 | 5 | 40 | 0|$\cdot$|50 | 34|$\cdot$|4 | 33|$\cdot$|6 | 59|$\cdot$|6 | 57|$\cdot$|8 |
2 | 5 | 40 | 0|$\cdot$|60 | 65|$\cdot$|3 | 64|$\cdot$|9 | 84|$\cdot$|9 | 84|$\cdot$|2 |
2 | 5 | 40 | 0|$\cdot$|70 | 91|$\cdot$|8 | 91|$\cdot$|7 | 97|$\cdot$|8 | 97|$\cdot$|7 |
3 | 7 | 50 | 0|$\cdot$|50 | 31|$\cdot$|3 | 27|$\cdot$|8 | 56|$\cdot$|5 | 51|$\cdot$|4 |
3 | 7 | 50 | 0|$\cdot$|60 | 64|$\cdot$|3 | 61|$\cdot$|8 | 84|$\cdot$|2 | 82|$\cdot$|2 |
3 | 7 | 50 | 0|$\cdot$|70 | 92|$\cdot$|5 | 92|$\cdot$|1 | 98|$\cdot$|0 | 97|$\cdot$|9 |
5 | 10 | 50 | 0|$\cdot$|50 | 13|$\cdot$|5 | 8|$\cdot$|5 | 32|$\cdot$|7 | 22|$\cdot$|2 |
5 | 10 | 50 | 0|$\cdot$|60 | 35|$\cdot$|1 | 28|$\cdot$|9 | 60|$\cdot$|3 | 52|$\cdot$|3 |
5 | 10 | 50 | 0|$\cdot$|70 | 72|$\cdot$|3 | 68|$\cdot$|9 | 88|$\cdot$|9 | 86|$\cdot$|6 |
|$p$| . | |$q$| . | |$n$| . | |$\rho$| . | |$P_D$| sim. . | |$P_D$| . | |$P_D$| sim. . | |$P_D$| . |
---|---|---|---|---|---|---|---|
. | . | . | . | |$\alpha=1%$| . | theory . | |$\alpha=5%$| . | theory . |
2 | 5 | 40 | 0|$\cdot$|50 | 34|$\cdot$|4 | 33|$\cdot$|6 | 59|$\cdot$|6 | 57|$\cdot$|8 |
2 | 5 | 40 | 0|$\cdot$|60 | 65|$\cdot$|3 | 64|$\cdot$|9 | 84|$\cdot$|9 | 84|$\cdot$|2 |
2 | 5 | 40 | 0|$\cdot$|70 | 91|$\cdot$|8 | 91|$\cdot$|7 | 97|$\cdot$|8 | 97|$\cdot$|7 |
3 | 7 | 50 | 0|$\cdot$|50 | 31|$\cdot$|3 | 27|$\cdot$|8 | 56|$\cdot$|5 | 51|$\cdot$|4 |
3 | 7 | 50 | 0|$\cdot$|60 | 64|$\cdot$|3 | 61|$\cdot$|8 | 84|$\cdot$|2 | 82|$\cdot$|2 |
3 | 7 | 50 | 0|$\cdot$|70 | 92|$\cdot$|5 | 92|$\cdot$|1 | 98|$\cdot$|0 | 97|$\cdot$|9 |
5 | 10 | 50 | 0|$\cdot$|50 | 13|$\cdot$|5 | 8|$\cdot$|5 | 32|$\cdot$|7 | 22|$\cdot$|2 |
5 | 10 | 50 | 0|$\cdot$|60 | 35|$\cdot$|1 | 28|$\cdot$|9 | 60|$\cdot$|3 | 52|$\cdot$|3 |
5 | 10 | 50 | 0|$\cdot$|70 | 72|$\cdot$|3 | 68|$\cdot$|9 | 88|$\cdot$|9 | 86|$\cdot$|6 |
|$p$| . | |$q$| . | |$n$| . | |$\rho$| . | |$P_D$| sim. . | |$P_D$| . | |$P_D$| sim. . | |$P_D$| . |
---|---|---|---|---|---|---|---|
. | . | . | . | |$\alpha=1%$| . | theory . | |$\alpha=5%$| . | theory . |
2 | 5 | 40 | 0|$\cdot$|50 | 34|$\cdot$|4 | 33|$\cdot$|6 | 59|$\cdot$|6 | 57|$\cdot$|8 |
2 | 5 | 40 | 0|$\cdot$|60 | 65|$\cdot$|3 | 64|$\cdot$|9 | 84|$\cdot$|9 | 84|$\cdot$|2 |
2 | 5 | 40 | 0|$\cdot$|70 | 91|$\cdot$|8 | 91|$\cdot$|7 | 97|$\cdot$|8 | 97|$\cdot$|7 |
3 | 7 | 50 | 0|$\cdot$|50 | 31|$\cdot$|3 | 27|$\cdot$|8 | 56|$\cdot$|5 | 51|$\cdot$|4 |
3 | 7 | 50 | 0|$\cdot$|60 | 64|$\cdot$|3 | 61|$\cdot$|8 | 84|$\cdot$|2 | 82|$\cdot$|2 |
3 | 7 | 50 | 0|$\cdot$|70 | 92|$\cdot$|5 | 92|$\cdot$|1 | 98|$\cdot$|0 | 97|$\cdot$|9 |
5 | 10 | 50 | 0|$\cdot$|50 | 13|$\cdot$|5 | 8|$\cdot$|5 | 32|$\cdot$|7 | 22|$\cdot$|2 |
5 | 10 | 50 | 0|$\cdot$|60 | 35|$\cdot$|1 | 28|$\cdot$|9 | 60|$\cdot$|3 | 52|$\cdot$|3 |
5 | 10 | 50 | 0|$\cdot$|70 | 72|$\cdot$|3 | 68|$\cdot$|9 | 88|$\cdot$|9 | 86|$\cdot$|6 |
|$p$| . | |$q$| . | |$n$| . | |$\rho$| . | |$P_D$| sim. . | |$P_D$| . | |$P_D$| sim. . | |$P_D$| . |
---|---|---|---|---|---|---|---|
. | . | . | . | |$\alpha=1%$| . | theory . | |$\alpha=5%$| . | theory . |
2 | 5 | 40 | 0|$\cdot$|50 | 34|$\cdot$|4 | 33|$\cdot$|6 | 59|$\cdot$|6 | 57|$\cdot$|8 |
2 | 5 | 40 | 0|$\cdot$|60 | 65|$\cdot$|3 | 64|$\cdot$|9 | 84|$\cdot$|9 | 84|$\cdot$|2 |
2 | 5 | 40 | 0|$\cdot$|70 | 91|$\cdot$|8 | 91|$\cdot$|7 | 97|$\cdot$|8 | 97|$\cdot$|7 |
3 | 7 | 50 | 0|$\cdot$|50 | 31|$\cdot$|3 | 27|$\cdot$|8 | 56|$\cdot$|5 | 51|$\cdot$|4 |
3 | 7 | 50 | 0|$\cdot$|60 | 64|$\cdot$|3 | 61|$\cdot$|8 | 84|$\cdot$|2 | 82|$\cdot$|2 |
3 | 7 | 50 | 0|$\cdot$|70 | 92|$\cdot$|5 | 92|$\cdot$|1 | 98|$\cdot$|0 | 97|$\cdot$|9 |
5 | 10 | 50 | 0|$\cdot$|50 | 13|$\cdot$|5 | 8|$\cdot$|5 | 32|$\cdot$|7 | 22|$\cdot$|2 |
5 | 10 | 50 | 0|$\cdot$|60 | 35|$\cdot$|1 | 28|$\cdot$|9 | 60|$\cdot$|3 | 52|$\cdot$|3 |
5 | 10 | 50 | 0|$\cdot$|70 | 72|$\cdot$|3 | 68|$\cdot$|9 | 88|$\cdot$|9 | 86|$\cdot$|6 |
5. DISCUSSION
The typical approach in classical statistics studies the asymptotics of the random variable of interest as the sample size |$n_H\to\infty$|. Propositions 1–5, in contrast, keep |$n_H,n_E,m$| fixed but let |$\lambda_H\to\infty$|, or equivalently |$\sigma\to 0$|. If the signal strength is sufficiently large, by their construction and as verified in the simulations, Propositions 1–5 are quite accurate for small dimension and sample size values. However, the error in these approximations increases with the dimensionality |$m$|, and so may not be suitable in high-dimensional small-sample settings.
Next, we mention some directions for future research. The study of the distribution of Roy’s largest root test under higher-dimensional alternatives is a natural extension. However, depending on the particular alternative, it may be less powerful than other common tests in that situation. It should be possible to study the resulting distribution under, say, two strong signals, or perhaps one strong signal and several weak ones. We briefly elaborate on this in the Supplementary Material. Sensitivity of the distributions to departures from normality is important. Finally, our approach can be applied to study other test statistics, such as the Hotelling–Lawley trace. In addition, the approach can also provide information about eigenvector fluctuations, which are important in a variety of applications.
ACKNOWLEDGEMENT
We thank Donald Richards and David Banks for many useful discussions and suggestions and Ted Anderson for references. Part of this work was performed while the second author was on sabbatical at the U.C. Berkeley and Stanford Departments of Statistics, and also during a visit by both authors to the Institute of Mathematical Sciences, National University of Singapore. We thank the reviewers for detailed comments, which have greatly improved the manuscript. The research was supported in part by the National Science Foundation, the National Institutes of Health and the U.S.–Israel Binational Science Foundation.
SUPPLEMENTARY MATERIAL
Supplementary material available at Biometrika online includes further discussion and examples, further simulation results, proofs of Propositions 1 to 5, supporting lemmas, discussion of error terms, and the rank-|$r$| case. Matlab code for power calculations and scripts that reproduce the figures and tables are also provided.
References