- Split View
-
Views
-
Cite
Cite
Lu Mao, On the relative efficiency of the intent-to-treat Wilcoxon–Mann–Whitney test in the presence of noncompliance, Biometrika, Volume 109, Issue 3, September 2022, Pages 873–880, https://doi.org/10.1093/biomet/asab053
- Share Icon Share
Summary
A general framework is set up to study the asymptotic properties of the intent-to-treat Wilcoxon–Mann–Whitney test in randomized experiments with nonignorable noncompliance. Under location-shift alternatives, the Pitman efficiencies of the intent-to-treat Wilcoxon–Mann–Whitney and |$t$| tests are derived. It is shown that the former is superior if the compliers are more likely to be found in high-density regions of the outcome distribution or, equivalently, if the noncompliers tend to reside in the tails. By logical extension, the relative efficiency of the two tests is sharply bounded by least and most favourable scenarios in which the compliers are segregated into regions of lowest and highest density, respectively. Such bounds can be derived analytically as a function of the compliance rate for common location families such as Gaussian, Laplace, logistic and |$t$| distributions. These results can help empirical researchers choose the more efficient test for existing data, and calculate sample size for future trials in anticipation of noncompliance. Results for nonadditive alternatives and other tests follow along similar lines.
1. Motivation
The asymptotic properties of the two-sample Wilcoxon–Mann–Whitney test (Wilcoxon, 1945; Mann & Whitney, 1947) have been thoroughly investigated in the literature, especially in comparison with the |$t$| test (Sidak et al., 1999). It is well known that the relative performance of the Wilcoxon–Mann–Whitney and |$t$| tests depends on the shape of the underlying outcome distribution, with the former being superior under heavy-tailed distributions. In fact, the asymptotic relative efficiency of the two tests has been explicitly derived for various outcome distributions, such as the Gaussian, logistic and Laplace families, under location-shift alternatives (see, e.g., van der Vaart, 1998, Ch. 14).
The situation may be different in the presence of nonignorable, i.e., informative, noncompliance to group assignment (Angrist et al., 1996). In such settings, an intent-to-treat test that compares the subjects according to their randomization status is commonly employed to circumvent the selection bias in the treatment received (Gupta, 2011). Because of the unknown relationship between the compliance status and potential outcome, however, the power structures of such tests remain opaque. In this paper, we use the instrumental-variable framework (Angrist et al., 1996; Imbens & Rubin, 1997) to study the asymptotic properties of the intent-to-treat Wilcoxon–Mann–Whitney test in the presence of non- compliance, with particular emphasis on comparison with its |$t$|-test counterpart in detecting additive treatment effects.
2. Asymptotic theory
2.1. General set-up for intent-to-treat tests
Let |$Y(z, a)$| denote the continuous potential outcome under randomization status |$z$| and received treatment |$a$|, where |$z$| and |$a$| are binary variables, each taking value 1 for the active treatment and value 0 for the control (Rubin, 1978). Similarly, let |$A(z)$| denote the potential treatment under randomization status |$z$|. This notation defines four compliance classes: always-taker if |$A(z)=1$|; complier if |$A(z)=z$|; never-taker if |$A(z)=0$|; and defier if |$A(z)=1-z$||$(z= 1, 0)$|. Under the stable unit treatment value assumption (Imbens & Rubin, 2015), the observed treatment and outcome are |$A=A(Z)$| and |$Y=Y(Z, A)$|, respectively. In addition, we assume that randomization has a nontrivial effect on the treatment received, i.e., |${\rm pr}\{A(1)=1\}\neq {\rm pr}\{A(0)=1\}$|, and that there is no defier in the population, i.e., |$A(1)\geq A(0)$| almost surely. In the interest of generality, we relax the usual exclusion restriction assumption that |$Y(1, a)=Y(0, a)$||$(a=1, 0)$| with probability |$1$| (Angrist et al., 1996) to allow for possible randomization direct effects.
With a random sample of size |$n$|, suppose that a level-|${\rm a}lpha$| test |$(0<{\rm a}lpha<1)$| rejects |$H_0$| and accepts |$H_{\rm A}: \theta> 0$| when |$n^{1/2}|T_n-\mu(0)|/\hat\sigma_n>z_{1-\alpha/2}$|, where |$T_n$| is a regular estimator for some function |$\mu(\theta)$| of |$\theta$|, |$\hat\sigma_n^2$| is a consistent estimator for its asymptotic variance, and |$z_{1-\alpha/2}=\Phi^{-1}(1-\alpha/2)$| with |$\Phi(\cdot)$| denoting the standard normal cumulative distribution function. The asymptotic power of the test can be evaluated analytically under a sequence of contiguous alternatives approaching |$H_0$| at rate |$O(n^{-1/2})$|, say |$\theta=n^{-1/2}h$| for some |$h>0$|. Under this |$\theta$|, the power of the test tends to |$\Phi(\zeta h-z_{1-\alpha/2})$| for some |$\zeta>0$| as |$n\to\infty$| (see van der Vaart, 1998, Ch. 14). The quantity |$\zeta^2$|, commonly called the Pitman efficiency, is given by |$\zeta^2=\dot \mu(0)^2/\sigma_0^2$|, where |$\sigma_0^2$| is the asymptotic limit of |$\hat\sigma_n^2$| under |$H_0$| and |$\dot f(x)={\rm d} f(x)/{\rm d} x$| for any function |$f$|. The Pitman efficiency is inversely proportional to the sample size needed to achieve a given power and is used aptly to measure the asymptotic quality of the test.
Nonparametric intent-to-treat tests can often be formulated as contrasts of empirical distributions. Let |$\hat\omega_z(\cdot)$| denote the empirical analogue of |$\omega_z(\cdot\mid\theta)$||$(z=1, 0)$|. Then the Wilcoxon–Mann–Whitney and |$t$| test statistics can be written as |$T_{{\rm WMW},\,n}=\int \hat\omega_0(y)\hat\omega_1({\rm d} y)-2^{-1}$| (Mann & Whitney, 1947) and |$T_{t,\,n}=\int y\{\hat\omega_1({\rm d} y)-\hat\omega_0({\rm d} y)\}$|, respectively. The estimands of |$T_{{\rm WMW},\,n}$| and |$T_{t,\,n}$| are thus |$\mu_{\rm WMW}(\theta)=\int\omega_0(y\mid\theta)\omega_1({\rm d} y\mid\theta)-2^{-1}$| and |$\mu_t(\theta)=\int y\{\omega_1({\rm d} y\mid\theta)-\omega_0({\rm d} y\mid\theta)\} =-\int \{\omega_1(y\mid\theta)-\omega_0(y\mid\theta)\} \,{\rm d} y$|, respectively. Write |$\nabla\eta(\cdot)=\partial\eta(\cdot\mid\theta)/\partial\theta\big|_{\theta=0}$| for any distribution function |$\eta$| indexed by |$\theta$|. The following lemma, proved in the Appendix, provides general expressions for the Pitman efficiencies of the two tests.
2.2. The Wilcoxon–Mann–Whitney test versus the |$t$| test under additive effects
Simulations described in the Supplementary Material show that asymptotic power functions based on (4) approximate the actual power fairly accurately under finite samples.
Either under perfect compliance so that |$p_{\rm c}=1$|, or under uninformative noncompliance so that |$\nu_{\rm c}=\nu$|, the far right-hand side of (5) reduces to |$12V(\nu)\langle\dot\nu,\dot\nu\rangle^2$|, recovering the classical results for the two tests (see, e.g., van der Vaart, 1998, Example 14.13). In that sense, the impact of informative noncompliance shows in the difference between |${\mathcal R}(p_{\rm c},\nu)$| and |${\mathcal R}(1,\nu)=12V(\nu)\langle\dot\nu,\dot\nu\rangle^2$|, which, in turn because of (5), is determined by the difference between |$\langle\dot\nu,\dot\nu_{\rm c}\rangle$| and |$\langle\dot\nu,\dot\nu\rangle$|. The following proposition says that this difference depends crucially on how strongly being a complier is correlated with the height of the outcome density.
This means that the sign of |${\mathcal R}(p_{\rm c},\nu)-{\mathcal R}(1,\nu)$| coincides with that of |${\rm cov}\{\delta,\dot\nu(Y)\}$|. Hence, roughly speaking, noncompliance favours the Wilcoxon–Mann–Whitney test over the |$t$| test if the compliers are more likely to appear in high-density regions or, conversely, if noncompliers are more likely to appear in the tails. Consider a simple situation where |$\dot\nu$| is unimodal and symmetric about zero and |${\rm pr}(\delta=1\mid Y=y)$| is nonincreasing in |$|y|$|; then one can easily show that |${\rm cov}\{\delta,\dot\nu(Y)\}\geq 0$| so that |${\mathcal R}(p_{\rm c},\nu)\geq{\mathcal R}(1,\nu)$|.
The correspondence between the relative location of compliers/noncompliers and the relative efficiency of the two tests has an intuitive explanation rooted in classical theory, which tells us that the rank-based Wilcoxon–Mann–Whitney test is more robust against noise in the tails. In an intent-to-treat analysis, the compliers are the sole carrier of the treatment signal between the randomized groups, with the noncompliers contributing nothing but random noise. Hence, if the noise contributors tend to reside in the tails, their influence is better contained by the Wilcoxon–Mann–Whitney test than by the |$t$| test, allowing the former to pick out the signal more easily.
We can obtain bounds for |$\{\langle\dot\nu,\dot\nu_{\rm c}\rangle:\nu_{\rm c}\in{\mathcal P}(p_{\rm c}\mid\nu)\}$| using the following lemma, whose proof can be found in the Supplementary Material.
Replacing |$g$| in Lemma 2 with |$\dot\nu$|, we obtain the range of |$\langle\dot\nu,\dot\nu_{\rm c}\rangle$| and, by (5), that of |${\mathcal R}(p_{\rm c},\nu)$|.
Using (9) in conjunction with (8), we derive the range of |${\mathcal R}(p_{\rm c},\nu)$| as a function of the noncompliance rate |$1-p_{\rm c}$| for Gaussian, Laplace, logistic, uniform and |$t$| distributions under exclusion restriction, i.e., |$r=0$|. The results are plotted in Fig. 1, with the underlying analytical formulae given in the Supplementary Material. The dashed lines represent the asymptotic relative efficiency values under perfect compliance, which agree with the numbers tabulated in Table 14.12 of van der Vaart (1998). Not surprisingly, |${\mathcal R}(p_{\rm c},\nu)\equiv 1$| for the uniform distribution, whose constant density makes the upper and lower bounds the same. Otherwise, we find by numerical calculation that the lower bounds of |${\mathcal R}(p_{\rm c},\nu)$| under the Laplace, logistic, |$t_3$| and |$t_5$| distributions intersect with the unit horizontal line at |$1-p_{\rm c}=18.4\%, 8.3\%, 34.4\%$| and |$16.7\%$|, respectively. This means that the Wilcoxon–Mann–Whitney test is always more efficient than the |$t$| test for these distributions, as long as the noncompliance rate is below the corresponding threshold.
2.3. Practical considerations
The theoretical results in § 2.2 are practically instructive in two ways. First, we can use Theorem 1 to pick the more efficient test for analysing a current trial in the presence of noncompliance. In (5) with |$r=0$|, for example, |$p_{\rm c}$| can be estimated by the sample analogue of |$E(A=1\mid Z=1)-E(A=1\mid Z=0)$| (Angrist et al., 1996). Moreover, |$\dot\nu(\cdot)$| and |$\dot\nu_{\rm c}(\cdot)$| can be estimated by a kernel density estimator based on the randomized control group and one using the methods of Imbens & Rubin (1997), respectively. Plugging in these estimates, we obtain an estimate for |${\mathcal R}(p_{\rm c},\nu)$|, telling us which of the Wilcoxon–Mann–Whitney and |$t$| tests is likely to be more powerful. Intuitively, this data-dependent selection of test should not affect the Type I error rate as it does not use the part of the data that is informative about the treatment effect.
As a demonstration, the above procedure was run on data collected from 11 204 participants in the well-known U.S. National Job Training Partnership Act Study, a randomized trial conducted between 1987 and 1989 to assess the effect of a job-training programme on the trainee’s subsequent earnings (Abadie et al., 2002). The estimated compliance rate is |$p_{\rm c}=62.7\%$|. As shown in the Supplementary Material, both |$\dot\nu$| and |$\dot\nu_{\rm c}$| appear heavily bimodal. Using these density estimates, we find that |${\mathcal R}(p_{\rm c},\nu)\approx 4.9$|, suggesting that the intent-to-treat Wilcoxon–Mann–Whitney test is likely much more powerful than the |$t$| test. This is substantiated by the actual test results, with the former producing a |$p$|-value of |$0.001$| and the latter |$0.023$|.
3. Extensions and discussions
3.1. Inhomogeneous effects
Taking |$\kappa(y)\equiv 1$| reduces (10) to (4) of Theorem 1. Although the Pitman efficiencies under nonconstant |$\kappa(\cdot)$| appear a bit more complex, one can nonetheless use Lemma 2 to derive bounds for them. The case with multiplicative effects is considered in the Supplementary Material.
3.2. General intent-to-treat tests
The intent-to-treat Wilcoxon–Mann–Whitney and |$t$| tests can both be formulated as contrasts between the empirical distributions in the randomized groups. As seen from the main steps in § 2, what determines their respective asymptotic properties is the functional derivative, or more precisely the Hadamard derivative (van der Vaart, 1998, Ch. 20), of the contrast in question. This helps us generalize the previous results to a broad class of nonparametric intent-to-treat tests. The following proposition is proved in the Supplementary Material.
In (11), the expressions for |$\nabla\omega_1-\nabla\omega_0$| and |$\dot{\mathcal H}_\nu$| are determined by the contiguous alternatives and the specific test, respectively. To recover the results of Lemma 1 for the Wilcoxon–Mann–Whitney and |$t$| tests under location-shift alternatives, simply plug in |$\dot{\mathcal H}_\nu\{\psi(\cdot)\}=-\int\psi(y)\dot\nu(y)\,{\rm d} y$| and |$-\int\psi(y) \,{\rm d} y$|, respectively, along with |$\nabla\omega_1-\nabla\omega_0=-\{p_{\rm c}\dot\nu_{\rm c}(y)+r\dot\nu(y)\}$|. For a new example, consider the |$\tau$|-quantile test based on |$\mathcal H(\hat\omega_1, \hat\omega_0)=\hat\omega_1^{-1}(\tau)-\hat\omega_0^{-1}(\tau)$||$(0<\tau<1)$|. The Hadamard derivative of this |$\mathcal H$| is |$\dot{\mathcal H}_\nu\{\psi(\cdot)\}=-\dot\nu(z_{\nu,\,\tau})^{-1}\psi(z_{\nu,\,\tau})$|, where |$z_{\nu,\,\tau}=\nu^{-1}(\tau)$| (e.g., van der Vaart, 1998, Lemma 21.3). A straightforward derivation in the Supplementary Material shows that the Pitman efficiency under the location-shift alternatives is |$\zeta_{\tau}^2=q(1-q)\tau^{-1}(1-\tau)^{-1}\left\{p_{\rm c}\dot\nu_{\rm c}(z_{\nu,\,\tau})+r\dot\nu(z_{\nu,\,\tau})\right\}^2$|. As one might have guessed, the |$\tau$|-quantile test is powerful when the neighbourhood of the corresponding quantile is densely populated by compliers, driving up the value of |$\dot\nu_{\rm c}(z_{\nu,\,\tau})$|. Formal analysis and comparison with the Wilcoxon–Mann–Whitney and |$t$| tests is left to the interested reader.
Acknowledgement
This research was supported by the U.S. National Science Foundation and National Institutes of Health. The author very much appreciates the helpful comments from the editor, associate editor and two referees.
Supplementary material
Supplementary Material includes technical and numerical results.