Testing for independence in arbitrary distributions

SUMMARY Statistics are proposed for testing the hypothesis that arbitrary random variables are mutually independent. The tests are consistent and well behaved for any marginal distributions; they can be used, for example, for contingency tables which are sparse or whose dimension depends on the sample size, as well as for mixed data. No regularity conditions, data jittering, or binning mechanisms are required. The statistics are rank-based functionals of Cramer–von Mises type whose asymptotic behaviour derives from the empirical multilinear copula process. Approximate $p$-values are computed using a wild bootstrap. The procedures are simple to implement and computationally efficient, and maintain their level well in moderate to large samples. Simulations suggest that the tests are robust with respect to the number of ties in the data, can easily detect a broad range of alternatives, and outperform existing procedures in many settings. Additional insight into their performance is provided through asymptotic local power calculations under contiguous alternatives. The procedures are illustrated on traumatic brain injury data.


Introduction
The problem of testing the null hypothesis of mutual independence between the components of a random vector is old but important. Investigating relationships between variables is central to many scientific studies, and tests of independence typically precede any attempt at modelling association. The relevant literature is too vast to review here, but we observe that consistent tests are usually designed for cases where the variables are all of the same type. 48 C. Genest, J. G. Nešlehová , B. Rémillard AND O. A. Murphy Independence between categorical outcomes is often tested by classical tests such as Pearson's χ 2 or the likelihood ratio statistic (Agresti, 2013). When all variables are continuous, consistent tests can be based on empirical distribution functions (Blum et al., 1961;Deheuvels, 1980;Genest & Rémillard, 2004) or characteristic functions (Feuerverger, 1993;Meintanis & Iliopoulos, 2008;Fan et al., 2017); see, e.g., Rémillard (2011) for an overview and additional references. In practice, however, multivariate data often consist of a mix of discrete and continuous variables, and ties can occur even among the latter, for example when the variables are measured with limited precision. Often the data are divided into response variables and covariates, but before a model is developed, the issue of which variables are dependent or not should be investigated.
As a case in point, Gauthier et al. (2018) describe a study of cognitive communication impairment after traumatic brain injury in which various language tests were administered, before hospital discharge, to adults with this condition. One of their objectives was to explore connections between these tests in relation to the patient's age, years of education, and severity of injury on the Glasgow Coma Scale. Another goal was to assess the tests' ability to predict the patient's degree of disability upon discharge. The data, described in greater detail in § 3.3, are a mix of ordinal, discrete and continuous variables. Before attempting to model relations between the variables, it is of interest to explore potential associations using an omnibus procedure that can handle various types of data simultaneously and can be adjusted for multiple testing.
In this paper, we propose model-free tests of mutual independence that are applicable to any collection of variables measured at least on an ordinal scale. Some of the variables could be discrete while others are continuous, and still others might have distributions that are mixtures of the two types. The tests, which are consistent and powerful, do not involve any data binning or jittering. They remain valid even when the data are categorical and lead to sparse contingency tables or to tables whose dimension grows with the sample size.

Motivation and description of the basic test statistic
2.1. Motivational background and notation Let (X 11 , . . . , X 1d ), . . . , (X n1 , . . . , X nd ) be a random sample from a vector (X 1 , . . . , X d ) with distribution function H , and for each k ∈ {1, . . . , d} let F k denote the distribution function of X k . By Sklar's theorem (Nelsen, 2006), there exists at least one distribution function C with standard uniform margins, i.e., a copula, such that H (x 1 , . . . , x d ) = C{F 1 (x 1 ), . . . , F d (x d )}, x 1 , . . . , x d ∈ R. (1) When F 1 , . . . , F d are continuous, the joint distribution function of F 1 (X 1 ), . . . , F d (X d ) is the only choice for C. Thus the null hypothesis H 0 : X 1 ⊥ ⊥ · · · ⊥ ⊥ X d holds if and only if C is the independence copula given by (u 1 , . . . , u d ) = u 1 × · · · × u d for u 1 , . . . , u d ∈ [0, 1]. A simple yet powerful rank-based test of H 0 whose null asymptotic distribution is margin-free can then be based on the statistic where · 2 denotes the L 2 -norm and C n is the empirical copula given by Samples of size 1000 from (X 1 , X 2 ) with X 1 ∼ Po(2) and X 2 ∼ NeBi(5, 0.5) (left column) when X 1 and X 2 are independent (top) or dependent (bottom); the pairs (U 1 , U 2 ) from the checkerboard copula C are plotted in the right column.
in terms of the componentwise empirical distribution functions F n1 , . . . , F nd . Here and throughout this paper, 1(B) stands for the indicator function of the set B. The statistic (2) was originally proposed by Deheuvels (1980); for details on the power, the asymptotic efficiency, and refinements of this test, see Genest & Rémillard (2004) and Genest et al. (2006, respectively. When at least one of F 1 , . . . , F d is discontinuous, the copula C in Sklar's decomposition is unique only on ran(F 1 ) ×· · ·×ran(F d ), but there exists an extension C on [0, 1] d that coincides with if and only if H 0 holds (Genest & Nešlehová, 2007;Nešlehová, 2007). This checkerboard or multilinear extension copula, which can be traced back to Schweizer & Sklar (1974) and Moore & Spruill (1975), may be defined as follows based on Proposition 1 in Genest et al. (2017). In what follows, the notation F(x − ) refers to the left limit of a univariate distribution function F at Definition 1. Let (X 1 , . . . , X d ) be a random vector with joint distribution H and univariate margins F 1 , . . . , F d . Also, let ( 1 , . . . , d ) be a random vector with joint distribution independent of (X 1 , . . . , X d ). The checkerboard copula C of H is the distribution function of the vector Figure 1 illustrates why C is useful as a tool for detecting dependence. The left column shows samples of size 1000 from (X 1 , X 2 ) when X 1 is Poisson with mean 2 and X 2 is negative binomial with size 5 and p = 0.5; in the top panel the variables are independent, while in the 50 C. Genest, J. G. Nešlehová , B. Rémillard AND O. A. Murphy bottom panel they are linked via a Gaussian copula with correlation 0.5. The relationship between X 1 and X 2 is hard to assess. The right column of Fig. 1 displays samples from C obtained by transforming the original pairs (X 1 , X 2 ) as described in Definition 1, using the known forms of F 1 and F 2 and additional independent samples of the same size from the randomizers 1 and 2 . The dependence, if any, is then manifest. Of course, data jittering cannot be recommended for testing independence per se, as the added noise could decrease efficiency and will not lead to reproducible results in any case. Fortunately, an estimator C n of C can be derived from data without involving randomization, and a global test of independence can be based on an analogue of (2) defined by where G n = n 1/2 (C n − ). We explain below how the empirical checkerboard copula C n is constructed and can be used to carry out this test or design even better tests of independence.

Empirical checkerboard copula
To construct an estimator C n of C that relies neither on any knowledge of the margins nor on any form of randomization or data jittering, Genest et al. (2014Genest et al. ( , 2017 used the fact that the empirical analogue H n of H based on a random sample from (X 1 , . . . , X d ) is a bona fide distribution function with discrete margins. The estimator C n is then simply the checkerboard copula of H n introduced in Definition 1. From Proposition 2 of Genest et al. (2017), where, for any x ∈ R, u ∈ [0, 1] and univariate distribution function F, As shown in Proposition 3 of Genest et al. (2017), C n admits a density on (0, 1) d which is given below in terms of the probability mass function h n of H n . For any fixed j ∈ {1, . . . , d}, let y 1j < · · · < y n j j be the ordered distinct values of {X 1j , . . . , X nj }, with y 0j = y 1j − 1, and for each i ∈ {1, . . . , n j } set f ij = F nj (y ij ). The Lebesgue density of C n is then whenever, for j ∈ {1, . . . , d}, F nj (y k j −1j ) < u j F nj (y k j j ) for some k j ∈ {1, . . . , n j }. Moreover, note that when H has continuous margins, C n is asymptotically equivalent to the classical empirical copula C n , as explained in Remark 2 of Genest et al. (2017).
Testing for independence 51 2.3. Computation of S n and link with classical statistics An explicit expression for S n can be derived from (4). By equation (A.1) in Genest et al. (2017) one has, for all k ∈ {1, . . . , d} and u ∈ [0, 1], n −1 {V F nk (X 1k , u) + · · · + V F nk (X nk , u)} = u. For k ∈ {1, . . . , d}, also let I k be an n × n symmetric matrix with (i, j) entry whose ith row, and hence also the ith column average, is given by Then In the special case of d = 2, this formula reduces to because (I 1·k +· · ·+I n·k )/n = 1/3. This can be contrasted with the standard χ 2 and likelihood ratio statistics for an n 1 × n 2 contingency table with observed relative frequency f k 1 k 2 = h n (y k 1 1 , y k 2 2 ) in cell (k 1 , k 2 ), which, as noted by Genest et al. (2014), are given by Thus, while χ 2 n and L n are based on c n , which consists of cell-by-cell ratios of observed to expected frequencies, S n and its variants described below are functionals of the cumulative distribution function C n , which also exploits the natural ordering in the values of the variables. As will be seen in § 4, this approach leads to a more powerful test in cases that the χ 2 n and L n statistics were designed to handle. More importantly still, the asymptotic null distribution of C n exists whatever the marginal distributions of X 1 , . . . , X d may be. The new procedures can therefore be used to test for independence between any set of discrete, continuous or mixed variables, even in cases where the traditional statistics are known to perform poorly (Agresti, 2013).

Limiting null distribution of S n and computation of critical values
It follows from the work of Genest et al. (2017) that a test of H 0 based on S n is consistent and that the limiting null distribution of S n exists. This is stated below and proved in the Appendix.
Theorem 1. If C | = , then for any M > 0, pr(S n > M ) → 1 as n → ∞. If C = , then as n → ∞, S n → S = G 2 2 in law, where G is a centred Gaussian process given in (A2).
As the limiting process G is intricate, the distribution of S is unwieldy. More importantly, it depends on the unknown margins F 1 , . . . , F d unless they are continuous. To carry out the test based on S n , one must therefore resort to resampling. The wild bootstrap approximation of the limiting process G proposed by Genest et al. (2017) can be used to this end. It extends the work of Rémillard & Scaillet (2009) and Bücher & Dette (2010) beyond the case of continuous margins.
Step 2. For any u 1 , . . . , u d ∈ [0, 1] and j ∈ {1, . . . , d}, let Step 3. Set S nb = G nb 2 2 , which can be explicitly computed as where J ij , which does not depend on b, is given by Once the B bootstrap replicates have been computed, is an approximate p-value of the test of H 0 based on S n .
The validity of the above wild bootstrap approximation relies on Proposition 8 of Genest et al. (2017), as detailed in the Appendix, where the following result is also proved.
3. Alternative procedures based on the Möbius decomposition 3.1. Möbius decomposition Based on a decomposition of the empirical copula process C n derived from the Möbius combinatorial formula (Spitzer, 1974), Deheuvels (1981) and Genest & Rémillard (2004) proposed and studied alternative tests of H 0 which are, in the continuous case, more powerful than the test based on S n . A similar approach is developed below. For which depends only on u k with k ∈ A. When C = , one has which is called the Möbius decomposition of the process G n . The usefulness of the latter for testing independence can be established as in Ghoudi et al. (2001). To see how, consider a vector When d = 2, A d contains only the set A = {1, 2} and μ A ≡ 0 is equivalent to independence between X 1 and X 2 . When d > 2, mutual independence between the variables {X k : k ∈ A} for a given set A continues to imply that μ A ≡ 0, but not the other way around. However, Proposition 2.1 of Ghoudi et al. (2001) states that C = if and only if μ A ≡ 0 for all A ∈ A d .

Computation of test statistics and an asymptotic approximation of their critical values
For any A ∈ A d , a test of the null hypothesis H A : μ A ≡ 0 can be based on the statistic where the same notation as in (5) and (6) is used. The following result, whose proof is given in the Appendix, explores the asymptotic properties of the statistics S An .
2 with G being the centred Gaussian process defined in (A2) and M A the operator defined in (A6).
As was the case for S, the distribution of S A is unwieldy and depends on the unknown margins F 1 , . . . , F d unless all of them are continuous. To carry out a test based on the statistics S An , therefore, one must again rely on resampling, as described below. Algorithm 2. Fix a large integer B, and for each b ∈ {1, . . . , B} proceed as follows: Step 1. Generate a random sample ξ 1b , . . . , ξ nb from some distribution with mean 0, variance 1, and fourth moment at most 3, e.g., N (0, 1). Then setξ b = (ξ 1b + · · · + ξ nb )/n.
Step 2. For every A ∈ A d , let Step 3. For every A ∈ A d , set S Anb = G Anb 2 2 , which can be explicitly computed as Once the B bootstrap replicates have been computed, one has that for any A ∈ A d , The validity of this algorithm is proved in the Appendix, along with the following result. This result can be exploited to derive tests of H 0 that are both easier to compute than S n and, as we shall see, more powerful. Following , we propose the Fisher statistic The resulting test is then consistent by Proposition 2. From the work of Littell & Folks (1973), T n is asymptotically Bahadur-optimal and one can reject H 0 at asymptotic level α if T n exceeds the upper αth quantile of the χ 2 distribution with 2(2 d − d − 1) degrees of freedom.
As the computation of T n may become prohibitive for large d, one could also compute When the hypothesis of mutual independence is rejected based on S n , T n or T nK , it may be of interest to determine which subsets of the variables led to this conclusion. By analogy with the correlogram, one could construct and plot a dependogram, as proposed by Genest & Rémillard (2004). In it the p-values p An are displayed together with a horizontal line at level α. This way, one can identify the subsets of variables that are significantly dependent at that level. However, as discussed in § 3.1, the fact that p An > α for some A does not imply that all variables in that subset are independent unless |A| = 2; all subsets of A of size 2 and above must also be checked. Gauthier et al. (2018) investigated communication impairment following a traumatic brain injury. Of interest is the impact of the severity of a patient's injury on performance in a number of language tests administered during the hospital stay, as well as the results of these tests as an indicator of the level of disability upon discharge, as measured on the Glasgow Outcome Scale-Extended. The severity of the injury was measured on the 13-point Glasgow Coma Scale. The language tests considered here are listed in the caption of Fig. 2 and detailed in the Supplementary Material. The clinicians suspect that not all tests are equally informative; also of interest to them is the effect of the patient's age and years of education. Complete data were available for 135 adult patients with traumatic brain injury.

Data illustration
Based on the statistics S n and T n3 with B = 1000 multiplier bootstrap replicates, the null hypothesis of mutual independence between these 11 variables was rejected at the 0.1% level. This result is neither surprising nor instructive. Because d = 11, A d contains 2036 sets and the dependogram is unmanageable. However, the statistics S nA with |A| = 2 and their p-values can be used to explore pairwise dependence. The results can then be visualized with a graph, as in Fig. 2, where each node represents a variable. An edge joins two nodes if and only if the association between the two corresponding variables is significant at the 5% level. In panel (a), the p-values are raw, i.e., unadjusted for multiple testing, while in panel (b), the adjustment of Benjamini & Hochberg (1995) was used to control the false discovery rate.
Focusing on Fig. 2(b), one can see that the seven language tests are mutually dependent. However, they are not equally indicative of disability status at discharge; nor are they influenced by age, education and severity of injury in the same way.Age affects the performance in confrontation naming as measured by the Boston Naming Test, literal and categorial fluency, and speed of reading. This is in accordance with the conclusion of Gauthier et al. (2018) and references therein that older patients perform these tasks with greater difficulty. Age is also related to injury severity, which may be explained by the cause of injury. The fact that education is related only to some of the tests is also documented but not yet elucidated from a clinical point of view; here, it is associated only with the Boston Naming Test after adjusting for the false discovery rate. More importantly, only the Protocole Montréal is associated with the severity of the injury, while the degree of disability at discharge is related to performance in the Protocole Montréal, the Detroit Test of Learning Aptitude, and the Literal Fluency test. These findings vindicate the use of the Protocole Montréal, which is easily administered at bedside, and may help clinicians to make a selection among the above tests, which was one of the study objectives.

Experimental design
The good performance of the tests based on S n and T n has already been documented for copula models with continuous margins (Genest & Rémillard, 2004;Genest et al., 2006. The case of two-way contingency tables was discussed in Genest et al. (2014), where the test based on S n was often found to be significantly more powerful than the χ 2 test, even when the critical values of the latter are obtained by Monte Carlo to compensate for its well-known liberality. The Supplementary Material includes additional simulation results that show the good behaviour of S n as a test of independence on contingency tables, when compared to the χ 2 , G 2 and Zelterman statistics, as well as to the test of Bakirov et al. (2006).
In this section, we study the tests based on S n and T n when some margins are discrete and others are continuous or mixed. In all cases, N = 1000 simulation runs were conducted and the nominal level of the tests was α = 5%. The number of bootstrap replicates was B = 1000 throughout.
4.2. The d = 2 case When d = 2, S n = T n . Although the statistic is rank-based, its limiting distribution in Theorem 1 depends on the marginal distributions if the latter are discontinuous (Genest et al., 2017). The behaviour of the test based on S n is therefore expected to depend on the marginals, the sample size n, and the nature of dependence. Five marginals were considered in the study: a Poisson with mean 1, which produces a large number of ties; a Poisson with mean 20, which is discrete but produces fewer ties than a Poisson with mean 1; a rounded Pareto with survival function 1/(k + 1) 1/3 for k ∈ {0, 1, . . . }, which is discrete and heavy-tailed and has no expectation; a standard Cauchy, which is continuous and heavy-tailed and also has no expectation; and a Student t with three degrees of freedom, having an atom at 0 with mass 0.05, which is a mixture of a continuous and a discrete distribution; its mean is zero but its variance does not exist.
We first investigate the level of the test based on S n when the margins are identical. Table 1 reports the estimated size of the test and the system time when run on a Thinkpad W530 using Matlab parallel processing with six processors. The nominal level is well maintained when n > 50. The test is slightly liberal in small samples, but this issue can be resolved by computing the p-values using the permutation approach. However, the solution becomes computationally prohibitive as n increases. This motivates the use of the wild bootstrap proposed here.
In the bivariate case, the performance of S n could be compared with many alternative procedures. We limited ourselves to tests that can handle any type of marginal distributions and therefore considered: a test of Hoeffding (1948) as implemented in the hoeffd procedure of the R (R Development Core Team, 2019) package Hmisc; a test of Székely et al. (2007) as implemented in the indep.test procedure with method=''dcov'' in the R package energy; a test based on an extension of Spearman's ρ due to Genest et al. (2013); and a test of Genest & Rémillard (2004) applied after breaking the tied ranks at random. The test of Heller et al. (2016) was found to be too slow to run for a large-scale simulation study.
As reported in the Supplementary Material, all tests hold their nominal level reasonably well for all combinations of margins when n ∈ {100, 250}, except for the test of Hoeffding (1948), which is conservative in the presence of many ties. This issue with Hoeffding's test persists even when n = 250. As N = 1000, the margin of error is about 2%.
For power comparisons, we first considered all combinations of margins and various copulas in (1) with different degrees of dependence as measured by Kendall's τ of the underlying copula. This way, the effect of the marginal distributions and the overall association can be studied separately. As expected, the power of all tests increases with n and τ . In Table 2, we therefore only report the results for n = 100 and τ = 0.1 when the copula in (1) is Clayton, Gaussian, Cauchy or Student t 3 ; see, e.g., McNeil et al. (2015) for definitions of these families of copulas. Results for other choices of n, τ , and copula alternatives are in the Supplementary Material.
The power of the test based on S n depends on the copula, but only slightly. More interestingly, the power is fairly stable across all choices of margins, regardless of whether they are discrete, continuous, mixed, or heavy-tailed. It is mildly negatively affected by the large number of ties in at least one margin. Data jittering cannot be recommended because it systematically reduces power and is too costly if repeated over a large number of jitters to ensure reproducibility.
Compared to S n , Hoeffding's test is more dependent on the choice of margins and performs poorly when there are many ties in at least one margin. When there are few or no ties, it performs similarly to the test based on S n , and does better when the copula is Cauchy.  The test of Genest et al. (2013) based on Spearman's ρ has stable power across margins. It is often slightly more powerful than the test based on S n for the copulas in Table 2, but it is not consistent and will perform poorly if Spearman's ρ of the joint distribution is zero or close to it. An example is the tent map copula for which ρ = 0, as detailed in the Supplementary Material.
The test of Székely et al. (2007) is much more sensitive to the choice of copula and margins, regardless of the proportion of ties. When the copula is Clayton or Gaussian, the lack of existence of the mean for Cauchy or rounded Pareto margins leads to a considerable loss of power. When the copula is Cauchy or Student t 3 , however, this test does much better than that based on S n , and the nonexistence of the mean appears not to be much of an issue. This is possibly because the Cauchy and Student t 3 margins do not reduce to when τ = 0. Instead, when τ = 0, samples  from these copulas are often clustered along both diagonals of the unit square. This phenomenon, sometimes referred to as arachnitude, is illustrated in Fig. 3(a). Finally, following Bakirov et al. (2006), we considered regression alternatives of the form X 2 = X 1 Z with X 1 either being N (0, 1) or having one of the five previously mentioned margins. The error Z was taken to be either N (0, 1) or centred binomial, namely Z + 5 ∼ Bi(10, 0.5). The law of X 2 and its dependence on X 1 are induced by the distributions of X 1 and Z; samples from (X 1 , X 2 ) are shown in the Supplementary Material. From Table 3 it can be seen that the test of Székely et al. (2007) does much better than the test based on S n when n = 100 and X 1 is normal, Cauchy, or t 3 with an atom at 0. As illustrated in Fig. 3(b), the underlying dependence again exhibits arachnitude. These regression alternatives are also examples of situations where the test of Genest et al. (2013) performs very poorly because ρ(X 1 , X 2 ) = 0.

The d > 2 case
When d > 2, the tests based on S n and T n differ. As it is computationally infeasible to consider all possible combinations of the margins described in § 4.2, all margins were taken to be identical.  15.0 7.7 8.6 5.2 6.7 7.2 8.0 4.9 6.9 5.7 8.7 5.6 4.9 5.7 5.8 4.5 P20 9.4 4.3 8.6 5.2 5.2 5.6 7.5 5.6 4.2 3.9 8.0 4.9 3.5 4.9 5.  The first observation to be made is that the tests based on S n and T n are again liberal in small samples; this issue worsens as d increases. Table 4 reports the levels of both tests when d = 5. As in the bivariate case, one can use the permutation approach when n is small at the cost of increased computational burden; the system times reported in Table 4 were obtained using the same personal laptop computer as in § 4.2. The overall recommendation based on this table is that when n > 75, the multiplier method performs well and is preferred.
In the power study, we considered the tests based on S n and T n as well as the test based on T n when the cardinality of the subsets is bounded above by K = 2. Furthermore, we ran these tests on a jittered sample obtained by breaking the tied ranks at random. As competitors, we used the tests based on ρ n1 and W n from Genest et al. (2013), which are multivariate generalizations of the tests of bivariate independence based on Spearman's ρ. The statistic W n is constructed in a similar manner to T n by combining p-values of tied-corrected statistics stemming from a decomposition of Spearman's ρ using the Möbius combinatorial formula.
The observed power when n = 100 is summarized in Table 5 for d = 5 and four exchangeable copulas whose pairwise value of Kendall's τ is 0.1. These copulas are extensions of those considered in Table 2. Results for other values of n and τ , d = 3, and the Frank, Gumbel, Farlie-Gumbel-Morgenstern and tent map copulas are in the Supplementary Material.
As in the d = 2 case, the tests based on S n and T n perform similarly across margins; their power can be slightly affected when there are many ties. Nor is jittering to be recommended when d > 2 either. For the alternatives in Table 5, the power of the tests increases with dimension d because all subsets of variables of size at least 2 contribute evidence against H 0 . In contrast, the power drops with d for Farlie-Gumbel-Morgenstern copulas, as only the full set exhibits dependence.
Which one of the tests based on S n and T n is superior depends on the copula. For example, the test based on S n is preferable when the copula is Clayton, Gaussian or Student t 3 , whereas with few exceptions the test based on T n has higher power for Cauchy, Frank or Gumbel alternatives.
Truncating the statistic T n by considering subsets of size K = 2 generally leads to a rise in power. However, this test performs poorly when the variables are pairwise but not mutually independent; the Farlie-Gumbel-Morgenstern alternatives are again a case in point.
The tests of Genest et al. (2013) based on Spearman's ρ are generally, but not always, more powerful than those based on S n and T n . However, this is not the case when ρ ≈ 0. An illustration is again provided by the tent map copula, for which both tests of Genest et al. (2013) perform very poorly, in stark contrast to those based on S n and T n .

Contiguous alternatives
As a complement to the numerical results presented in § 4, we now look at the limiting distribution of the stochastic processes G n and G An for all A ∈ A d , under a sequence of contiguous alternatives. To define the latter, we will proceed as in . Let ⊂ R be an open interval, and let {C θ : θ ∈ } be a family of copulas such that for a unique θ 0 ∈ , C θ 0 = . Next, assume that the following regularity conditions hold.
When Conditions 1 and 2 hold, Q n is contiguous with respect to P n . The regularity conditions are the same as in  and do not concern the marginal distributions. As detailed in the Appendix, the following result can then be derived from LeCam's Third Lemma (van der Vaart & Wellner, 1996). In what follows, C([0, 1] d ) denotes the space of continuous real-valued functions defined on [0, 1] d endowed with the supremum norm · .
Theorem 3. Let C = {C θ : θ ∈ } be a family of copulas such that C θ 0 = for a unique θ 0 ∈ , and suppose that Conditions 1 and 2 hold. Then under Q n , as n → ∞, the empirical process G n converges weakly in C ([0, 1] Furthermore, under Q n , the processes G An with A ∈ A d jointly converge weakly in C ([0, 1] Formulas for q A are given in § 3 of  for various families of multivariate copulas. In the exchangeable or equicorrelated Gaussian case, for instance, one has where denotes the N (0, 1) cumulative distribution function and ϕ the corresponding density. Thus, within that class of copulas, only the pairwise interactions are significant in a neighbourhood of independence. The same phenomenon occurs in the Clayton family of copulas, which plays a key role in survival analysis (Clayton, 1978). For this model, one has This explains why in Table 5, the tests based on the truncated statistics are significantly more powerful than the corresponding tests based on the nontruncated versions of these two classes of alternatives. In contrast, for the Frank and Gumbel families of Archimedean copulas, one has that for all (u 1 , . . . , u d ) ∈ (0, 1) d , respectively. In these models, therefore, it is essential to look at all possible subsets of variables. As |A| increases, the terms in the expression for q A decrease in absolute value for the Frank family but not for the Gumbel family. This explains why, as reported in the Supplementary Material, the tests based on the truncated statistics have comparable, and sometimes even lower, power than the nontruncated versions of the Frank and Gumbel alternatives. Finally, for the Farlie-Gumbel-Morgenstern family of copulas, one finds that So unless the full set is included in the statistic, one cannot expect any test to have significant power for these alternatives, as already noted in § 4.3. Under the conditions of Theorem 3, it easily follows that under Q n , as n → ∞, S n and the collection of (S An : A ∈ A d ) converge jointly in law to G+δM F (Ċ θ 0 ) 2 2 and ( G A +δM F (q A ) 2 2 : A ∈ A d ), respectively. In principle this result can be used to compute the asymptotic local power of the tests considered here. To do this in full generality is a much more complicated endeavour than in Genest et al. (2006, however, because the limiting distributions depend on the marginals if the latter are discontinuous. Therefore, here we only consider the local power of the test based on S n when d = 2 with binomial margins. The binomial distribution Bi(p, N ) converges to a continuous distribution as N → ∞ while having a finite support, which makes calculations easier.
Example 1. Suppose d = 2 and that for each j ∈ {1, 2}, F j is binomial Bi(p j , N j ). From (A4), note that under Q n , G n → M F (B δ ) weakly, where B δ = B + δĊ θ 0 with B given by (A3). Consequently, under Q n , S n → M F (B δ ) 2 2 in law. The limit can be evaluated explicitly, where, for any k 1 , k 2 ∈ N, k 1 ,k 2 = B δ {F 1 (k 1 ), F 2 (k 2 )}. Furthermore, as is well known, the process B admits the Karhunen-Loève expansion 1 1 2 π 2 sin( 1 πu 1 ) sin( 2 πu 2 )Z 1 , 2 , where {Z 1 , 2 : 1 , 2 ∈ N} are mutually independent N (0, 1) random variables (Shorack & Wellner, 1984, p. 213). Using this expansion, the asymptotic critical values c F (α) and the asymptotic local power π(δ) = pr{ M F (B δ ) 2 2 > c F (α)} can be determined by simulation. Table 6 shows the results for 1000 Monte Carlo repetitions when 1 and 2 are truncated at 50 in (8) and the marginal parameters are p 1 = p 2 = 0.5 and N 1 , N 2 ∈ {2, 5, 50}. As alternatives, we chose the Clayton, Gaussian and Farlie-Gumbel-Morgenstern copula families parameterized by Kendall's τ to ensure comparability. The results largely confirm the findings from the simulation study in § 4.2 reported in Table 2. The power rises with δ, as expected; more importantly, it is fairly homogeneous across the three alternatives.Also visible is the phenomenon of the asymptotic power decreasing with the degree of discreteness; it is lowest when N 1 = N 2 = 2. The loss of power is, however, rather small. Therefore, we can conclude that the test based on S n is very robust with respect to the choice of margins and successfully corrects for ties in the data.  Table 6. Asymptotic local power of the test based on S n when d = 2 for three local alternatives; the results are based on 1000 Monte Carlo samples and 1 , 2 50 in (8) Asymptotic behaviour of the empirical multilinear process For convenience, we provide a combined restatement of Theorem 1 and Corollary 1 from Genest et al. (2017) concerning the limiting behaviour of C n = n 1/2 (C n − C ). Let B * be a Brownian bridge, i.e., a continuous centred Gaussian process on [0, 1] d such that for all u 1 , . . . , Further, set C * = M F (B * ) where M F is as in (A1). Then C * is a continuous centred Gaussian process. For any v ∈ R, let ι j (v) be the vector in R d whose jth component is v and whose other components all equal 1. Also let C([0, 1] d ) be the space of continuous functions f : [0, 1] d → R endowed with the supremum norm · .
Theorem A1. The sequence C n is tight. If in addition C = , then as n → ∞, G n = C n → G weakly in C([0, 1] d ), where for any u 1 , . . . , u d ∈ [0, 1], The process G in (A2) can also be conveniently expressed in terms of the process B given by which is the limiting process G in Theorem A1 when the margins F 1 , . . . , F d are continuous. To this end, note that for each j ∈ {1, . . . , d} and u j ∈ [0, 1], . Substituting in (A2) and using the multinomial formula and the fact that M F is a linear operator, one obtains Proof of Theorem 1 If C n = n 1/2 (C n − C ), then G n = C n + n 1/2 (C − ) and Given that C n is tight by Theorem A1, it follows that for any M > 0, pr(S n > M ) → 1 as n → ∞ whenever C | = . Therefore, the test based on S n is consistent. Suppose now that C = . From Theorem A1, as n → ∞, G n → G weakly in C([0, 1] d ). The asymptotic distribution of S n then follows because the map f → f 2 2 is continuous on C([0, 1] d ).
Proof of Theorem 2 Following Genest & Rémillard (2004), first define, for A ∈ A d and f ∈ C([0, 1] d ), where u B j = u j if j ∈ B and 1 otherwise. It follows from the multinomial formula that if f (u 1 , . . . , u d ) = f 1 (u 1 ) · · · f d (u d ) for all u 1 , . . . , u d ∈ [0, 1], then Consequently, for any A ∈ A d , one has M A ( ) ≡ 0 and μ A = M A (C ) as defined in (7). Thus, G n = C n + n 1/2 (C − ) and G An = M A (G n ) = M A (C n ) + n 1/2 μ A . For any A ∈ A d , (A6) implies that If μ A | ≡ 0, then for any M > 0, pr(S An > M ) → 1 as n → ∞, because C n is tight by Theorem A1. Now suppose that C = . Then G An = M A (C n ) and, since the map f → M A (f ) is continuous on C([0, 1] d ), it follows from Theorem A1 that the processes (G An : A ∈ A d ) jointly converge weakly to G A = M A (G) = M A (C * ), using (A7) and representation (A2). Next, since C * = M F (B * ), it follows easily that G A = M F (B * A ), where B * A = M A (B * ). Now it is well known that (B * A : A ∈ A d ) is a collection of mutually independent continuous centred Gaussian processes; see, e.g., Ghoudi et al. (2001). Their covariance function is given, for A, A ∈ A d and u 1 , . . . The processes (G A : A ∈ A d ) are thus mutually independent; their covariance is given in the Supplementary Material. The claim now follows from the continuity of the map f → f 2 2 on C([0, 1] d ).

Proof of Proposition 2
For any A ∈ A d and b ∈ {1, . . . , B}, G Anb = M A (G nb ) with M A as defined in (A6). First suppose C = . Given that G n , G n1 , . . . , G nB jointly converge weakly to mutually independent copies of G, as shown in the proof of Proposition 1, the continuity of M A implies that as n → ∞, (G An : A ∈ A d ), (G An1 : A ∈ A d ), . . . , (G AnB : A ∈ A d ) jointly converge weakly to mutually independent copies of (G A : A ∈ A d ) in C([0, 1] d ). As discussed in the proof of Theorem 2, the processes G A , A ∈ A d , are mutually independent. Hence, as n → ∞, {(S An , S An1 , . . . , S AnB ) : A ∈ A d } jointly converge weakly to {(S A , S A1 , . . . , S AB ) : A ∈ A d }, where for each A ∈ A d , S A , S A1 , . . . , S AB are mutually independent copies of G A 2 2 . Along the same lines as the proof of Proposition 1, one finds that (p AnB : A ∈ A d ) converge jointly in law to mutually independent copies of the uniform distribution on {0, 1/B, . . . , B/B}.
If C | = , then μ A | = 0 for some A ∈ A d . From the proof of Proposition 1, the continuity of M A and Theorem A1, C n and S An1 , . . . , S AnB are tight. Thus (A8) yields the conclusion, as in the proof of Proposition 1.

Proof of Theorem 3
Let (U 11n , . . . , U 1dn ), . . . , (U n1n , . . . , U ndn ) be a random sample from C θn , and define the process As shown in Proposition 2.1 of  using Theorem 3.10.12 of van der Vaart & Wellner (1996), under Q n , as n → ∞, B n converges weakly in the Skorohod space of càdlàg functions D([0, 1] d ) to B * +δĊ θ 0 , whereĊ θ 0 is as in Condition 2. Now write G n =C n +D n , where the summands are as in equation (5) of Genest et al. (2017). From the proof of Proposition 4 and in light of Proposition 5 in the latter paper together with contiguity, it continues to be the case that under Q n , C n − M F (B n ) → 0 in probability. Because the operator M F is continuous and linear, the continuous mapping theorem implies that under Q n , as n → ∞,C n → M F (B * + δĊ θ 0 ) = C * + δM F (Ċ θ 0 ) weakly in D([0, 1] d ). Now observe that for each j ∈ {1, . . . , d} and u j ∈ [0, 1], Condition 2 implies thatĊ θ 0 {ι j (u j )} = 0 and hence M F (Ċ θ 0 ){ι j (u j )} = 0; see also Genest et al. (2007, p. 170). From Proposition 6 in Genest et al. (2017), together with Proposition 4.5 in Genest et al. (2014) and contiguity, one then has that under Q n , as n → ∞,D n →D weakly, where for each (u 1 , . . . , u d ) ∈ [0, 1] d , From (A2), under Q n , as n → ∞, G n → G + δM F (Ċ θ 0 ) weakly in D([0, 1] d ). Because G n and the limit have continuous paths, the convergence is in C([0, 1] d ). From the proof of Theorem 2, where M A is a continuous map; so the continuous mapping theorem implies that under Q n , the processes G A with A ∈ A d converge jointly to M A {G + δM F (Ċ θ 0 )}. For each A ∈ A d , the map M A is linear, and, as shown in the proof of Theorem 2, M A (G) = M A (C * ) = G A . The limiting process can therefore be rewritten as M A {G + δM F (Ċ θ 0 )} = G A + δM F (q A ) where q A = M A (Ċ θ 0 ), as claimed.