Abstract

We develop a permutation test for assessing a difference in the areas under the curve (AUCs) in a paired setting where both modalities are given to each diseased and nondiseased subject. We propose that permutations be made between subjects specifically by shuffling the diseased/nondiseased labels of the subjects within each modality. As these permutations are made within modality, the permutation test is valid even if both modalities are measured on different scales. We show that our permutation test is a sign test for the symmetry of an underlying discrete distribution whose size remains valid under the assumption of equal AUCs. We demonstrate the operating characteristics of our test via simulation and show that our test is equal in power to a permutation test recently proposed by Bandos and others (2005).

INTRODUCTION

Nonparametric inference for a difference in areas under the curve (AUCs) for paired studies was first proposed by DeLong and others (1988), which is based upon asymptotic theory for U-statistics (Hoeffding, 1948) and estimates the covariance of the 2 U-statistics using the jackknife. Other nonparametric inference procedures include those based upon an analysis of variance of jackknife pseudovalues (Dorfman and others, 1992; Song, 1997) and bootstrap-based methods (Campbell, 1994; Moise and others, 1988). However, the validity of each of these methods is founded in large-sample theory and each does not necessarily lead to a valid test of difference in AUC in small samples. A competing approach to the above methods is a permutation test, the size of which will remain nominal in small samples.

Two permutation tests for paired receiver operating characteristic (ROC) studies currently exist: one proposed by Venkatraman and Begg (1996) and the more recent test of Bandos and others (2005). The test of Bandos and others directly tests for an equality of AUCs, while the test of Venkatraman and Begg is more general and tests for equality of the underlying ROC curves. As a result, the test of Venkatraman and Begg is less powerful for testing equality of AUCs. Both permutation tests are executed by permuting the labels of the 2 modalities within each diseased and nondiseased subject. Such an approach implicitly assumes that both modalities are exchangeable within subject and requires an appropriate transformation, such as ranks, for modalities differing in scale. Bandos and others (2005) compared the performance of their test to that of DeLong and others (1988) via simulation and found that the permutation test had greater power than the nonparametric test developed by DeLong and others (1988) when there was moderate correlation between modalities, large AUCs, and small sample sizes.

We propose an alternative permutation test based on between-subject permutations of the diseased/ nondiseased labels of the subjects. These permutations do not require the exchangeability of the 2 modalities and do not require transformations of the original data. In Section 2, we derive our permutation test and a corresponding asymptotic normal approximation. In Section 3, we discuss simulation results regarding the validity and power of our test in relation to that of Bandos and others (2005). In Section 4, we make concluding remarks.

PROPOSED PERMUTATION TEST

We have a study designed to compare AUC1 and AUC2, the respective AUCs of Modalities 1 and 2. We have a total of N subjects, and both modalities are applied to the same m nondiseased subjects and the same n = Nm diseased subjects. We let Xi1 and Xi2 denote the respective values reported from Modalities 1 and 2 for nondiseased subject i,i = 1,2,…,m. Likewise, we let Yj1 and Yj2 denote the respective values reported from Modalities 1 and 2 for diseased subject j,j = 1,2,…,n. We let X = {(X11,X12),(X21,X22),…,(Xm1,Xm2)} denote the vector of measurement pairs on nondiseased subjects and Y = {(Y11,Y12),(Y21,Y22), …,(Yn1,Yn2)} denote the vector of measurement pairs on diseased subjects. The nonparametric estimate of AUCΔ = AUC2 − AUC1 is forumla, where Sij = Uij2Uij1 and Uijk = I(Xik < Yjk) + I(Xik = Yjk)/2,k = 1,2.

To formally test AUCΔ = 0, we combine all the subjects into 1 group of N subjects. We let Z1 = {Z11,Z12,…,Z1m,Z1,m + 1,…,Z1N} denote the N measurements reported from the first modality, in which the subscripts graphic = 1,2,…,m represent values corresponding to nondiseased subjects and graphic = m + 1,m + 2,…,N represent values corresponding to diseased subjects. Within Modality 1, we compare every subject's value to every other subject's value, that is, Vgraphicgraphic1 = I(Zgraphic1 < Zgraphic1) + I(Zgraphic1 = Zgraphic1)/2,graphicgraphic. In other words, we are comparing every diseased subject to all nondiseased subjects and all (n − 1) other diseased subjects. Likewise, we are comparing every nondiseased subject to all diseased subjects and all (m − 1) other nondiseased subjects. For Modality 2, we have corresponding definitions of Z2 and Vgraphicgraphic2. By these definitions, we see that Vgraphicgraphic = 1 − Vgraphicgraphick,k = 1,2.

We then define a generalized estimate of AUCΔ 

(2.1)
graphic
in which 
graphic
and Tgraphicgraphic = (Vgraphicgraphic2Vgraphicgraphic1). The sum in (2.1) is restricted to graphic < graphic because we know that Tgraphicgraphic = − Tgraphicgraphic, that is, both values correspond to a comparison of the same diseased and nondiseased subject, but in reversed orders.

We prove in the Appendix that when both modalities have continuous distributions, testing the hypothesis AUCΔ = 0 is equivalent to testing the hypothesis that Tgraphicgraphic has a distribution symmetric around zero. A standard nonparametric test for symmetry around zero is the sign test, that is, the null distribution for AUCΔ is found by computing AUCΔ for every permutation of wgraphicgraphic, the signs of the Tgraphicgraphic. Since we permute the wgraphicgraphic by switching the labels of nondiseased subject graphic and diseased subject graphic, our permutation test corresponds to permuting the vector of diseased/nondiseased labels among the subjects. Under such a permutation scheme, we are breaking the connection between a subject's disease status and their 2 modality values. Thus, at first glance, our permutation test may appear to be a valid test of AUCΔ = 0 only when both modalities are equally useless for detecting disease. However, as we have just discussed, our permutation scheme creates a valid test whenever AUC1 = AUC2 = c for any value 0.5 ≤ c ≤ 1.0.

Like all sign tests, our test is valid when Tgraphicgraphic has a distribution symmetric around zero but has power that is impacted by p0 = Prob(Tgraphicgraphic = 0), that is, how much mass the discrete distribution of Tgraphicgraphic has at zero. In our setting, p0 increases as either AUC1 or AUC2 increases toward 1.0. Specifically, as the overlap in the distributions of diseased and nondiseased subjects for both modalities decreases, it becomes increasingly likely that Vgraphicgraphic = 1 for both modalities, leading to the increased probability that Tgraphicgraphic = 0 and decreased power of our permutation test. Therefore, we adopt the traditional approach of improving power in sign tests by proposing a modified statistic 

graphic
in which S0 = ∑ijI(Sij = 0), S+ = ∑ijI(Sij > 0), and f is a proportion describing the degree to which we use the number of zeros as evidence against AUCΔ = 0.

Unfortunately, the optimal value of f in our setting is difficult to derive due to the correlation of the wgraphicgraphic. If the wgraphicgraphic were independent, the methods of Irle and Klosener (1980), based upon the Neyman–Pearson lemma, would show that the optimal value of f is a function of p = Prob(Sgraphicgraphic < 0) and p+ = Prob(Sgraphicgraphic > 0), both of which in our setting will vary depending on the actual values of AUC1 and AUC2. Although Putter (1955) suggested f = 1/2 and Coakley and Heise (1996) proposed f = 2/3 for use with the standard sign test, the simulation results presented in the Appendix demonstrate that a fixed value of f will not guarantee good operating characteristics for all possible values of AUC1 and AUC2. Nonetheless, we give a suggested formula for f (denoted graphic) in the Appendix that appears to perform well in most settings.

Furthermore, because the exact correlation structure of the wgraphicgraphic is quite complicated, permutation theory cannot be used to derive the asymptotic variance of D*. As an alternative, we collect the values of S+ and S0 from each permutation to give us a joint permutation distribution for S+ and S0. From this distribution, we compute forumla + and forumla0, the respective sample means of S+ and S0, as well as forumla+2, forumla02, and forumla+0, the respective sample variances and covariance of S+ and S0. Assuming a value for f, we estimate the null mean and variance of D* to be μD* = fforumla0 + forumla + and forumlaD*2 = f2forumla02 + forumla+2 + 2 fforumla+0, respectively. An asymptotic version of our permutation test would therefore compare the value of (D* − forumlaD*)/forumlaD* to the appropriate critical value in a standard normal distribution.

NUMERICAL STUDIES OF OPERATING CHARACTERISTICS

Our simulation results are presented as a series of 4 tables in the Appendix, which also contains specific computational details for the simulations. Based upon 1000 simulations, an approximate 95% confidence interval around a nominal size of 0.05 is (0.036,0.064). Thus, we see in Table A1 that our permutation test is valid with any value of f when both AUC1 and AUC2 are less than 0.7, regardless of sample size or within-subject correlation. However, for higher AUC values, we see that the appropriate value of f will vary and if f is set too high, our test can actually have size above the desired level of 0.05. When AUC1 = AUC2 = 0.80, the results suggest that f = 1/3 may be appropriate, while when AUC1 = AUC2 = 0.90, f should be set greater than 1/3, but less than 1/2. In contrast to using a fixed value of f, our proposed value graphic appears to work well (whether exact or asymptotic) among all settings, although it may produce a slightly conservative test at extreme AUC values. These findings continue to hold for discrete modalities, as demonstrated in Table A2. Furthermore, although there are slight variations in size between our test using graphic and that of Bandos and others, none of the differences are significant, except with discrete modalities with AUC of 0.80 or higher. Therefore, we conclude overall that both approaches have similar size in most reasonable settings.

In Tables A3 and A4, we see that the power of the proposed permutation test using an appropriate value of f is comparable with the power of the test of Bandos and others. In fact, the power of the proposed permutation test can be increased marginally above that of Bandos and others with some values of f. For example, when testing for a difference in AUC1 = 0.7 and AUC2 = 0.8 with continuous modalities using 40 diseased and 40 nondiseased subjects with intrasubject correlation ρ = 0.5, we see in Table A3 that the proposed test has power 0.471 when f = 1/4, as compared to power 0.420 for the test of Bandos and others. Nonetheless, choosing a fixed value of f to increase power will be difficult in practice as the true AUC values of both modalities will be unknown.

FUTURE RESEARCH AREAS

Due to the complicated correlation structure of the elements used in the statistic, we have not yet derived a theoretically optimal value of the value f necessary for the statistic. Although we have developed one possible value that appears in simulations to work well across many settings, we are continuing research into deriving a formula for f that will maximize power in all settings. We are also seeking to use our permutation test to generate a confidence interval for AUCΔ as a complement to the hypothesis test. Furthermore, unlike the test of Bandos and others (2005), our proposed test does not require modalities that are measured on identical scales and thus may prove to be more powerful in settings in which the modality values are skewed; we are pursuing this conjecture in current research.

Note that the semiparametric regression model of Dodd and Pepe (2003) uses the Uijk defined in Section 2 in a generalized estimating equation (GEE) and yields a standardized value of the nonparametric estimate of AUCΔ when using an independence working covariance structure of the Uijk. Although, inference for AUCΔ could be based upon the sandwich (robust) variance estimator of GEE methods, Braun and Feng (2001) showed that score and Wald tests using this approach are known to have liberal sizes in smaller sample sizes and developed a permutation test as an alternative to large-sample theory. As a result, we are also pursuing use of our permutation approach to the methods of Dodd and Pepe (2003) that would lead to exact inference for semiparametric estimates of difference in AUCs.

APPENDIX

Theoretic proof for permutation test validity

Suppose Vgraphicgraphic1 and Vgraphicgraphic2 have distributions 

graphic
and 
graphic
so that AUC1 = p10 + (1 − p10 + p11)/2 and AUC2 = p20 + (1 − p20 + p21)/2. We also know that pd = 2mn/[N(N − 1)] is the probability that only one of the subjects graphic and graphic is diseased, while (1 − pd) = 1 − 2mn/[N(N − 1)] is the probability that subjects graphic and graphic are both diseased or both nondiseased. As a result, Tgraphicgraphic = Vgraphicgraphic1Vgraphicgraphic2 has distribution 
graphic
Under the null hypothesis AUC1 = AUC2 = AUC*, we find that wgraphicgraphic = sign(Tgraphicgraphic) has mean 
graphic
If continuous distributions are assumed for both modalities, then AUC* = p10 = (1 − p11) = p20 = (1 − p21) and we find that E(wgraphicgraphic) = 0, thereby proving that Tgraphicgraphic is equally likely to be positive or negative. Modalities producing discrete values will lead to a distribution for Tgraphicgraphic that is skewed positively or negatively from zero. However, the effect of this skewness will vanish asymptotically as N→ ∞, assuming that the ratio m/N remains constant (Romano, 1990). As a result, our permutation test will be (asymptotically) valid when AUCΔ = 0, regardless of the value of AUC1 and AUC2.

Simulation description and results

We have simulated measurements for both modalities as follows. We drew the 2 continuous measurements for each nondiseased subject from a bivariate normal distribution centered at μX = 0, with both measurements having a marginal variance of 1.0 and correlation ρ. We drew the 2 continuous measurements for each diseased subject from a bivariate normal distribution centered at μY, also with both measurements having a marginal variance of 1.0 and correlation ρ; the values in μY are directly determined from AUC1 and AUC2. To generate discrete outcomes, we first generated continuous outcomes as described above. Within each modality, we then assigned a value of 1 to outcomes in the lowest quintile of outcomes, a value of 2 outcomes between the first and the second quintiles, etc., with a value of 5 to outcomes above the fourth quintile.

For each simulation, we examined 3 fixed values f = {1/2,1/3,1/4} in our D* statistic, as well as the value 

(A.1)
graphic
which varied among simulations depending upon the values forumla+, forumla0, and forumla, the proportions of the Sij that were greater than, equal to, and less than zero, respectively. This value graphic is based upon the optimal value proposed by Irle and Klosener (1980) in the setting of independent Sij. However, we do not claim this estimate to be necessarily optimal in our setting, but rather one that appears to offer our test nominal size and excellent power across a variety of settings.

Tables A1 and A2 examine the size of the 2 competing permutation tests (both exact and asymptotic versions) for assessing a difference in AUC for 2 continuous or discrete modalities, while Tables A3 and A4 are the corresponding comparisons of the power for the 2 tests. Each setting in Tables A1–A4 is defined by m, the number of nondiseased and number of diseased subjects, ρ, the within-subject correlation of the 2 modalities, and the values of AUC1 and AUC2. The size and power of each test were computed as the percentage of 1000 simulations in which the null hypothesis AUCΔ = 0 was rejected at a level of α = 0.05. We generated the permutation distribution of D* in each simulation by generating 1000 random permutations of the diseased/nondiseased labels.

Table A1.

Comparison of size for proposed permutation test and that of Bandos and others for assessing a difference in AUC of 2 paired continuous modalities with within-subject correlation ρ in samples of m diseased and m nondiseased subjects. The proposed permutation test is applied with f = {1/4, 1/3, 1/2} as well as the value graphic presented in (1). The asymptotic size of the proposed permutation test is based upon f = graphic

ρ m AUC1 AUC2 Proposed Bandos and others 
    Exact Asymptotic Exact Asymptotic 
    f = 1/4 f = 1/3 f = 1/2 graphic    
0.0 40 0.6 0.6 0.053 0.053 0.053 0.054 0.053 0.057 0.055 
  0.7 0.7 0.056 0.052 0.054 0.055 0.053 0.062 0.060 
  0.8 0.8 0.055 0.042 0.040 0.045 0.046 0.061 0.063 
  0.9 0.9 0.111* 0.044 0.013 0.039 0.039 0.050 0.051 
 80 0.6 0.6 0.051 0.050 0.048 0.040 0.040 0.041 0.041 
  0.7 0.7 0.047 0.042 0.030 0.036 0.036 0.038 0.039 
  0.8 0.8 0.094* 0.055 0.021 0.035 0.036 0.047 0.045 
  0.9 0.9 0.297* 0.098* 0.009 0.041 0.043 0.046 0.047 
0.5 40 0.6 0.6 0.062 0.062 0.061 0.055 0.056 0.060 0.060 
  0.7 0.7 0.051 0.048 0.046 0.043 0.044 0.052 0.053 
  0.8 0.8 0.057 0.044 0.033 0.037 0.036 0.046 0.041 
  0.9 0.9 0.111* 0.047 0.013 0.040 0.040 0.042 0.041 
 80 0.6 0.6 0.043 0.043 0.044 0.051 0.049 0.051 0.052 
  0.7 0.7 0.047 0.047 0.049 0.043 0.039 0.058 0.058 
  0.8 0.8 0.077* 0.051 0.032 0.041 0.045 0.048 0.050 
  0.9 0.9 0.229* 0.089* 0.011 0.036 0.038 0.050 0.048 
ρ m AUC1 AUC2 Proposed Bandos and others 
    Exact Asymptotic Exact Asymptotic 
    f = 1/4 f = 1/3 f = 1/2 graphic    
0.0 40 0.6 0.6 0.053 0.053 0.053 0.054 0.053 0.057 0.055 
  0.7 0.7 0.056 0.052 0.054 0.055 0.053 0.062 0.060 
  0.8 0.8 0.055 0.042 0.040 0.045 0.046 0.061 0.063 
  0.9 0.9 0.111* 0.044 0.013 0.039 0.039 0.050 0.051 
 80 0.6 0.6 0.051 0.050 0.048 0.040 0.040 0.041 0.041 
  0.7 0.7 0.047 0.042 0.030 0.036 0.036 0.038 0.039 
  0.8 0.8 0.094* 0.055 0.021 0.035 0.036 0.047 0.045 
  0.9 0.9 0.297* 0.098* 0.009 0.041 0.043 0.046 0.047 
0.5 40 0.6 0.6 0.062 0.062 0.061 0.055 0.056 0.060 0.060 
  0.7 0.7 0.051 0.048 0.046 0.043 0.044 0.052 0.053 
  0.8 0.8 0.057 0.044 0.033 0.037 0.036 0.046 0.041 
  0.9 0.9 0.111* 0.047 0.013 0.040 0.040 0.042 0.041 
 80 0.6 0.6 0.043 0.043 0.044 0.051 0.049 0.051 0.052 
  0.7 0.7 0.047 0.047 0.049 0.043 0.039 0.058 0.058 
  0.8 0.8 0.077* 0.051 0.032 0.041 0.045 0.048 0.050 
  0.9 0.9 0.229* 0.089* 0.011 0.036 0.038 0.050 0.048 
*

Significantly above desired level of 0.05

Table A2.

Comparison of size for proposed permutation test and that of Bandos and others for assessing a difference in AUC of 2 paired discrete modalities with within-subject correlation ρ in samples of m diseased and m nondiseased subjects. The proposed permutation test is applied with f = {1/4, 1/3, 1/2} as well as the value graphic presented in (1). The asymptotic size of the proposed permutation test is based upon f = graphic

ρ m AUC1 AUC2 Proposed Bandos and others 
    Exact Asymptotic Exact Asymptotic 
    f = 1/4 f = 1/3 f = 1/2 graphic    
0.0 40 0.6 0.6 0.053 0.054 0.056 0.056 0.057 0.057 0.057 
  0.7 0.7 0.056 0.052 0.054 0.056 0.057 0.064 0.060 
  0.8 0.8 0.055 0.042 0.040 0.048 0.045 0.059 0.061 
  0.9 0.9 0.111* 0.044 0.013 0.032 0.030 0.047 0.045 
0.0 80 0.6 0.6 0.051 0.050 0.048 0.050 0.050 0.044 0.045 
  0.7 0.7 0.047 0.042 0.030 0.034 0.039 0.037 0.037 
  0.8 0.8 0.094* 0.055 0.021 0.038 0.038 0.042 0.044 
  0.9 0.9 0.297* 0.098* 0.009 0.028 0.030 0.036 0.034 
0.5 40 0.6 0.6 0.062 0.062 0.061 0.063 0.065 0.061 0.061 
  0.7 0.7 0.051 0.048 0.046 0.051 0.050 0.056 0.056 
  0.8 0.8 0.057 0.044 0.033 0.045 0.046 0.048 0.048 
  0.9 0.9 0.111* 0.047 0.013 0.035 0.035 0.046 0.040 
0.5 80 0.6 0.6 0.043 0.043 0.044 0.045 0.044 0.044 0.044 
  0.7 0.7 0.047 0.047 0.049 0.052 0.051 0.059 0.059 
  0.8 0.8 0.077* 0.051 0.032 0.041 0.043 0.056 0.056 
  0.9 0.9 0.229* 0.089* 0.011 0.028 0.030 0.037 0.036 
ρ m AUC1 AUC2 Proposed Bandos and others 
    Exact Asymptotic Exact Asymptotic 
    f = 1/4 f = 1/3 f = 1/2 graphic    
0.0 40 0.6 0.6 0.053 0.054 0.056 0.056 0.057 0.057 0.057 
  0.7 0.7 0.056 0.052 0.054 0.056 0.057 0.064 0.060 
  0.8 0.8 0.055 0.042 0.040 0.048 0.045 0.059 0.061 
  0.9 0.9 0.111* 0.044 0.013 0.032 0.030 0.047 0.045 
0.0 80 0.6 0.6 0.051 0.050 0.048 0.050 0.050 0.044 0.045 
  0.7 0.7 0.047 0.042 0.030 0.034 0.039 0.037 0.037 
  0.8 0.8 0.094* 0.055 0.021 0.038 0.038 0.042 0.044 
  0.9 0.9 0.297* 0.098* 0.009 0.028 0.030 0.036 0.034 
0.5 40 0.6 0.6 0.062 0.062 0.061 0.063 0.065 0.061 0.061 
  0.7 0.7 0.051 0.048 0.046 0.051 0.050 0.056 0.056 
  0.8 0.8 0.057 0.044 0.033 0.045 0.046 0.048 0.048 
  0.9 0.9 0.111* 0.047 0.013 0.035 0.035 0.046 0.040 
0.5 80 0.6 0.6 0.043 0.043 0.044 0.045 0.044 0.044 0.044 
  0.7 0.7 0.047 0.047 0.049 0.052 0.051 0.059 0.059 
  0.8 0.8 0.077* 0.051 0.032 0.041 0.043 0.056 0.056 
  0.9 0.9 0.229* 0.089* 0.011 0.028 0.030 0.037 0.036 
*

Significantly above desired level of 0.05

Table A3.

Comparison of power for proposed permutation test and that of Bandos and others for assessing a difference in AUC of 2 paired continuous modalities with within-subject correlation ρ in samples of m diseased and m nondiseased subjects. The proposed permutation test is applied with f = {1/4, 1/3, 1/2} as well as the value graphic presented in (1). The asymptotic size of the proposed permutation test is based upon f = graphic

ρ m AUC1 AUC2 Proposed Bandos and others 
    Exact Asymptotic Exact Asymptotic 
    f = 1/4 f = 1/3 f = 1/2 graphic    
0.0 40 0.6 0.7 0.197 0.184 0.166 0.190 0.190 0.201 0.195 
  0.6 0.8 0.703 0.682 0.622 0.685 0.683 0.691 0.692 
  0.7 0.8 0.295 0.257 0.179 0.235 0.237 0.239 0.237 
  0.7 0.9 0.903* 0.861 0.731 0.842 0.837 0.834 0.834 
  0.8 0.9 0.571* 0.448 0.208 0.374 0.378 0.363 0.361 
 80 0.6 0.7 0.413 0.391 0.341 0.379 0.379 0.380 0.377 
  0.7 0.8 0.615* 0.556 0.407 0.491 0.489 0.487 0.483 
  0.8 0.9 0.913* 0.821* 0.506 0.706 0.705 0.696 0.690 
0.5 40 0.6 0.7 0.349 0.338 0.320 0.347 0.350 0.363 0.349 
  0.6 0.8 0.911 0.900 0.875 0.919 0.917 0.921 0.923 
  0.7 0.8 0.471 0.422 0.337 0.409 0.409 0.420 0.413 
  0.7 0.9 0.987* 0.980 0.948 0.979 0.978 0.977 0.978 
  0.8 0.9 0.758* 0.636 0.392 0.626 0.622 0.610 0.598 
 80 0.6 0.7 0.611 0.593 0.555 0.607 0.609 0.619 0.612 
  0.7 0.8 0.796* 0.743 0.630 0.718 0.717 0.716 0.712 
  0.8 0.9 0.975* 0.935* 0.751 0.905 0.907 0.890 0.889 
ρ m AUC1 AUC2 Proposed Bandos and others 
    Exact Asymptotic Exact Asymptotic 
    f = 1/4 f = 1/3 f = 1/2 graphic    
0.0 40 0.6 0.7 0.197 0.184 0.166 0.190 0.190 0.201 0.195 
  0.6 0.8 0.703 0.682 0.622 0.685 0.683 0.691 0.692 
  0.7 0.8 0.295 0.257 0.179 0.235 0.237 0.239 0.237 
  0.7 0.9 0.903* 0.861 0.731 0.842 0.837 0.834 0.834 
  0.8 0.9 0.571* 0.448 0.208 0.374 0.378 0.363 0.361 
 80 0.6 0.7 0.413 0.391 0.341 0.379 0.379 0.380 0.377 
  0.7 0.8 0.615* 0.556 0.407 0.491 0.489 0.487 0.483 
  0.8 0.9 0.913* 0.821* 0.506 0.706 0.705 0.696 0.690 
0.5 40 0.6 0.7 0.349 0.338 0.320 0.347 0.350 0.363 0.349 
  0.6 0.8 0.911 0.900 0.875 0.919 0.917 0.921 0.923 
  0.7 0.8 0.471 0.422 0.337 0.409 0.409 0.420 0.413 
  0.7 0.9 0.987* 0.980 0.948 0.979 0.978 0.977 0.978 
  0.8 0.9 0.758* 0.636 0.392 0.626 0.622 0.610 0.598 
 80 0.6 0.7 0.611 0.593 0.555 0.607 0.609 0.619 0.612 
  0.7 0.8 0.796* 0.743 0.630 0.718 0.717 0.716 0.712 
  0.8 0.9 0.975* 0.935* 0.751 0.905 0.907 0.890 0.889 
*

Based upon test with supranominal size

Table A4.

Comparison of power for proposed permutation test and that of Bandos and others for assessing a difference in AUC of 2 paired discrete modalities with within-subject correlation ρ in samples of m diseased and m nondiseased subjects. The proposed permutation test is applied with f = {1/4, 1/3, 1/2} as well as the value graphic presented in (1). The asymptotic size of the proposed permutation test is based upon f = graphic

ρ m AUC1 AUC2 Proposed Bandos and others 
    Exact Asymptotic Exact Asymptotic 
    f = 1/4 f = 1/3 f = 1/2 graphic    
0.0 40 0.6 0.7 0.197 0.184 0.166 0.176 0.180 0.177 0.178 
  0.6 0.8 0.703 0.682 0.622 0.659 0.660 0.659 0.665 
  0.7 0.8 0.295 0.257 0.179 0.217 0.213 0.225 0.218 
  0.7 0.9 0.903* 0.861 0.731 0.807 0.807 0.817 0.813 
  0.8 0.9 0.571* 0.448 0.208 0.358 0.356 0.365 0.363 
0.0 80 0.6 0.7 0.413 0.391 0.341 0.358 0.363 0.370 0.368 
  0.7 0.8 0.615* 0.556 0.407 0.462 0.460 0.454 0.453 
  0.8 0.9 0.913* 0.821* 0.506 0.669 0.669 0.665 0.669 
0.5 40 0.6 0.7 0.349 0.338 0.320 0.328 0.331 0.349 0.339 
  0.6 0.8 0.911 0.900 0.875 0.888 0.889 0.893 0.890 
  0.7 0.8 0.471 0.422 0.337 0.389 0.388 0.390 0.385 
  0.7 0.9 0.987* 0.980 0.948 0.970 0.971 0.967 0.968 
  0.8 0.9 0.758* 0.636 0.392 0.562 0.558 0.553 0.543 
0.5 80 0.6 0.7 0.611 0.593 0.555 0.568 0.569 0.569 0.570 
  0.7 0.8 0.796* 0.743 0.630 0.664 0.668 0.671 0.665 
  0.8 0.9 0.975* 0.935* 0.751 0.861 0.859 0.854 0.853 
ρ m AUC1 AUC2 Proposed Bandos and others 
    Exact Asymptotic Exact Asymptotic 
    f = 1/4 f = 1/3 f = 1/2 graphic    
0.0 40 0.6 0.7 0.197 0.184 0.166 0.176 0.180 0.177 0.178 
  0.6 0.8 0.703 0.682 0.622 0.659 0.660 0.659 0.665 
  0.7 0.8 0.295 0.257 0.179 0.217 0.213 0.225 0.218 
  0.7 0.9 0.903* 0.861 0.731 0.807 0.807 0.817 0.813 
  0.8 0.9 0.571* 0.448 0.208 0.358 0.356 0.365 0.363 
0.0 80 0.6 0.7 0.413 0.391 0.341 0.358 0.363 0.370 0.368 
  0.7 0.8 0.615* 0.556 0.407 0.462 0.460 0.454 0.453 
  0.8 0.9 0.913* 0.821* 0.506 0.669 0.669 0.665 0.669 
0.5 40 0.6 0.7 0.349 0.338 0.320 0.328 0.331 0.349 0.339 
  0.6 0.8 0.911 0.900 0.875 0.888 0.889 0.893 0.890 
  0.7 0.8 0.471 0.422 0.337 0.389 0.388 0.390 0.385 
  0.7 0.9 0.987* 0.980 0.948 0.970 0.971 0.967 0.968 
  0.8 0.9 0.758* 0.636 0.392 0.562 0.558 0.553 0.543 
0.5 80 0.6 0.7 0.611 0.593 0.555 0.568 0.569 0.569 0.570 
  0.7 0.8 0.796* 0.743 0.630 0.664 0.668 0.671 0.665 
  0.8 0.9 0.975* 0.935* 0.751 0.861 0.859 0.854 0.853 
*

Based upon test with supranominal size

Conflict of Interest: None declared.

References

Bandos
AI
Rockette
HE
Gur
D
A permutation test sensitive to differences in areas for comparing ROC curves from a paired design
Statistics in Medicine
 , 
2005
, vol. 
24
 (pg. 
2873
-
2893
)
Braun
TM
Feng
Z
Optimal permutation tests for the analysis of group randomized trials
Journal of the American Statistical Association
 , 
2001
, vol. 
96
 (pg. 
1424
-
1432
)
Campbell
G
General methodology I: advances in statistical methodology for the evaluation of diagnostic and laboratory tests
Statistics in Medicine
 , 
1994
, vol. 
13
 (pg. 
499
-
508
)
Coakley
CW
Heise
MA
Versions of the sign test in the presence of ties
Biometrics
 , 
1996
, vol. 
52
 (pg. 
1242
-
1251
)
DeLong
ER
DeLong
DM
Clarke-Pearson
DL
Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach
Biometrics
 , 
1988
, vol. 
44
 (pg. 
837
-
845
)
Dodd
LE
Pepe
MS
Semiparametric regression for the area under the receiver operating characteristic curve
Journal of the American Statistical Association
 , 
2003
, vol. 
98
 (pg. 
409
-
417
)
Dorfman
DD
Berbaum
KS
Metz
CE
Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method
Investigative Radiology
 , 
1992
, vol. 
27
 (pg. 
723
-
731
)
Hoeffding
W
A class of statistics with asymptotically normal distribution
Annals of Mathematical Statistics
 , 
1948
, vol. 
19
 (pg. 
293
-
325
)
Irle
A
Klosener
K-H
Note on the sign test in the presence of ties
Annals of Statistics
 , 
1980
, vol. 
8
 (pg. 
1168
-
1170
)
Moise
A
Clement
B
Raissis
M
A test for crossing receiver operating characteristic (ROC) curves
Communications in Statistics—Theory and Methods
 , 
1988
, vol. 
17
 (pg. 
1985
-
2003
)
Putter
J
The treatment of ties in some nonparametric tests
Annals of Mathematical Statistics
 , 
1955
, vol. 
26
 (pg. 
368
-
386
)
Romano
JP
On the behavior of randomization tests without a group invariance assumption
Journal of the American Statistical Association
 , 
1990
, vol. 
85
 (pg. 
686
-
692
)
Song
HH
Analysis of correlated ROC areas in diagnostic testing
Biometrics
 , 
1997
, vol. 
53
 (pg. 
370
-
382
)
Venkatraman
E
Begg
CB
A distribution-free procedure for comparing receiver operating characteristic curves from a paired experiment
Biometrika
 , 
1996
, vol. 
83
 (pg. 
835
-
848
)