-
PDF
- Split View
-
Views
-
Cite
Cite
Daniela Dunkler, Michael Schemper, Georg Heinze, Gene selection in microarray survival studies under possibly non-proportional hazards, Bioinformatics, Volume 26, Issue 6, 15 March 2010, Pages 784–790, https://doi.org/10.1093/bioinformatics/btq035
Close -
Share
Abstract
Motivation: Univariate Cox regression (COX) is often used to select genes possibly linked to survival. With non-proportional hazards (NPH), COX could lead to under- or over-estimation of effects.
The effect size measure c=P(T1<T0), i.e. the probability that a person randomly chosen from group G1 dies earlier than a person from G0, is independent of the proportional hazards (PH) assumption. Here we consider its generalization to continuous data c′ and investigate the suitability of c′ for gene selection.
Results: Under PH, c′ is most efficiently estimated by COX. Under NPH, c′ can be obtained by weighted Cox regression (WHE) or a novel method, concordance regression (CON). The least biased and most stable estimates were obtained by CON. We propose to use c′ as summary measure of effect size to rank genes irrespective of different types of NPH and censoring patterns.
Availability: WHE and CON are available as R packages.
Contact:georg.heinze@meduniwien.ac.at
Supplementary Information:Supplementary Data are available at Bioinformatics online.
1 INTRODUCTION
Figure 1 and Supplementary Figure 1 show examples of genes with PHs, converging hazards (CH) and diverging hazards (DH) from the study of Bhattacharjee et al. (2001). For the gene with PH (left column) the correlation of scaled Schoenfeld residuals (Grambsch and Therneau, 1994) with the rank of time is close to 0. For the other genes the correlation is considerably larger indicating violation of the PH assumption. The middle column of Figure 1 shows a gene with CH, where the effect fades away with time, whereas the gene depicted in the right column exhibits DH, i.e. an effect increasing with time. In the Bhattacharjee study, genes with DH are found more often than genes with CH. NPH may arise from time-dependent effects of genes on survival, but could also result from model misspecification, e.g. from omitting a strong clinical covariate or another gene. This may particularly happen in univariate analyses.
Exemplification of proportional, converging and diverging hazards with three genes from the Bhattacharjee study. First row: Kaplan Meier estimates with gene expression dichotomized at the median versus time; second row: smoothed scaled Schoenfeld residuals and 95% confidence intervals versus rank of time. R, correlation of these residuals with rank of time.
Exemplification of proportional, converging and diverging hazards with three genes from the Bhattacharjee study. First row: Kaplan Meier estimates with gene expression dichotomized at the median versus time; second row: smoothed scaled Schoenfeld residuals and 95% confidence intervals versus rank of time. R, correlation of these residuals with rank of time.
1.1 Alternatives to COX in case of NPH
To cope with an apparent time-dependent effect of gene expression on survival, one may model the functional form of the time-dependency by an interaction of gene expression with arbitrary functions of time. Suitable functions of time could be found, e.g. by fractional polynomials or penalized regression (cf. Lehr and Schemper, 2007) or restricted cubic splines (Hess, 1994), or by a simple interaction with a monotonic function, e.g. the logarithm, of survival time (Ng'andu, 1997). In this article, we do not follow any of these approaches, because finding the best functional form of time-dependency for a huge number of predictors may lead to numerous problems like multicollinearity or multiple testing issues, to name just two. Other options to cope with NPH, like stratification or separate modeling for different time periods (Moreau et al., 1985), are also not feasible with microarray data. Furthermore, all these options do not allow a comparison or ranking of genes with respect to their ability to predict short or long survival.
In this article, we propose a semi-parametric generalization of the well-known concordance probability as a summary measure of effect size suitable to rank genes when some of the genes may exhibit a time-dependent effect on survival. The concordance probability c will be reviewed and its generalization c′ presented in Methods section, where we also introduce two methods to estimate c′: concordance regression (CON) and weighted Cox regression (WHE). In Results section we present results from a simulation study comparing the performance of COX, WHE and CON in estimating c′ under various conditions, and in producing gene lists in microarray experiments. We will also present results of analyses of three data sets, and close with a discussion in Section 4.
2 METHODS
2.1 The effect size measure c′ for continuous data
An intuitive non-parametric measure of separation of the survival distributions of two groups is the concordance probability c=P(T1<T0), defined as the probability that a randomly chosen survival time T1 from group G1 is smaller than a randomly chosen survival time T0 from group G0. With uncensored data c is equivalent to the non-parametric two-sample test statistics of Wilcoxon (1945) and Mann and Whitney (1947), and to the area under an ROC curve (Hanley and McNeil, 1982). Assuming PH, one can directly use the Cox regression coefficient
associated with a single binary covariate as an estimate of the log odds of c, i.e.
has the interpretation of log(c/(1−c)). This interesting relationship is shown in detail in the Appendix. Under NPH this connection to
no longer applies, but c still is a measure with an intuitive interpretation. However, as recently demonstrated, c can be approximated also under NPH by using weighted estimation in Cox regression (Schemper et al., 2009).
This quantity can be transformed into the generalized concordance probability by c′=exp(γ)/[1+exp(γ)].
Our definition of c′ has some similarities with the concordance probability defined for time-to-event settings as CP(X, T) = P(Ti<Tj|xi>xj) (Gönen, 2007, p. 89). CP(X, T) is purely non-parametric and could also be used with gene expression data. Under PHs, Gönen and Heller (2005) have developed a modification of CP(X, T) which is not sensitive to censoring. However, we prefer c′ here, because unlike CP(X, T), c′ assumes a higher concordance probability with higher difference in gene expression.
2.2 Estimation of c′
2.2.1 Concordance regression
has been computed, ĉ′ can be derived by Despite the continuous nature of gene expression, in practical problems ties in gene expression may occur. In this case, we omit risk pairs with xi=xj from the likelihood, since they do not contribute information about γ or c′.
2.2.2 Weighted Cox regression
Recently, Schemper et al. (2009) have shown that by introducing weights into the score function of Cox's partial log likelihood, an approximative estimate
of the log odds of concordance γ is obtained that works well over a wide range of underlying values. The validity of the approximation is independent of the type of non-proportionality. The weights that are introduced are defined by w(ti)=S(ti)G(ti )−1, with S(ti) and G(ti) as defined above.
Another related approach was proposed by Xu and O'Quigley (2000). These authors also introduce weights into the score function, but their weights are defined by w(ti)=S(ti)/N(ti), which can be rewritten as w(ti)=G(ti)−1. Their aim was to provide an average regression effect independent of the pattern of censoring. One advantage of the approach of Schemper et al. (2009) is the intuitive interpretation of
as log odds of concordance γ.
2.2.3 Software
WHE has been implemented in an R package coxphw and a SAS macro WCM available at CRAN.r-project.org and http://www.muw.ac.at/msi/biometrie/programs, respectively. CON for estimation of c′ has also been implemented in an R package concreg and is available upon request from the authors.
2.3 Gene selection based on c′
Under PH, all three methods (COX, CON and WHE) will approximately supply similar estimates. Under NPH, however, we may expect differences between COX and CON or WHE, but similarity of the latter. We now assume that gene selection is done based on univariate regressions, and assume that from all candidate genes, the top-ranked are selected for further analysis. In our context, we rank genes by their absolute effect size c′+=0.5+|c′−0.5|, estimated via
,
or
supplied by COX, WHE or CON, respectively. A threshold on c′+ can be defined to produce a list of ‘significant’ genes, and the false discovery rate (FDR) associated with that list evaluated. This procedure is known as FDR thresholding as proposed by Tusher et al. (2001).
3 RESULTS
We evaluated COX, WHE and CON by simulating trials assessing the association of gene expression with survival. The first series of simulations aimed at comparing the methods in univariate models considering expression of only one gene the same time (‘univariate evaluation’). These simulations should reveal differences of the methods in estimating the generalized concordance probability c′ under PH and various types of NPH. A second series simulated typical gene expression studies, where we considered a large number of genes with partly correlated expressions competing for selection in the same study (‘multivariate evaluation’).
3.1 Simulation study: univariate evaluation
In this series of simulations, we investigated the effect of the following factors on the distribution of c′ estimates in a factorial design, generating 2000 samples of 200 observations for each cell: time-dependency (PH, CH or DH), strength of effects (‘small’, ‘medium’ or ‘large’) and presence and amount of censoring (0, 33, 67%). We generated log2 gene expression values from a standard normal distribution, and survival time y from a Weibull distribution with shape parameter a=2 and scale parameter b=0.5. Gene expression was linked to survival time by applying an algorithm of MacKenzie and Abrahamowicz (2002). For time-dependency we considered PH with β(t)=β0, CH with a time-dependent log HR of β(t)=β0[1+2.88/(1+5t)], and DH with β(t)=β0(1+1.86t). β0 was determined for pre-defined population values for c′ of 0.60 (‘small’ effect size), 0.66 (‘medium’ effect size) and 0.80 (‘large’ effect size). Under PH, these choices correspond to β0 values of log(1.5), log(2) and log(4). Further details of these computations are given in the Supplementary Data, Section 4.1 and Supplementary Figure 5. To simulate censoring we drew a uniformly distributed follow-up time z from U[0, τ] and defined the observed survival time as t=min(y, z) with status indicator I(z>y). We determined τ to obtain proportions of censored times of 33 and 67%.
Figure 2 shows boxplots of the estimates of c′ by COX, WHE and CON. The dashed reference lines indicate population values of c′. In case of PH all three methods provide approximately unbiased estimates of c′, irrespective of the effect size and amount of censoring, and COX shows slight efficiency advantages. In case of NPH (CH, DH) however, WHE and CON have clearly lower bias than COX, which over- or under-estimates depending on the combination of censoring and the type of NPH. With increasing censoring all three methods show variance inflation of similar magnitude. When censoring is combined with time-dependent effects a part of the bias can be attributed to the discrepancy of the population value of c′ given follow-up is restricted to a maximum time τ compared to the unrestricted c′. Across all scenarios, the largest observed discrepancy was 0.023.
Boxplots of ĉ′ estimates obtained by Cox (COX), weighted Cox (WHE) and concordance (CON) regression in 2000 simulated data sets with 200 observations each. Dashed lines refer to population values of c′. %c, percent censored.
Boxplots of ĉ′ estimates obtained by Cox (COX), weighted Cox (WHE) and concordance (CON) regression in 2000 simulated data sets with 200 observations each. Dashed lines refer to population values of c′. %c, percent censored.
3.2 Simulation study: multivariate evaluation
The aim of the second series of simulations was to see how the methods compare in selecting those genes which truly are related to survival, if a large number of genes are competing for selection. We simulated gene expressions of p=5000 features according to a scheme outlined by Binder and Schumacher (2008), and assumed that only the first 72 genes had an additive effect on the log hazard, with an equal number of 24 genes exhibiting PH, CH and DH. From each group, we chose eight genes to have a ‘large’ effect size and 16 genes to have a ‘small’ effect size. As in the univariate simulation, we simulated survival times from a Weibull (2, 0.5) distribution with the distribution function denoted by FW(t). We linked standard normally distributed gene expressions to survival times, assuming that the hazard of subject i at time t is λi(t)=λ0(t) exp[∑g=1pxigβg(t)]. The time-dependent log HR of gene g was defined as βg(t)=β0 in case of PH, βg(t)=β0[1+2.88/(1+5t)] for CH, and βg(t)=β0(1+1.86t) for DH. The constants β0 were set such that average regression effects
of 0.4 (‘large’ effect size) and 0.2 (‘small’ effect size) resulted. For each combination of censoring (0, 33, 67%) and sample size (200, 800) we generated 200 data sets and assessed the variability of results.
Each data set was analyzed using COX, WHE and CON and for each gene ĉ′ was estimated. Genes were ranked by ĉ′+ and the m top genes were considered ‘selected’. Figure 3 shows some results when m is set to 72, the number of genes truly associated with survival. Results for other choices of m are given in Supplementary Figures 9–14 and Supplementary Tables 3–8.
Average number of correctly selected genes by Cox regression (COX), weighted Cox regression (WHE) and concordance regression (CON) without and with truncation of weights from the multivariate evaluation. Lower and upper parts of each bar correspond to correctly selected genes with small and large effect sizes, respectively. %c, percent censored; N, sample size.
Average number of correctly selected genes by Cox regression (COX), weighted Cox regression (WHE) and concordance regression (CON) without and with truncation of weights from the multivariate evaluation. Lower and upper parts of each bar correspond to correctly selected genes with small and large effect sizes, respectively. %c, percent censored; N, sample size.
The average number of correctly selected genes which corresponds to the true positive rate, TPR, under various censoring proportions and sample sizes is graphically compared in Figure 3. The TPR is significantly highest for CON in scenarios with no or 33% censoring (all paired t-tests yielded P<10−6, cf. Supplementary Table 9). Although CON estimates are on average least biased, their variability is higher than that of WHE or COX, which leads to the impaired performance of CON with 67% censoring. This could be due the unequal weighting of the contributions to the likelihood, which can be severe when there are few ‘long survivors’ in the data set. To address this issue, we truncated all weights at their 95th percentile and found that the relative loss of efficiency of CON is compensated (CON compared to COX: 31.5 versus 32.0 genes, P=0.039 for N=200; 41.4 versus 41.7 genes, P=0.551 for N=800; Table 1). Increasing the sample size from 200 to 800, the number of correctly selected genes increases by ∼35% for CON and by ∼32% for COX and WHE.
Average number of true positive genes in 200 simulated data sets selected by Cox regression/weighted Cox regression/concordance regression with truncation of weights
| . | . | 0%c . | 33%c . | 67%c . | |||
|---|---|---|---|---|---|---|---|
| N . | Hazard . | Small ES . | Large ES . | Small ES . | Large ES . | Small ES . | Large ES . |
| 200 | PH | 6.8/6.7/7.0 | 5.2/5.1/5.6 | 6.5/6.5/6.6 | 4.8/4.8/5.3 | 6.2/5.9/6.1 | 4.5/4.4/4.4 |
| DH | 6.8/5.8/6.6 | 5.6/4.9/5.4 | 6.3/6.1/6.7 | 5.0/4.7/5.1 | 5.3/5.3/5.5 | 3.8/3.8/4.1 | |
| CH | 6.3/7.5/7.2 | 4.6/5.5/5.7 | 6.5/6.8/6.7 | 5.0/5.5/5.3 | 6.9/6.6/6.6 | 5.3/4.9/4.8 | |
| Subtotal | 19.9/20.0/20.8 | 15.4/15.6/16.8 | 19.3/19.4/20.1 | 14.8/14.9/15.8 | 18.4/17.8/18.2 | 13.6/13.1/13.3 | |
| Totala | 35.3/35.6/37.6 | 34.1/34.3/35.9 | 32.0/30.9/31.5 | ||||
| 800 | PH | 8.6/8.8/9.6 | 7.2/7.2/7.7 | 8.3/8.4/8.8 | 6.9/7.0/7.5 | 7.5/7.2/7.6 | 6.3/6.1/6.4 |
| DH | 9.5/8.0/9.4 | 7.4/6.8/7.4 | 8.1/7.2/8.5 | 7.0/6.7/7.3 | 6.5/6.8/6.7 | 5.3/5.4/5.9 | |
| CH | 7.5/9.2/9.5 | 6.6/7.6/7.8 | 7.8/8.8/9.1 | 6.8/7.3/7.5 | 8.9/7.9/7.9 | 7.2/6.7/6.9 | |
| Subtotal | 25.5/26.1/28.4 | 21.2/21.7/22.9 | 24.3/24.5/26.3 | 20.7/21.0/22.3 | 22.8/21.8/22.2 | 18.9/18.2/19.2 | |
| Totala | 46.7/47.8/51.3 | 45.0/45.5/48.6 | 41.7/40.0/41.4 | ||||
| . | . | 0%c . | 33%c . | 67%c . | |||
|---|---|---|---|---|---|---|---|
| N . | Hazard . | Small ES . | Large ES . | Small ES . | Large ES . | Small ES . | Large ES . |
| 200 | PH | 6.8/6.7/7.0 | 5.2/5.1/5.6 | 6.5/6.5/6.6 | 4.8/4.8/5.3 | 6.2/5.9/6.1 | 4.5/4.4/4.4 |
| DH | 6.8/5.8/6.6 | 5.6/4.9/5.4 | 6.3/6.1/6.7 | 5.0/4.7/5.1 | 5.3/5.3/5.5 | 3.8/3.8/4.1 | |
| CH | 6.3/7.5/7.2 | 4.6/5.5/5.7 | 6.5/6.8/6.7 | 5.0/5.5/5.3 | 6.9/6.6/6.6 | 5.3/4.9/4.8 | |
| Subtotal | 19.9/20.0/20.8 | 15.4/15.6/16.8 | 19.3/19.4/20.1 | 14.8/14.9/15.8 | 18.4/17.8/18.2 | 13.6/13.1/13.3 | |
| Totala | 35.3/35.6/37.6 | 34.1/34.3/35.9 | 32.0/30.9/31.5 | ||||
| 800 | PH | 8.6/8.8/9.6 | 7.2/7.2/7.7 | 8.3/8.4/8.8 | 6.9/7.0/7.5 | 7.5/7.2/7.6 | 6.3/6.1/6.4 |
| DH | 9.5/8.0/9.4 | 7.4/6.8/7.4 | 8.1/7.2/8.5 | 7.0/6.7/7.3 | 6.5/6.8/6.7 | 5.3/5.4/5.9 | |
| CH | 7.5/9.2/9.5 | 6.6/7.6/7.8 | 7.8/8.8/9.1 | 6.8/7.3/7.5 | 8.9/7.9/7.9 | 7.2/6.7/6.9 | |
| Subtotal | 25.5/26.1/28.4 | 21.2/21.7/22.9 | 24.3/24.5/26.3 | 20.7/21.0/22.3 | 22.8/21.8/22.2 | 18.9/18.2/19.2 | |
| Totala | 46.7/47.8/51.3 | 45.0/45.5/48.6 | 41.7/40.0/41.4 | ||||
The total number of selected genes was 72 for each method and each scenario. The effect sizes (ES) were set to ‘small’ for 48 and to ‘large’ for 24 out of 5000 candidate genes. Total numbers of correctly selected genes were statistically compared between the methods by paired t-tests. PH, proportional hazards; DH, diverging hazards; CH, converging hazards; %c, percent censored; ES, effect size; N, sample size.
aThe significantly (P<0.01) highest total number of true positive genes is set in boldface.
Average number of true positive genes in 200 simulated data sets selected by Cox regression/weighted Cox regression/concordance regression with truncation of weights
| . | . | 0%c . | 33%c . | 67%c . | |||
|---|---|---|---|---|---|---|---|
| N . | Hazard . | Small ES . | Large ES . | Small ES . | Large ES . | Small ES . | Large ES . |
| 200 | PH | 6.8/6.7/7.0 | 5.2/5.1/5.6 | 6.5/6.5/6.6 | 4.8/4.8/5.3 | 6.2/5.9/6.1 | 4.5/4.4/4.4 |
| DH | 6.8/5.8/6.6 | 5.6/4.9/5.4 | 6.3/6.1/6.7 | 5.0/4.7/5.1 | 5.3/5.3/5.5 | 3.8/3.8/4.1 | |
| CH | 6.3/7.5/7.2 | 4.6/5.5/5.7 | 6.5/6.8/6.7 | 5.0/5.5/5.3 | 6.9/6.6/6.6 | 5.3/4.9/4.8 | |
| Subtotal | 19.9/20.0/20.8 | 15.4/15.6/16.8 | 19.3/19.4/20.1 | 14.8/14.9/15.8 | 18.4/17.8/18.2 | 13.6/13.1/13.3 | |
| Totala | 35.3/35.6/37.6 | 34.1/34.3/35.9 | 32.0/30.9/31.5 | ||||
| 800 | PH | 8.6/8.8/9.6 | 7.2/7.2/7.7 | 8.3/8.4/8.8 | 6.9/7.0/7.5 | 7.5/7.2/7.6 | 6.3/6.1/6.4 |
| DH | 9.5/8.0/9.4 | 7.4/6.8/7.4 | 8.1/7.2/8.5 | 7.0/6.7/7.3 | 6.5/6.8/6.7 | 5.3/5.4/5.9 | |
| CH | 7.5/9.2/9.5 | 6.6/7.6/7.8 | 7.8/8.8/9.1 | 6.8/7.3/7.5 | 8.9/7.9/7.9 | 7.2/6.7/6.9 | |
| Subtotal | 25.5/26.1/28.4 | 21.2/21.7/22.9 | 24.3/24.5/26.3 | 20.7/21.0/22.3 | 22.8/21.8/22.2 | 18.9/18.2/19.2 | |
| Totala | 46.7/47.8/51.3 | 45.0/45.5/48.6 | 41.7/40.0/41.4 | ||||
| . | . | 0%c . | 33%c . | 67%c . | |||
|---|---|---|---|---|---|---|---|
| N . | Hazard . | Small ES . | Large ES . | Small ES . | Large ES . | Small ES . | Large ES . |
| 200 | PH | 6.8/6.7/7.0 | 5.2/5.1/5.6 | 6.5/6.5/6.6 | 4.8/4.8/5.3 | 6.2/5.9/6.1 | 4.5/4.4/4.4 |
| DH | 6.8/5.8/6.6 | 5.6/4.9/5.4 | 6.3/6.1/6.7 | 5.0/4.7/5.1 | 5.3/5.3/5.5 | 3.8/3.8/4.1 | |
| CH | 6.3/7.5/7.2 | 4.6/5.5/5.7 | 6.5/6.8/6.7 | 5.0/5.5/5.3 | 6.9/6.6/6.6 | 5.3/4.9/4.8 | |
| Subtotal | 19.9/20.0/20.8 | 15.4/15.6/16.8 | 19.3/19.4/20.1 | 14.8/14.9/15.8 | 18.4/17.8/18.2 | 13.6/13.1/13.3 | |
| Totala | 35.3/35.6/37.6 | 34.1/34.3/35.9 | 32.0/30.9/31.5 | ||||
| 800 | PH | 8.6/8.8/9.6 | 7.2/7.2/7.7 | 8.3/8.4/8.8 | 6.9/7.0/7.5 | 7.5/7.2/7.6 | 6.3/6.1/6.4 |
| DH | 9.5/8.0/9.4 | 7.4/6.8/7.4 | 8.1/7.2/8.5 | 7.0/6.7/7.3 | 6.5/6.8/6.7 | 5.3/5.4/5.9 | |
| CH | 7.5/9.2/9.5 | 6.6/7.6/7.8 | 7.8/8.8/9.1 | 6.8/7.3/7.5 | 8.9/7.9/7.9 | 7.2/6.7/6.9 | |
| Subtotal | 25.5/26.1/28.4 | 21.2/21.7/22.9 | 24.3/24.5/26.3 | 20.7/21.0/22.3 | 22.8/21.8/22.2 | 18.9/18.2/19.2 | |
| Totala | 46.7/47.8/51.3 | 45.0/45.5/48.6 | 41.7/40.0/41.4 | ||||
The total number of selected genes was 72 for each method and each scenario. The effect sizes (ES) were set to ‘small’ for 48 and to ‘large’ for 24 out of 5000 candidate genes. Total numbers of correctly selected genes were statistically compared between the methods by paired t-tests. PH, proportional hazards; DH, diverging hazards; CH, converging hazards; %c, percent censored; ES, effect size; N, sample size.
aThe significantly (P<0.01) highest total number of true positive genes is set in boldface.
If the number of selected genes m is varied, the TPR and similarly the false positive rates change. These changes concern all methods alike, such that we can conclude that the superior performance of CON is independent of a particular choice for m. Generally, TPRs are higher with a sample size of 800 compared to 200, but the general conclusions do not change (Supplementary Fig. 7).
In Table 1 and Supplementary Figure 8, we investigate which type of time-dependency is favored by the methods, and whether this depends on censoring and/or sample size. Ideally, genes with equal effect sizes should be selected with equal probability, irrespective of the type of time-dependency (PH, DH or CH), and the censoring pattern. We learn that this ideal situation is best accomplished by CON, which yields the best balance between genes from all types of time-dependency. By contrast, WHE selects CH genes more than others, and with COX the proportions of selected genes of different type change with increasing amount of censoring. These results are independent of the sample size, which affects the number of selected genes in all methods and with all types of genes in the same manner.
3.3 Application to real-life studies
We applied univariate COX, WHE and CON to all genes of three microarray data sets and evaluated differences in gene selection. Beer et al. (2002) studied the association of survival and gene expression profiles of microarray data of 86 patients with early-stage lung adenocarcinomas. Similarly, Bhattacharjee et al. (2001) investigated correlation of gene expression from lung adenocarcinomas with a survival endpoint in 125 patients. In a study by Rosenwald et al. (2002) the survival and gene expressions of 240 patients with diffuse large B-cell lymphoma were analyzed. For information regarding pre-processing we refer to our Supplementary Data, Section 3.
for the three data sets are given in Supplementary Table 1. The
was then calculated as
.Results are summarized in Table 2. In all three data sets approximately half of the genes have a negative effect on survival, i.e. ĉ′g<0.5. The range of ĉ′ varies considerably between the three data sets, with the largest range in the Beer data set (0.254–0.828 computed by CON) and the smallest range in the Bhattacharjee data set (0.412–0.591 computed by CON). The correlation of the absolute effect size estimates ĉ′+ from WHE and CON is close to 1 for all data sets, whereas it is considerably smaller when correlating WHE or CON with COX estimates.
Results of analysis with Cox (COX), weighted Cox (WHE) and concordance (CON) regression of three data sets
| . | No. of genes . | No. of obs./number of events . | Cor of ĉ′+ . | ![]() . | No. of genes selected by . | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| . | . | . | COX and WHE . | COX and CON . | WHE and CON . | COX . | WHE . | CON . | COX and WHE . | COX and CON . | WHE and CON . | COX, WHE and CON . |
| Beer | 4966 | 86/24 | 0.792 | 0.767 | 0.972 | 0.389 | 0.492 | 0.845 | 167 (67%) | 145 (58%) | 177 (71%) | 134 (54%) |
| Bhattacharjee | 12 600 | 125/71 | 0.829 | 0.829 | 0.994 | 1.053 | 0.841 | 0.923 | 167 (67%) | 158 (63%) | 197 (79%) | 144 (58%) |
| Rosenwald | 7053 | 240/138 | 0.464 | 0.613 | 0.902 | 0.383 | 0.721 | 0.369 | 119 (48%) | 143 (57%) | 172 (69%) | 109 (44%) |
| . | No. of genes . | No. of obs./number of events . | Cor of ĉ′+ . | ![]() . | No. of genes selected by . | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| . | . | . | COX and WHE . | COX and CON . | WHE and CON . | COX . | WHE . | CON . | COX and WHE . | COX and CON . | WHE and CON . | COX, WHE and CON . |
| Beer | 4966 | 86/24 | 0.792 | 0.767 | 0.972 | 0.389 | 0.492 | 0.845 | 167 (67%) | 145 (58%) | 177 (71%) | 134 (54%) |
| Bhattacharjee | 12 600 | 125/71 | 0.829 | 0.829 | 0.994 | 1.053 | 0.841 | 0.923 | 167 (67%) | 158 (63%) | 197 (79%) | 144 (58%) |
| Rosenwald | 7053 | 240/138 | 0.464 | 0.613 | 0.902 | 0.383 | 0.721 | 0.369 | 119 (48%) | 143 (57%) | 172 (69%) | 109 (44%) |
The table summarizes the number of observations and events (‘No. of obs/No. of events’), the correlation of the estimated absolute effect size (‘Cor of ĉ′+’), the estimated false discovery rate (‘
’) and the number of genes selected in common by COX, WHE and CON when 250 genes are selected based on the estimated absolute effect size. Further results can be found in the Supplementary Data, section 3.
Results of analysis with Cox (COX), weighted Cox (WHE) and concordance (CON) regression of three data sets
| . | No. of genes . | No. of obs./number of events . | Cor of ĉ′+ . | ![]() . | No. of genes selected by . | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| . | . | . | COX and WHE . | COX and CON . | WHE and CON . | COX . | WHE . | CON . | COX and WHE . | COX and CON . | WHE and CON . | COX, WHE and CON . |
| Beer | 4966 | 86/24 | 0.792 | 0.767 | 0.972 | 0.389 | 0.492 | 0.845 | 167 (67%) | 145 (58%) | 177 (71%) | 134 (54%) |
| Bhattacharjee | 12 600 | 125/71 | 0.829 | 0.829 | 0.994 | 1.053 | 0.841 | 0.923 | 167 (67%) | 158 (63%) | 197 (79%) | 144 (58%) |
| Rosenwald | 7053 | 240/138 | 0.464 | 0.613 | 0.902 | 0.383 | 0.721 | 0.369 | 119 (48%) | 143 (57%) | 172 (69%) | 109 (44%) |
| . | No. of genes . | No. of obs./number of events . | Cor of ĉ′+ . | ![]() . | No. of genes selected by . | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| . | . | . | COX and WHE . | COX and CON . | WHE and CON . | COX . | WHE . | CON . | COX and WHE . | COX and CON . | WHE and CON . | COX, WHE and CON . |
| Beer | 4966 | 86/24 | 0.792 | 0.767 | 0.972 | 0.389 | 0.492 | 0.845 | 167 (67%) | 145 (58%) | 177 (71%) | 134 (54%) |
| Bhattacharjee | 12 600 | 125/71 | 0.829 | 0.829 | 0.994 | 1.053 | 0.841 | 0.923 | 167 (67%) | 158 (63%) | 197 (79%) | 144 (58%) |
| Rosenwald | 7053 | 240/138 | 0.464 | 0.613 | 0.902 | 0.383 | 0.721 | 0.369 | 119 (48%) | 143 (57%) | 172 (69%) | 109 (44%) |
The table summarizes the number of observations and events (‘No. of obs/No. of events’), the correlation of the estimated absolute effect size (‘Cor of ĉ′+’), the estimated false discovery rate (‘
’) and the number of genes selected in common by COX, WHE and CON when 250 genes are selected based on the estimated absolute effect size. Further results can be found in the Supplementary Data, section 3.
If the 250 genes with the largest absolute effect size estimates ĉ′+ are selected the best agreement in gene selection is observed between WHE and CON: 71, 79 and 69% in the three data sets. The proportion of genes selected by all three methods is ∼50% in all three data sets. The smallest
values were obtained by COX in the Beer data (0.389), WHE in the Bhattacharjee data (0.841) and CON in the Rosenwald set (0.369). These data dependent advantages of the methods were also observed with higher or lower numbers of selected genes (see Supplementary Table 2). In the Bhattacharjee data the
estimate from COX selection was larger than 1, a situation which was already anticipated by Tusher et al. (2001).
4 DISCUSSION
We have introduced c′, a semi-parametric generalization of the well-known concordance probability for continuous predictors and have shown that it is a suitable measure to compare effect sizes between genes irrespective of PHs. In case of PHs, c′ can be directly computed from the Cox regression coefficient
. In case of NPH, this relationship no longer holds. We have shown by simulation that a novel method, CON, supplies an empirically unbiased estimate of c′. WHE as recently proposed approximates this value very well in most but not all cases, and has some efficiency advantages over CON. Cox regression in general fails to provide a consistent estimate of c′ in case of NPH, and this bias is further modified by censoring. With large proportions of censored survival times, CON estimates may become inefficient due to unequal weighting of the contributions to the likelihood. As we have demonstrated, truncating weights of CON at a high, e.g. the 95th percentile may then be applied to reduce the variability of the estimates, while the amount of bias introduced is still negligible.
If an association analysis of high-dimensional data with survival involves a gene selection step, then a gene ranking based on estimates of c′ may be preferable to a gene ranking based on Cox regression coefficients, because the former does not rely on the assumption of PHs. By contrast, the gene ranking by Cox regression does not only depend on the true effect size of the genes, but also on the realized follow-up distribution. Thus these rankings may not be reproducible under different follow-up schemes. We have shown by analysis of simulated and three real data sets that gene rankings by estimates of c′ provided by CON or WHE may yield different results than gene rankings by log HR estimates from Cox regression. The rankings from WHE and CON were more similar than compared to gene rankings from Cox regression, and the agreement of the former two with Cox regression in an absolute sense was low in all three data sets. We have also shown that FDR thresholding based on c′ is straightforward.
Our investigation focused on gene selection. This was motivated by our own experience, that most often the microarray platform is mainly used to screen the whole genome for suitable candidate genes. Gene expression data, reduced to a set of, say, 50–400 candidate genes, is then re-determined using a high-sensitivity platform such as real time polymerase chain reaction, and only from these validated expression values statistical models, often on an even further reduced set of genes, will be developed. For some of the methods for prediction of survival from high-dimensional data it was proposed to use gene selection as a first step, e.g. supervised principal components regression (Bair and Tibshirani, 2004; Bair et al., 2006). Application of the methods investigated in this contribution in combined selection and prediction models like the LASSO (Park and Hastie, 2007; Tibshirani, 1997), ridge regression (Verweij and van Houwelingen, 1994) or partial least squares regression (Park et al., 2002) is in principle straightforward. However, our investigation leaves the question open whether the modest improvements in gene selection observed with WHE or CON can contribute to improved estimation of prediction models. Nevertheless, we consider the robustness of these methods to NPH as being of particular advantage for real-life applications.
Thus, one may replace Cox regression by WHE or CON at any stage of model development, to obtain prediction models without having to assume PHs. As final result, such prediction models supply cross-validated risk scores to assess and compare different levels of risk between subjects. An evaluation of the association of the risk scores with survival using c′ may improve interpretation as it allows quantifying the impact of the genetic information on survival, provides a direct comparison of subjects and at the same time is robust to violations of the PHs assumption.
ACKNOWLEDGEMENTS
The authors are grateful to Samo Wakounig, Vienna and Ståle Nygård, Oslo, and three anonymous reviewers for comments that considerably improved the article.
Funding: FWF Austrian Science Fund (P18553-N13).
Conflict of interest: none declared.
APPENDIX A
A.1 EQUIVALENCE OF c/(1−c) AND THE HAZARD RATIO UNDER PROPORTIONAL HAZARDS
from a Cox regression analysis to estimate c.REFERENCES
Author notes
Associate Editor: Trey Ideker













