-
PDF
- Split View
-
Views
-
Cite
Cite
Verena Zuber, Korbinian Strimmer, Gene ranking and biomarker discovery under correlation, Bioinformatics, Volume 25, Issue 20, 15 October 2009, Pages 2700–2707, https://doi.org/10.1093/bioinformatics/btp460
Close -
Share
Abstract
Motivation: Biomarker discovery and gene ranking is a standard task in genomic high-throughput analysis. Typically, the ordering of markers is based on a stabilized variant of the t-score, such as the moderated t or the SAM statistic. However, these procedures ignore gene–gene correlations, which may have a profound impact on the gene orderings and on the power of the subsequent tests.
Results: We propose a simple procedure that adjusts gene-wise t-statistics to take account of correlations among genes. The resulting correlation-adjusted t-scores (‘cat’ scores) are derived from a predictive perspective, i.e. as a score for variable selection to discriminate group membership in two-class linear discriminant analysis. In the absence of correlation the cat score reduces to the standard t-score. Moreover, using the cat score it is straightforward to evaluate groups of features (i.e. gene sets). For computation of the cat score from small sample data, we propose a shrinkage procedure. In a comparative study comprising six different synthetic and empirical correlation structures, we show that the cat score improves estimation of gene orderings and leads to higher power for fixed true discovery rate, and vice versa. Finally, we also illustrate the cat score by analyzing metabolomic data.
Availability: The shrinkage cat score is implemented in the R package ‘st’, which is freely available under the terms of the GNU General Public License (version 3 or later) from CRAN (http://cran.r-project.org/web/packages/st/).
Contact:strimmer@uni-leipzig.de
1 INTRODUCTION
The discovery of genomic biomarkers is often based on case–control studies. For instance, consider a typical microarray experiment comparing healthy to cancer tissue. A shortlist of genes relevant for discriminating the phenotype of interest is compiled by ranking genes according to their respective t-scores. Because of the high dimensionality of the genomic data special stabilizing procedures such as ‘SAM’, ‘moderated t’ or ‘shrinkage t’ are warranted and most effective—see Opgen-Rhein and Strimmer (2007) for a recent comparative study.
However, microarrays are only one particularly prominent example of a series of modern technologies emerging for high-throughput biomarker discovery. In addition to gene expression, it is now a common practice in biomedical laboratories to measure metabolite concentrations and protein abundances. A distinguishing feature of proteomic and metabolic data is the presence of correlation among markers, due to chemical similarities (metabolites) and spatial dependencies (spectral data). These correlations may impact statistical conclusions.
There are three main strategies for dealing with the issue of correlation among biomarkers. One approach is to initially ignore the correlation structure and to compute conventional t-scores. Subsequently, the effects of correlation are accommodated in the last stage of the analysis when statistical significance is assigned (Efron, 2007; Shi et al., 2008). An alternative approach is to model the correlation structure explicitly in the data-generating process, and base all inferences on this more complex model. For example, in case of proteomics data, a spatial autoregressive model can account for dependencies between neighboring peaks (Hand, 2008). A third strategy, occupying middle ground between the two described approaches, is to combine t-scores and the estimated correlations to a new gene-wise test statistic. This approach is followed in Tibshirani and Wasserman (2006) and Lai (2008) and it is also the route we pursue here.
Specifically, we propose ‘correlation-adjusted t’-scores, or for short ‘cat’ scores. These scores are derived from a predictive perspective by exploiting a close link between gene ranking and two-class linear discriminant analysis (LDA). It is well known (Fan and Fan, 2008) that the t-score is the natural feature selection criterion in diagonal discriminant analysis (DDA), i.e. when there is no correlation. As we argue here in the general LDA case assuming arbitrary correlation structure, this role is taken over by the cat score.
For practical application of the cat score as a ranking criterion for biomarkers, we develop a corresponding shrinkage procedure, which can be employed in high-dimensional settings with a comparatively small number of samples. This statistic reduces to the shrinkage t approach (Opgen-Rhein and Strimmer, 2007), if there is no correlation. We also provide a recipe for constructing cat scores from other regularized t-statistics. Furthermore, we show that the cat score enables in a straightforward fashion the ranking of sets of features, and thus facilitates the analysis of gene set enrichment (Ackermann and Strimmer, 2009).
The rest of the article is organized as follows. Next, we present the methodological background, the definition of the cat score and a corresponding small sample shrinkage procedure. Subsequently, we report results from a comparative study where we investigate the performance of the shrinkage cat score relative to other gene ranking procedures including the approaches by Tibshirani and Wasserman (2006) and Lai (2008). In our study, we assume a variety of both synthetic as well as empirical correlation scenarios from gene expression data. Finally, we illustrate the cat score approach by analyzing a metabolomic dataset and conclude with recommendations.
2 METHODS
2.1 Linear discriminant analysis
2.2 Feature selection in two-class LDA
Gene ranking and feature selection for class prediction are closely connected. We exploit this here to define a score for ranking genes (features) in the presence of correlation. In what follows, we consider LDA for precisely two classes, i.e. the typical setup in case–control studies.
the log-ratio of the mixing proportions π1 and π2;
δ(x), the standardized and decorrelated distance of the test sample x to the average centroid; and
the variable-specific feature weights ω.
The interpretation of ω as general feature weights is supported by considering the special case of DDA. In the absence of correlation, the weights ω directly reduce to V−1/2 (μ1 − μ2), which is (apart from a constant) the usual vector of two-sample t-scores. It is well known that in the DDA setting the t-score is the natural and optimal ranking criterion for discovering genes that differentiate best between the two classes (Fan and Fan, 2008). Note that if we would (hypothetically) decompose the product ωTδ(x) from Equation (1) in a different fashion, e.g. such that the factor P−1/2V−1/2 was moved from Equation (3) to Equation (2), then in the limit of vanishing correlation the ranking criterion would not be a t-score but rather V−1 (μ1 − μ2). Similarly, if we would move only the factor P−1/2 from Equation (3) to Equation (2), then a number of other inconsistencies arise, in particular the connection of ω with Hotelling's T2 statistic (see further below) is lost. Therefore, the decomposition as given by Equations (2) and (3) is the most natural.
2.3 Definition of the cat score
ensures that the empirical version of the cat score matches the scale of the empirical t-score. The vector τ contains the gene-wise t-scores, and nk is the number of observations in group k.The cat score is a natural and intuitive extension of both the fold change and t-score, as illustrated in Figure 1. While the t-score is the standardized mean difference μ1 − μ2, the cat score is the standardized as well as decorrelated mean difference. The factor P−1/2 responsible for the decorrelation is well known from the Mahalanobis transform that is frequently applied to prewhiten multivariate data. Also note that the inverse correlation matrix is closely related to partial correlations.
Relationship between fold change, t-score and the cat score. The constant c equals
.
Relationship between fold change, t-score and the cat score. The constant c equals
.
2.4 Estimation of feature weights and computation of the cat score from data
Substituting empirical estimates for means, variances and correlations into Equations (2) and (4) provides a simple recipe for estimating the feature weights and computing the cat score from data. However, this is only a valid approach if sample size is large compared with the dimension.
A major obstacle in the application of Equation (5) is the problem of efficiently computing (Rshrink)−1/2. Direct calculation of the matrix square root, e.g. by eigenvalue decomposition, is extremely tedious for large dimensions p. Instead, we present here a simple time-saving identity for computing the α-th power of Rshrink (though here we only need the case α = −1/2).
, where M is a symmetric positive definite matrix of size m times m and U an orthonormal basis. Note that m is the rank of R. Subsequently, to calculate the α-th power of Z we use the identity1 that requires only the computation of the α-th power of the matrix Im + M. This trick enables substantial computational savings when the number of samples (and hence m) is much smaller than p.Finally, additional information about the structure of the correlation matrix P (or its inverse) may also be taken into account when estimating the cat score. This is done simply by replacing the unrestricted shrinkage estimator by a more structured estimator (e.g. Guillemot et al., 2008; Li and Li, 2008; Tai and Pan, 2007).
2.5 Selection of single genes
The cat score offers a simple approach to feature selection, both of individual genes and sets of genes (see below).
By construction, the cat score is a decorrelated t-score. As such, it measures the individual contribution of each single feature to separate the two groups, after removing the effect of all other genes. Therefore, to select individual genes according to their relative effect on group separation one simply ranks them according to the magnitude of the respective τadji.
For determining p- and FDR values, we fit a two-component mixture model to the observed cat scores (Efron, 2008). Asymptotically, for large dimension the null distribution is approximately normal as a consequence of central limit theorems for dependent random variables (e.g. Hoeffding and Robbins, 1948; Romano and Wolf, 2000)—recall that the cat score is a weighted sum of dependent t-statistics. This is validated empirically in Section 3.5 discussing the analysis of a metabolomic dataset.
For practical analysis, we suggest employing the ‘fdrtool’ algorithm (Strimmer, 2008a, b). A comparison of methods for assigning significance to cat scores is given in Ahdesmäki and Strimmer (2009).
2.6 Selection of gene sets
For evaluating the total effect of a set of features on group separation, we exploit the close connection of cat scores with the Hotelling's T2 statistic, a standard criterion in gene set analysis (Kong et al., 2006; Lu et al., 2005).
Specifically, T2 =(tadj)Ttadj = tTR−1t, where R is the empirical correlation matrix, tadj is the empirical cat score vector and t the vector containing the gene-wise Student's t-statistic. In other words, the T2 statistic is identical to the sum of the squared individual empirical cat scores for the genes in the set. Note that any normalization with regard to the size of the set is implicit in the factor R−1. For example, if there is strong correlation among the genes in a set, then T2 is approximately the average of the underlying squared t-scores.
There are two main cases when it is important to consider sets of genes rather than individual genes: We note that using the grouped cat score provides a simple procedure for high-dimensional feature selection where whole sets of variables are simultaneously included or excluded, in contrast with the classical view of feature selection where only one of those features is retained [see also Bondell and Reich (2008) for references to related approaches].
First, in a gene set enrichment analysis where prespecified pathways or functional units rather than individual genes are being investigated [cf. Ackermann and Strimmer (2009)].
Second, if genes are highly correlated and thus provide the same information on group separation. To accommodate for this collinearity, we suggest constructing a suitable correlation neighborhood around each gene, e.g. by the rule |r| ≥ 0.85. Typically, the resulting sets are rather small and for most genes are comprised only of the gene itself – see Tibshirani and Wasserman (2006) and also Läuter et al. (2009) for similar procedures. Note that the suggested threshold of 0.85—which we use throughout this article—is rather conservative. It defines a priori which pair of genes are assumed to be collinear.
3 RESULTS
In order to study the performance of the cat score for feature selection and gene ranking, we conducted an extensive study. Specifically, we investigated six different correlation scenarios, three synthetic models and three empirical correlation matrices estimated from three different gene expression datasets, and compared the results with a diverse number of regularized t-scores. Furthermore, we analyzed a metabolomic dataset investigating prostate cancer.
3.1 Correlation scenarios
For the correlation structure, we considered a variety of scenarios. Specifically, we employed six different correlation patterns (cf. Fig. 2):
First, as a negative control we assumed a diagonal correlation matrix P = I of size 1000 × 1000.
Next, we employed an autoregressive block-diagonal correlation matrix (Guo et al., 2007). We used 10 blocks of size 100 × 100 genes. Within each block, the correlation between two genes i,j, = 1,…, 100 equals ρ(i, j) = ρabs(i − j). We set ρ = 0.99 with alternating sign in each block. This correlation matrix is sparse with most entries being very small, nevertheless it also contains some highly correlated genes.
Third, we employed a correlation block structure where the first 100 genes have pairwise correlation of 0.7 and the remaining 900 genes have pairwise correlation of 0.3. Between the two groups there is no correlation. The block with the larger correlation corresponds to the differentially expressed genes.
In addition to the three artificial correlation structures, we also employed shrinkage estimators of correlations matrices from three expression datasets, using a sample of 1000 genes. Structure D is obtained from gene expression data of colon cancer (Alon et al., 1999).
As D, but for breast cancer (Hedenfalk et al., 2001).
As D, but from a spike-in experiment (Choe et al., 2005).
The six correlation scenarios investigated in our study. All correlation matrices have size 1000 × 1000 and thus contain 499 500 correlation values. Top panel: histograms of the three synthetic correlation patterns (A–C). Bottom panel: histograms of the three empirical correlation structures (D–F).
The six correlation scenarios investigated in our study. All correlation matrices have size 1000 × 1000 and thus contain 499 500 correlation values. Top panel: histograms of the three synthetic correlation patterns (A–C). Bottom panel: histograms of the three empirical correlation structures (D–F).
3.2 Test statistics
In our comparison, we included the following gene ranking statistics: fold change, empirical t-statistic, ‘SAM’ (Tusher et al., 2001), ‘moderated t’ (Smyth, 2004) and ‘shrinkage t’ (Opgen-Rhein and Strimmer, 2007). As in Opgen-Rhein and Strimmer (2007), the latter three regularized t-scores gave nearly identical estimates and always outperformed Student's t, so we report here only the results for ‘shrinkage t’. As baseline reference, we also included random ordering in the analysis.
For the cat score, we investigated two variants: the shrinkage cat score [Equation (5)] and an oracle version, which uses the true underlying correlation matrix rather than estimating the correlation structure. For the two structures with high correlations (B and C), we employed the grouped cat score using a correlation neighborhood threshold of 0.85.
In addition, we included in our study two recently proposed gene ranking procedures that, like the cat score, also aim at incorporating information about gene–gene correlations in gene ranking: the ‘correlation-shared t-score’ introduced by Tibshirani and Wasserman (2006) and the ‘correlation-predicted t-score’ suggested by Lai (2008). Correlation-shared t averages over gene-specific Student's t-scores in a data-dependent correlation neighborhood. The approach by Lai (2008) employs a local smoothing approach to ‘predict’ the t-score of a particular gene from t-scores of other genes highly correlated with it. Here, we use the Lai (2008) approach with the smoothing parameter set to its default value f = 0.2. Note that the cat score, the correlation-shared t-score and the correlation-predicted t-score all are based on linear combinations of t-scores, albeit with different weights.
3.3 Data generation
In our data generation procedure, we followed closely the setup in Smyth (2004) and Opgen-Rhein and Strimmer (2007), with the additional specification of a correlation structure among genes. In detail, the simulations were conducted as follows:
The number of genes was fixed at p = 1000. The first 100 genes were designated to be differentially expressed.
The variances across genes were drawn from a scale-inverse-chi-square distribution, scale-inv-χ2(d0, s02). We used s20 = 4 and d0 = 4, which corresponds to the ‘balanced’ variance case in Smyth (2004). Thus, the variances vary moderately from gene to gene.
The difference of means for the differentially expressed genes (1–100) were drawn from a normal distribution with mean zero and the gene-specific variance. For the non-differentially expressed genes (101–1000) the difference was set to zero.
The data were generated by drawing from group-specific multivariate normal distributions with the given variances and means. The correlation matrix assumed one of the above structures A–F.
We also varied the sample sizes n1 and n2 in each group, from very small n1 = n2 = 3 to fairly large n1 = n2 = 50. Here, we report results for n1 = n2 = 8.
3.4 Comparison of gene rankings
For each correlation scenario A–F, we generated 500 datasets and computed corresponding gene rankings using the various t- and cat scores discussed above. We then counted false positives (FP), true positives (TP), false negatives (FN) and true negatives (TN) for all possible cutoffs in the gene list (1–1000). From this data, we estimated the true discovery rates (= positive predictive value, ppv) E(TP/(TP + FP)) and the power (= sensitivity) E(TP/(TP + FN)).
A graphical summary of the results are presented in Figures 3 and 4. The first column shows the true discovery rates as a function of the number of included top-ranking genes, whereas the second column gives the plots of true discovery rate versus power. The latter graphs, known in the machine learning community as ‘precision–recall’ plots, highlight methods that simultaneously have large power and large true discovery rates.
True discovery rates (left panel) and precision–recall curves (right panel) for the three synthetic correlation structures A–C. Note that for B and C the grouped cat score was employed, using a correlation neighborhood |r| ≥ 0.85.
True discovery rates (left panel) and precision–recall curves (right panel) for the three synthetic correlation structures A–C. Note that for B and C the grouped cat score was employed, using a correlation neighborhood |r| ≥ 0.85.
True discovery rates (left panel) and precision–recall curves (right panel) for the three empirical correlation scenarios D–F.
True discovery rates (left panel) and precision–recall curves (right panel) for the three empirical correlation scenarios D–F.
The first row in Figure 3 shows the control case when there is no correlation. As expected, the cat score performs identical to the shrinkage t approach. A similar performance is given by the correlation-shared t and the fold change statistic, slightly worse than shrinkage t- and cat score. The ordering provided by the correlation-predicted t-score is random, which is not surprising as prediction fails when there is no correlation.
For the autoregressive and the block structure (scenarios B and C in Fig. 3), substantial gains are achieved over the shrinkage t-score, both by the cat score and the correlation-shared t-score by Tibshirani and Wasserman (2006). In particular, in case B these two methods show near-perfect recovery of the gene ranking. The shrinkage t approach and fold change remain the second and third best feature ranking approach, with the correlation-predicted t-score of Lai (2008) trailing the comparison.
For the empirically estimated correlation structures, the picture changes slightly (cf. Fig. 4). All these scenarios have in common that there is background correlation but no very strong individual pairwise correlation (cf. Fig. 2, bottom panel). In this setting, the shrinkage cat score also improves over the shrinkage t-score. The oracle cat score shows that further benefits are possible if the correlation structure was known, or if a better estimator was used. For the empirical scenarios the correlation-shared t-score performs similar as the fold change, and the correlation-predicted t-score again delivers random orderings.
In summary, in all the six quite different correlation scenarios the (grouped) cat score offers in part substantial performance improvements over standard regularized t-scores, which were represented here by shrinkage t-score. The correlation-shared t-score also performs exceptionally well if there are a few highly correlated genes, but otherwise falls back to the efficiency of using fold-change approach. In general, the correlation-predicted approach did not provide any reasonable orderings. It seems to us that this is due to the fact that it is the only test statistic that discards the actual value of the t-score of a gene, and instead relies exclusively on closely correlated genes—which may not exist.
3.5 Ranking of metabolomic markers of prostate cancer
To illustrate the effect of correlation on gene ranking, we analyzed a subset of data from a recent metabolomic study concerning prostate cancer (Sreekumar et al., 2009). The original study investigated three groups of tissues, benign, localized cancer and metastatic prostate cancer. Here, we focused on the two types of cancer tissue. Specifically, we compared 12 samples of clinically localized prostate cancers versus 14 samples of metastatic prostate cancers. For each sample, the concentrations of 518 metabolites were measured. We use here the preprocessed data as kindly provided by Dr. Sreekumar and Dr. Chinnaiyan.
For each of the 518 metabolites, we computed a shrinkage t-score and a shrinkage cat score. For the latter, we applied grouping of features with a correlation threshold of |r| ≥ 0.85. The Q–Q plot of the cat scores versus a normal distribution is shown in Figure 5. By inspection of this diagnostic plot, we see that the null model of the grouped cat scores, represented by the linear middle part, is approximately a normal distribution. The deviations from normality at the tails correspond to the alternative distribution containing the high-ranked metabolites of interest.
Plot of normal versus empirical quantiles for the grouped cat scores computed from the metabolomic prostate data. The linearity in the central part indicates a normal null model.
Plot of normal versus empirical quantiles for the grouped cat scores computed from the metabolomic prostate data. The linearity in the central part indicates a normal null model.
The 10 top ranking metabolic features that differentiate between localized and metastatic cancer according to t-scores and cat scores, respectively, are listed in Table 1. Overall, the two rankings differ quite notably, as expected in the presence of correlation. In particular, at the top of the list there are differences due to very strong correlation between the substrate X-5207 and nicotinamide (r = 0.9444) and likewise between guanosine and X-3390 (r = 0.9389). Unlike with t-scores, in a grouped cat score analysis the features in these two pairs are treated as a unit. Jointly, the correlated markers outperform other individual markers with respect to distinguishing between the two phenotypic groups.
The top 10 ranking metabolites according to the shrinkage t and the grouped cat scores, respectively
| Rank . | Shrinkage t . | Grouped cat score . |
|---|---|---|
| 1 | Ciliatine | Nicotinamide |
| 2 | Inosine | X-5207 |
| 3 | Putrescine | Guanosine |
| 4 | X-3390 | X-3390 |
| 5 | Palmitate | Ciliatine |
| 6 | Glycerol | Putrescine |
| 7 | Ribose | Inosine |
| 8 | X-3102 | Citrate |
| 9 | Myristate | Uridine |
| 10 | X-4620 | X-2867 |
| Rank . | Shrinkage t . | Grouped cat score . |
|---|---|---|
| 1 | Ciliatine | Nicotinamide |
| 2 | Inosine | X-5207 |
| 3 | Putrescine | Guanosine |
| 4 | X-3390 | X-3390 |
| 5 | Palmitate | Ciliatine |
| 6 | Glycerol | Putrescine |
| 7 | Ribose | Inosine |
| 8 | X-3102 | Citrate |
| 9 | Myristate | Uridine |
| 10 | X-4620 | X-2867 |
Note that nicotinamide and X-5207, as well as guanosine and X-3390, are strongly correlated.
The top 10 ranking metabolites according to the shrinkage t and the grouped cat scores, respectively
| Rank . | Shrinkage t . | Grouped cat score . |
|---|---|---|
| 1 | Ciliatine | Nicotinamide |
| 2 | Inosine | X-5207 |
| 3 | Putrescine | Guanosine |
| 4 | X-3390 | X-3390 |
| 5 | Palmitate | Ciliatine |
| 6 | Glycerol | Putrescine |
| 7 | Ribose | Inosine |
| 8 | X-3102 | Citrate |
| 9 | Myristate | Uridine |
| 10 | X-4620 | X-2867 |
| Rank . | Shrinkage t . | Grouped cat score . |
|---|---|---|
| 1 | Ciliatine | Nicotinamide |
| 2 | Inosine | X-5207 |
| 3 | Putrescine | Guanosine |
| 4 | X-3390 | X-3390 |
| 5 | Palmitate | Ciliatine |
| 6 | Glycerol | Putrescine |
| 7 | Ribose | Inosine |
| 8 | X-3102 | Citrate |
| 9 | Myristate | Uridine |
| 10 | X-4620 | X-2867 |
Note that nicotinamide and X-5207, as well as guanosine and X-3390, are strongly correlated.
Regarding the interpretation of observed enrichment of nicotinamide and guanosine, we caution that without further additional information it is not possible to decide whether this is due to intake of medication or rather due to the different progression of cancer.
For a series of other data examples further illustrating the analysis of cat scores and estimation of corresponding predictive errors, we refer to Ahdesmäki and Strimmer (2009).
4 DISCUSSION
4.1 Harmonizing gene ranking and feature selection
The correlation-adjusted t-score is the result of our attempt to harmonize gene ranking with LDA feature selection. While, it is well known that in the absence of correlation the t-score provides optimal rankings (Fan and Fan, 2008), the situation is less clear in the LDA case where genes are allowed to be correlated. Here, we show that the cat score provides a natural weight for feature selection in LDA analysis and that it can be successfully employed to rank genes and gene groups.
In order to apply cat scores in the analysis of high-dimensional data, in this article, we develop a corresponding shrinkage procedure. For moderately high dimensions and sufficient sample size, we demonstrate that incorporating correlation information into the gene ranking can lead to substantial improvement in power. However, this is only feasible if either the sample size is large or the signal is strong enough to estimate correlations (Hall et al., 2005). For microarray data with very small sample size (in the order of n1 = n2 = 3), it is impossible to estimate a large-scale correlation matrix, and unsurprisingly for that case we did not see any benefits. However, as our study shows (Figs 3 and 4) using the cat score can lead to substantial gains already for relatively moderate sample sizes (n1 = n2 = 8).
4.2 Recommendations
In high-dimensional genomic experiments with very small sample size, when nothing is known a priori about the correlation structure, we recommend employing the standard regularized t-scores.
However, for moderate ratios of p/n, say <100, it is often possible to obtain reliable estimates of the correlation among markers. Thus, in this setting, we propose ranking of biomarkers by the correlation-adjusted t-score, computed by the shrinkage procedure outlined above. In addition, if inspection of the correlation histogram shows existence of highly correlated features, then joint evaluation of those features by computing the grouped cat score is advised. Using more constrained correlation estimators may further improve the efficiency.
Finally, as pointed out by a referee, gene ranking by cat scores may be combined with fold-change-based thresholding, in order to filter out statistically significant yet biologically irrelevant features (e.g. McCarthy and Smyth, 2009).
In short, we propose to view gene ranking as a generically multivariate problem. In this perspective, it seems stringent not only to standardize the mean differences (i.e. using the corresponding t-scores), but also to additionally decorrelate them, which results in the cat score proposed here.
1The validity of the identity can be verified by noting that the eigenvalues of (Ip + UMUT)α and of the right-hand side of Equation (6) are identical (which implies similarity between the two matrices) and that no further rotation is needed for identity.
ACKNOWLEDGEMENTS
We are grateful to the anonymous referees for their very valuable comments. We thank our colleagues at IMISE for discussion and Anne-Laure Boulesteix, Florian Leitenstorfer and Abdul Nachtigaller for additional suggestions.
Conflict of Interest: none declared.
REFERENCES
Author notes
Associate Editor: Joaquin Dopazo



















