Abstract

Motivation: The microarray report measures the expressions of tens of thousands of genes, producing a feature vector that is high in dimensionality and that contains much irrelevant information. This dimensionality degrades classification performance. Moreover, datasets typically contain few samples for training, leading to the ‘curse of dimensionality’ problem. It is essential, therefore, to find good methods for reducing the size of the feature set.

Results: In this article, we propose a method for gene microarray classification that combines different feature reduction approaches for improving classification performance. Using a support vector machine (SVM) as our classifier, we examine an SVM trained using a set of selected genes; an SVM trained using the feature set obtained by Neighborhood Preserving Embedding feature transform; a set of SVMs trained using a set of orthogonal wavelet coefficients of different wavelet mothers; a set of SVMs trained using texture descriptors extracted from the microarray, considering it as an image; and an ensemble that combines the best feature extraction methods listed above. The positive results reported offer confirmation that combining different features extraction methods greatly enhances system performance. The experiments were performed using several different datasets, and our results [expressed as both accuracy and area under the receiver operating characteristic (ROC) curve] show the goodness of the proposed approach with respect to the state of the art.

Availability: The MATHLAB code of the proposed approach is publicly available at bias.csr.unibo.it/nanni/micro.rar

Contact:  loris.nanni@unipd.it

Supplementary information:  Supplementary data are available at Bioinformatics online.

1 INTRODUCTION

DNA microarray technology has proven to be an important breakthrough in molecular biology. This rapidly maturing technology is providing scientists with a means of monitoring the expression of genes on a genomic scale (Chee et al., 1996). One important application area is disease prognostication (Golub et al., 1999; Peng, 2006). Benefits include the potential for identifying individual genes responsible for disease (Der et al., 1998; Huang and Keoman, 2005; Maglietta et al., 2007; Turashvilli et al., 2007) and for providing scientists with a more accurate means of diagnosis and prognosis (Alon et al., 1999; Beer et al., 2002; Ben-Dor et al., 2003; Brown et al., 2000; Freije et al., 2004; Petricoin et al., 2002; Pomeroy et al., 2002; Singh et al., 2002; Tamayo et al., 1999). Large-scale profiling of gene expression can reveal, for example, normal versus malignant cells and the genetic and cellular changes in the progression of tumor metastasis (Golub et al., 1999).

The benefits offered by simultaneously monitoring tens of thousands of genes, however, depend on developing tools capable of handling not only the sheer size of this data but also the small number of samples usually available for analysis. Machine learning systems are well suited for this problem, but they must be designed to handle high levels of noise, as only a small minority of genes is typically relevant for any given problem. The small sample size compared to the large number of features means that these systems must also contend with the dreaded ‘curse of dimensionality’ (Lee et al., 2008). It would be very beneficial, therefore, if good methods for identifying these small sets of relevant genes could be developed.

In the literature, gene selection methods have been organized into three categories: filter, wrapper and embedded methods (Bontempi, 2007). Filter methods reveal dependencies without using classifiers and are based on statistical methods of ranking genes, e.g. t-statistics (Devore and Peck, 1997; Tibshirani et al., 2002), class separability (Dudoit et al., 2002) and Fisher's criterion (Broet et al., 2004; Lai et al., 2004). Wrapper and embedded methods consider the mutual information among genes as well as its relevance (Peng et al., 2005). Example classifiers used in wrapper methods include Bayesian classifier (Figuiredo and Jain, 2001; Hastie et al., 2009), K-nearest neighbor (Hastie et al., 2009; Tibshirani et al., 2003) and support vector machines (SVMs) (Furey et al., 2000; Guyon et al., 2002). Wrapper methods are much slower than filter methods because they search for optimal combinations of features/genes, but filter methods may not select the most optimal set of features.

Examples of embedded methods include one-norm SVM (Fung and Mangasarian, 2000), logistic regression (Shen and Tan, 2005), sparse logistic regression (Roth, 2004) and methods based on regularization (Ghosh and Chinnaiyan, 2005). An interesting embedded method is that developed by Huerta et al. (2010). They devised a Genetic Algorithm with Fisher's Linear Discriminant Analysis (LDA) as the fitness function that performed well across a number of databases using a small number of selected genes. Most of these filter, wrapper and embedded methods are comparable in accuracy (Ghorai et al., 2011).

Several recent advances include reducing the sample set (Chen and Lin, 2011), using classifier ensembles (Ghorai et al., 2011; Huang et al., 2010; Tan and Gilbert, 2003), rather than single classifiers and using hybrid or multiple sets of different type of feature selection and transformation methods (Ghorai et al., 2011). Chen and Lin (2011) have improved classifier performance by extracting significant samples that are located only on support vectors.

Huang et al. (2010) improve performance using decision forest for classification of gene expression data, and Stiglic et al. (2010) use rotation forests for robust and improved classification accuracy. Ghorai et al. (2011) have developed an ensemble that combines both filter and wrapper methods: a ranking method performs a fast reduction in dimensionality and a wrapper method refines the search. Their method has demonstrated comparable performance with wrapper methods while providing significant reduction in the computational burden.

In this article, we propose to classify DNA microarray data using an ensemble of SVM classifiers, with each SVM trained on a different set of features. SVM is selected because it is considered to be one of the most powerful classifiers in microarray classification of cancers (Statnikov et al., 2008) and in several other bioinformatic problems (Hayat and Khan, 2011; Tahir et al., 2011). Even though SVM is a strong learner and thus not typically suitable for ensembles, it actually performs well in ensembles if coupled with the random subspace technique (Nanni and Lumini, 2011).

In our experiments, we specifically investigate approaches: (i) that compare standard feature selection methods, where only a subset of the whole gene set is retained and then used to train an SVM; (ii) that compare several feature transform methods, where the dimension of the feature vector is reduced and then used to train an SVM; (iii) that train a set of SVMs using a set of orthogonal wavelet coefficients of different wavelet mothers-these sets of coefficients are selected via Sequential Forward Floating Selection (SFFS) using the leave-one-dataset-out validation protocol, such that when a given dataset is classified, the sets of coefficients are selected by SFFS using the others datasets as validation set; and (iv) that consider the microarray as an image, where the texture descriptors are extracted from the image and used to train an SVM. Experiments are carried out on several datasets, and experimental results show that the proposed method performs well when considering both accuracy and the area under the ROC curve (AUC) as the performance indicators.

2 METHODS

In this section, we briefly describe the feature selection, feature transform and classification and fusion methods, including the tree wavelet and texture descriptors used in our approach.

2.1 Feature selection

The feature selection methods we explore are the following: In most cases, the code for the above methods was taken from the MATLAB Feature Selection Package available at http://featureselection.asu.edu/

  • Fisher score (Fi): a method utilizing discriminative methods and generative statistical models for determining the most relevant features for classification;

  • Gini index (Gi): a statistical measure of dispersion, most commonly used to quantify wealth distributions based on the Lorentz curve;

  • mRMR (Mr): a feature selection method that correlates the strongest features with a classification variable: features are selected that are mutually different from each other while still maintaining high correlation;

  • T-test (Tt): a statistical hypothesis that uses the Student's distribution; and

  • Sb: a feature selection method based on the sparse Bayesian multinomial logistic regression.

In addition to the above listed feature selection methods, we also examine: See the Supplementary Material for a fuller discussion of Fisher score and SFFS.

  • FFacsa21 (Luo et al., 2011): a forward feature selection algorithm that is based on the aggregation of classifiers generated by a single attribute;

  • SVMrfe1 (Guyon et al., 2002): the famous SVM-based recursive feature elimination method;

  • SFFS (Pudil et al., 1994)2: an exhaustive search procedure that has been studied extensively and shown to perform well compared to competing methods (Kudo and Sklansky, 2000). To reduce computation time, SFFS starts from the 500 genes selected by Fi; then the best set is extracted. SVM is used as the objective performance method.

2.2 Feature transform

We explore the following feature transform techniques: See the Supplementary Material for a fuller discussion of NPE.

  • Locally Linear Embedding (LLE), as proposed in Roweis and Saul (2000);

  • Orthogonal LDA (OLDA), as proposed in Ye (2005);

  • Orthogonal Neighborhood Preserving Projections (ONPPs), as proposed in Kokiopoulou and Saad (2005); and

  • Neighborhood Preserving Embedding (NPE), as proposed in (He et al., 2005). Unlike principal component analysis (PCA), which aims at preserving the global Euclidean structure of the data, NPE preserves the local neighborhood structure on the data manifold. As a result, NPE is less sensitive to outliers than is PCA. We used the MATLAB code freely available at http://www.zjucadcg.cn/dengcai/Data/data.html

2.3 Tree wavelet

In the case of one dimensional wavelet decomposition, the first step produces two sets of coefficients from the signal: (i) approximation coefficients, or scaling coefficients; and (ii) detail coefficients, or wavelet coefficients. The approximation coefficients are split into two parts repeating the same algorithm, being thereby replaced by approximation coefficients and detail coefficients. This decomposition process is repeated until a required level is reached (Liu, 2009; Nanni and Lumini, 2011).

In this article, we examine the following wavelets (until the sixth decomposition level): Haar, Daubechies order 7, Symmlet order 2, Coiflets order 2, Biorthogonal order for reconstruction 2 and for decomposition 2, Reverse Biorthogonal order for reconstruction 2 and for decomposition 2. For each set of coefficients (both approximation coefficients and detail coefficients) of a given decomposition level, a different classifier is trained. The decomposition is applied both on the original data and on the set of genes selected by Fisher score. SFFS is used to select a set of subbands. The testing protocol was the leave-one-dataset-out validation protocol. When a given dataset is classified, the sets of coefficients are selected using as the validation set the other datasets. A fuller discussion of wavelet decomposition and the set of subbands selected by SFFS considering all the datasets are reported in the Supplementary Material.

2.4 Texture descriptors

In this approach, we consider the microarray as an image from which a set of texture descriptors is extracted. First, we select a set of 900 genes using the Fisher criterion feature selection method. Then this 900-dimensional feature vector is reshaped as a matrix using random assignment. A total of 50 different random reshapings are performed. For each reshaping, a different SVM is trained, with results combined using a fusion rule.

In this article, we examine the following image texture feature transforms: See the Supplementary Material for a fuller discussion of local phase quantization.

  • Lu is a concatenation of the uniform bins extracted using local binary patterns (LBPs) (Ojala et al., 2002) with P = 8 and P = 16. If x = 8 then R = 1, if x = 16 then R = 2. The length of the feature vector is 59 in the case x = 8 and 243 in the case x = 16;

  • Lr is rotation invariant uniform bins extracted using LBP with P = 8 and P = 16. If x = 8 then R = 1, if x = 16 then R = 2. The length of the feature vector is 10 in the case x = 8 and 18 in the case x = 16;

  • LP(x) is local phase quantization (Ojansivu and Heikkila, 2008) with radius x = 3 or x = 5. The length of the feature vector is 256 in both cases;

  • LQPu is different local quinary patterns (Nanni et al., 2010) with uniform bins and with τ 1 = {1, 3, 5, 7, 9} and τ2 = { τ1+2, τ1+4,…, τ1+11}. These are combined by a fusion rule (see the ‘Results’ section for details).

2.5 Classification and fusion

In this approach, we use SVM as the stand-alone classifier. SVM is a general purpose binary classifier based on statistical learning. It performs classification in two steps. In the first step, it maps the sample data vector into a higher dimensional data space by means of polynomial kernels or radial basis function kernels. In the second step, the algorithm finds a hyperplane in this space that has the largest margin separating the classes.

The fusion step is performed by means of the sum rule or the majority voting (vote) rule. The first consists in summing the scores of all the classifiers of the ensemble and selecting the class with the highest score; the second simply selects the class with the higher number of votes (see the Supplementary Material for a fuller discussion of SVM and the sum and vote rules).

3 RESULTS

To assess the performance of our approach, we have conducted several experiments on a number of publicly available datasets. Below we provide a brief description of each dataset (the salient features of each dataset are summarized in Table 1): In our first experiment, we compare several feature selection methods using a stand-alone SVM as the classifier for the function of the number of g genes retained: 150, 300 and 450, respectively. In Table 2, we report the average performance of the different approaches across all datasets (the performance for each dataset is reported in Supplementary Table S1 in the Supplementary Material). It is interesting to note that the best performance is obtained by the old Fisher criterion, which slightly outperforms the more recent FFacsa2, SVMrfe and the computationally heavy SFFS method. This advantage in performance is obtained using 450 genes. SVMrfe and SFFS performed best when fewer genes/features are retained. In our experiments, we choose the best kernel and the best parameters for each dataset using 10-fold cross validation on the training data.

  • Breast dataset (B) (van 't Veer et al., 2002): the goal of this experiment is to identify patients who might benefit from adjuvant chemotherapy. Two classes are considered: patients who continued to be disease free after 5 years (44 samples) and patients who developed metastases within 5 years (34 samples);

  • Ovarian dataset (O) (Petricoin et al., 2002): the goal of this experiment is to identify proteomic patterns in serum that distinguish between ovarian cancer and normal non-cancer groups. Two classes are considered: 91 controls (Normal) and 162 ovarian cancers;

  • Lung dataset (L)3 (Gordon et al., 2002): the goal of this experiment is to classify between malignant pleural mesothelioma (MPM) and adenocarcinoma (ADCA) of the lung. Two classes are considered: 31 MPM tissue samples and 150 ADCA tissue samples;

  • Prostate tumors (P) (Singh et al., 2002): the goal of this experiment is to classify prostate tumor samples and normal non-tumor samples. Two classes are considered: 52 prostate tumor samples and 50 normal samples;

  • Medulloblastoma (M) (Pomeroy et al., 2002): the researchers analyze 60 similarly treated patients from whom biopsies were obtained before receiving treatment. Using this dataset, Pomeroyet et al. show that the clinical outcome of children with medulloblastomas is predictable on the basis of the gene expression profiles of their tumors at diagnosis;

  • Colon (C) (Alon et al., 1999): the colon dataset contains 62 samples: 40 are tumor samples and 22 are normal controls. In this dataset, 2000 genes with highest intensity across the samples are considered;

  • Duke (D) (Luo et al., 2011): this is a dataset that contains 44 patterns described by 7129 genes;

  • ALML (A) (Golub et al., 1999): this leukemia dataset was derived from a study of gene expression in two types of acute leukemia: acute lymphoblastic leukemia (ALL) and acute myeloid-leukemia (AML). The dataset includes 47 cases of ALL and 25 cases of AML, together with 7129 genes;

  • DLBCL (DL) (Shipp et al., 2002): the goal of this dataset is to distinguish diffuse large B-cell lymphoma (DLBCL) from follicular lymphoma (FL) morphology. This dataset contains 58 DLBCL samples and 19 FL samples.

Table 1.

Characteristics of the datasets used in the experiments: the first column presents the number of attributes (#A), and the second column reports the number of examples (#E)

Dataset#A#E
Ovarian (O) 15 154 253 
Prostate (P) 12 600 102 
Lung (L) 12 533 181 
Breast (B) 24 481 78 
Medulloblastoma (M) 7129 60 
Colon (C) 2000 62 
Duke (D) 7129 44 
ALML (A) 7129 72 
DBCL (DL) 7129 77 
Dataset#A#E
Ovarian (O) 15 154 253 
Prostate (P) 12 600 102 
Lung (L) 12 533 181 
Breast (B) 24 481 78 
Medulloblastoma (M) 7129 60 
Colon (C) 2000 62 
Duke (D) 7129 44 
ALML (A) 7129 72 
DBCL (DL) 7129 77 
Table 1.

Characteristics of the datasets used in the experiments: the first column presents the number of attributes (#A), and the second column reports the number of examples (#E)

Dataset#A#E
Ovarian (O) 15 154 253 
Prostate (P) 12 600 102 
Lung (L) 12 533 181 
Breast (B) 24 481 78 
Medulloblastoma (M) 7129 60 
Colon (C) 2000 62 
Duke (D) 7129 44 
ALML (A) 7129 72 
DBCL (DL) 7129 77 
Dataset#A#E
Ovarian (O) 15 154 253 
Prostate (P) 12 600 102 
Lung (L) 12 533 181 
Breast (B) 24 481 78 
Medulloblastoma (M) 7129 60 
Colon (C) 2000 62 
Duke (D) 7129 44 
ALML (A) 7129 72 
DBCL (DL) 7129 77 
Table 2.

Average accuracy obtained using different feature selection methods as a function of the number g genes retained

avg(ACC)FiGiMrSbTtFFacsaSVMrfeSFFS
150 87.50 78.47 84.44 80.61 84.39 87.26 88.77 88.85 
 300 87.97 83.24 87.14 80.61 85.45 86.76 88.95 88.10 
 450 89.59 85.53 87.25 80.61 86.24 87.67 89.48 88.48 
avg(ACC)FiGiMrSbTtFFacsaSVMrfeSFFS
150 87.50 78.47 84.44 80.61 84.39 87.26 88.77 88.85 
 300 87.97 83.24 87.14 80.61 85.45 86.76 88.95 88.10 
 450 89.59 85.53 87.25 80.61 86.24 87.67 89.48 88.48 

The bold values are the highest performance, the italic values are the values of parameters.

Table 2.

Average accuracy obtained using different feature selection methods as a function of the number g genes retained

avg(ACC)FiGiMrSbTtFFacsaSVMrfeSFFS
150 87.50 78.47 84.44 80.61 84.39 87.26 88.77 88.85 
 300 87.97 83.24 87.14 80.61 85.45 86.76 88.95 88.10 
 450 89.59 85.53 87.25 80.61 86.24 87.67 89.48 88.48 
avg(ACC)FiGiMrSbTtFFacsaSVMrfeSFFS
150 87.50 78.47 84.44 80.61 84.39 87.26 88.77 88.85 
 300 87.97 83.24 87.14 80.61 85.45 86.76 88.95 88.10 
 450 89.59 85.53 87.25 80.61 86.24 87.67 89.48 88.48 

The bold values are the highest performance, the italic values are the values of parameters.

In the second experiment, we compare several feature transform methods using the stand-alone SVM as the classifier. To reduce the computation time, 1000 genes are first selected by Fisher and then PCA is used to decorrelate the data. In Table 3 the average accuracy on all the datasets obtained using different feature transform methods is reported as a function of the dimension k of the projection space (k∈{20,30,45}) (the accuracy obtained in each dataset is reported in Supplementary Table S2 in the Supplementary Material). The best performance is obtained by NPE that only slightly improves the performance obtained by Fi in the previous test reported in Table 2.

Table 3.

Average accuracy obtained using different feature transform methods in reduced spaces of different dimensionality k

avg(ACC)LLEOLDAONPPNPE
20 86.25 89.06 86.60 89.93 
 30 86.42 89.06 87.40 89.93 
 45 86.25 89.06 87.69 89.93 
avg(ACC)LLEOLDAONPPNPE
20 86.25 89.06 86.60 89.93 
 30 86.42 89.06 87.40 89.93 
 45 86.25 89.06 87.69 89.93 

The bold values are the highest performance, the italic values are the values of parameters.

Table 3.

Average accuracy obtained using different feature transform methods in reduced spaces of different dimensionality k

avg(ACC)LLEOLDAONPPNPE
20 86.25 89.06 86.60 89.93 
 30 86.42 89.06 87.40 89.93 
 45 86.25 89.06 87.69 89.93 
avg(ACC)LLEOLDAONPPNPE
20 86.25 89.06 86.60 89.93 
 30 86.42 89.06 87.40 89.93 
 45 86.25 89.06 87.69 89.93 

The bold values are the highest performance, the italic values are the values of parameters.

In the third experiment, we evaluate the performance obtained by varying the image descriptors used to represent the microarray patterns (as described in Section 2.4). In Table 4 we report the accuracy obtained: (i) by methods based on different descriptors; (ii) by the tree wavelet (TW) approach (where the classifiers are combined by vote rule); (iii) by the ensemble FUS (which is the fusion by vote rule of TW, NPE and Fi) and, as a reference; (iv) by the best approaches previously tested (Fi and NPE). It is interesting to note in Table 4 that not only does the fusion approach obtain the best average performance but also FUS closely matches the performance of the best approach for any given dataset: We tried combining LQPr in FUS, but performance remained the same. The most advanced methods based on image descriptors (i.e. LQPr and LQPu) perform much better than do simple Lu, Lr and LP (we believe, however, that combinations of different texture descriptors with the simple methods would probably obtain performances closer to those obtained by standard approaches).

  • In the prostate dataset (P), the best single approach is TW, which FUS matches;

  • In the breast dataset (B), NPE outperforms TW and F. FUS obtains a performance only slightly lower than NPE but higher than either Fi and TW;

  • In the ALML dataset (A), Fi outperforms TW and NPE. FUS, however, outperforms Fi.

Table 4.

Average accuracy obtained using different feature transform methods in reduced spaces of different dimensionality k

ACCFiNPETWLuLrLP(3)LP(5)LQPuFUS
O 100.00 100.00 100.00 96.40 87.20 91.20 94.00 94.80 100.00 
P 93.85 95.38 96.15 80.00 70.77 65.38 66.15 84.62 96.15 
L 100.00 100.00 100.00 95.56 93.89 92.22 82.22 98.33 100.00 
B 82.86 90.00 87.14 71.43 74.29 61.43 54.29 84.29 88.57 
M 70.00 70.00 70.00 68.33 68.33 68.33 68.33 68.33 66.67 
C 75.00 68.33 75.00 65.00 65.00 65.00 65.00 65.00 73.33 
D 87.50 90.00 85.0 72.50 65.00 45.00 45.00 80.00 90.00 
A 98.57 97.14 95.71 88.57 72.86 65.71 65.71 82.86 100.00 
DL 98.57 98.57 98.57 75.71 75.71 68.57 68.57 77.14 98.57 
avg 89.59 89.93 89.73 79.27 74.78 69.20 67.69 81.70 90.37 
ACCFiNPETWLuLrLP(3)LP(5)LQPuFUS
O 100.00 100.00 100.00 96.40 87.20 91.20 94.00 94.80 100.00 
P 93.85 95.38 96.15 80.00 70.77 65.38 66.15 84.62 96.15 
L 100.00 100.00 100.00 95.56 93.89 92.22 82.22 98.33 100.00 
B 82.86 90.00 87.14 71.43 74.29 61.43 54.29 84.29 88.57 
M 70.00 70.00 70.00 68.33 68.33 68.33 68.33 68.33 66.67 
C 75.00 68.33 75.00 65.00 65.00 65.00 65.00 65.00 73.33 
D 87.50 90.00 85.0 72.50 65.00 45.00 45.00 80.00 90.00 
A 98.57 97.14 95.71 88.57 72.86 65.71 65.71 82.86 100.00 
DL 98.57 98.57 98.57 75.71 75.71 68.57 68.57 77.14 98.57 
avg 89.59 89.93 89.73 79.27 74.78 69.20 67.69 81.70 90.37 

The bold values are the highest performance, the italic values are the values of parameters.

Table 4.

Average accuracy obtained using different feature transform methods in reduced spaces of different dimensionality k

ACCFiNPETWLuLrLP(3)LP(5)LQPuFUS
O 100.00 100.00 100.00 96.40 87.20 91.20 94.00 94.80 100.00 
P 93.85 95.38 96.15 80.00 70.77 65.38 66.15 84.62 96.15 
L 100.00 100.00 100.00 95.56 93.89 92.22 82.22 98.33 100.00 
B 82.86 90.00 87.14 71.43 74.29 61.43 54.29 84.29 88.57 
M 70.00 70.00 70.00 68.33 68.33 68.33 68.33 68.33 66.67 
C 75.00 68.33 75.00 65.00 65.00 65.00 65.00 65.00 73.33 
D 87.50 90.00 85.0 72.50 65.00 45.00 45.00 80.00 90.00 
A 98.57 97.14 95.71 88.57 72.86 65.71 65.71 82.86 100.00 
DL 98.57 98.57 98.57 75.71 75.71 68.57 68.57 77.14 98.57 
avg 89.59 89.93 89.73 79.27 74.78 69.20 67.69 81.70 90.37 
ACCFiNPETWLuLrLP(3)LP(5)LQPuFUS
O 100.00 100.00 100.00 96.40 87.20 91.20 94.00 94.80 100.00 
P 93.85 95.38 96.15 80.00 70.77 65.38 66.15 84.62 96.15 
L 100.00 100.00 100.00 95.56 93.89 92.22 82.22 98.33 100.00 
B 82.86 90.00 87.14 71.43 74.29 61.43 54.29 84.29 88.57 
M 70.00 70.00 70.00 68.33 68.33 68.33 68.33 68.33 66.67 
C 75.00 68.33 75.00 65.00 65.00 65.00 65.00 65.00 73.33 
D 87.50 90.00 85.0 72.50 65.00 45.00 45.00 80.00 90.00 
A 98.57 97.14 95.71 88.57 72.86 65.71 65.71 82.86 100.00 
DL 98.57 98.57 98.57 75.71 75.71 68.57 68.57 77.14 98.57 
avg 89.59 89.93 89.73 79.27 74.78 69.20 67.69 81.70 90.37 

The bold values are the highest performance, the italic values are the values of parameters.

In the fourth experiment, we compare the performance of FUS with several state-of-the art approaches: LI (Liu et al., 2002), CN (Cheng, 2010), GH (Ghorai et al., 2011), LU (Luo et al., 2011), PA (Paliwal and Sharma, 2010), HU (Huerta et al., 2010), BO (Bolón-Canedo et al., 2012), CH (Chen and Lin, 2011), OR (Orsenigo and Vercellis, 2011) and PO (Porto-Díaz et al., 2011).

This comparison shows the goodness of the proposed approach with respect to the state of the art. The only dataset where our results are lower is with the Colon dataset (C). In several of the papers used in Table 5, the feature selection was performed using the training data, but system performance was measured with the testing set, where varying numbers of the features were retained (see Table 9 for the performance of PO using the original code tested in our datasets). In Table 5, we give the best results reported for each method using the testing set. Our method, in contrast, used the same number of features both in training and testing as well as across all datasets. Our method is thus very suitable for general practitioners.

Table 5.

Comparison among FUS and different state of the art methods

ACCFUSLICNGHLUPAHUBOCHORPO
O 100      100 100    
P 96.15   90.16  76.50 96.00  95.09   
L 100  99.33 96.38  97.30 99.30 98.89   99.33 
B 88.57     73.70      
M 66.67           
C 73.33   82.77 80.72   80.95  85.60 90.00 
D 90.00    86.83       
A 100 100 100 94.52 97.21 100 100 94.46 98.61 94.40 100 
DL 98.57    95.56     98.70  
ACCFUSLICNGHLUPAHUBOCHORPO
O 100      100 100    
P 96.15   90.16  76.50 96.00  95.09   
L 100  99.33 96.38  97.30 99.30 98.89   99.33 
B 88.57     73.70      
M 66.67           
C 73.33   82.77 80.72   80.95  85.60 90.00 
D 90.00    86.83       
A 100 100 100 94.52 97.21 100 100 94.46 98.61 94.40 100 
DL 98.57    95.56     98.70  

The bold values are the highest performance, the italic values are the values of parameters.

Table 5.

Comparison among FUS and different state of the art methods

ACCFUSLICNGHLUPAHUBOCHORPO
O 100      100 100    
P 96.15   90.16  76.50 96.00  95.09   
L 100  99.33 96.38  97.30 99.30 98.89   99.33 
B 88.57     73.70      
M 66.67           
C 73.33   82.77 80.72   80.95  85.60 90.00 
D 90.00    86.83       
A 100 100 100 94.52 97.21 100 100 94.46 98.61 94.40 100 
DL 98.57    95.56     98.70  
ACCFUSLICNGHLUPAHUBOCHORPO
O 100      100 100    
P 96.15   90.16  76.50 96.00  95.09   
L 100  99.33 96.38  97.30 99.30 98.89   99.33 
B 88.57     73.70      
M 66.67           
C 73.33   82.77 80.72   80.95  85.60 90.00 
D 90.00    86.83       
A 100 100 100 94.52 97.21 100 100 94.46 98.61 94.40 100 
DL 98.57    95.56     98.70  

The bold values are the highest performance, the italic values are the values of parameters.

In Tables 6–9, we report results obtained in the previous experiments using a more reliable performance indicator: the AUC. AUC can be interpreted as the probability that the classifier will assign a lower score to a randomly picked positive sample than to a randomly picked negative sample.

Table 6.

Average AUC obtained using different feature selection methods as a function of the number g genes retained

avg(AUC)FiGiMrSbTtFFacsa2SVMrfeSFFS
150 89.70 79.17 86.62 85.22 84.01 89.51 89.67 89.70 
 300 89.62 86.56 89.21 85.22 85.58 90.63 90.06 89.62 
 450 90.30 87.20 88.63 85.22 86.34 90.40 90.17 90.30 
avg(AUC)FiGiMrSbTtFFacsa2SVMrfeSFFS
150 89.70 79.17 86.62 85.22 84.01 89.51 89.67 89.70 
 300 89.62 86.56 89.21 85.22 85.58 90.63 90.06 89.62 
 450 90.30 87.20 88.63 85.22 86.34 90.40 90.17 90.30 

The bold values are the highest performance, the italic values are the values of parameters.

Table 6.

Average AUC obtained using different feature selection methods as a function of the number g genes retained

avg(AUC)FiGiMrSbTtFFacsa2SVMrfeSFFS
150 89.70 79.17 86.62 85.22 84.01 89.51 89.67 89.70 
 300 89.62 86.56 89.21 85.22 85.58 90.63 90.06 89.62 
 450 90.30 87.20 88.63 85.22 86.34 90.40 90.17 90.30 
avg(AUC)FiGiMrSbTtFFacsa2SVMrfeSFFS
150 89.70 79.17 86.62 85.22 84.01 89.51 89.67 89.70 
 300 89.62 86.56 89.21 85.22 85.58 90.63 90.06 89.62 
 450 90.30 87.20 88.63 85.22 86.34 90.40 90.17 90.30 

The bold values are the highest performance, the italic values are the values of parameters.

Table 7.

Average AUC obtained using different feature transform methods in reduced spaces of different dimensionality k

avg(AUC)LLEOLDAONPPNPE
20 89.49 89.53 90.06 91.79 
 30 89.80 89.53 91.01 91.83 
 45 90.33 89.53 91.69 91.85 
avg(AUC)LLEOLDAONPPNPE
20 89.49 89.53 90.06 91.79 
 30 89.80 89.53 91.01 91.83 
 45 90.33 89.53 91.69 91.85 

The bold values are the highest performance, the italic values are the values of parameters.

Table 7.

Average AUC obtained using different feature transform methods in reduced spaces of different dimensionality k

avg(AUC)LLEOLDAONPPNPE
20 89.49 89.53 90.06 91.79 
 30 89.80 89.53 91.01 91.83 
 45 90.33 89.53 91.69 91.85 
avg(AUC)LLEOLDAONPPNPE
20 89.49 89.53 90.06 91.79 
 30 89.80 89.53 91.01 91.83 
 45 90.33 89.53 91.69 91.85 

The bold values are the highest performance, the italic values are the values of parameters.

Table 8.

AUC obtained by different texture descriptors, TW, Fi, NPE and the ensembles FUS and WF

AUCFiNPETWLuLrLP(3)LP(5)LQPuFUSWF
O 99.97 99.97 99.97 99.72 97.89 99.89 99.89 99.55 99.97 99.97 
P 95.44 96.50 98.24 87.28 85.31 89.47 90.45 86.52 97.50 97.71 
L 99.97 99.97 99.97 99.46 99.46 99.63 99.97 98.55 99.97 99.97 
B 94.53 97.99 91.08 89.93 80.22 88.45 90.58 91.74 97.11 97.33 
M 61.17 69.13 66.62 49.04 43.00 46.73 49.55 45.44 66.55 66.82 
C 68.19 65.51 72.10 57.69 54.52 59.89 66.00 45.42 69.02 69.17 
D 93.61 97.95 98.21 81.33 64.71 79.54 89.77 82.86 98.66 98.72 
A 99.95 99.95 99.95 99.95 93.98 99.23 99.77 99.05 99.95 99.95 
DL 99.95 99.76 99.95 94.89 94.08 95.98 98.82 94.70 99.95 99.95 
avg 90.30 91.85 91.78 84.36 79.24 84.31 87.20 82.64 92.07 92.18 
AUCFiNPETWLuLrLP(3)LP(5)LQPuFUSWF
O 99.97 99.97 99.97 99.72 97.89 99.89 99.89 99.55 99.97 99.97 
P 95.44 96.50 98.24 87.28 85.31 89.47 90.45 86.52 97.50 97.71 
L 99.97 99.97 99.97 99.46 99.46 99.63 99.97 98.55 99.97 99.97 
B 94.53 97.99 91.08 89.93 80.22 88.45 90.58 91.74 97.11 97.33 
M 61.17 69.13 66.62 49.04 43.00 46.73 49.55 45.44 66.55 66.82 
C 68.19 65.51 72.10 57.69 54.52 59.89 66.00 45.42 69.02 69.17 
D 93.61 97.95 98.21 81.33 64.71 79.54 89.77 82.86 98.66 98.72 
A 99.95 99.95 99.95 99.95 93.98 99.23 99.77 99.05 99.95 99.95 
DL 99.95 99.76 99.95 94.89 94.08 95.98 98.82 94.70 99.95 99.95 
avg 90.30 91.85 91.78 84.36 79.24 84.31 87.20 82.64 92.07 92.18 

The bold values are the highest performance, the italic values are the values of parameters.

Table 8.

AUC obtained by different texture descriptors, TW, Fi, NPE and the ensembles FUS and WF

AUCFiNPETWLuLrLP(3)LP(5)LQPuFUSWF
O 99.97 99.97 99.97 99.72 97.89 99.89 99.89 99.55 99.97 99.97 
P 95.44 96.50 98.24 87.28 85.31 89.47 90.45 86.52 97.50 97.71 
L 99.97 99.97 99.97 99.46 99.46 99.63 99.97 98.55 99.97 99.97 
B 94.53 97.99 91.08 89.93 80.22 88.45 90.58 91.74 97.11 97.33 
M 61.17 69.13 66.62 49.04 43.00 46.73 49.55 45.44 66.55 66.82 
C 68.19 65.51 72.10 57.69 54.52 59.89 66.00 45.42 69.02 69.17 
D 93.61 97.95 98.21 81.33 64.71 79.54 89.77 82.86 98.66 98.72 
A 99.95 99.95 99.95 99.95 93.98 99.23 99.77 99.05 99.95 99.95 
DL 99.95 99.76 99.95 94.89 94.08 95.98 98.82 94.70 99.95 99.95 
avg 90.30 91.85 91.78 84.36 79.24 84.31 87.20 82.64 92.07 92.18 
AUCFiNPETWLuLrLP(3)LP(5)LQPuFUSWF
O 99.97 99.97 99.97 99.72 97.89 99.89 99.89 99.55 99.97 99.97 
P 95.44 96.50 98.24 87.28 85.31 89.47 90.45 86.52 97.50 97.71 
L 99.97 99.97 99.97 99.46 99.46 99.63 99.97 98.55 99.97 99.97 
B 94.53 97.99 91.08 89.93 80.22 88.45 90.58 91.74 97.11 97.33 
M 61.17 69.13 66.62 49.04 43.00 46.73 49.55 45.44 66.55 66.82 
C 68.19 65.51 72.10 57.69 54.52 59.89 66.00 45.42 69.02 69.17 
D 93.61 97.95 98.21 81.33 64.71 79.54 89.77 82.86 98.66 98.72 
A 99.95 99.95 99.95 99.95 93.98 99.23 99.77 99.05 99.95 99.95 
DL 99.95 99.76 99.95 94.89 94.08 95.98 98.82 94.70 99.95 99.95 
avg 90.30 91.85 91.78 84.36 79.24 84.31 87.20 82.64 92.07 92.18 

The bold values are the highest performance, the italic values are the values of parameters.

Table 9.

Comparison of WF with different state of the art methods using AUC as the performance indicator

AUCLIU2OldTWK = 64K = 128K = 512K = 50%OCWF
O 99.97 99.97 99.97 99.97 99.97 99.97 99.30 99.97 
P 97.00 96.70 94.61 95.82 95.14 96.42 93.47 97.71 
L 99.90 99.97 99.97 99.97 99.97 99.97 99.97 99.97 
B 86.50 91.10 92.60 94.70 95.52 96.83 93.38 97.33 
M – – 58.60 54.81 61.94 52.95 61.75 66.82 
C – – 69.41 68.44 68.44 69.17 63.37 69.17 
D – – 97.70 98.21 89.77 89.77 95.91 98.72 
A – – 99.95 99.95 99.95 99.95 99.95 99.95 
DL – – 99.38 99.95 99.76 99.76 98.25 99.95 
avg – – 90.24 90.20 90.05 89.42 89.48 92.18 
AUCLIU2OldTWK = 64K = 128K = 512K = 50%OCWF
O 99.97 99.97 99.97 99.97 99.97 99.97 99.30 99.97 
P 97.00 96.70 94.61 95.82 95.14 96.42 93.47 97.71 
L 99.90 99.97 99.97 99.97 99.97 99.97 99.97 99.97 
B 86.50 91.10 92.60 94.70 95.52 96.83 93.38 97.33 
M – – 58.60 54.81 61.94 52.95 61.75 66.82 
C – – 69.41 68.44 68.44 69.17 63.37 69.17 
D – – 97.70 98.21 89.77 89.77 95.91 98.72 
A – – 99.95 99.95 99.95 99.95 99.95 99.95 
DL – – 99.38 99.95 99.76 99.76 98.25 99.95 
avg – – 90.24 90.20 90.05 89.42 89.48 92.18 

The bold values are the highest performance, the italic values are the values of parameters.

Table 9.

Comparison of WF with different state of the art methods using AUC as the performance indicator

AUCLIU2OldTWK = 64K = 128K = 512K = 50%OCWF
O 99.97 99.97 99.97 99.97 99.97 99.97 99.30 99.97 
P 97.00 96.70 94.61 95.82 95.14 96.42 93.47 97.71 
L 99.90 99.97 99.97 99.97 99.97 99.97 99.97 99.97 
B 86.50 91.10 92.60 94.70 95.52 96.83 93.38 97.33 
M – – 58.60 54.81 61.94 52.95 61.75 66.82 
C – – 69.41 68.44 68.44 69.17 63.37 69.17 
D – – 97.70 98.21 89.77 89.77 95.91 98.72 
A – – 99.95 99.95 99.95 99.95 99.95 99.95 
DL – – 99.38 99.95 99.76 99.76 98.25 99.95 
avg – – 90.24 90.20 90.05 89.42 89.48 92.18 
AUCLIU2OldTWK = 64K = 128K = 512K = 50%OCWF
O 99.97 99.97 99.97 99.97 99.97 99.97 99.30 99.97 
P 97.00 96.70 94.61 95.82 95.14 96.42 93.47 97.71 
L 99.90 99.97 99.97 99.97 99.97 99.97 99.97 99.97 
B 86.50 91.10 92.60 94.70 95.52 96.83 93.38 97.33 
M – – 58.60 54.81 61.94 52.95 61.75 66.82 
C – – 69.41 68.44 68.44 69.17 63.37 69.17 
D – – 97.70 98.21 89.77 89.77 95.91 98.72 
A – – 99.95 99.95 99.95 99.95 99.95 99.95 
DL – – 99.38 99.95 99.76 99.76 98.25 99.95 
avg – – 90.24 90.20 90.05 89.42 89.48 92.18 

The bold values are the highest performance, the italic values are the values of parameters.

In Table 6, we compare several feature selection methods using AUC (cf. Table 2 where we used accuracy as the performance indicator). FFacsa2 provides the best performance. It should be noted that this difference is mainly due to the lower performance obtained by the other methods in the M dataset (see Supplementary Table S3 in the Supplementary Material for results of each dataset).

In Table 7, we compare the different feature transform techniques using AUC. The best performance, as in Table 3 using accuracy, is obtained by NPE.

In Table 8, we report the performance obtained in the third experiment. In this Table a new ensemble is evaluated, WF, which is the fusion by weighted sum rule of TW, NPE, Fi and LP(5). In the weighted sum rule, each classifier is weighted by a value between 0 and 1. The scores are then summed. Optimal weights are obtained using the leave-one-dataset-out validation protocol. In other words, when a given dataset is classified, the sets of weights are selected using as the validation set the others datasets. Our fusion approach WF obtains the best overall average performance using AUC. Moreover, fusion results for each dataset closely approximate the performance of the best methods reported for the individual datasets.

In Table 9, we compare our best approach WF with the performance obtained by a random subspace of SVM trained using the original genes, LIU2 (Liu, 2009), and OldTW (Nanni and Lumini, 2011). Random subspace of SVM has been shown to be very effective (Bertoni et al., 2009). The random subspace creates an ensemble such that each classifier is trained with a different subset of the original features. In our experiments, we combine results with sum rule using 50 classifiers, each trained with K features. In Table 9, K = 50% means that each classifier is trained with a subset that contains 50% of the original features, whereas K = x means that each classifiers is trained with x randomly selected genes. PO in Table 9 refers to the results obtained using the original code shared by (Porto-Díaz et al., 2011) with the following setting: we ran their approach starting from the 500 genes selected by Fi (in this way a more fair approach with our method is provided). It is interesting to note that now the performance on the Colon dataset (C) is lower than that obtained by our ensemble. WF outperforms the other methods.

The advantage of using a combination of approaches is also demonstrated by the use of the Wilcoxon Signed-Rank test (Demsar, 2006) developed for comparing the results of stand-alone methods with ensembles. The null hypothesis (that is there is no difference between the accuracies of the stand-alone methods and the ensemble) is rejected with a level of significance of 0.10.

As an additional experiment, we investigated the relationship among the different approaches by evaluating the error independence between the classifiers trained using those features. Table 10 reports the average Yule's Q-statistic (Kuncheva and Whitaker, 2003) in the tested datasets. For two classifier G  i and G  j the Q-statistic, a posteriori measure, is defined as:

Table 10.

Yule's Q-statistic between the stand-alone approaches

compared descriptorsOPLBMCDADL
FI versus NP 1.00 0.96 0.93 0.98 0.99 0.60 1.00 0.99 1.00 
FI versus TW 1.00 0.99 0.93 0.94 0.93 0.63 0.98 1.00 1.00 
NPE versus T 1.00 0.96 1.00 0.93 0.95 0.99 0.99 0.99 1.00 
compared descriptorsOPLBMCDADL
FI versus NP 1.00 0.96 0.93 0.98 0.99 0.60 1.00 0.99 1.00 
FI versus TW 1.00 0.99 0.93 0.94 0.93 0.63 0.98 1.00 1.00 
NPE versus T 1.00 0.96 1.00 0.93 0.95 0.99 0.99 0.99 1.00 
Table 10.

Yule's Q-statistic between the stand-alone approaches

compared descriptorsOPLBMCDADL
FI versus NP 1.00 0.96 0.93 0.98 0.99 0.60 1.00 0.99 1.00 
FI versus TW 1.00 0.99 0.93 0.94 0.93 0.63 0.98 1.00 1.00 
NPE versus T 1.00 0.96 1.00 0.93 0.95 0.99 0.99 0.99 1.00 
compared descriptorsOPLBMCDADL
FI versus NP 1.00 0.96 0.93 0.98 0.99 0.60 1.00 0.99 1.00 
FI versus TW 1.00 0.99 0.93 0.94 0.93 0.63 0.98 1.00 1.00 
NPE versus T 1.00 0.96 1.00 0.93 0.95 0.99 0.99 0.99 1.00 

where N  ab is the number of instances in the test set, classified correctly (a = 1) or incorrectly (a = 0) by the classifier G  i, and correctly (b = 1) or incorrectly (b = ) by the classifier G  j.  Q  ∈[ −1, 1] and Q  i,j = 0 for statistically independent classifiers. Classifiers that tend to recognize the same patterns correctly will have Q > 0, and those that commit errors on different patterns will have Q < 0. In this problem, the Q-statistic values are low enough to validate the idea of combining the different approaches.

As a final experiment, in Table 11, we report the results of our ensemble on two other recent datasets from (Shi et al., 2011). The first is a breast cancer dataset (WB) that contains a subset of ER-positive, lymphnode-negative patients who did not received adjuvant treatment. The raw intensity Affymetrix CEL files and normalized data by RMA procedures using Bioconductor packages are used for obtaining a final expression matrix comprising 22 283 features and 209 samples. The 71 patients who developed distant metastases or died within 5 years are classified as poor prognosis subjects, and the 139 patients who remained healthy for >5 years are classified as good prognosis subjects. The second dataset (LA) contains gene expressions of 86 patients with primary lung ADCA; 62 patients were still alive, and 24 patients had died.

Table 11.

AUC obtained in the datasets used in (Shi et al., 2011)

compared descriptorsK = 64K = 128K = 512FFacsa2FVFiNPETWWF
WB 76.41 76.41 75.72 64.58 69.56 73.50 67.93 72.97 74.19 
LA 65.12 68.54 69.03 65.76 66.27 71.53 73.02 67.04 71.31 
avg 70.76 72.47 72.37 65.17 67.91 72.51 70.48 70.00 72.75 
compared descriptorsK = 64K = 128K = 512FFacsa2FVFiNPETWWF
WB 76.41 76.41 75.72 64.58 69.56 73.50 67.93 72.97 74.19 
LA 65.12 68.54 69.03 65.76 66.27 71.53 73.02 67.04 71.31 
avg 70.76 72.47 72.37 65.17 67.91 72.51 70.48 70.00 72.75 

The bold values are the highest performance, the italic values are the values of parameters.

Table 11.

AUC obtained in the datasets used in (Shi et al., 2011)

compared descriptorsK = 64K = 128K = 512FFacsa2FVFiNPETWWF
WB 76.41 76.41 75.72 64.58 69.56 73.50 67.93 72.97 74.19 
LA 65.12 68.54 69.03 65.76 66.27 71.53 73.02 67.04 71.31 
avg 70.76 72.47 72.37 65.17 67.91 72.51 70.48 70.00 72.75 
compared descriptorsK = 64K = 128K = 512FFacsa2FVFiNPETWWF
WB 76.41 76.41 75.72 64.58 69.56 73.50 67.93 72.97 74.19 
LA 65.12 68.54 69.03 65.76 66.27 71.53 73.02 67.04 71.31 
avg 70.76 72.47 72.37 65.17 67.91 72.51 70.48 70.00 72.75 

The bold values are the highest performance, the italic values are the values of parameters.

Notice that all the parameters of WF are obtained using the nine datasets previously used throughout this article. In this test, we arrive at the same main conclusion of the previous test: the fusion, WF, obtains the best average performance.

4 CONCLUSION

The goal of this study was to develop a robust ensemble of SVM classifiers based on feature perturbation for microarray classification. The reported results of our experiments, expressed as both accuracy and AUC, show that our approach performs very well across several datasets. Our study examined an SVM trained using a set of selected genes by Fisher criterion, an SVM trained using the feature set obtained by NPE, a set of SVMs trained using a set of orthogonal wavelet coefficients of different wavelet mothers and a set of SVMs trained using texture descriptors extracted from the microarray, considering it as an image. The positive results we obtain compare well with those reported in the literature and provide further confirmation that ensembles of classifiers obtain more reliable results.

In future studies, we plan on testing our approach using more datasets. We will also study combining additional methods in ensemble construction (e.g. combining our feature perturbation approaches with a pattern perturbation approach).

Conflict of Interest: none declared.

1The MATLAB code was shared by the original authors of FFacsa2, which also shared the code of SVMrfe.

2Implemented as in PRTools (prtools.org/prtools.html).

3Publically available at http://www.chestsurg.org.

REFERENCES

Alon
U.
, et al. 
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
Proc. Natl Acad. Sci. USA
1999
, vol. 
96
 (pg. 
6745
-
6750
)
Beer
D.G.
, et al. 
Gene-expression profiles predict survival of patients with lung adenocarcinoma
Nat. Med.
2002
, vol. 
8
 (pg. 
816
-
823
)
Ben-Dor
A.
, et al. 
Tissue classification with gene expression profiles
J. Comput. Biol.
2003
, vol. 
7
 (pg. 
559
-
583
)
Bertoni
A.
, et al. 
Classification of DNA microarray data with random projection ensembles of polynomial
18th Italian Workshop on Neural Networks.
2009
Vietri sul Mare, Italy
IOS Press
(pg. 
60
-
66
)
Bolón-Canedo
V.
, et al. 
An ensemble of filters and classifiers for microarray data classification
Pattern Recognit.
2012
, vol. 
45
 (pg. 
531
-
539
)
Bontempi
G.
A blocking strategy to improve gene selection for classification of gene expression data
IEEE/ACM Trans. Comput. Biol. Biofrom.
2007
, vol. 
4
 (pg. 
293
-
300
)
Broet
P.
, et al. 
A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments
Bioinformatics
2004
, vol. 
20
 (pg. 
2562
-
2571
)
Brown
M.P.
, et al. 
Knowledge-based analysis of microarray gene expression data by using support vector machines
Proc. Natl Acad. Sci. USA
2000
, vol. 
97
 (pg. 
262
-
267
)
Chee
M.
, et al. 
Assessing genetic information with high-density dna arrays
Science
1996
, vol. 
274
 (pg. 
610
-
614
)
Chen
A.H.
Lin
C.-H.
A novel support vector sampling technique to improve classification accuracy and to identify key genes of leukaemia and prostrate cancer
Expert Syst Appl
2011
, vol. 
38
 (pg. 
3209
-
3219
)
Cheng
Q.
A sparse learning machine for high-dimensional data with application to microarray gene analysis
IEEE/ACM Trans. Comput. Biol. Biofrom.
2010
, vol. 
7
 (pg. 
636
-
646
)
Demsar
J.
Statistical comparisons of classifiers over multiple data sets
J. Mach. Learn. Res.
2006
, vol. 
7
 (pg. 
1
-
30
)
Der
S.D.
, et al. 
Identification of genes differently regulated by interferon alpha, beta, or gamma using oligonucleotide arrays
Proc. Natl Acad. Sci. USA
1998
, vol. 
95
 (pg. 
15623
-
15628
)
Devore
J.
Peck
R.
Statistics: the Exploration and Analysis of Data.
1997
Florence, KY
Duxbury Press
Dudoit
S.
, et al. 
Comparison of discrimination methods for the classification of tumors using gene expression data
J. Am. Stat. Assoc.
2002
, vol. 
97
 (pg. 
77
-
87
)
Figuiredo
M.A.T.
Jain
A.K.
Baysean learning of sparse classifiers
Computer Vision and Pattern Recognition (CVPR '01).
2001
Miami, Florida
IEEE Computer Society
(pg. 
I
-
35-I-45
)
Freije
W.A.
, et al. 
Gene expression profiling of gliomas strongly predicts survival
Cancer Res.
2004
, vol. 
64
 (pg. 
6503
-
6510
)
Fung
G.
Mangasarian
O.L.
Data selection for support vector machine classifiers
Association for Computing Machinery Special Interest Group on Knowledge Discovery and Data Mining.
2000
New York, USA
Association for Computing Machinery
(pg. 
64
-
70
)
Furey
T.S.
, et al. 
Support vector machine classification and validation of cancer tissue samples using microarray expression data
Bioinformatics.
2000
, vol. 
16
 (pg. 
906
-
914
)
Ghorai
S.
, et al. 
Cancer classification from gene expression data by NPPC ensemble
IEEE/ACM Trans. Comput. Biol. Biofrom.
2011
, vol. 
8
 (pg. 
659
-
671
)
Ghosh
D.
Chinnaiyan
A.M.
Classification and selection of biomarkers in genomic data using LASSO
J. Biomed. Biotechnol.
2005
, vol. 
2
 (pg. 
147
-
154
)
Golub
T.R.
, et al. 
Molecular classification of cancer: class discovery and class predition by gene expression monitoring
Science.
1999
, vol. 
286
 (pg. 
531
-
537
)
Gordon
G.J.
, et al. 
Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma
Cancer Res.
2002
, vol. 
62
 (pg. 
4963
-
4967
)
Guyon
I.
, et al. 
Gene selection for cancer classification using support vector machines
Mach. Learn.
2002
, vol. 
46
 (pg. 
389
-
422
)
Hastie
T.
, et al. 
The Elements of Statistical Learning.
2009
New York
Springer
Hayat
M.
Khan
A.
Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition
J. Theor. Biol.
2011
, vol. 
271
 (pg. 
10
-
17
)
He
X.
, et al. 
Neighborhood preserving embedding
Tenth IEEE International Conference on Computer Vision (ICCV'2005)
2005
Beijing, China
IEEE Computer Society
Huang
J.
, et al. 
Decision forest for clssification of gene expression data
Comput. Biol. Med.
2010
, vol. 
40
 (pg. 
698
-
704
)
Huang
T.M.
Keoman
V.
Gene extraction for cancer diagnosis by support vector machines-an improvement
Artif. Intel. Med.
2005
, vol. 
40
 (pg. 
185
-
194
)
Huerta
E.B.
, et al. 
A hybrid LDA and genetic algorithm for gene selection and classification of microarray data
Neurocomputing
2010
, vol. 
73
 (pg. 
2375
-
2383
)
Kokiopoulou
E.
Saad
Y.
Orthogonal Neighborhood Preserving Projections
IEEE International conference on Data Mining.
2005
New Orleans, LA
IEEE Computer Society
Kudo
M.
Sklansky
J.
Comparison of algorithms that select features for pattern classifiers
Pattern Recognit.
2000
, vol. 
33
 (pg. 
25
-
41
)
Kuncheva
L.I.
Whitaker
C.J.
Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy
Mach. Learn.
2003
, vol. 
51
 (pg. 
181
-
207
)
Lai
Y.
, et al. 
Statistical method for identifying diferential gene-gene coexpression patterns
Bioinformatics
2004
, vol. 
20
 (pg. 
3146
-
3155
)
Lee
G.
, et al. 
Investigating the efficiacy of nonlinear dimensionality reduction schemes in classifying gene- and protein-expression studies
IEEE/ACM Trans. Comput. Biol. Biofrom.
2008
, vol. 
5
 (pg. 
368
-
384
)
Liu
H.
, et al. 
A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns
Genome Inform.
2002
, vol. 
13
 (pg. 
51
-
60
)
Liu
Y.
Wavelet feature extraction for high dimensional microarray data
Neurocomputing
2009
, vol. 
72
 (pg. 
985
-
990
)
Luo
L.
, et al. 
Methods of forward feature selection based on the aggregation of classifiers generated by single attribute
Comput Biol Med.
2011
, vol. 
41
 (pg. 
435
-
441
)
Maglietta
R.
, et al. 
Selection of relevant genes in cancer diagnosis based on their prediction accuracy
Artif. Intel. Med.
2007
, vol. 
40
 (pg. 
29
-
44
)
Nanni
L.
Lumini
A.
Wavelet selection for disease classification by DNA microarray data
Expert Syst Appl.
2011
, vol. 
38
 (pg. 
990
-
995
)
Nanni
L.
, et al. 
Local binary patterns variants as texture descriptors for medical image analysis
Artif. Intel. Med.
2010
, vol. 
49
 (pg. 
117
-
125
)
Ojala
T.
, et al. 
Multiresolution gray-scale and rotation invariant texture classification with local binary patterns
IEEE Trans. Pattern Anal. Mach. Intell.
2002
, vol. 
24
 (pg. 
971
-
987
)
Ojansivu
V.
Heikkila
J.
Blur insensitive texture classification using local phase quantization
International Conference on Image and Signal Processing.
2008
Cherbourg-Octeville, France
Springer
(pg. 
236
-
243
)
Orsenigo
C.
Vercellis
C.
An effective double-bounded tree-connected isomap algorithm for microarray data classification
Pattern Recognit. Lett.
2011
, vol. 
33
 (pg. 
9
-
16
)
Paliwal
K.K.
Sharma
A.
Improved direct LDA and its application to DNA microarray gene expression data
Pattern Recognit. Lett.
2010
, vol. 
31
 (pg. 
2489
-
2492
)
Peng
H.
, et al. 
Feature selection on mutual information: criteria of max-dependency, max-relevance, and min-redundancy
IEEE Trans. Pattern Anal. Mach. Intel.
2005
, vol. 
27
 (pg. 
1226
-
1238
)
Peng
Y.
A novel ensemble machine learning for robust microarray data classification
Comput. Biol. Med.
2006
, vol. 
36
 (pg. 
553
-
573
)
Petricoin
E.F.
, et al. 
Use of proteomic patterns in serum to identify ovarian cancer
Lancet
2002
, vol. 
359
 (pg. 
572
-
577
)
Pomeroy
S.L.
, et al. 
Prediction of central nervous system embryonal tumour outcome based on gene expression
Nature
2002
, vol. 
415
 (pg. 
436
-
442
)
Porto-Díaz
I.
, et al. 
A study of performance on microarray data sets for a classifier based on information theoretic learning
Neural Netw.
2011
, vol. 
24
 (pg. 
888
-
896
)
Pudil
P.
, et al. 
Floating search methods in feature selection
Pattern Recognit. Lett.
1994
, vol. 
5
 (pg. 
1119
-
1125
)
Roth
V.
The generalized LASSO
IEEE Trans. Neural Netw.
2004
, vol. 
15
 (pg. 
16
-
18
)
Roweis
S.
Saul
L.
Nonlinear dimensionality reduction by locally linear embedding
Science
2000
, vol. 
290
 (pg. 
2323
-
2326
)
Shen
L.
Tan
E.C.
Dimension reduction-based penalized logistic regression for cancer classification using microarray data
IEEE/ACM Trans. Comput. Biol. Biofrom.
2005
, vol. 
2
 (pg. 
166
-
175
)
Shi
P.
, et al. 
Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction
BMC Bioinformatics
2011
, vol. 
12
 pg. 
375
 
Shipp
M.A.
, et al. 
Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning
Nat. Med.
2002
, vol. 
8
 (pg. 
68
-
74
)
Singh
D.
, et al. 
Gene expression correlates of clinical prostate cancer behavior
Cancer Cell
2002
, vol. 
1
 (pg. 
203
-
209
)
Statnikov
A.
, et al. 
A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
BMC Bioinformatics
2008
, vol. 
9
 pg. 
319
 
Stiglic
G.
, et al. 
Finding optimal classifiers for small feature sets in genomics and protoemics
Neurocomputing
2010
, vol. 
73
 (pg. 
2346
-
2352
)
Tahir
M.
, et al. 
Protein subcellular localization of fluorescence imagery using spatial and transform domain features
Bioinformatics.
2011
 
doi:10.1093/bioinformatics/btr624
Tamayo
P.
, et al. 
Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation
Proc. Natl Acad. Sci. USA
1999
, vol. 
96
 (pg. 
2907
-
2912
)
Tan
A.C.
Gilbert
D.
Ensemble machine learning on gene expression data for cancer classification
Appl. Bioinformatics
2003
, vol. 
2
 (pg. 
75
-
83
)
Tibshirani
R.
, et al. 
Diagnosis of multiple cancer types by shrunken centroids of gene expression
Proc. Natl Acad. Sci. USA
2002
, vol. 
99
 (pg. 
6567
-
6572
)
Tibshirani
R.
, et al. 
Class predition by nearest shrunken centroids, with application to DNA microarrays
Stat. Sci.
2003
, vol. 
18
 (pg. 
104
-
117
)
Turashvilli
G.
, et al. 
Novel markers for differentiation of lobular and ductal invasive breast carcinomas by laser microdissection and microarry analysis
BMC Cancer
2007
, vol. 
7
 pg. 
55
 
van 't Veer
L.J.
, et al. 
Gene expression profiling predicts clinical outcome of breast cancer
Nature
2002
, vol. 
415
 (pg. 
530
-
536
)
Ye
J.
Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems
J. Mach. Learn. Res.
2005
, vol. 
6
 (pg. 
483
-
502
)

Author notes

Associate Editor: Martin Bishop

Supplementary data