Motivation: Systematic differences due to experimental features of microarray experiments are present in most large microarray data sets. Many different experimental features can cause biases including different sources of RNA, different production lots of microarrays or different microarray platforms. These systematic effects present a substantial hurdle to the analysis of microarray data.

Results: We present here a new method for the identification and adjustment of systematic biases that are present within microarray data sets. Our approach is based on modern statistical discrimination methods and is shown to be very effective in removing systematic biases present in a previously published breast tumor cDNA microarray data set. The new method of ‘Distance Weighted Discrimination (DWD)’ is shown to be better than Support Vector Machines and Singular Value Decomposition for the adjustment of systematic microarray effects. In addition, it is shown to be of general use as a tool for the discrimination of systematic problems present in microarray data sets, including the merging of two breast tumor data sets completed on different microarray platforms.

Availability: Matlab software to perform DWD can be retrieved from https://genome.unc.edu/pubsup/dwd/

Supplementary information: The complete figures that represent the cluster diagrams in Figure 6 and other figures are available at https://genome.unc.edu/pubsup/dwd/

To whom correspondence should be addressed.

Author notes

1Department of Statistics and Econometrics, University of Carlos III, Madrid, Spain, 2Lineberger Comprehensive Cancer Center, 3Department of Genetics and 4Department of Pathology and Laboratory Medicine, University of North Carolina, Chapel Hill, NC 27599-7264, USA, 5Department of Molecular Medicine, Karolinska Institutet, S 17176 Stockholm, Sweden and 6Department of Statistics, University of North Carolina, Chapel Hill, NC 27599-3260, USA