Motivation: We examine the effect of replication on the detection of apparently differentially expressed genes in gene expression microarray experiments. Our analysis is based on a random sampling approach using real data sets from 16 published studies. We consider both the ability to find genes that meet particular statistical criteria as well as the stability of the results in the face of changing levels of replication.

Results: While dependent on the data source, our findings suggest that stable results are typically not obtained until at least five biological replicates have been used. Conversely, for most studies, 10–15 replicates yield results that are quite stable, and there is less improvement in stability as the number of replicates is further increased. Our methods will be of use in evaluating existing data sets and in helping to design new studies.

Supplementary information:http://microarray.cpmc.columbia.edu/pavlidis/pub/gxrep

To whom correspondence should be addressed.
Formerly William Noble Grundy: see www.gs.washington.edu/~noble/name-change.html

Author notes

1Columbia Genome Center, Columbia University, 1150 St. Nicholas Avenue, New York, NY 10032, USA, 2Department of Computer Science, Columbia University, 1214 Amsterdam Avenue, New York, NY 10027, USA and 3Department of Genome Sciences, University of Washington, 1705 NE Pacific Street, Seattle, WA 98195, USA