Motivation: High-density oligonucleotide arrays (GeneChip, Affymetrix, Santa Clara, CA) have become a standard research tool in many areas of biomedical research. They quantitatively monitor the expression of thousands of genes simultaneously by measuring fluorescence from gene-specific targets or probes. The relationship between signal intensities and transcript abundance as well as normalization issues have been the focus of much recent attention (Hill et al., 2001; Chudin et al., 2002; Naef et al., 2002a). It is desirable that a researcher has the best possible analytical tools to make the most of the information that this powerful technology has to offer. At present there are three analytical methods available: the newly released Affymetrix Microarray Suite 5.0 (AMS) software that accompanies the GeneChip product, the method of Li and Wong (LW; Li and Wong, 2001), and the method of Naef et al. (FN; Naef et al., 2001). The AMS method is tailored for analysis of a single microarray, and can therefore be used with any experimental design. The LW method on the other hand depends on a large number of microarrays in an experiment and cannot be used for an isolated microarray, and the FN method is particular to paired microarrays, such as resulting from an experiment in which each ‘treatment’ sample has a corresponding ‘control’ sample. Our focus is on analysis of experiments in which there is a series of samples. In this case only the AMS, LW, and the method described in this paper can be used. The present method is model-based, like the LW method, but assumes multiplicative not additive noise, and employs elimination of statistically significant outliers for improved results. Unlike LW and AMS, we do not assume probe-specific background (measured by the so-called mismatch probes). Rather, we assume uniform background, whose level is estimated using both the mismatch and perfect match probe intensities.
Results: We present a new method for GeneChip analysis, based on a statistical model with multiplicative noise. We demonstrated that this method yields results superior to those obtained by the Affymetrix Microarray Suite 5.0 software and to those obtained by the model-based method of Li and Wong (Li and Wong, 2001). The present method eliminates the hard-to-interpret negative expression indices, and the binary‘ presence’ calls (present or absent) are replaced by the statistical significance (p-value) of gene expression. We have found that thresholding the p-values at the (0.1)16–level produces about the same number of ‘present’ calls as the AMS software. By testing our method on a pair of replicate GeneChips (hybridized with the same cRNA), we found that 95.6% of data points lie within the 1.25–fold interval. In other words, our method had a 4.4% type I error rate at the 1.25–fold level. The error rate of the LW method was 15%, and that of the AMS method was 29%. There were no points outside the 2–fold interval with the present method. Analysis of variance (ANOVA) of another experiment with multiple replicates shows that this reduction of variance is not accompanied by a corresponding reduction of signal. On the contrary, the signal-to-noise ratio (as measured by the distribution of F-statistics) of the present method is on average 3.4-times better than that of AMS, and 1.4-times better than that of Li and Wong.
Availability: A Fortran 90 source code of this method is available from the corresponding author upon request.
To whom correspondence should be addressed.