Microarray technology is a powerful technique that allows the simultaneous study of thousands of gene transcripts. During the past two years there has been an explosion of publications describing experiments utilizing microarray technology that range from original research findings from biological paradigms to mathematically modeled systems. However, neuroscientists using microarray technology face significant challenges due to high tissue complexity, low abundance transcripts, and small magnitude changes in transcript levels that have significant biological impact. This manuscript describes a series of studies designed to address issues regarding microarray sensitivity, ability of microarrays to detect subtle changes, and reproducibility of microarray experiments, all in the context of neuronal tissue. From the presentation of these studies, the authors argue that although microarray technology is limited with regards to sensitivity, the outcome of these experiments, if approached with appropriate skepticism, can be fruitful in the generation of hypotheses and seeding of future experiments.
DNA microarrays provide an opportunity for discovery of complex and coordinated transcriptional control in biological systems. Microarray technology has already advanced our understanding in areas such as cancer, development, microbial genetics, and drug discovery (for reviews see (Butte, 2002; Chung et al., 2002; Joyce et al., 2002; Stathopoulos and Levine, 2002)) to name a few. However, the power that microarrays bring to biological researchers is not without limitation. There are considerable issues regarding sensitivity and reproducibility of microarray studies and availability of genomic tools to comparative model systems. Neuroscience, in particular, presents several challenges to microarray experiments due to high tissue complexity, a large proportion of low abundance transcripts and subtle changes in transcript levels that have significant biological impact. Furthermore, because of the large number of observations (transcript species being monitored) in any given microarray experiment there is an unavoidably high number of false positives and because of the limited sensitivity of microarray technology (Evans et al., 2002) there is also a false negative rate, which is difficult to measure. Most microarray experiments utilize confirmatory techniques such as, northern blot, real-time PCR, or in situ hybridization (ISH) to confirm and expand microarray results. These confirmatory experiments are absolutely necessary to eliminate false positives from microarray data, however, have their own limitations so should not be over-interpreted. For example real-time PCR and northern blot analysis have no more anatomical resolution than the original dissection of material and cannot easily detect small differences in gene expression (as is also the case for microarrays). ISH is more anatomically descriptive than these other techniques, but can have limitations with regard to sampling efficiency since usually only a few thin tissue sections are assumed to be representative of a larger structure. Thus the combination of multiple techniques is optimal. Several successful microarray studies in neuroscience have used microarrays as a filter to generate large lists of candidate genes and have then refined these lists with real-time PCR and ISH (e.g.,Mirnics et al., 2000; Zhao et al., 2001; Zirlinger et al., 2001). It should be stated that the majority of studies using any of the techniques identified above have been qualitative and only examine relative transcriptional regulation. Quantification of gene expression requires a much larger battery of controls to determine specific transcript concentration and although these experiments are possible they will not be discussed in this report.
In this manuscript we will focus largely on the issue of false negative rates. First, we will present data evaluating microarray sensitivity in a complex neuronal tissue (hippocampus). Second, we will present studies evaluating the performance of microarrays in the detection of know transcriptional changes in the hypothalamus of adrenalectomized rats. Finally, we will present data from microarray experiments replicated across independent laboratories, which address reproducibility of microarray experiments.
Male Sprague-Dawley Rats were purchased either adrenalectomized or sham operated from Charles Rivers Laboratories (Wilmington, MA). Adrenalectomized animals were given 0.9% saline in their drinking water. Animals were received 5 days post surgery and allowed to recover for another 5 days on site before being sacrificed 10 days post surgery. Brains were quickly removed and either dissected for microarrays and real-time PCR or snap frozen for ISH. Dissections were frozen on dry ice and stored at −80°C until RNA extractions. All animals were treated in strict accordance with guidelines of the animal ethics committee at the University of Michigan.
RESULTS AND DISCUSSION
All studies used total RNA as the starting material since we do not find any advantage to using poly A+ mRNA. The purification of poly A+ mRNA and the amount of material needed for the initial input actually requires approximately ten times more tissue and provides no detectable benefit (data not shown).
In order to estimate the sensitivity of microarrays in the detection of transcripts across a wide range of expression and in complex tissue, we utilized an existing study by Datson et al. that describes the hippocampal transcriptome using serial analysis of gene expression (SAGE) (Datson et al., 2001). SAGE is a nucleotide sequencing based technique, originally described by Velculescu et al. (Velculescu et al., 1995), that estimates expression levels of individual transcripts through modified, high throughput DNA sequencing. With this technique concatemerized cDNA is derived from a biological sample where each transcript is represented by short sequence “tags” that are generally long enough to uniquely identify the transcript in a sequence database. The abundance of these “tags” in the SAGE sequence data is proportional to the abundance of the transcript in the original biological sample and thus statements can be made about relative expression levels of each transcript species in the original biological source.
By comparing data from 44 Gene Chips, that were hybridized with material derived from rat hippocampus, to Datson's SAGE data (also derived from rat hippocampus) we were able to estimate the percentage of hippocampal transcripts that are reliably detected by Gene Chip technology. This work was originally described in detail by Evans et al. (Evans et al., 2002). In this comparative study we showed that high abundance transcripts are reliably detected by Affymetrix Gene Chips but that intermediate abundance transcripts are less reliably detected and many low abundance transcripts are likely to go completely undetected (Fig. 1a). Furthermore, since there are relatively few high abundance transcripts, with low abundance transcripts representing the majority of unique transcriptional species in the hippocampus (Datson et al., 2001) (Fig. 1b), a large percentage of genes are likely to go undetected by current Gene Chip technology. By extrapolating the data described in both previous studies (Datson et al., 2001; Evans et al., 2002) we can estimate that approximately one third of hippocampal transcripts are reliably detected by Gene Chip technology, approximately one third are completely undetected, and the final third fall into a range of unreliable detection.
Limited annotation of detected transcripts was also described by both of the above studies and it was shown that many transcripts of interest to neuroscientists (e.g., neuropeptides, neurotransmitter receptors, signal transduction molecules, etc.) are typically expressed at low levels. The implication of this is that many molecules of interest in relevant experimental paradigms are simply not being reliably detected. To exemplify this, the glutamate system, which is important to hippocampal function, was highlighted in the study by Evans and co-workers. This showed that while some relatively abundant glutamatergic transcripts (e.g., glutamine synthetase, GluRB, and glutamate transporter) were reliably detected in the hippocampal transcriptome, other less abundant, but biologically important glutamatergic transcripts (e.g., GluR1 and NMDAR2C) were not reliably detected (Evans et al., 2002).
These sensitivity limits of microarray technology, described above, obviously have an impact on how data from a complex biological source should be interpreted. If only a few low abundance transcripts are key regulation points in a given system then a significant proportion of desired data is likely to be missed. If, however, intermediate to high abundance transcripts are regulated, then these data are more likely to be revealed by Gene Chip technology. A more realistic scenario is that multiple transcripts will be regulated, either primarily or secondarily, in a given experimental paradigm. In this case the identification of a few regulated transcripts by microarray technology may be sufficient to focus the investigators attention on a given system that can be further studied by more sensitive or anatomically descriptive techniques, such as RT-PCR or ISH, respectively. The on-going advancements in bioinformatics tools certainly aid this approach. Gene Ontology databases (i.e.,http://www.geneontology.org) relate transcript species to each other through common functional roles, metabolic databases (i.e.,http://www.genome.ad.jp/kegg/metabolism.html) relate transcript species through biochemical pathways and newer databases are being developed to relate transcripts to each other through networks that are extrapolated from automated literature mining. All of these tools can be utilized to identify underlying pathways or systems that are represented in the microarray data as being relevant to the experimental model.
Beyond the question of sensitivity, which addresses absolute detection, we also explored the question of performance, which addresses the ability to detect differential expression between experimental groups. For these studies we evaluated individual rat hypothalami from two groups of animals that had either been adrenalectomized or sham operated. Adrenal steroids are transcription-regulating hormones that are important in homeostatic maintenance and are key players in stress responsiveness. The ADX model is well established in the study of adrenal steroid-responsive genes and a large body of literature exists that describes specific transcriptional changes following ADX, especially in the hypothalamus. Thus, these previous studies provide a framework against which we can judge the success of microarray technology in detecting known changes in hypothalamic transcript levels in our own adrenalectomized animals.
Table 1 lists several transcripts that have been previously reported to change following ADX and the results of the transcriptional changes found in our microarray data (Aguilera et al., 1995; Akabayashi et al., 1994a, b; Albeck et al., 1994; Davis et al., 1986; Day et al., 1999; Gozes et al., 1994; Grino et al., 1990; Hedlund et al., 1994; Kakucska et al., 1995; Lightman and Young et al., 1989; Makino et al., 2000, 1995, 1997; Pelletier, 1993; Ryan et al., 1997; Wardlaw et al., 1998; Wisialowski et al., 2000). The second column shows the concordance of the microarray data with that previously reported in the literature. Since there is no clear ideal method of data analysis for microarray data several different analytical methods were evaluated in this study, including MAS 5.0 (Affymetrix, inc.), d-Chip ((Li and Wong, 2001), available at http://biosun1.harvard.edu/complab/dchip), and RMA ((Irizarry et al., 2003), available at http://www.bioconductor.org). A “+” was assigned to a strong finding in the microarray data, where differential expression was observed by most analytical methods we employed. A “±” was assigned to weaker findings in the microarray data, where differential expression was observed by some analytical methods but was missed or only showed a trend toward significance by other established methods. A “−” was assigned if differential expression was not observed by any analytical method we employed.
Immediately obvious in this data is that known-regulated transcripts with higher expression levels (as represented by mean signal intensity) were those observed to be differentially expressed in the microarray data. In fact, with the exception of corticotropin-releasing hormone (CRH), all of the transcripts in this table that were determined to be differentially expressed by microarray were of higher mean signal intensity than all those that showed no differential expression in the array data. This supports the above conclusions regarding sensitivity limitations of microarray technology.
There is, however, no distinct threshold for the limitation of sensitivity regarding differential expression. For example, CRH, is strongly expressed in the peraventricular nucleus (PVN) of the hypothalamus but is a low abundance transcript in a gross hypothalamic dissection. Yet because this transcript is robustly regulated by ADX in all of the cells it is expressed in, within the hypothalamus, (Makino et al., 1995) the change in expression was easily detected by microarray. Also, differential expression of angiotensinogen, which is globally regulated throughout the hypothalamus (Ryan et al., 1997) was easily detected by microarray. In contrast, differential expression of AVP, which is a relatively abundant transcript in the hypothalamus, was only observed with a subset of analytical algorithms. This is likely due to the fact that AVP regulation by ADX is restricted to the medial parvocellular subdivison of the PVN, while no change occurs in AVP containing neurons in the supraoptic (SON) or suprachiasmatic (ScN) nuclei (Davis et al., 1986). Thus, the net change of total hypothalamic AVP following ADX is subtle. Galanin and NPY, however, which are regulated by ADX in opposing directions in different hypothalamic nuclei (Akabayashi, 1994a, b; Hedlund et al., 1994; Makino et al., 2000) were not found to be differentially expressed, even though they were reliably detected. This is most likely due to the fact that opposing directional changes in distinct hypothalamic nuclei would tend to cancel each other out in the context of a gross dissection.
The performance studies we have described using the ADX model highlight the difficulty of detecting cell-type specific transcriptional regulation in a complex tissue. Obviously enriched dissections that decrease tissue heterogeneity would improve such microarray analyses. In fact, microarray analysis of PVN punches increases the signal of the CRH transcript by approximately 10-fold over that from a gross hypothalamic dissection (R. C. Thompson, personal communication), as expected. Thus the further an investigator can refine a dissection of the source material, the more likely cell-type specific transcriptional regulation will be revealed. However, this is not always feasible if specific nuclei relative to an experimental paradigm have not been identified. Contrarily, care must be taken not to over-enrich a sample. With the advent of laser-capture microscopy, which allows the isolation of individual cells from tissue sections, it is becoming feasible to extract RNA and generate microarray data from a handful of cells. However, in these cases sampling efficiency may become a significant issue. This investigator must ensure that the cells selected are indeed representative of a larger population being targeted for study.
To evaluate reproducibility we have analyzed several samples at three independent laboratories. In these studies total RNAs, extracted from three structurally discrete regions (cerebellum (CB), anterior cingulate cortex (AnCg), and dorsolateral prefrontal cortex (DLPFC)) of post-mortem human brains, were aliquoted and distributed to each of the laboratories for independent RNA processing and microarray hybridizations. Previous reports describe a detailed analysis of these data for brain region differences (Evans et al., 2003) and gender differences (Vawter et al., 2003), however, in this report we focus on technical variance.
Figure 2 shows a hierarchical cluster analysis (Pearson correlation) of all samples used in this study. The figure shows that the primary factor in this analysis is the biological difference between divergent brain regions (CB vs. neocortical samples), which fall into two distinct clusters. The second factor apparent in this analysis was processing site, where arrays tended to cluster with other arrays run in the same laboratory. Third, this analysis showed an individual subject effect, where similar samples (AnCg and DLPFC) from the same subject tended to cluster as nearest neighbors in most cases.
These reproducibility studies show that technical variance across laboratories is relatively high so care should be taken in pooling or extrapolating data generated at different sites. However, since this study we have calibrated our laser scanners to be consistent with each other and have vastly improved the correlation between the same samples processed at different sites. In fact, in recent experiments technical replicates (same biological sample processed at 2 different sites) cluster as nearest neighbors greater 50% of the time (data not shown).
Also apparent in these data is that similar samples processed in the same laboratory (AnCg and DLPFC from the same individual) have a tight correlation, since samples from the same subjects frequently clustered together. From this we can conclude that in a complex biological system the systematic error introduced by microarray technique within a laboratory is lower than the random error (biological variance). Based on our recent studies we can also conclude that with appropriate measures the systematic error introduced across laboratories can be reduced so that the biological variance becomes the dominant error factor.
In summary, we have described limitations of microarray technology that we hope will aid researchers in their design of microarray experiments. All examples discussed in this report were based on experiments with Affymetrix Gene Chips, however, we believe that the general conclusions are appropriate for other DNA microarray platforms, although precise thresholds will vary to some degree. In spite of the limitations outlined above, microarray technology is incredibly powerful tool to investigate complex gene expression relationships. If the data are approached carefully and the false negative rate is understood so as not to over-interpret the lack of observed transcriptional regulation, then microarray experiments can be very fruitful. A few findings in a given molecular circuit or pathway are all that are necessary to direct the researchers attention appropriately and these results can seed future projects to examine systems by more classical methodology.
From the Symposium Contemporary Approaches to Endocrine Signaling presented at the Annual Meeting of the Society for Integrative and Comparative Biology, 4–8 January 2003, at Toronto, Canada.
This work was supported by NIH Program Project Grant #5P01MH42251 and the Pritzker Consortium for Severe Psychiatric Disorders, Pritzker Family Philanthropic Foundation.