Abstract

Motivation: Gene expression profiling is an important tool for gaining insight into biology. Novel strategies are required to analyze the growing archives of microarray data and extract useful information from them. One area of interest is in the construction of gene association networks from collections of profiling data. Various approaches have been proposed to construct gene networks using profiling data, and these networks have been used in functional inference as well as in data visualization. Here, we investigated a non-parametric approach to translate profiling data into a gene network. We explored the characteristics and utility of the resulting network and investigated the use of network information in analysis of variance models and hypothesis testing.

Results: Our work is composed of two parts: gene network construction and partitioning and hypothesis testing using sub-networks as groups. In the first part, multiple independently collected microarray datasets from the Gene Expression Omnibus data repository were analyzed to identify probe pairs that are positively co-regulated across the samples. A co-expression network was constructed based on a reciprocal ranking criteria and a false discovery rate analysis. We named this network Reference Gene Association (RGA) network. Then, the network was partitioned into densely connected sub-networks of probes using a multilevel graph partitioning algorithm. In the second part, we proposed a new, MANOVA-based approach that can take individual probe expression values as input and perform hypothesis testing at the sub-network level. We applied this MANOVA methodology to two published studies and our analysis indicated that the methodology is both effective and sensitive for identifying transcriptional sub-networks or pathways that are perturbed across treatments.

Contact:Nathan.Siemers@bms.com or Ruiru.Ji@bms.com

1 INTRODUCTION

The technique of transcriptional profiling, as well as related proteomic and metabonomic methodologies, is producing biological state information in a breadth and depth that remains challenging to interpret. A diverse arsenal of analytical tools are typically applied to analyze such data (Quackenbush, 2001). A variety of global, unsupervised learning methods such as principle component analysis (PCA) and various clustering algorithms are often used to initially characterize a dataset (Eisen et al., 1998; Tamayo et al., 1999). The discovery of significantly regulated transcripts through hypothesis testing, including the use of t-tests or more complex ANOVA models, is often an important goal. Linear or non-linear models can also be built to predict biological response (Golub et al., 1999; Quackenbush, 2006). Classification schemes such as the Gene Ontology (Ashburner and Lewis, 2002) are often utilized to macroscopically classify and conceptualize the microscopic behavior of probes, and in some cases provide superior sensitivity and interpretability

Recently, there has been a growing interest in producing gene networks from profiling data (Bergman et al., 2004; van Noort et al., 2004; Lee et al., 2004). In these representations, genes constitute the nodes of the network and the edges between nodes indicate gene associations (e.g. co-expression). As the application of graph methods and network inference algorithms to these data becomes more advanced, there may be some advantages to these approaches (Mao and Resat, 2004; Mehra et al., 2004; Rice et al., 2005; Pena et al., 2005).

In this study, we explored the use of a non-parametric, rank-based methodology to transform profiling data into a gene network. Our methodology exploits principles of reciprocal best match pairs, often used in the area of gene sequence analysis, now applied to transcriptional profiling data. In addition, we applied a False Discovery Rate (FDR) analysis step to characterize and control the signal to noise ratio in the network. Graph partitioning and clustering algorithms were then utilized to extract functionally and topologically coherent sub-networks. The effectiveness of these algorithms was investigated based on domain knowledge from curated biological pathway databases. We also investigated the utility of a gene network (built via analysis of a compendium of human data available in the NCBI Gene Expression Omnibus) in enhancing the analysis and interpretation of individual profiling experiments. Specifically, we proposed to use the sub-networks extracted from the association network, presumably representing biological functional units, as groups in set-based enrichment analyses.

The goal of gene set analysis is to determine whether genes of a defined set are differentially affected by a treatment. The major advantage of gene set analysis over single gene analysis is that the formal provides a unifying theme for data interpretation, often shedding light on the underlying biological mechanisms. To this date several gene set analysis approaches have been proposed. Some of them, such as GO Term Finder (Boyle et al., 2004) and Gene Set Enrichment Analysis (Subramanian et al., 2005), require a prior generation of a rank-ordered gene list. Another test, parametric analysis of gene set enrichment (PAGE), does not require a sorted gene list but need genes' fold changes or P-values from hypothesis testing as inputs (Kim and Volsky, 2005). Recently, Hotelling's T-square test has been proposed as a multivariate approach for network/pathway significance analysis (Lu et al., 2005). This method takes multiple dependent variables (i.e. genes from a particular set) and tests if there is a significant difference between two groups (e.g. treated versus control). Several studies (Lu et al., 2005; Vinciotti et al., 2006; Xiong, 2006) have successfully utilized this approach to detect differentially regulated genes or gene groups. There are three potential advantages to this type of approaches. First, no ordered gene list needs to be generated. Second, it is more robust to type I error (false positives). Third, the set significance test may be more sensitive as it is computed directly from expression data instead of using some kind of enrichment score. Nevertheless, the application of Hotelling's T-square test is limited as it can only deal with two treatment groups.

Here, we proposed and investigated the utility of a new, MANOVA-based statistical model that allows hypothesis testing across sub-networks extracted from the reference network. This methodology overcomes the limitation of Hotelling's T-square test and can be applied to a wider range of experimental designs.

2 METHODS

The overview of the proposed methodology is illustrated in Figure 1. In the first step, a reference gene association (RGA) network was constructed from publicly available microarray profiling data. The network was partitioned in step two to extract co-expression sub-networks. These sub-networks were subsequently fed into a MANOVA-based procedure for hypothesis testing in individual profiling experiments. Throughout this article, gene and probe are used interchangeably.

Fig. 1.

Schematic view of the proposed methodologies. In the first step, RGA network construction, transcriptional profiling projects downloaded from the Gene Expression Omnibus (GEO) were pooled and Pearson correlation coefficients were calculated for every probe pairs across all samples. Probe pairs were then sorted based on a reliability score calculated from reciprocal co-expression ranks. FDR analysis based on known pathway annotations was employed to find a suitable cutoff for inclusion of probe pairs in the resulting network. In the second step, RGA Network was partitioned to identify functional modules within the network. The sub-networks were annotated with their significantly enriched GO terms. In the last step, a new, MANOVA-based model was proposed for hypothesis testing at the sub-network level.

Fig. 1.

Schematic view of the proposed methodologies. In the first step, RGA network construction, transcriptional profiling projects downloaded from the Gene Expression Omnibus (GEO) were pooled and Pearson correlation coefficients were calculated for every probe pairs across all samples. Probe pairs were then sorted based on a reliability score calculated from reciprocal co-expression ranks. FDR analysis based on known pathway annotations was employed to find a suitable cutoff for inclusion of probe pairs in the resulting network. In the second step, RGA Network was partitioned to identify functional modules within the network. The sub-networks were annotated with their significantly enriched GO terms. In the last step, a new, MANOVA-based model was proposed for hypothesis testing at the sub-network level.

2.1 Data collection and pre-processing

We collected experiments within NCBI Gene Expression Omnibus (GEO) that were studied with the Affymetrix HG-U133A chip (Affymetrix Inc., Santa Clara, CA, USA). The starting data consisted of 525 human tissue and cell line samples from 26 independently conducted projects. We excluded projects that include primary tumor samples as we wanted to minimize the possible influence of gross cytogenetic abnormalities, amplifications and deletions on network construction. The resulting final dataset consisted of 393 human tissue and cell line samples from 21 different projects. The Affymetrix CEL files were analyzed with the MAS 5.0 algorithm from Affymetrix and standardized using quantile normalization (Bolstad et al., 2003), making use of the publicly available Bioconductor tools (www.bioconductor.org).

2.2 RGA network construction

In order to identify probes that are co-expressed across multiple projects, we first calculated Pearson correlation coefficients between every probe pair across all samples from the selected microarray projects. Formally, the Pearson correlation coefficient between probes pA and pB is defined as:  
formula
(1)
where Ai and Bi are the expression levels of pA and pB in the i th sample and A and B are the average expression levels of pA and pB, respectively. In this work, we based our analysis only on positive correlations as it guarantees to include the strongest correlations among gene pairs. In prior work, Goldenberg et al. (Goldenberg and Moore, 2004), demonstrated that positive correlations are much more significant than negative correlations, in sparsely connected networks. The implicit importance of positive correlation over the negative ones is also suggested in previous research in the context of microarray studies (Allocco et al., 2004). In our context, the goal is to derive a network that is relatively sparse, so in this sense by considering only the positive correlations we ensure that the strongest or most significant correlations among gene pairs are captured by our model.
A rank ordered list of co-expressed neighbors was enumerated for each probe with rank 1 being the most correlated. Note that, although the Pearson correlation scores are symmetric, rank order of probe A with respect to probe B (say, ROAB), does not necessarily equal to rank of probe B with respect to probe A (say, ROBA). Thus, two reciprocal rank values are associated with each probe pair (i.e. ROAB and ROBA). Accordingly, for any probe pair, pA and pB, we define its reliability score as:  
formula
(2)

Thus, probe pairs with smaller reciprocal rank orders have higher reliability scores. We systematically computed reliability scores for every probe pair based on the reciprocal ranks. Pairs were sorted with respect to their scores starting from the most reliable one.

Essentially the network construction step can be viewed as gradual additions of increasingly unreliable pairs. One can imagine that, with the increasing number of probe pairs in a network, the noise in the network is also increasing. To find a proper cutoff point, we adopted an FDR analysis step to quantify the signal to noise ratio in a given network. Two probes linked in the network can either be functionally relevant true positive (TP) or irrelevant false positive (FP). To quantify the relevance of probe pairs, we made use of curated pathway annotations from the KEGG, BioCarta and GenMAPP databases. Accordingly, a link between two probes that share pathway annotation in any of the databases is labeled as TP. Conversely, a link between two probes that are annotated with two non-overlapping sets of pathways is labeled as FP. Probe pairs where at least one probe is not annotated are defined as non-discriminatory (ND) as current knowledge is not sufficient to conclude whether they are functionally relevant. Such pairs are excluded from the FDR calculation.

Effect of the FDR cutoff on the resulting network and criteria taken into consideration while setting this parameter are discussed in Section 3.1. In order to include pairs labeled as ND, a reliability score threshold corresponding to the chosen FDR cutoff was obtained and all probe pairs that have higher reliability scores than this threshold were included in the final reference network.

To check the significance of the network, we generated random datasets using the random number generator tool from the Partek software package (www.partek.com). Random datasets of normal, gamma and uniform distributions, and of overall dimensions comparable to the experimental data were generated. These datasets were processed the same way as the real dataset and the sorted probe pair lists were analyzed and compared to that of RGA.

2.3 RGA network partitioning

Many published graph partitioning algorithms can be used to partition the co-expression network into densely- connected modules. Clustering algorithms can also be employed utilizing the reliability scores of probe pairs as the similarity values. To find an optimal algorithm, two graph partitioning algorithms, Graclus (Dhillon et al., 2005) and Metis,1 as well as a clustering algorithm, Agglomerative Hierarchical Clustering were employed. We utilized a fast implementation of agglomerative clustering from CLUTO, clustering toolkit.2

We utilized the GO Term Finder tool (Boyle et al., 2004) to evaluate whether the extracted sub-networks were enriched in genes of similar functions and/or involved in similar biological processes. This analysis served two purposes. First, each sub-network was annotated with the best GO terms that are significantly enriched. Second, the optimal algorithm was selected based on the number of annotated sub-networks and the significance of the annotations (i.e. P-values).

2.4 MANOVA-based sub-network significance testing

Sub-networks extracted from the RGA network usually consist of co-regulated probes involved in a common biological function or process. We adopted a MANOVA-based approach to assess transcriptional response at the sub-network level.

MANOVA is a statistical method that can be used to measure treatment effects across multiple dependent variables simultaneously. In our case, we treated each probe of a sub-network as a response variable. By combining all probes from the same sub-network, we can access the response of that sub-network as a whole to some experiment factor.

For experiments where there is only one independent variable (e.g. treatment), our MANOVA model is formulated as follows:  
formula
(3)
where pi is the i th probe in the sub-network and t is the treatment factor. Note that each probe is treated as a dependent variable. Using this model, we assessed the following hypotheses for each of the sub-networks.

Hnull = Sub-network is not affected by different treatments.

Ha = Sub-network is significantly affected by different treatments.

Note that the model may be expanded to accommodate more complex experimental designs and interactions between factors.

To validate the effectiveness of the approach, we applied it to two published studies on HIV protease inhibitors and cigarette smoking. Significantly affected sub-networks were analyzed and compared to known biological effects of the drugs.

3 RESULTS AND DISCUSSION

3.1 RGA network construction and significance verification

Microarray datasets generated from the Affymetrix HG-U133A platform were analyzed and a reference gene association network was constructed as described in Section 2. Primary tumor samples were excluded from the original data to minimize possible influence from cytogenetic abnormalities. For example, we often observed probe–probe correlations between probes mapped to same chromosomal regions commonly amplified or deleted in cancer if primary tumor samples were included in the sample set (data not shown).

We employed a rank-based approach that avoids any assumption of data distribution since correlation coefficients calculated from profiling data are not of the well known normal or t-distribution (Allocco et al., 2004). We used mutual rank orders to quantify the strength of co-expression between probe pairs. Every probe pair was assigned a reliability score calculated from the two reciprocal correlation ranks between the pair. These scores were used to sort the probes pairs starting from the most reliable ones. Networks were constructed by incrementally adding probe pair to the networks starting from the top of the sorted list.

To estimate the signal-to-noise ratio in a network, FDR analysis was adopted. As described in Section 2, each probe pair was given a label of TP, FP, or ND (non-discriminatory) based on the pathway annotations from the KEGG, BioCarta and GenMAPP databases. As depicted in Figure 2, FDR of the network increases with increasing number of probe pairs, indicating that reliability scores correlate well with the biological relevance of probe pairs.

Fig. 2.

Probe pairs calculated from the real expression data have significantly lower FDRs than those of random datasets. As described in Section 2, a sorted probe pair list was generated from each dataset and every probe pair was then given a label of TP, FP or ND based on the pathway annotations. False positive rates were calculated for the total probe pair numbers indicated on the graph. The x-axis indicates the total number of probe pairs considered starting from the top of the sorted probe pair list and the y-axis indicates the corresponding FDR for that set of probe pairs.

Fig. 2.

Probe pairs calculated from the real expression data have significantly lower FDRs than those of random datasets. As described in Section 2, a sorted probe pair list was generated from each dataset and every probe pair was then given a label of TP, FP or ND based on the pathway annotations. False positive rates were calculated for the total probe pair numbers indicated on the graph. The x-axis indicates the total number of probe pairs considered starting from the top of the sorted probe pair list and the y-axis indicates the corresponding FDR for that set of probe pairs.

We chose a relatively moderate 20% FDR cutoff for a couple of reasons. First, it yielded a reasonable coverage of probes in the network. Second, we must take into account the fact that the current pathway annotation is not complete due to our limited understanding of biology, and the probe pairs falling into the false discovery class are in fact a mixture of false positives and true positives missing in the databases. We expect that the FDR threshold could be made stricter as pathway annotation becomes more complete. Using the 20% FDR cutoff, a network composed of 10 314 probes and 9675 edges was generated.

To further assess the significance of probe–probe association in the RGA network, random datasets of normal, uniform and gamma distributions, were simulated as described in Section 2. For each of the datasets, Pearson correlations were computed for probe pairs and FDR analysis was performed. As shown in Figure 2, networks constructed from random datasets have much higher FDR than RGA at any given numbers of probe pairs.

Another metric that quantifies functional relevance of probe pairs as their GO term overlap was also utilized to compare the networks (Lee et al., 2004). In this analysis, the number of shared GO annotation including parent terms was calculated for each probe pair. For better comparison, we used the same number of probe pairs from each of the random networks as in the real network (i.e. 9675 pairs). As depicted in Figure 3, the real network showed greater functional relevance than any of the random networks. In summary, these results demonstrated that our network construction procedure is effective in selecting biologically related probe pairs.

Fig. 3.

Probe pairs calculated from the real expression data show greater functional relevance. The functional relevance of the top 9675 probe pairs, which correspond to 20% FDR in the real network, from each of sorted probe pair lists, were evaluated by examining the overlap of probe pairs' Gene Ontology (GO) annotations. The x-axis indicates the number of overlapping GO terms and the y-axis is the cumulative probability. The x-axis is truncated at 220.

Fig. 3.

Probe pairs calculated from the real expression data show greater functional relevance. The functional relevance of the top 9675 probe pairs, which correspond to 20% FDR in the real network, from each of sorted probe pair lists, were evaluated by examining the overlap of probe pairs' Gene Ontology (GO) annotations. The x-axis indicates the number of overlapping GO terms and the y-axis is the cumulative probability. The x-axis is truncated at 220.

3.2 Partitioning of RGA network

To further exploit the co-regulatory networks, it is desirable to partition the network into smaller pieces, with the overall intent of identifying functional biological modules within the larger network. Many published clustering algorithms can identify densely connected sub-networks in a network (Bader and Hoque, 2003; Brun et al., 2004; King et al., 2004). Given that each probe pair can be assigned a weight based on the reliability score, we selected three algorithms that can make use of the weights. The algorithms include two graph partitioning algorithms (Graclus and Metis) and one clustering algorithm (hierarchical clustering with average linkage).

We considered two criteria when picking the number (k) of partitions. First, we would like to obtain sub-networks that are comparable in size to actual biological pathways. Second, the sub-network size should be feasible for use in the next section, MANOVA-based data analysis. After a few trials, we picked a partition number of 500, which would result in an average number of 20 nodes per partition if nodes are evenly distributed among partitions. After partitioning, we identified connected sub-networks to be evaluated further.

If the partitioning algorithms could identify functional modules in the network, we would expect that probes of similar functions (thus likely to be co-expressed) would be partitioned into the same sub-network. The GO Term Finder tool was utilized to determine whether sub-networks are enriched in probes with similar GO term annotations and to compare the suitability of the partitioning and clustering algorithms.

A plot of the best GO P-values for each of the sub-networks is shown in Figure 4. As can be seen from this figure, the Graclus algorithm outperformed the other two algorithms, in terms of both quality and quantity of significantly annotated sub-networks. The hierarchical clustering method yielded the worst performance, likely due to the unbalanced sub-network sizes.

Fig. 4.

Histogram of the best P-values of GO match for sub-networks. Sub-networks were annotated using the GO Term Finder tool and were assigned the P-value of the best GO term match. The distributions of the P-values were analyzed to select a partitioning algorithm that best captures the functional modules in the RGA network.

Fig. 4.

Histogram of the best P-values of GO match for sub-networks. Sub-networks were annotated using the GO Term Finder tool and were assigned the P-value of the best GO term match. The distributions of the P-values were analyzed to select a partitioning algorithm that best captures the functional modules in the RGA network.

The box plots of the connected sub-network sizes when k was set to 500 are shown in Figure 5. It can be observed that the graph partitioning algorithms yielded sub-networks that are more balanced in size. By contrast, the hierarchical clustering algorithm tends to produce a few big sub-networks and many singletons.

Fig. 5.

Box-plots of sub-network sizes generated by selected network partitioning algorithms with k = 500. The algorithms are shown on the x-axis, and the log(sub-network size) values are indicated on the y-axis.

Fig. 5.

Box-plots of sub-network sizes generated by selected network partitioning algorithms with k = 500. The algorithms are shown on the x-axis, and the log(sub-network size) values are indicated on the y-axis.

Accordingly, the final partitioning was obtained using the Graclus algorithm and a k value of 500. Among these 500 partitions, we identified 2112 connected sub-networks. The average size of sub-networks is 5 probes per sub-network and the largest two sub-networks have 38 probes each.

Inspection of these sub-networks revealed many well-known areas of biological regulation. These areas include steroid biosynthesis, neuronal development, antigen processing and presentation, cell polarity maintenance, cytoskeletal components, the cell cycle, RNA splicing machinery and many others. However, it is also worthwhile to recognize the limitations of the underlying experimental data and their effects on network structure when interpreting the network and its partitions. One critical feature of the RNA profiling data generated from the U-133A array is that although (however many thousand) probes are independently measured within a single sample, significantly fewer independent signals can realistically be resolved from these data. Probe design has led to the intentional (and sometimes unintentional) construction of many redundant probes for RNA transcripts. We estimate that 2358 transcripts represented on the U-133A array have more than one probe set measuring them (data not shown). Although not technically an artifact, networks and network partitions containing these redundant probes have fewer independent genetic components than would be implied by the network connectivity.

Lack of specificity of probe hybridization characteristics (cross-hybridization) is another limitation of Affymetrix technology. However, the impact of cross-hybridization on network structure seems to be small. The U-133A array design assists in the identification and interpretation of cross-hybridization artifacts by attaching ‘_s’ and ‘_x’ suffixes in the nomenclature of probes to indicate higher risk of cross-hybridization across isoforms and gene families for these probes. We found that the RGA network is only modestly enriched in probes capable of cross-hybridization—53% versus 47% on the U-133A array. This result suggests that, for the majority of the probes, the cross-hybridization impact at the probe set level is limited.

The impact of the above probe design considerations may affect the network construction and our assessment of network robustness in various ways. In the cases where probes within a given network partition are non-redundant, we can somewhat safely ascribe the network to the biology we are trying to discover. The highly co-regulated elements of neuronal development and signaling fall into this category. Probes in this sub-network represent genes include ASTN1, CA11, CDK5R1, CSPG3, CSPG5, CTNNA2, DIRAS2, DNM1, GABRB1, GPM6A, GRIA2, HMP19, HPCAL4, NTRK2, OMG, PPP2R2B, RAB3A, RPIP8, RTN1, RUFY3, SCG3, SLC6A1, SNAP25, SNPH, SV2B and TUBB4 (Fig. 6a). These genes share only minor sequence homology. Two unannotated probes are also present in the sub-network: 213484_at and 213841_at. Interestingly, both probes represent transcripts that are highly expressed in brain tissues (Unigene, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene).

Fig. 6.

Example sub-networks. (a) The neuronal development and signaling cluster: 213841_at, 213484_at, 213197_at (ASTN1), 209726_at (CA11), 204995_at (CDK5R1), 205143_at (CSPG3), 39966_at (CSPG5), 205373_at (CTNNA2), 219619_at (DIRAS2), 215116_s_at (DNM1), 207010_at (GABRB1), 209469_at (GPM6A), 209470_s_at (GPM6A), 205358_at (GRIA2), 218623_at (HMP19), 219671_at (HPCAL4), 214680_at (NTRK2), 207093_s_at (OMG), 213849_s_at (PPP2R2B), 204974_at (RAB3A), 206196_s_at (RPIP8), 203485_at (RTN1), 203724_s_at (RUFY3), 219196_at (SCG3), 205152_at (SLC6A1), 202508_s_at (SNAP25), 205104_at (SNPH), 205551_at (SV2B), 212664_at (TUBB4). (b) The ribosomal protein cluster: 211942_x_at (RPL13A), 200022_at (RPL18), 200869_at (RPL18A), 200013_at (RPL24), 200003_s_at (RPL28), 200823_x_at (RPL29), 213969_x_at (RPL29), 200963_x_at (RPL31), 200002_at (RPL35), 219762_s_at (RPL36), 201406_at (RPL36A), 200936_at (RPL8), 214167_s_at (RPLP0), 200018_at (RPS13), 208645_s_at (RPS14), 211487_x_at (RPS17), 212433_x_at (RPS2), 200024_at (RPS5), 200082_s_at (RPS7), 217747_s_at (RPS9), 200651_at (GNB2L1), 217831_s_at (NSFL1C), 201600_at (PHB2), 201052_s_at (PSMF1).

Fig. 6.

Example sub-networks. (a) The neuronal development and signaling cluster: 213841_at, 213484_at, 213197_at (ASTN1), 209726_at (CA11), 204995_at (CDK5R1), 205143_at (CSPG3), 39966_at (CSPG5), 205373_at (CTNNA2), 219619_at (DIRAS2), 215116_s_at (DNM1), 207010_at (GABRB1), 209469_at (GPM6A), 209470_s_at (GPM6A), 205358_at (GRIA2), 218623_at (HMP19), 219671_at (HPCAL4), 214680_at (NTRK2), 207093_s_at (OMG), 213849_s_at (PPP2R2B), 204974_at (RAB3A), 206196_s_at (RPIP8), 203485_at (RTN1), 203724_s_at (RUFY3), 219196_at (SCG3), 205152_at (SLC6A1), 202508_s_at (SNAP25), 205104_at (SNPH), 205551_at (SV2B), 212664_at (TUBB4). (b) The ribosomal protein cluster: 211942_x_at (RPL13A), 200022_at (RPL18), 200869_at (RPL18A), 200013_at (RPL24), 200003_s_at (RPL28), 200823_x_at (RPL29), 213969_x_at (RPL29), 200963_x_at (RPL31), 200002_at (RPL35), 219762_s_at (RPL36), 201406_at (RPL36A), 200936_at (RPL8), 214167_s_at (RPLP0), 200018_at (RPS13), 208645_s_at (RPS14), 211487_x_at (RPS17), 212433_x_at (RPS2), 200024_at (RPS5), 200082_s_at (RPS7), 217747_s_at (RPS9), 200651_at (GNB2L1), 217831_s_at (NSFL1C), 201600_at (PHB2), 201052_s_at (PSMF1).

In other cases, both RNA sequence and functional biological conservation are contained within probes in a sub-network. Such a sub-network is fundamentally confounded, and even when we can give the sub-network a high Gene Ontology overlap significance, we lack the resolution to identify whether or not the score is due to our capture of biology or the fact that we are measuring many fewer independent signals within the sub-network than the number of probes would imply. The high degree of both sequence and functional similarity in the ribosome proteins (RPL13A, RPL18, RPL18A, RPL24, RPL28, RPL29, RPL31, RPL35, RPL36, RPL36A, RPL8, RPLP0, RPS13, RPS14, RPS17, RPS2, RPS5, RPS7, RPS9) is one example of this behavior, and is well represented within a sub-graph of the network (Fig. 6b).

Finally, although details of biological specimen preparation not intended in study design (in vitro cell confluency, culture temperature variation, etc.) are still valid biological effects, other technical artifacts of sample processing, hybridization and data acquisition can interfere with the robustness of the network and network partitions. Some of these technical effects can be readily identified in the networks. For instance, a collection of control probes including BioB, BioC, BioD and Cre control oligonucleotides are spiked into the cRNA mixtures appear in the resultant network, connected only to each other. Similarly, 18S and 28S ribosomal RNA contaminants carried through the probe preparation process also appear as an isolated sub-network of the total network.

3.3 MANOVA-based sub-network significance testing

Gene networks have been used in the area of gene function inference and gene–gene interaction/regulation studies (Jordan et al., 2004; Stuart et al., 2003). Allocco et al. have demonstrated that two genes are more likely to share a common transcriptional regulator if they are strongly co-expressed (Allocco et al., 2004). Since our sub-networks are composed of co-expressed genes, these sub-networks may be treated as a special transcriptional ontology where each sub-network is specifically under the control of one or a combination of transcription factors. For any individual microarray study, if the significantly affected sub-networks can be identified analytically, it will greatly improve our understanding of the underlying biological mechanism.

Naturally, sub-networks can be treated as gene sets that can be fed into gene set analysis tools. To this date several gene set significance testing approaches have been developed including GO Term Finder, GSEA, PAGE and Hotelling's T-square test (Boyle et al., 2004; Kim and Volsky, 2005; Lu et al., 2005; Subramanian et al., 2005). Here, we explored the use of a new, MANOVA (multivariate analysis of variance) based approach for network/pathway analysis.

MANOVA is an extension of ANOVA (analysis of variance) to cover cases where there is more than one dependent variables. It is often used when a set of correlated dependent variables are present whereas a single, overall statistical test is desired. Hotelling's T-square test is a special case of MANOVA to deal with data of two groups. MANOVA enjoys the same advantages as Hotelling's T-square test (see Introduction section), but it can be applied to a wider range of experimental designs since MANOVA can deal with data of any group number. Naturally, we can treat each probe as a dependent variable, and then derive a single statistic based on MANOVA for the sub-network as a unit. This approach is well-suited for our purposes because the sub-networks are consisted of co-expressed probes and MANOVA are especially designed to deal with correlated dependent variables.

We tested our MANOVA model using two published datasets. The first one is the study of HIV protease inhibitors including antazanavir, nelfinavir and ritonavir (Parker et al., 2005). It is well known that patients taking protease inhibitor drugs to treat HIV-AIDS often develop a lipodystrophy-like syndrome such as hyperlipidermia, peripheral lipoatrophy and central fat accumulation (Calza et al., 2004). Parker et al. (Parker et al., 2005), used Affymetrix chip technology to survey gene expression changes after drug treatment. Their analysis indicated that protease inhibitors could induce gene expression changes indicative of dysregulation of lipid metabolism, endoplasmic reticulum (ER) stress and metabolic disturbance. This result is consistent with the clinical observations and provides evidence for a molecular mechanism for the pathophysiology of protease inhibitor-induced lipodystrophy.

The top 15 scoring sub-networks with respect to treatment effect are listed in Table 1. The model P-values give a relative measurement of the significance of the treatment. One can immediately note that this list includes all the major targets of the HIV protease inhibitors including lipid metabolism, amino acid metabolism, gluconeogenesis, cellular respiration and endoplasmic reticulum.

Table 1.

Top 15 scoring sub-networks for the HIV data

Cluster IDMANOVA P-valueSignificant GO annotation
1186 1.67E−09 Regulation of glutamate-cysteine ligase activity 
495 4.08E−09 Aminosugar metabolic process, N-acetylglucosamine 6-O-sulfotransferase activity 
1197 5.26E−08 Cellular respiration, TCA cycle, mitochondrion 
812 5.72E−08 No significant annotation 
164 7.92E−08 Aromatic amino acid transport, antiporter activity 
1520 8.10E−08 6-phosphofructokinase complex 
547 8.95E−08 Glycolysis, energy derivation by oxidation of organic compounds 
1302 2.82E−07 Glucose transport, integral to membrane 
1272 3.51E−07 Amino acid biosynthetic/metabolic process 
1862 5.67E−07 Amine metabolic process 
568 5.96E−07 Steroid metabolic process, cholesterol homeostasis, xenobitic metabolic process 
1109 1.14E−06 Kinase activity 
1707 2.61E−06 Protein targeting to ER, endoplasmic reticulum 
1832 2.98E−06 Cell growth, phosphatase activity 
291 3.64E−06 Response to chemical stimulus, response to drug 
Cluster IDMANOVA P-valueSignificant GO annotation
1186 1.67E−09 Regulation of glutamate-cysteine ligase activity 
495 4.08E−09 Aminosugar metabolic process, N-acetylglucosamine 6-O-sulfotransferase activity 
1197 5.26E−08 Cellular respiration, TCA cycle, mitochondrion 
812 5.72E−08 No significant annotation 
164 7.92E−08 Aromatic amino acid transport, antiporter activity 
1520 8.10E−08 6-phosphofructokinase complex 
547 8.95E−08 Glycolysis, energy derivation by oxidation of organic compounds 
1302 2.82E−07 Glucose transport, integral to membrane 
1272 3.51E−07 Amino acid biosynthetic/metabolic process 
1862 5.67E−07 Amine metabolic process 
568 5.96E−07 Steroid metabolic process, cholesterol homeostasis, xenobitic metabolic process 
1109 1.14E−06 Kinase activity 
1707 2.61E−06 Protein targeting to ER, endoplasmic reticulum 
1832 2.98E−06 Cell growth, phosphatase activity 
291 3.64E−06 Response to chemical stimulus, response to drug 

Sub-networks are annotated with the top GO term hits in ‘Molecular Function’, ‘Biological Process’ and ‘Cellular Component’ ontologies.

Table 1.

Top 15 scoring sub-networks for the HIV data

Cluster IDMANOVA P-valueSignificant GO annotation
1186 1.67E−09 Regulation of glutamate-cysteine ligase activity 
495 4.08E−09 Aminosugar metabolic process, N-acetylglucosamine 6-O-sulfotransferase activity 
1197 5.26E−08 Cellular respiration, TCA cycle, mitochondrion 
812 5.72E−08 No significant annotation 
164 7.92E−08 Aromatic amino acid transport, antiporter activity 
1520 8.10E−08 6-phosphofructokinase complex 
547 8.95E−08 Glycolysis, energy derivation by oxidation of organic compounds 
1302 2.82E−07 Glucose transport, integral to membrane 
1272 3.51E−07 Amino acid biosynthetic/metabolic process 
1862 5.67E−07 Amine metabolic process 
568 5.96E−07 Steroid metabolic process, cholesterol homeostasis, xenobitic metabolic process 
1109 1.14E−06 Kinase activity 
1707 2.61E−06 Protein targeting to ER, endoplasmic reticulum 
1832 2.98E−06 Cell growth, phosphatase activity 
291 3.64E−06 Response to chemical stimulus, response to drug 
Cluster IDMANOVA P-valueSignificant GO annotation
1186 1.67E−09 Regulation of glutamate-cysteine ligase activity 
495 4.08E−09 Aminosugar metabolic process, N-acetylglucosamine 6-O-sulfotransferase activity 
1197 5.26E−08 Cellular respiration, TCA cycle, mitochondrion 
812 5.72E−08 No significant annotation 
164 7.92E−08 Aromatic amino acid transport, antiporter activity 
1520 8.10E−08 6-phosphofructokinase complex 
547 8.95E−08 Glycolysis, energy derivation by oxidation of organic compounds 
1302 2.82E−07 Glucose transport, integral to membrane 
1272 3.51E−07 Amino acid biosynthetic/metabolic process 
1862 5.67E−07 Amine metabolic process 
568 5.96E−07 Steroid metabolic process, cholesterol homeostasis, xenobitic metabolic process 
1109 1.14E−06 Kinase activity 
1707 2.61E−06 Protein targeting to ER, endoplasmic reticulum 
1832 2.98E−06 Cell growth, phosphatase activity 
291 3.64E−06 Response to chemical stimulus, response to drug 

Sub-networks are annotated with the top GO term hits in ‘Molecular Function’, ‘Biological Process’ and ‘Cellular Component’ ontologies.

The second dataset is from the study on the effects of cigarette smoking on human airway epithelial cell transcriptome (Spira et al., 2004). It is well established that cigarette smoking is the leading cause of lung cancer and other serious pulmonary diseases such as chronic obstructive pulmonary disease (COPD). Spira and colleagues analyzed three groups of samples from current smokers, previous smokers and non-smokers. Their results indicated that cigarette smoking could lead to expression changes in genes involved in xenobiotic metabolism, antioxidation, inflammation, cell adhesion function and oncogenesis, which are consistent with previous studies using different model system or experimental design (Gebel et al., 2004; Hackett et al., 2003). The top 15 sub-networks identified by the MANOVA approach are listed in Table 2. Xenobiotic metabolism, cytoskeleton elements, amino acid and lipid metabolisms, and protein kinase and phosphatase activities are found to be significantly differentiated among study groups. In addition, several immune system-related responses are also present in the list, which are presumably due to the inflammation response caused by smoking. Again, the MANOVA approach was able to identify all the major effects caused by cigarette smoking.

Table 2.

Top 15 scoring sub-networks for the cigarette smoking data

Cluster IDMANOVA P-valueSignificant GO annotation
1770 1.69E−12 Response to extracellular stimulus, cell ion homeostasis 
93 4.17E−10 Epithelial cell differentiation, phospholipid binding 
261 8.39E−10 Microtubule associated complex, GKAP/Homer scaffold activity 
495 4.08E−09 Amino sugar metabolic process, N-acetylglucosamine 6-O-sulfotransferase activity 
2062 1.42E−08 Protein phosphatase binding, protein kinase CK2 activity 
1123 4.84E−08 Acute-phase response 
1369 7.04E−08 Fibril organization and biogenesis, extracellular matrix structural constituent 
1272 3.37E−07 Amino acid biosynthetic/metabolic process 
198 3.91E−07 Nitrogen compound catabolic process, organic acid metabolic process 
77 3.93E−07 Xenobitic metabolic process 
697 4.50E−07 Leukocyte activation, immune response 
1967 5.54E−07 Phosphaditic acid metabolic process 
568 5.96E−07 Steroid metabolic process, cholesterol homeostasis, xenobiotic metabolic process 
434 1.13E−06 Myosin complex, actin cytoskeleton 
182 2.31E−06 Ciliary or flagellar motility, microtubule-based process 
Cluster IDMANOVA P-valueSignificant GO annotation
1770 1.69E−12 Response to extracellular stimulus, cell ion homeostasis 
93 4.17E−10 Epithelial cell differentiation, phospholipid binding 
261 8.39E−10 Microtubule associated complex, GKAP/Homer scaffold activity 
495 4.08E−09 Amino sugar metabolic process, N-acetylglucosamine 6-O-sulfotransferase activity 
2062 1.42E−08 Protein phosphatase binding, protein kinase CK2 activity 
1123 4.84E−08 Acute-phase response 
1369 7.04E−08 Fibril organization and biogenesis, extracellular matrix structural constituent 
1272 3.37E−07 Amino acid biosynthetic/metabolic process 
198 3.91E−07 Nitrogen compound catabolic process, organic acid metabolic process 
77 3.93E−07 Xenobitic metabolic process 
697 4.50E−07 Leukocyte activation, immune response 
1967 5.54E−07 Phosphaditic acid metabolic process 
568 5.96E−07 Steroid metabolic process, cholesterol homeostasis, xenobiotic metabolic process 
434 1.13E−06 Myosin complex, actin cytoskeleton 
182 2.31E−06 Ciliary or flagellar motility, microtubule-based process 

Sub-networks are annotated with the top GO term hits in ‘Molecular Function’, ‘Biological Process’ and ‘Cellular Component’ ontologies.

Table 2.

Top 15 scoring sub-networks for the cigarette smoking data

Cluster IDMANOVA P-valueSignificant GO annotation
1770 1.69E−12 Response to extracellular stimulus, cell ion homeostasis 
93 4.17E−10 Epithelial cell differentiation, phospholipid binding 
261 8.39E−10 Microtubule associated complex, GKAP/Homer scaffold activity 
495 4.08E−09 Amino sugar metabolic process, N-acetylglucosamine 6-O-sulfotransferase activity 
2062 1.42E−08 Protein phosphatase binding, protein kinase CK2 activity 
1123 4.84E−08 Acute-phase response 
1369 7.04E−08 Fibril organization and biogenesis, extracellular matrix structural constituent 
1272 3.37E−07 Amino acid biosynthetic/metabolic process 
198 3.91E−07 Nitrogen compound catabolic process, organic acid metabolic process 
77 3.93E−07 Xenobitic metabolic process 
697 4.50E−07 Leukocyte activation, immune response 
1967 5.54E−07 Phosphaditic acid metabolic process 
568 5.96E−07 Steroid metabolic process, cholesterol homeostasis, xenobiotic metabolic process 
434 1.13E−06 Myosin complex, actin cytoskeleton 
182 2.31E−06 Ciliary or flagellar motility, microtubule-based process 
Cluster IDMANOVA P-valueSignificant GO annotation
1770 1.69E−12 Response to extracellular stimulus, cell ion homeostasis 
93 4.17E−10 Epithelial cell differentiation, phospholipid binding 
261 8.39E−10 Microtubule associated complex, GKAP/Homer scaffold activity 
495 4.08E−09 Amino sugar metabolic process, N-acetylglucosamine 6-O-sulfotransferase activity 
2062 1.42E−08 Protein phosphatase binding, protein kinase CK2 activity 
1123 4.84E−08 Acute-phase response 
1369 7.04E−08 Fibril organization and biogenesis, extracellular matrix structural constituent 
1272 3.37E−07 Amino acid biosynthetic/metabolic process 
198 3.91E−07 Nitrogen compound catabolic process, organic acid metabolic process 
77 3.93E−07 Xenobitic metabolic process 
697 4.50E−07 Leukocyte activation, immune response 
1967 5.54E−07 Phosphaditic acid metabolic process 
568 5.96E−07 Steroid metabolic process, cholesterol homeostasis, xenobiotic metabolic process 
434 1.13E−06 Myosin complex, actin cytoskeleton 
182 2.31E−06 Ciliary or flagellar motility, microtubule-based process 

Sub-networks are annotated with the top GO term hits in ‘Molecular Function’, ‘Biological Process’ and ‘Cellular Component’ ontologies.

To compare this approach to a typical analysis flow, i.e. single gene analysis followed by a gene set analysis, we analyzed the two datasets using Ingenuity IPA tool (www.ingenuity.com), which uses the Fisher exact test to calculate gene set enrichment scores. For the HIV dataset, a list of 579 affected probes (10% FDR) identified through single gene ANOVA were analyzed by IPA. While lipid and amino acid metabolism were picked up as the main affected pathways, ER stress response and gluoconeogenesis were not identified (data not shown). For the cigarette smoking dataset, a list of 1190 affected probes (10% FDR) was analyzed. IPA identified xenobiotic metabolism, cytoskeleton structures and electron transport as the main affected targets but missed inflammation response completely.

The gained sensitivity of our approach may be due to two factors. First, no ranked gene list with an artificial cutoff was used. Therefore, genes having modest changes may be considered together to reveal significant group effect. Second, instead of an enrichment score that does not reflect the actual expression data, our approach attempts to find the best separation between treatments. Given that parametric method is generally considered more sensitive than its non-parametric counterpart, it was expected for our method to generate results that better fit the biological hypothesis.

However, cautions should also be taken when applying MANOVA and related Hotelling's T-square models. There are three assumptions when applying MANOVA. First, the dependent variables should be normally distributed. In general, the more samples in a project, the more likely the assumption is met. Second, the relationships among dependent variables are linear. Since our calculation of co-expression is based on linear correlation, it is likely that genes of the same sub-network follow a linear relationship. However, deviations from linear relationship will compromise the power of the test. Third, variances and co-variances of dependent variables should be homogeneous across the ranges of independent variables. Several tests may be applied to check the validity of this assumption (Tabachnick and Fidell, 1996).

In addition to the aforementioned assumptions, MANOVA is also sensitive to outliers, which may lead to type I or type II errors. There are tests available to check for univariate and multivariate outliers. Finally, MANOVA analysis can only be employed when the total number of samples is larger than the size of the gene set being tested. This is because one degree of freedom is lost for each dependent variable added (Bray and Maxwell, 1985). In another word, large gene sets cannot be tested by MANOVA or Hotelling's T-square for projects involving small numbers of samples. In fact, this is one of the criteria we used to pick the partition number k so that the sizes of the resulting sub-networks are feasible for MANOVA based hypothesis testing for most projects, which typically have 10 samples or more.

4 CONCLUSION

In this article, we showed how reciprocal rankings and FDR analysis can be applied to integrate signal from independently conducted microarray experiments into a meta-network, which we named RGA network. We used three different algorithms to identify densely connected sub-networks from the network. Examination of the sub-networks indicated that they may represent various units of biological processes. The homogeneity (gene function) of these sub-networks fits well with the assumption of MANOVA. We developed a MANOVA-based approach which allows interpreting expression profiling data at the gene set level. We conducted a thorough evaluation of our approach by applying it to two published studies. Although in this article, we limited our analysis to co-expression gene sub-networks, we strongly believe that our MANOVA model is expandable to other groups of related genes (e.g. a gene list containing genes mapped to the same chromosomal location). We expect this approach would serve as a novel method for analyzing transcriptional and proteomic profiling data at the gene set level.

ACKNOWLEDGEMENTS

The authors of this paper would like to thank Curtis Huttenhower and Hilary Coller for valuable discussion and review of the manuscript. Authors D.U. and S.P. were supported by the DOE Early Career Principal Investigator Award No. DE-FG02-04ER25611 and NSF CAREER Grant IIS-0347662.

Conflict of Interest: none declared.

REFERENCES

Allocco
DJ
, et al. 
Quantifying the relationship between co-expression,co-regulation and gene function
BMC Bioinformatics
2004
, vol. 
5
 pg. 
18
 
Ashburner
M
Lewis
SE
On ontologies for biologists: the Gene Ontology – uncoupling the web
Novartis Found. Symp
2002
, vol. 
247
 (pg. 
66
-
80
)
Bader
GD
Hogue
CW
An automated method for finding molecular complexes in large protein interaction networks
BMC Bioinformatics
2003
, vol. 
4
 pg. 
2
 
Bergman
S
, et al. 
Similarities and difference in genome-wide expression data of six organisms
PLoS Biol
2004
, vol. 
2
 (pg. 
85
-
93
)
Bolstad
BM
, et al. 
A comparison of normalization methods for high density oligonucleotide array data based on bias and variance
Bioinformatics
2003
, vol. 
19
 (pg. 
185
-
193
)
Boyle
EI
, et al. 
GO::TermFinder–open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes
Bioinformatics
2004
, vol. 
20
 (pg. 
3710
-
3715
)
Bray
JH
Maxwell
SE
Multivariate analysis of variance
Quantitative Applications in the Social Sciences Series. No. 54
1985
Thousand Oaks, CA
Sage Publications
Brun
C
, et al. 
Clustering proteins from interaction networks for the prediction of cellular functions
BMC Bioinformatics
2004
, vol. 
5
 pg. 
95
 
Calza
L
, et al. 
Dyslipidaemia associated with antiretroviral therapy in HIV-infected patients
J. Antimicrob. Chemother
2004
, vol. 
53
 (pg. 
10
-
14
)
Dhillon
I
, et al. 
A fast kernel-based multilevel algorithm for graph clustering
In Proceedings of the 11th ACM SIGKDD, Chicago, IL, August 21 - 24, pp.
2005
(pg. 
629
-
634
)
Eisen
MB
, et al. 
Cluster analysis and display of genome-wide expression patterns
Proc. Natl Acad. Sci. USA
1998
, vol. 
95
 (pg. 
14863
-
14868
)
Gebel
S
, et al. 
Gene expression profiling in respiratory tissues from rats exposed to mainstream cigarette smoke
Carcinogenesis
2004
, vol. 
25
 (pg. 
169
-
178
)
Goldenberg
A
Moore
A
Tractable learning of large Bayes net structures from sparse data
In Proceedings of the 21st ICML
2004
Golub
TR
, et al. 
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring
Science
1999
, vol. 
286
 (pg. 
531
-
537
)
Hackett
NR
, et al. 
Variability of antioxidant-related gene expression in the airway epithelium of cigarette smokers
Am. J. Respir. Cell Mol. Biol
2003
, vol. 
29
 (pg. 
331
-
43
)
Jordan
IK
, et al. 
Conservation and coevolution in the scale-free human gene coexpression network
Mol. Biol. Evol
2004
, vol. 
21
 (pg. 
2058
-
2070
)
Kim
SY
Volsky
DJ
PAGE: parametric analysis of gene set enrichment
BMC Bioinformatics
2005
, vol. 
6
 pg. 
144
 
King
AD
, et al. 
Protein complex prediction via cost-based clustering
Bioinformatics
2004
, vol. 
20
 (pg. 
3013
-
3020
)
Lee
HK
, et al. 
Coexpression analysis of human genes across many microarray data sets
Genome Res
2004
, vol. 
14
 (pg. 
1085
-
1094
)
Lu
Y
, et al. 
Hotelling's T2 multivariate profiling for detecting differential expression in microarrays
Bioinformatics
2005
, vol. 
21
 (pg. 
3105
-
3113
)
Mao
L
Resat
H
Probabilistic representation of gene regulatory networks
Bioinformatics
2004
, vol. 
20
 (pg. 
2258
-
2269
)
Mehra
S
, et al. 
A Boolean algorithm for reconstructing the structure of regulatory networks
Metabolic Eng
2004
, vol. 
6
 (pg. 
326
-
339
)
Quackenbush
J
Computational analysis of microarray data
Nat. Rev. Genet
2001
, vol. 
2
 (pg. 
418
-
427
)
Quackenbush
J
Microarray analysis and tumor classification
N. Engl. J. Med
2006
, vol. 
354
 (pg. 
2463
-
2472
)
Parker
RA
, et al. 
Endoplasmic reticulum stress links dyslipidemia to inhibition of proteasome activity and glucose transport by HIV protease inhibitors
Mol. Pharmacol
2005
, vol. 
67
 (pg. 
1909
-
1919
)
Pena
JM
, et al. 
Growing Bayesian network models of gene networks from seed genes
Bioinformatics
2005
, vol. 
21
 (pg. 
224
-
229
)
Rice
JJ
, et al. 
Reconstructing biological networks using conditional correlation analysis
Bioinformatics
2005
, vol. 
21
 (pg. 
765
-
773
)
Spira
A
, et al. 
Effects of cigarette smoke on the human airway epithelial cell transcriptome
Proc. Natl Acad. Sci. USA
2004
, vol. 
101
 (pg. 
10143
-
10148
)
Subramanian
A
, et al. 
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles
Proc. Natl Acad. Sci. USA
2005
, vol. 
102
 (pg. 
15545
-
15550
)
Stuart
J
, et al. 
A gene coexpression network for global discovery of conserved genetic modules
Sci. Express
2003
, vol. 
302
 (pg. 
249
-
255
)
Tabachnick
BG
Fidell
LS
Using Multivariate Statistics
1996
New York
Harper Collins College Publishers
Tamayo
P
, et al. 
Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation
Proc. Natl Acad. Sci. USA
1999
, vol. 
96
 (pg. 
2907
-
2912
)
van Noort
V
, et al. 
The yeast co-expression network has a small-world, scale-free architecture and can be explained by a simple model
EMBO Re
2004
, vol. 
5
 (pg. 
280
-
284
)
Vinciotti
V
, et al. 
Exploiting the full power of temporal gene expression profiling through a new statistical test: application to the analysis of muscular dystrophy data
BMC Bioinformatics
2006
, vol. 
7
 (pg. 
183
-
194
)
Xiong
H
Non-linear tests for identifying differentially expressed genes or genetic networks
Bioinformatics
2006
, vol. 
22
 (pg. 
919
-
923
)

Author notes

Associate Editor: John Quackenbush

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data