Gene Coexpression Network Alignment and Conservation of Gene Modules between Two Grass Species: Maize and Rice

One major objective for plant biology is the discovery of molecular subsystems underlying complex traits. The use of genetic and genomic resources combined in a systems genetics approach offers a means for approaching this goal. This study describes a maize ( Zea mays ) gene coexpression network built from publicly available expression arrays. The maize network consisted of 2,071 loci that were divided into 34 distinct modules that contained 1,928 enriched functional annotation terms and 35 cofunctional gene clusters. Of note, 391 maize genes of unknown function were found to be coexpressed within modules along with genes of known function. A global network alignment was made between this maize network and a previously described rice ( Oryza sativa ) coexpression network. The IsoRankN tool was used, which incorporates both gene homology and network topology for the alignment. A total of 1,173 aligned loci were detected between the two grass networks, which condensed into 154 conserved subgraphs that preserved 4,758 coexpression edges in rice and 6,105 coexpression edges in maize. This study provides an early view into maize coexpression space and provides an initial network-based framework for the translation of functional genomic and genetic information between these two vital agricultural species. known phenotypic associations in rice with possible translation to maize.

The combination of genomics, genetics, and systemslevel computational methods provides a powerful approach toward insight into complex biological systems. Of particular significance is the discovery of genetic interactions that lead to desirable agricultural and economic traits in the Poaceae family (grasses). The Poaceae includes valuable crops such as rice (Oryza sativa), maize (Zea mays), wheat (Triticum spp.), and sugarcane (Saccharum officinarum), which are globally some of the most agriculturally and economically important crops (FAOSTAT, 2007). Understanding complex interactions underlying agronomic traits within these species, therefore, is of great significance, in particular to help with crop improvements to meet the challenges of plant and human health but also for basic understanding of complex biological systems.
In addition to their pivotal role in agriculture, grasses offer a powerful model system in that their genomes are closely conserved and functional genomic knowledge gained in one species can be hypothesized to occur in another syntenic region (translational func-tional genomics; Paterson et al., 2009). In cases of grass species with poorly resolved, polyploid genomes such as sugarcane, where genomic resources are not as far progressed as in other grasses (e.g. rice, sorghum [Sorghum bicolor], maize, etc.), translational functional genomics methods may be the most cost-effective strategy for crop improvement as well as for unraveling the functional consequences of polyploidy. Additionally, crops rich in genetically mapped loci deposited in sites like Gramene (Jaiswal, 2011) provide a rich source of systems genetic hypotheses that could in principle accelerate the translation of interacting gene sets associated with complex traits into grasses with poor genetic resources (Ayroles et al., 2009;Wang et al., 2010).
One method of identifying interacting gene sets is through the construction of a gene coexpression network, which is constructed through the discovery of nonrandom gene-gene expression dependencies measured across multiple transcriptome perturbations, often derived from a collection of microarray data sets. During coexpression network construction, the tendency of m transcripts to exhibit similar (or not) expression patterns across a set of n microarrays is determined. In the case where dependency is determined via a correlation metric (e.g. Pearson's r), a comprehensive m 3 m matrix of correlation values is generated, which represents expression similarity. The "similarity matrix" is then thresholded to form an "adjacency matrix," which represents an undirected graph where edges (coexpression) exist between two nodes (transcripts) when a correlation value in the matrix is above the significance threshold. Computational methods are then applied to circumscribe groups of network nodes that are highly connected (coexpressed gene "modules"; Langfelder and Horvath, 2008;Li and Horvath, 2009;Chang et al., 2010;Rivera et al., 2010;Xu et al., 2010). It has been shown that genes in these modules participate in similar biological processes; therefore, guilt-by-association inferences can be applied to module genes with no known function that are connected to module genes of known function (Wolfe et al., 2005;Aoki et al., 2007).
Global coexpression networks are those that incorporate expression data from a variety of tissues, developmental stages, and environmental conditions into a single network, the goal being to capture stable coexpression relationships across a diverse collection of experimental perturbations. Global gene coexpression networks maintain similar properties as other naturally occurring networks, such as human social networks and protein-protein interaction networks. These networks tend to be scale free, small world, modular, and hierarchical (Ravasz et al., 2002;Barabási and Oltvai, 2004). Detailed descriptions of these properties can be found in the report by Barabási and Oltvai (2004).
Given the recent and rapid increase of available biological networks, an important method is the identification of common patterns of connectivity between two networks. Internetwork comparisons are used for several purposes, including improved identification of functional orthologs between species (Bandyopadhyay et al., 2006) and identification of evolutionarily conserved subgraphs, or sets of highly connected genes that demonstrate conserved function (Stuart et al., 2003). Several different network comparison methods exist that perform either local or global comparisons. Local network alignments attempt to align small subsets of nodes between multiple networks, whereas global network alignments attempt to find the best alignment of all nodes in one network with another (Singh et al., 2008). Various heuristics exist for global alignment of two or more networks, and typically these methods first use homology to prioritize the alignment of nodes and then incorporate a measure of topology to refine alignments (Hu et al., 2005;Flannick et al., 2009;Kalaev et al., 2009;Liao et al., 2009;Zaslavskiy et al., 2009;Chindelevitch et al., 2010). Some methods strictly use topology to guide alignments (Kuchaiev et al., 2010), given that network motifs are often conserved in functionally related systems Shen-Orr et al., 2002). The majority of network alignment methods have been used to align protein-protein interaction networks, whereas one method has recently been published for the alignment of gene coexpression networks (Zarrineh et al., 2011).
This study adds to the growing compendium of systems-level knowledge for plants by first describing a maize gene coexpression network, and then through a global network alignment with a rice coexpression network (Ficklin et al., 2010) we identified common subgraphs of coexpressed gene sets between the two grass species. For network alignment, we applied a tool, IsoRankN (Liao et al., 2009), which incorporates both gene homology and network topology in its alignment algorithm. The use of homology contributes conservation of sequence, and topology contributes conservation of coexpression-both of which are associated with functional relatedness. We describe the discovery of multiple sets of modules between rice and maize that are both enriched for similar functional terms and that are potentially evolutionarily conserved between the two grasses. This functional similarity between modules in maize and rice seems to agree with the idea that function may be translated through the aligned nodes of two networks. This may serve as a method for identifying functional modules in other grass species. Phenotypic associations available in the rice network may also provide an initial glimpse at the possibilities of translational systems genetics from rice to maize and other cereals. In practice, this method may assist with the prioritization of genes for future mutational studies.

Maize Coexpression Network Construction
The maize coexpression network was constructed using 253 Affymetrix Maize GeneChip Genome Array microarray samples obtained from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) repository. A listing of these array accessions and the experimental conditions under which the transcriptome was measured can be found in Supplemental Table S1. Construction of the maize network was performed using the same method as published previously for rice (Ficklin et al., 2010). Maize microarray data sets were Robust Multichip Average (RMA) normalized (Irizarry et al., 2003), and 40 outlier arrays were removed using the R/array-QualityMetrics package (Kauffmann et al., 2009). Upon inspection, these outliers seemed to be a result of low-quality hybridizations or nonstandard experimental conditions and did not appear to derive from a common biological system. Next, all pairwise gene expression correlations were determined (Pearson's r). The resulting correlation (similarity) matrix was used as input into both the Weighted Correlation Network Analysis (WGCNA) soft-threshold (Langfelder and Horvath, 2008) and Random Matrix Theory (RMT) hard-threshold (Luo et al., 2007) methods for network construction. The WGCNA method identified a power of 6 to power raise the similarity matrix and later divided the network into 34 distinct gene modules, whereas 45 modules were detected for rice ( Table I).
The relationship between maize modules in terms of similarity of expression is shown in Supplemental Figure S1. The RMT method provided a hard-threshold cutoff value of 0.5781 for the WGCNA power-raised matrix. This is the point where the nearest-neighbor spacing distribution within the network transitions from what would appear as random noise to nonrandom signal (x 2 ; P . 0.001). The final maize network consisted of 31,983 edges between 2,708 probe sets (2,071 gene models), which corresponds to 15.4% of the original probe sets on the array (Table I). A global view of the maize coexpression network can be seen in Figure 1, where individual modules are distinctly colored. A detailed list of edges for the maize network can be found in Supplemental Table S2. The maize network is available online, along with the previously described rice network, for browsing and searching at http:// www.clemson.edu/genenetwork. Network properties, such as node degree and clustering coefficient distributions, can be found in Supplemental Figure S2.

Functional Enrichment and Clustering of Coexpressed Maize Gene Modules
Functional enrichment was performed for each of the 34 modules identified by WGCNA using annotation terms from Gene Ontology (GO; Ashburner et al., 2000), InterPro (Apweiler et al., 2001), and Kyoto Encyclopedia of Genes and Genomes (KEGG; Kanehisa et al., 2008) using an in-house method similar to the tool DAVID (for Database for Annotation, Visualization, and Integrated Discovery; Dennis et al., 2003;Huang et al., 2009). A total of 1,928 unique annotation terms were found to be enriched in the maize modules (Fisher's exact test; P , 0.1). Cofunctional clusters, or subsets of nodes within a module that share enriched functional annotation, were identified using the DAVID approach. The identified clusters were sorted first by average connectivity (,k.) and second by enrichment score (e-score), the geometric mean of enrichment P value. A total of 35 cofunctional gene clusters identified from 596 enriched terms were found within 10 modules. Detailed lists of probe sets and genomic loci within modules and cofunctional clusters, as well as enriched annotation terms, can be found in Supplemental Tables S3 to S6. A total of 383 maize loci are represented in the network with no known functional annotation (Supplemental Table S7). Of these, approximately 50%, or 193 of the 391 genes of unknown function, have 3,092 coexpressed edges with genes in cofunctional clusters (Supplemental Table S8). Therefore, it may be possible to infer function for these loci using the principle of guilt by association.
Interestingly, cofunctional clusters ordered first by ,k. in both the maize and rice networks seem quite similar. A list of the top-10 ordered clusters in both networks can be seen in Table II. For example, the highest ordered maize cluster by ,k. was enriched for functional terms related to the ribosome and translation (module ZmM6C25: ,k. = 20.2, e-score = 2.62). Similarly, the corresponding rice cluster was also enriched for terms related to the ribosome and translation (module OsM6C25: ,k. = 17.0, e-score = 1.98). The second highest maize cluster was enriched for seed storage activity (ZmM5C1: ,k. = 10.0, e-score = 3.38), as was the third highest rice cluster (OsM13C1: ,k. = 12.9, e-score = 12.22). Additional top-10 ordered clusters with similar annotation terms in both maize and rice, although not in the same order, include clusters enriched for photosynthesis, glycolysis, and microtubule activity.

Rice and Maize Coexpression Network Alignment
Using the constructed maize network, a comparison with the existing rice network was performed with the goal of identifying evolutionarily conserved coexpression patterns. A comparative summary of the statistics for both the rice and maize networks can be seen in Table I. To allow for direct comparison of various network alignment methods, the probe set-based networks were first condensed into a locus-based network that contained 2,071 loci for maize and 2,257 loci for rice (Supplemental Tables S9 and S10). IsoRankN (Liao et al., 2009) was used to perform global alignments between the maize and rice networks. To help clarify the meaning of the various modules, clusters, and subgraphs constructed with this analysis, we provide definitions as well as naming conventions (Table III).
IsoRankN provides three input parameters that affect the results of the network alignments. These parameters include an iteration parameter, a threshold parameter, and an a value. Documentation for IsoRankN indicates that the iteration parameter should vary between 10 and 30, the threshold between 1e-3 and 1e-5, and the a value between 0 and 1. The a value controls the contribution of homology and topology; a value of 0 would strictly use homology for alignments, whereas 1 would strictly use topology. A value of 0.5 would weight equally the contributions of both homology and topology. Therefore, to identify an adequate set of parameters for IsoRankN for alignment of the rice and maize networks, these parameters were varied from one extreme to the other within the suggested documented ranges. In total, 189 tests were performed. To measure the change of the biological signal caused by varying these parameters, we used k statistics to provide a measure of functional similarity between conserved subgraphs. Functional enrichment was performed for each conserved subgraph in both maize and rice. The similarity of terms enriched in corresponding conserved subgraphs of maize and rice is measured by the k score, where a value greater than 0 indicates that the conserved subgraph in rice is similar, more than could be expected by chance, to the corresponding subgraph in maize. A value of 1 indicates that the two are identical in terms of enriched terms. A plot of average k scores and subgraph counts across 20 a values, for an iteration value of 30 and threshold value of 1e-4, is shown in Figure 2. Aside from the extreme a values near 0 and 1, the functional similarity of conserved subgraphs is relatively consistent across the a values. The graph in Figure 2 was effectively identical for each combination of iteration and threshold we tested. This similarity indicates that convergence of the alignment occurs at low stringency and that selection of almost any parameter blend for those we selected for testing would be effective. The k score (or functional similarity) of the subgraphs in rice and maize at an a value of 0 is very high; however, the number of conserved subgraphs at that value is very low. The opposite is true for an a value of 1. It seemed that most parameter sets, aside from the extreme a values, would generate an adequate set of subgraphs with a reasonably high average similarity (average k), so we selected conserved subgraphs derived from alignments from IsoRankN using an a value of 0.8, an iteration value of 30, and a threshold of 1e-4, because this particular combination of parameters seemed to provide the highest average k. Because average k score and subgraph count were very similar across all parameter variations, we only present here a single representative result set. Using these parameters, we detected 1,173 aligned loci, which were later connected into 154 conserved subgraphs. These subgraphs preserved 4,758 edges in rice and 6,105 edges in maize (Supplemental Tables S11 and S12). Functional enrichment and clustering, identical to that performed for the network modules, was performed for these subgraphs as well. The cofunctional clusters of these conserved subgraphs can be found in Supplemental Tables S13 and S14.

Common Function in Conserved Maize-Rice Subgraphs
For the IsoRankN alignments, functional enrichment and k analysis of the conserved subgraphs yielded nine subgraphs with a perfect k score of 1, indicating an identical set of enriched terms. These include subgraphs enriched for early nodulin 93 proteins, hydrolase activities, DNA binding, peptidase, Clusters within a module are numbered sequentially and are prefixed with the letter C. Modules and clusters are prefixed with a species abbreviation: Os for rice and Zm for maize. Thus, cluster 1 from module 8 in rice is named OsM8C1. b ,k. is the average connectivity of the nodes in the cluster. c E-score is the enrichment score, or geometric mean of the Fisher's test enrichment P values of the cluster. Conserved subgraph A subgraph that is present in one network and has a corresponding subgraph in another network; these subgraphs share nodes that have been locally aligned using a network alignment tool subgraph_z a For naming, Sp indicates a two-letter species abbreviation; the M in Mx indicates the subgraph is a module where x is the module number; C indicates a cluster followed by the cluster number, y; and z is a four-digit number given to uniquely identify conserved subgraphs. nucleosome assembly, transcription factor activity, and others (Supplemental Table S15). However, these subgraphs are relatively small, with two to five edges. The four largest conserved subgraphs are enriched for terms involved in photosynthesis, DNA replication, the ribosome, and starch synthase (Table IV), all of which have a k score greater than 0.5. Incidentally, these four classes of enriched terms are also present in the top-10 list of enriched clusters as seen in Table II. For the rice network, phenotypic terms derived from the Tos17 retrotransposon insertion mutation studies (Hirochika et al., 1996;Miyao et al., 2003) were mapped to loci and included in the functional enrichment and clustering of network modules (Ficklin et al., 2010). Of the 154 conserved subgraphs, 20 conserved subgraphs from rice are enriched for Tos17 phenotypic terms, which include phenotypes such as "sterile," "pale yellow leaf," "high tillering," "vivipary," and more. A listing of these 20 conserved subgraphs can be seen in Table V. The subgraphs are ranked by descending order of k score, which indicates the similarity of functional annotations between the rice and maize conserved subgraphs. Several subgraphs have a k score of 1.0, indicating identical functional similarity, but overall, a high degree of similarity between most of the subgraphs is evident. Figure 3 shows the relationship between the global rice and maize networks (Fig. 3, A and C, respectively) with the conserved subgraphs of each (Fig. 3, B and D, respectively) as constructed using IsoRankN node alignments. Light gray lines between the global rice and maize networks simply map the location of nodes with their conserved counterparts. Light gray lines between the two conserved subgraph networks show node alignments provided by IsoRankN. Light red lines indicate node alignments with phenotypic associations in rice. Figure 4 shows a close-up view of conserved subgraph "subgraph_0107."

DISCUSSION
The purpose of this study was to identify conserved, coexpressed gene sets between two vital agricultural species: rice and maize. To identify these gene sets, we first constructed a maize coexpression network, de novo, and aligned it to a previously described rice coexpression network (Ficklin et al., 2010). Our hypothesis was that the discovery of conserved network nodes (genes) and edges would provide an initial framework  for the translation of complex functional genomic and genetic knowledge from one species to another. This strategy is complementary to traditional comparative genomic approaches where known function is translated between taxa via homology and/or synteny. Additionally, the WGCNA and RMT tools were selected to preserve a knowledge-independent approach. The networks were thresholded (using RMT) and modules were constructed (using WGCNA) without prior knowledge of the underlying gene functions.

The Global Maize Gene Coexpression Network
Here, we provide, to our knowledge, the first known maize gene coexpression network. This network facilitates research in maize by providing lists of interacting genes annotated for specific biological processes that provide clues to candidate gene (known and novel) involved in those processes. Additionally, 391 genes with unknown function (Supplemental Tables S7 and S8) are coexpressed within modules, and 194 of those unannotated genes are interconnected within 32 different cofunctional modules. For example, cluster ZmM5C1 is the fourth highest ordered cofunctional cluster by ,k. and contains nine loci. However, there are 24 directly connected neighboring genes that have no ascribed GO, KEGG, or InterPro function. The enriched functional terms for this cluster include seed storage activities. Guilt-by-association inferences would suggest that the 24 genes of unknown function in ZmM5C1 may be involved in seed storage or related processes. Therefore, these genes make interesting, perhaps novel, candidates for understanding the biological process associated with seed storage. In total, 194 genes of unknown function, through 3,092 edges, now suggest inferences for the biological processes summarized by 33 different cofunctional clusters (Supplemental Table S8).

The Small Size of Global Coexpression Networks
The maize network is small in comparison with the number of loci mapped to the probe sets present on the microarray. Using 32,540 gene models from the ZmB73 4a.53 release of the maize genome (Schnable et al., 2009), 14,792 (45%) of the known maize loci were measured on the microarray platform. Of those, only 2,071 loci (14%) were present in the global maize network. We observed a similar phenomenon in rice, where almost 86% of known rice transcripts mapped to the probe sets on the microarray platform; similarly, a low fraction of the measured loci (10%) were present in the final network. Therefore, with regard to the number of potential coexpression relationships, both networks are relatively smaller than what we would expect across the organism's life cycle. The RMT method (Luo et al., 2007) was specifically used to define the threshold to reduce random noise from the final network to ensure that the detected coexpression relationships were strong. Therefore, the small size of the network is most likely caused by relationships lost within the "noise" of the data set, combined with the fact that not all coexpression relationships from all conditions are represented by the data set.
Is it possible to boost the biological signal and increase the gene space fraction captured in coexpression networks? Lowering the significance threshold, even using reasonable methods designed to limit the number of false positives, would increase the number of loci in the network but could reduce the overall quality of the biological signal and possibly confound the interpretation of modules (Perkins and Langston, 2009). Usadel et al. (2009) discuss several reasons that significant coexpression correlations can be lost. These include sample selection, complex interaction types, and selection of normalization and correlation methods. Our data suggest that the rice and maize networks consist primarily of coexpression relationships derived from basal biological processes whose expression is most common across the samples used to build the network. It would seem that the global coexpression networks currently available for plants, including the rice and maize networks we have generated, are immediately useful for these common processes but lack representation of less frequent processes and other subtle interactions. For maize and rice, it may be that more significant coexpression relationships would be detected if (1) additional transcriptome measurements are made from tissue systems not present in the current network, which would increase the sampling frequency and the probability of detecting rarer coexpression relationships; (2) overlap from multiple tissue-specific transcriptomes on a single sample are reduced by segregating data sets to be tissue/condition specific; and (3) additional statistical methods are employed to identify coexpression relationships specific to unique tissues, conditions, or developmental stages, essentially dissecting the input data into subsystems. It should be noted that the detection of coexpression relationships between highly homologous transcripts including gene variants may require extensive transcriptome measurements from a non-hybridization-based platform (e.g. RNAseq) before the full potential of global coexpression networks, measured in the observed number of coexpression relationships, can be realized.

Conservation between Rice and Maize Coexpression Networks
From a qualitative perspective, the apparent collective role of genes in cofunctional clusters from both rice and maize networks, when ordered by average connectivity, were quite similar (Table II). Cofunctional clusters were ordered by connectivity under the premise that highly coexpressed genes are more likely involved in similar biological processes. As mentioned previously, functional terms from processes such as translation, seed storage, glycolysis, photosynthesis, and the cell cycle are all enriched in the top-10 functional clusters of both networks and provide good indication that the two coexpression networks, derived from independent microarray samples for two different species, demonstrate conservation in terms of the connectivity of coexpressed genes for common biological processes.
The apparent conservation of coexpression patterns between rice and maize is further bolstered through a formal global alignment of the two networks via IsoRankN and identification of conserved subgraphs. Many of the conserved subgraphs between rice and maize show a high degree of similarity of enriched functional terms, indicating a high level of conservation, which we quantified using k statistics (Table IV;  Supplemental Table S15). For example, the function of the top-10 conserved subgraphs by size is shown in Table IV. Many of these share similar function, especially when k scores are closest to 1. This is notable because the likelihood that nodes in the conserved subgraph would be significantly coexpressed, aligned together based on topology and homology, and have nonrandom chance of similarity between their respective enriched function is low. Moreover, modules and cofunctional clusters in maize also align to modules and cofunctional clusters in rice that have similar functional annotations. For example, Figure 5 shows the fourth largest conserved subgraph, "subgraph_0282," with 49 nodes from maize (bottom left) and 45 from rice (top right). Both the maize and rice loci from this subgraph are enriched for terms involved in seed storage, nutrient reservoir activity, and starch synthase, with a k score of 0.51. Not only are the functional enrichments similar between aligned nodes, but module coexpression relationships are also maintained. The majority of maize genes in subgraph_0282 belong to module ZmM5 (with only three from ZmM19), and all of the genes from rice belong to module OsM13. Also, cofunctional clusters show evidence of alignment as well. Within this same conserved subgraph, the orange nodes from rice in Figure 5 and the purple nodes from maize are from cofunctional clusters OsM13C1 and ZmM5C1, respectively. Both of these clusters are enriched for seed storage activities, and the nodes from these two clusters have direct alignments with the other.
It should be noted that low k scores between enriched terms of the rice and maize networks do not indicate that the node alignments are weak. k scores are based on functional similarity and are dependent on the underlying functional annotation of the loci. For instance, conserved orthologous loci in two genomes may not have been annotated identically yet are aligned due to sequence homology and network to- . Largest conserved subgraph with implied phenotypic associations. Shown is the largest conserved subgraph, subgraph_0107, between rice (blue nodes) and maize (green nodes). Dark edges in the subgraph are coexpression relationships. Light edges indicate alignments between the two subgraphs determined using IsoRankN. Light red edges indicate phenotypic associations with nodes in rice that are aligned to nodes in maize.
pology. Also, similar functions may be annotated in somewhat equivalent yet different functional terms. Therefore, a high functional similarity between conserved subgraphs was used to help validate the network alignments, but a lack of functional similarity does not indicate a poor alignment.

Translation of Function and Phenotype
As mentioned previously, conservation between coexpression networks is a powerful tool for validating the correctness of each aligned pair of networks. In essence, conservation reduces the noise within the network because it provides another layer of evidence for coexpression (Obayashi and Kinoshita, 2011). Moreover, the alignment between species strengthens the guilt-by-association inferences made for genes of unknown function. For example, cluster ZmM5C1 was described previously as containing coexpression with 24 genes of unknown function. Seven of the loci from ZmM5C1 appear in conserved subgraph_0282 (purple nodes in Fig. 5). Guilt-by-association inferences may be applied to these genes of unknown function; however, the inference is made stronger because the coexpression relationships are conserved.
A powerful application of alignments between rice and maize networks is the potential to translate gene sets with enriched phenotypes from rice to maize through conserved subgraphs. For instance, Table V provides a list of conserved subgraphs that have enriched phenotypes in rice. These terms are not only present in annotations of genes in the subgraph but enriched. Of particular note is subgraph_0065, with five genes in rice, four genes in maize, and six phenotypic terms: pale green leaf, long culm, albino, drooping leaf, yellow, and low tillering. This subgraph has a k score of 1, indicating perfect similarity between annotated terms, and is annotated as early nodulin 93 protein. It may be that this high level of similarity is due in part to the fact that sequence homology was employed in network alignment, and sequence homology is often used to transfer functional annotation from one species to another. However, network topology based on coexpression edges was weighted more strongly in the IsoRankN alignment, indicating that coexpression relationships are also maintained between rice and maize alignments. Therefore, it seems appropriate that these phenotypic associations from rice can be inferred to the four maize genes as well as connected neighbors in the subgraph. Figure 5. Subgraph_0282 from rice and maize. This subnetwork shows the coexpression edges and conserved alignments for all nodes of subgraph_0282 between maize and rice. Coexpression edges are gray lines, and network alignments are light blue lines. Nodes below the heavy diagonal line are from the maize network, and nodes above it are from rice. All of the rice nodes belong to module OsM13, and the majority of the maize nodes are from module ZmM5, with the exception of the three rightmost nodes in the bottom half, which belong to module Zm19. Yellow nodes in maize are for loci of unknown function. Purple nodes in maize are from cluster ZmM5C1, annotated for nutrient reservoir/seed storage activity. Orange nodes in rice are from cluster OsM13C1, also annotated for nutrient reservoir/seed storage activity. Nodes of other colors belong to the other functional clusters within the module. Gray nodes belong within the subgraph but are not part of a cofunctional cluster.

CONCLUSION
Gene coexpression network alignments coupled with genetic and functional genomic data provide a method for translation of gene function and genotypephenotype associations between species, and they are especially useful for species with limited genetic resources. Experimental evidence will be needed to determine the true predictive power of coexpression relationships (intranetwork and internetwork), but the functional similarity we observed in conserved subgraphs seems quite promising. Still, better quantitative measures of biological signal are needed to validate the coexpression relationships. If an in silico metric for biological signal can be identified, it would provide a means to calculate type I and type II errors under alternate network construction protocols. However, given that gene coexpression networks have already been used to successfully identify candidate genes for specific traits (Lee et al., 2010;Mutwil et al., 2010), it is natural to conclude that function and phenotype can also be transferred across species to help identify genes involved in complex traits. The power of this translational systems genetics approach will be increasingly more useful as more genetic data are made available for grasses, especially in the form of genomewide association studies. In particular, the translation of function and phenotype into large polyploid species, such as sugarcane, would be especially powerful because the capture of genetic associations can be difficult and expensive and genome resources tend to lag behind those of less complex species.

Maize Network Construction
The method used for construction of the maize (Zea mays) gene coexpression network was identical to that previously described for the rice (Oryza sativa) gene coexpression network (Ficklin et al., 2010). A total of 293 samples from the Affymetrix Maize GeneChip Genome Array microarray were obtained from NCBI's GEO repository. RMA normalization (Irizarry et al., 2003) using the software package RMAExpress (Bolstad, 2010) and outlier detection using the arrayQualityMetrics (Kauffmann et al., 2009) tool for Bioconductor (Gentleman et al., 2004) were used to remove outlier samples. Arrays that failed all three outlier tests were excluded from further analysis. Then, a similarity matrix was constructed by performing pairwise Pearson correlations for every probe set across all samples. We selected Pearson correlation because it was commonly supported by both the WGCNA and RMT tools. Next, the WGCNA package (Langfelder and Horvath, 2008) was used to convert the similarity matrix into an adjacency matrix by raising the similarity matrix to a power of 6. The power chosen is one that best approximates scale-free behavior in the resulting network and is selected by the software. Finally, the RMT algorithm (Luo et al., 2007) was used to select a hard threshold that limits the noise in the resulting network.

Functional Enrichment and Clustering
The gene models used for this study were from the maize B73 genome (Schnable et al., 2009) version 4.53a obtained from the maizesequence.org Web site. GO (Ashburner et al., 2000), InterPro (Apweiler et al., 2001), and KEGG (Kanehisa et al., 2008) terms were used for functional annotation of these gene models. In the case of GO and InterPro terms, these were obtained directly from the maizesequence.org Web site. KEGG terms were obtained by uploading maize coding sequences to the KEGG/KAAS server, which maps KEGG terms using a homology-based method (Moriya et al., 2007). An inhouse tool similar to the online DAVID tool (Dennis et al., 2003;Huang et al., 2009) was used to perform functional enrichment using a Fisher's exact test against each network module and the genome background. Modules were further subdivided into functional clusters using pairwise k statistics between all genes. Functional clusters were ordered by the geometric mean of the Fisher's P values, the e-score, or the ,k., which provides a measure of interconnectedness of the nodes in the functional cluster.

Maize-Rice Network Comparison
The maize network was compared with the previously described rice network (Ficklin et al., 2010). The maize and rice networks, as well as functional enrichment and cluster discovery, were constructed with an identical protocol. However, as a result of improvements to the in-house scripts that perform functional enrichment, the functional enrichment and clustering were performed again for the rice network before comparison. The maize and rice networks, including both the original and updated functional enrichment results for rice, are available online at http://www.clemson.edu/genenetwork. Before network comparisons were performed, nodes in both the rice and maize networks were converted from microarray probe sets to genomic loci. In some cases, these were one-to-one mappings between probe sets and genes. However, some microarray probe sets map to more than one genomic locus and vice versa. These mappings are ambiguous but were retained with the assumption that a significant edge to these nodes could be informative because one or more mapped genes would be producing the correlated transcript. During conversion from a probe set to a loci-based network, edges were placed between two loci whenever they mapped to connected probe sets. Edges were also preserved in cases where a single locus mapped to more than one probe set in a different module.
Network comparisons between the rice and maize gene coexpression networks were performed using IsoRankN (Liao et al., 2009), which provided a mixed topology and homology-based global alignment methodology. First, the maize and rice protein sequence data sets were obtained from the Michigan State University version 6.0 assembly for rice (Ouyang et al., 2007) and the maize B73 genome (Schnable et al., 2009) (2008) for selecting BLAST parameters for reciprocal best hits. The homolog scores derived from BLAST and the network edges list were used as input to IsoRankN. Several iterations were performed by varying the parameters for the software. IsoRankN's own iteration parameter was adjusted at values of 10, 20, and 30. The threshold parameter was adjusted at values of 1e-3, 1e-4, and 1e-5, and the a value, which controls the contribution of topology versus homology in aligning the networks, was varied from 0.0 to 1.0 in 0.1 increments. In total, 189 iterations were performed for IsoRankN. IsoRankN generates sets of one-tomany mappings, where in some cases multiple aligned loci are in a single set. Each pair or group was referred to as an alignment set. Conserved subgraphs were generated using these output files with an in-house Perl script. Conserved subgraphs were constructed in a two-step method. The first step selected edges that were conserved between the two networks, and the second step identified subgraphs of interconnected loci.
The process for selecting preserved edges was performed by comparing two loci from two different alignment sets in one network with two loci from the same alignment sets from the other network. If an edge existed in both networks using the four selected loci, then both edges were marked as conserved. The following pseudocode describes the process: S = the array of aligned sets of loci for each set in S as s i for each set in S as s j where s i is not s j for each locus l i in s i for each locus l j in s j for each locus k i in s i where k i is not l i for each locus k j in s i where k i is not k j if l i and l j are in the same network and connected and if k i and k j are in the same network and connected then mark both edges as conserved. Edges and nodes that were not marked as conserved were discarded, and the remaining networks, one for rice and the other for maize, became the "conserved" subnetworks.
Finally, conserved subgraphs within the conserved networks were identified by first selecting an edge from one conserved network to serve as a seed for the subgraph. The aligned loci in the other conserved network were also used as a seed. Thus, the process of defining subgraphs was performed in parallel in both networks. Next, the edges of all of the connected neighbors of the seed were added to the subgraph. The process was continued by iterating recursively through the neighbors and adding their edges until all possible edges were exhausted. Then, a new edge, which had not yet been added to a subgraph, was selected to act as the next seed until all edges in the conserved networks were placed in subgraphs. These subgraphs were labeled numerically, and a label for a subgraph in rice was the same for the corresponding conserved subgraph in maize and vice versa.
Functional enrichment was then performed for each subgraph using the same method described previously for modules in the global network. Subgraphs were then compared using k statistics. As described previously, k statistics were used to provide a measure of similarity between the functionally enriched terms of genes in a network module. Here, k statistics are used to provide a measure of similarity between the two conserved subgraphs of maize and rice that have the same label. Subgraphs are then ranked by k score from greatest to smallest. Conserved subgraphs are given a four-digit unique number prefixed with the word "subgraph."

Supplemental Data
The following materials are available in the online version of this article.
Supplemental Figure S1. Maize network module clustering.
Supplemental Table S1. Microarray samples used in network construction.
Supplemental Table S2. Edges from the probe set-based maize network.
Supplemental Table S3. Maize locus identifiers in functionally enriched clusters.
Supplemental Table S4. Maize Affymetrix probe set identifiers in functionally enriched clusters.
Supplemental Table S5. Enriched functional terms in maize clusters.
Supplemental Table S6. Enriched functional terms in maize modules.
Supplemental Table S7. Maize genes in modules with no annotated function.
Supplemental Table S8. First neighbors of maize genes in modules with no annotated function.
Supplemental Table S9. Edges from the locus-based rice network.
Supplemental Table S10. Edges from the locus-based maize network.