Plant MetGenMAP: an integrative analysis system for plant systems biology

The information and resources generated from diverse ‘omics’ technologies provide opportunities for producing novel biological knowledge. It is essential to integrate various kinds of biological information and large-scale ‘omics’ datasets through systematic analysis in order to describe and understand complex biological phenomena. For this purpose, we have developed a web-based system, Plant MetGenMAP, which can comprehensively integrate and analyze large-scale gene expression and metabolite profile datasets along with diverse biological information. Using this system, significantly altered biochemical pathways and biological processes under given conditions can be retrieved rapidly and efficiently, and transcriptional events and/or metabolic changes in a pathway can be easily visualized. In addition, the system provides a unique function that can identify candidate promoter motifs associated with regulation of specific biochemical pathways. We demonstrate the functions and application of the system using datasets from Arabidopsis and tomato, respectively. The results obtained by Plant MetGenMAP can aid in a better understanding of the mechanisms that underlie interesting biological phenomena and provide novel insights into the biochemical changes associated with them at the gene and metabolite levels. Plant MetGenMAP is freely available at http://bioinfo.bti.cornell


INTRODUCTION
significance of pathway changes can be further corrected for multiple testing using the False Discovery Rate (FDR; Benjamini and Hochberg, 1995) or Bonferroni correction.
PromAnalyzer retrieves the promoter sequences of co-expressed genes in an altered pathway and identifies enriched regulatory motifs from said promoter sequences.
FunctAnnotator analyzes a list of up-and/or down-regulated genes under specific conditions and reports a list of significantly enriched GO terms. FunctAnnotator can also classify a list of genes into different functional categories using a set of plant specific GO slims, which are a list of high level GO terms providing a broad overview of the ontology content (http://www.geneontology.org/GO.slims.shtml). A sample output of the functional classification generated by the system is shown in Figure 2C. PathVisualizer provides an intuitive visualization of each individual pathway with genes and metabolites decorated using different colors reflecting the changes of their respective levels (e.g., ratios) and the significances of the changes (e.g., p values) ( Figure 2D). The expression changes of genes from a given family are displayed on the pathway separately for each individual gene. This is a more accurate representation than using average or extreme values since genes from the same family often behave differently. In addition, tables listing the absolute values of quantitative changes as well as the significances of the changes of all the genes and metabolites in the pathway are provided in the system. Plant MetGenMAP currently supports different expression profiling platforms for several major plant organisms including: 1) ATH1 genome array and TAIR AGI locus number for Arabidopsis; 2) Affymetrix genome array and genome locus number for rice; and 3) TOM1 array, TOM2 array, Affymetrix genome array and SGN unigenes for tomato.
More platforms from additional plant species can be easily added to the system. Plant MetGenMAP operates on a Linux system under an Apache web server and the majority of the functions in the system were implemented with Perl/CGI or R scripts.

Mapping Gene Expression Profiles to Metabolic Pathways in Arabidopsis
We first demonstrated the functions of the Plant MetGenMAP system using a publicly available expression dataset generated from Arabidopsis seedlings treated with fourteen different light conditions, among which seven were treated for four hours (long-term light treatments) and seven were treated for 45 minutes (short-term treatments) (Supplemental Table S1). The normalized and processed microarray dataset was uploaded into the Plant MetGenMAP system. Genes with fold changes greater than 1.5 and corrected p value (FDR) less than 0.05 were regarded as differentially expressed genes. The system can efficiently map genes onto each biochemical pathway and identify significantly altered pathways under each condition.
The significantly altered pathways under each of the fourteen light treatment conditions were retrieved with a p value cutoff of less than 0.05. The list of all the significantly altered pathways is provided in Supplemental Table S2. As expected, a number of known light-regulated metabolic pathways were among the list of the most highly altered pathways, including photosynthesis, Calvin cycle, and carotenoid biosynthesis pathways.
Significant differences in pathway changes between long-term and short-term light treatments were also observed. Table I lists pathways that were significantly altered only under either long-term or short-term light treatments in at least four out of the seven conditions. Several notable light-regulated pathways including photosynthesis, photosynthesis (light reaction), and chlorophyllide a biosynthesis were significantly altered under all seven long-term light treatment conditions while none of these were significantly altered under short-term conditions. In addition, the Calvin cycle and salicylic acid biosynthesis pathways were also affected specifically by the long-term light treatments (Table I). Our analysis could provide an explanation at the molecular level with relevance to the finding that ultraviolet light stimulates the accumulation of salicylic acid in plant leaves (Yalpani et al., 1993). Table I, pathways regulated specifically by short-term light treatments include anthocyanin biosynthesis, flavonoid biosynthesis, spermidine biosynthesis, spermine biosynthesis, stachyose biosynthesis, and polyamine biosynthesis. It is well known that light plays a critical role in the regulation of anthocyanin and flavonoid biosynthesis (Koes et al., 2005;Grotewold, 2006). In addition, through an integrated analysis of gene expression and metabolite profiling, Jumtee et al. (2008) found that the photoreceptor phytochrome A regulated the biosynthesis of polyamines, including spermidine and spermine. However, no previous reports that we are aware of have described the possible participation of light involvement in stachyose biosynthesis.

As shown in
Several reports have described the biochemical pathways affected by different qualities and quantities of lights based on whole genome expression profiling analysis (Ma et al., 2001;Jiao et al., 2005). Our analysis identified a large number of previously described as well as novel light-regulated biochemical pathways from a comprehensive gene expression dataset (Supplemental Table S2). The functions implemented in the Plant MetGenMAP system are able to retrieve highly affected pathways efficiently and comprehensively and allow us to visualize the detailed gene expression changes within a pathway intuitively, which facilitates insights into important biological processes which remain to be fully characterized.

Promoter Analysis of Co-expressed Genes in a Specific Pathway
Plants have evolved the ability to synthesize a large variety of metabolites to protect themselves against various attacks and to attract flower pollinators. The regulation of metabolite biosynthesis is coordinated by specific transcription factors (Grotewold, 2005).
A notable example is the regulation of the anthocyanin biosynthetic pathway by MYB transcription factors (Gantet and Memelink, 2002). Bioinformatics analysis has indicated that genes within the same pathway, especially those clustered together in the pathway structure, are usually highly co-expressed (Wei et al., 2006). This implies that those genes might be regulated by common transcription factors. Experimental evidences also support that a subset of genes in the same pathway could be regulated by common transcription factors (Borevitz et al., 2000;Jin et al., 2000;van de Fits and Memelink, 2000). Based on these reports, we implemented a function in Plant MetGenMAP to identify over-represented motifs from promoter sequences of a set of co-expressed genes in a specific metabolite pathway. These motifs may play an important role in transcriptional regulation of enzymes controlling specific pathways.
To demonstrate the efficiency of this function, we extracted over-represented motifs in six significantly altered pathways using the microarray datasets generated under long term UV-A and short term blue light treatments. The six pathways are photosynthesis, photosynthesis (light reaction), chlorophyllide a biosynthesis, leucine degradation, valine biosynthesis, and spermine biosynthesis. Among a number of over-represented motifs generated in these pathways, a total of four are known to be related to light responsible genes ( Table II). The motif CACGTGGC was enriched in promoters of up-regulated genes in the photosynthesis (light reaction) pathway. This motif is similar to G-boxes, elements with the core CACGTG that are found repeatedly in light-regulated genes (Terzaghi and Cashmore, 1995). Another similar element, GmCACGTG, was also identified in the photosynthesis pathway. In addition, a motif (GCCACGTG) found in the photosynthesis (light reaction) pathway contains the computationally identified phyAinduced motif, SORLIP 1 (GCCAC), which is over-represented in light-induced genes (Jiao et al., 2005). The element AGATAAGA was enriched in promoters of co-expressed genes in the leucine degradation pathway. This element consists of an I-box motif (GATAAG) that has been reported to be conserved in the upstream sequences of lightregulated genes (Giuliano et al., 1988;Martinez-Hernandez et al. 2002) and can confer responsiveness to diverse light spectra including far-red, red, and blue light (Chattopadhyay et al., 1998;Escobar et al., 2004). Similar elements, GATmAGnm, AGATAAGn and AGATAAGA, were also identified in the leucine degradation pathway under the far-red, red and blue light treatments, respectively (data not shown). In addition, our analysis identified a number of novel motifs that might have potential roles in regulating specific biochemical pathways under different light treatments. Sequence logos of all the identified known and novel motifs are provided in Supplemental Figure   S1. In summary, the motif identification tool provided in the Plant MetGenMAP system can aid us in identifying important candidate transcriptional regulators that coordinately modulate the expression of a subset of genes in a specific metabolic pathway and in further engineering the production of important plant metabolites. Large-scale expression profiling experiments such as microarray and RNA-seq often produce a list of differentially expressed genes, which could contain hundreds or thousands of genes of interest. Translating such lists of genes into biologically meaningful information is normally required to better understand the underlying biological phenomena. This can be achieved in part by applying GO term enrichment analysis. Through this kind of analysis, a set of over-represented GO terms, which represent highly affected biological processes, can be extracted from a list of differentially regulated genes. Using the GO term enrichment analysis tool implemented in Plant MetGenMAP, we were able to identify a total of 218 significantly enriched GO terms belonging to the biological process category from lists of up-regulated genes in the fourteen light treatment conditions using a cutoff of Bonferroni corrected p value <= 0.05 (Supplemental Figure S2). Figure 3A shows the most enriched GO terms (p <= 1.0e-10), among which two (response to radiation; response to light stimulus) are highly enriched in all 14 conditions. In addition, a number of GO terms which were highly enriched only in long-term or short-term light treatment conditions were identified ( Figure 3A and Supplemental Figure S2), which clearly showed the differences in plant response to different durations of light treatments.

Functional Analysis of Gene Expression
One of the major tasks in gene expression data analysis is to classify a list of genes of interest into different functional categories. In the Plant MetGenMAP system, we implemented a tool which uses a set of plant specific GO slims to classify genes. Using this tool, we functionally categorized up-regulated genes in each of the fourteen light treatments. As shown in Figure 3B, most of the light-induced genes fall into categories such as response to stress, response to abiotic stimulus, transcription, and metabolic process, indicating that light treatments trigger systems which helps plants to fight against light stresses, and that light treatments caused significant changes of associated primary and secondary metabolite levels.
Tomato has long served as the primary physiological, biochemical, genetic and molecular model for fleshy fruit development and ripening (Giovannoni, 2001(Giovannoni, , 2004. A collection of tomato (Solanum lycopersicum) lines harboring single, defined, and overlapping introgressions from the wild species S. pennellii was generated (Eshed and Zamir, 1994) and has proved to be a valuable resource for tomato QTL mapping and breeding.
Substantial line to line variations of various different phenotypes (traits) and the levels of transcripts and metabolites have been observed among these introgression lines (Baxter et al., 2005;Schauer et al., 2006;Tieman et al., 2006). Among the many interesting lines in this collection, IL3-2, has the yellow fruit phenotype ( Figure 4A). The line includes the S. pennellii introgression segment containing the r gene (fruit specific phytoene synthase) (Fray and Grierson, 1993) and has very low levels of lycopene (Rousseaux et al., 2005).
To systematically understand transcript and metabolite changes in this interesting line, we performed comparative transcriptome and target metabolite analyses on IL3-2 and its cultivated S. lycopersicon parent line, M82.
Tomato TOM1 cDNA arrays were used to investigate the genome-wide transcript changes between ripening fruits of IL3-2 and the M82 S. lycopersicum control. The contents of a set of metabolites in the carotenoid biosynthesis pathway, including translycopene, phytoene, phytofluene, cis-lycopene, gamma-carotene, beta-carotene, alphacarotene, delta-carotene, and lutein, were also measured in ripening fruits of IL3-2 and M82. The normalized transcript and metabolite profiles were simultaneously analyzed using the Plant MetGenMAP system. We treated genes and metabolites with fold changes between IL3-2 and M82 greater than two as significantly modified genes/metabolites. We were able to identify a number of significantly altered pathways (FDR < 0.05) in IL3-2 (Table III). As expected, the carotenoid biosynthesis pathway was highly altered. Figure   4B, which was generated automatically by the system based on the expression and metabolite levels, provides visualization of the gene and metabolite changes of the carotenoid biosynthesis pathway in IL3-2. It clearly shows that the decreased level of phytoene synthase, an upstream enzyme in the pathway, was associated with significant decreases in all downstream metabolites that were investigated and which is causal of the yellow fruit phenotype of IL3-2. Several other pathways were also significantly altered in IL3-2, including the sucrose degradation pathway, lipoxygenase pathway, jasmonic acid biosynthesis, glutamate degradation, and arginine degradation. β -fructofuranosidase (or acid invertase), a major enzyme in the sucrose degradation pathway, cleaves sucrose and related sugars into hexoses such as glucose and fructose and controls sugar composition. The expression of β -fructofuranosidase is highly induced during tomato fruit ripening (Klann et al., 1993).
In IL3-2, we found that the expression of β -fructofuranosidase was significantly repressed. We then further investigated changes of glucose and fructose levels in IL3-2.
Consistent with the changes of β -fructofuranosidase, the contents of glucose and fructose were also significantly decreased ( Figure 5). Lipoxygenases (LOX) have been suggested to play a role in wound responses, pathogen attack, potato tuber enlargement (Feussner and Wasternack, 2002), and fruit flavor generation (Griffiths et al., 1999). LOX is also an enzyme in the jasmonic acid biosynthesis pathway (León and Sánchez-Serrano, 1999).

Several LOX genes have been identified in tomato, among which the expression of
TomloxA declines during fruit ripening while TomloxB and TomloxC expression is enhanced (Griffiths et al., 1999). It was reported that TomloxA displays negative correlation with respect to carotenoid accumulation and may provide essential defense component in unripe fruit (Griffiths et al., 1999). Consistent with this report, we observed that in IL3-2 which accumulates much less carotenoids, the expression of TomloxA was significantly increased. Glutamate decarboxylase, the key enzyme in the glutamate degradation pathway, has been reported to be down-regulated by fruit ripening (Gallego et al., 1995) while the relative content of glutamate, the substrate of glutamate decarboxylase, increased markedly in red fruits (Boggio et al., 2000;Pratta et al., 2004).
In IL3-2, we found that the level of glutamate decarboxylase was significantly increased.
However, to get a deeper understanding of the changes of the above pathways, further study is required to obtain comprehensive metabolite profiles of these pathways.
In summary, using the