Advanced Data-Mining Strategies for the Analysis of Direct-Infusion Ion Trap Mass Spectrometry Data from the Association of Perennial Ryegrass with Its Endophytic Fungus, Neotyphodium lolii 1[W][OA]

Direct-infusion mass spectrometry (MS) was applied to study the metabolic effects of the symbiosis between the endophytic fungus Neotyphodium lolii and its host perennial ryegrass ( Lolium perenne ) in three different tissues (immature leaf, blade, and sheath). Unbiased direct-infusion MS using a linear ion trap mass spectrometer allowed metabolic effects to be determined free of any preconceptions and in a high-throughput fashion. Not only the full MS 1 mass spectra (range 150–1,000 mass-to-charge ratio) were obtained but also MS 2 and MS 3 product ion spectra were collected on the most intense MS 1 ions as described previously (Koulman et al., 2007b). We developed a novel computational methodology to take advantage of the MS 2 product ion spectra collected. Several heterogeneous MS 1 bins (different MS 2 spectra from the same nominal MS 1 ) were identiﬁed with this method. Exploratory data analysis approaches were also developed to investigate how the metabolome differs in perennial ryegrass infectedwith N.lolii incomparisontouninfectedperennialryegrass.Aswellassomeknownfungalmetaboliteslikeperamineand mannitol, several novel metabolites involved in the symbiosis, including putative cyclic oligopeptides, were identiﬁed. Cor- relation network analysis revealed a group of structurally related oligosaccharides, which differed signiﬁcantly in concentration in perennial ryegrass sheaths due to endophyte infection. This study demonstrates the potential of the combination of unbiased metabolite proﬁling using ion trap MS and advanced data-mining strategies for discovering unexpected perturbations of the metabolome, and generating new scientiﬁc questions for more detailed investigations in the future.

With the advent of metabolomics, methods for the simultaneous analysis of a large number of small molecules (metabolites) have been developed and improved, providing more details about the metabolism of complex biological systems (Sumner et al., 2003;Dettmer et al., 2007). Metabolite fingerprinting methods provide relatively unbiased and high-throughput information on complex biological systems and have been used for yeast (Saccharomyces cerevisiae) strain classification (Allen et al., 2003) and annotation of gene functions (Raamsdonk et al., 2001). Comprehensive and unbiased metabolite analysis is also an indispensable tool for systems biology together with transcriptomics and proteomics (see commentary by Sauer et al., 2007). Direct-infusion (without prior chromatographic sepa-ration) electrospray ionization (ESI) mass spectrometry (MS) was introduced as a tool for the identification of novel fungal metabolites in culture (Smedsgaard and Frisvad, 1996) and is now widely applied in metabolomics (for a recent review, see Dettmer et al., 2007). High resolution MS instrumentation such as time-of-flight MS (Dunn et al., 2005) or Fourier transform ion cyclotron resonance MS (FT-ICR-MS;Aharoni et al., 2002) is often preferred for the specificity provided by resolution of isobaric ions and highly accurate estimates of mass-to-charge (m/z) ratios. We recently applied direct-infusion ESI MS using an ion trap MS (DIMS n ) to determine metabolic differences between endophyte-infected and endophyte-free perennial ryegrass (Lolium perenne) seed samples (Koulman et al., 2007b). This study established that DIMS n is a powerful tool for determining metabolic profiles, even though ion trap MS is a low resolution MS technology. Its advantage lies in the capacity of the ion trap to rapidly collect fragmentation data on large numbers of ions (more than 200) selected from the MS 1 spectrum using automated data-dependent scanning, thereby facilitating structural classification and identification of metabolites of interest. Direct-infusion ESI MS/MS with an ion trap has been used to discover unknown drug metabolites (e.g. Tozuka et al., 2003), but its potential for metabolome investigations warrants further exploration and development. Here, we have employed DIMS n technology to detect and investigate a wide range of metabolites involved in an association between perennial ryegrass and its endophytic fungus, Neotyphodium lolii.
Symbiotic associations between fungal endophytes and grasses are widespread and have been estimated to occur in 20% to 30% of all grass species (Leuchtmann, 1992), and are therefore of wide interest to studies on plant-fungal interactions. The two most widely studied associations are the perennial ryegrass-N. lolii symbiosis in Australasia and tall fescue (Lolium arundinaceum)-Neotyphodium coenophialum in North America (Christensen et al., 1993). These two associations are of particular interest to agricultural pastoral systems because the fungal endophytes have been implicated in the toxicity of grazing livestock including ryegrass staggers and fescue foot, but also found to confer a range of agronomic benefits to their grass hosts, mainly through toxicity and feeding deterrent activities toward invertebrate herbivores. These antiherbivore activities have been associated with specific alkaloids produced by the fungi within the plant (Bush et al., 1997;Lane et al., 2000;Malinowski and Belesky, 2000;Schardl et al., 2004). The major known alkaloids produced by N. lolii in perennial ryegrass are peramine, lolitrem B, and ergovaline. Peramine, a pyrrolopyrazine alkaloid, protects the grass host from herbivorous insects such as the Argentine stem weevil (Listronotus bonariensis; Rowan and Gaynor, 1986;Fletcher and Easton, 1997) and has been shown to be exuded in the guttation fluid of endophyte-infected perennial ryegrass (Koulman et al., 2007a). The indolediterpene lolitrem B acts as a neurotoxin and causes ryegrass staggers in grazing livestock (Gallagher et al., 1984). The peptide alkaloid ergovaline has been associated with heat stress and poor liveweight gains in livestock grazing endophyte-infected perennial ryegrass (Easton et al., 1986;Fletcher and Easton, 1997) and with fescue foot, a severe mammalian disorder that can lead to considerable productivity losses in livestock raised on endophyte-infected tall fescue (Lyons et al., 1986;Strickland et al., 1996). Considerable research efforts have been focused on the biosynthesis, accumulation, and ecological consequences of these fungal alkaloids (for review, see Schardl, 2001;Clay and Schardl, 2002;Schardl et al., 2004). Much less is known about the impacts of fungal endophytes on general host plant performance and metabolism, and recent publications indicate that the effects of fungal endophytes on plant metabolism might be of importance to the understanding of ecosystem-wide impacts of the grassendophyte symbiosis (Hunt et al., 2005;Cheplick, 2007;Krauss et al., 2007;Rasmussen et al., 2007Rasmussen et al., , 2008. In this article, we present exploratory data analysis approaches to investigate how metabolites analyzed by DIMS n differ between endophyte-infected and uninfected ryegrass plants in three tissue samples corresponding to three developmental stages (immature leaf, blade, and sheath) of the symbiosis of perennial ryegrass and N. lolii. We have also taken advantage of the MS/MS spectral information to aid metabolite identification and determine the homogeneity of the spectra, and hence uniformity of the metabolite species across the samples. Available software for automating the processing of liquid chromatography (LC)-MS data including commercial software such as MassFrontier (http://www.highchem.com/) and freeware and open source software such as MetAlign (http:// www.metalign.nl), XCMS (Smith et al., 2006), and MZmine (Katajamaa and Orešič, 2005) is not applicable to infusion profiles of the type generated in our experiments, as these programs are designed to search for peaks in both the time and mass domain. Thus, computational tools for harnessing the raw data from DIMS n experiments have been developed in this study and will be made available upon request.

RESULTS
The approach to DIMS n data analysis in this article is as follows: (1) statistical analysis of MS 1 data, with the range of 150 to1,000 m/z, to select nominal m/z bins that differ significantly in intensity between the groups of four samples from infected and uninfected plants within each of three tissue types (immature leaf, blade, and sheath); (2) analysis of the MS 2 spectra deriving from the same parent MS 1 m/z bin across all the samples using purpose-built computational tools to determine their similarity and aid identification of components in the fragmentation data; and (3) correlation network analysis of all MS 1 bins to identify metabolite relationships not revealed by feature selection based on the statistical ranking. The MS 1 spectrum of each sample was obtained from the raw data as described in ''Materials and Methods'' (''Data Analysis''). To handle the low resolution infusion data we were able to adopt simple processing procedures (compare with Enot et al., 2006) compared to those required for handling high resolution LC-MS data (Hansen and Smedsgaard, 2004), where alignment in the time and mass domains can be challenging. The MS 1 spectra were collated in nominal m/z bins in keeping with the 1 m/z resolution of the machine (called MS 1 bins thereafter). After background subtraction, the median value of the ion abundance in each bin was determined. Each MS 1 bin is thus likely to include ion signals from more than one metabolite. On the other hand, each metabolite is likely to contribute to signals for more than one MS 1 bin due to the occurrence of isotopologue ions, hydrogen transfers in the source, and the formation of salt adduct ions. Both aspects must be taken into account for the interpretation of a nominal MS 1 bin.
Our experiment was designed to analyze how the metabolome of the symbiosis changes upon endophyte infection in different tissues, i.e. immature leaf, blade, and sheath (see Supplemental Table S1 for sample description). Our first data exploration based on principal component analysis revealed that the main variations (73.45% of the total variation) were explained by metabolic differences between tissues. The infected (E1) and uninfected (E2) samples were not resolved in the PC1-PC2 score plot (Supplemental Fig. S1). We therefore used an empirical Bayes moderated t test (Smyth, 2004) to investigate which metabolites were differentially expressed between E1 and E2 in immature leaf, blade, and sheath tissue, respectively. This approach was devised to identify differentially expressed genes across specified conditions in designed microarray experiments, and it provides more stable inference when the sample size is small (Smyth, 2004). Significant differential MS 1 bins were selected based on an adjusted P value ,0.05 using Benjamin and Hochberg's false discovery rate to control false positives. Ten MS 1 bins were identified based on this criterion, of which seven (m/z 230, 248, 205, 231, 335, 189, and 554) were significantly different between E1 and E2 in sheath, four in immature (m/z 209, 297, 223, and 230), but none in blade tissue. The distributions of the ion abundance of MS 1 bins of m/z 205, 248, 335, and 554 in different treatment groups are shown in Figure 1. See Supplemental Figure S2 for the distribution of the other significant MS 1 bins (m/z 230, 231, 189, 209, 223, and 297) and Table I for a summary of all the MS 1 bins identified as being significantly different between infected and uninfected perennial ryegrass tissues.
Based on the nominal mass of an MS 1 bin only, it is impossible to determine the chemical identity of a selected ion in a single experiment. For the MS 1 bins of m/z 230, 231, and 189, no MS 2 data were generated and no putative identification can be suggested. We have shown previously that DIMS n with our instrumenta-tion allows MS 2 product ion spectra for selected MS 1 features to be checked manually to identify unique MS 1 ion species by their fragmentation pathway (Koulman et al., 2007b). Here, we developed a computational method to utilize the fragmentation data from all the samples where MS 2 data are available and circumvent manual comparison of individual spectra from different samples.
MS 2 Data Analysis and the Identification of Selected MS 1 Ions As noted above, MS 1 bins of different m/z could arise from the same metabolite due to isotopic ions, hydrogen transfers, or salt adducts. To identify a metabolite we need to first identify the relevant monoisotopic ( 12 C, 1 H, 14 N, 16 O) MS 1 bin. This can be partially addressed using correlation analysis (Enot et al., 2006) as bins deriving from the same metabolite should be highly correlated, and in the mass range we are investigating, the monoisotopic MS 1 bin will be of highest intensity. Thus, to assist in the identification of ions in the significant MS 1 bins, we have considered concurrently correlated MS 1 bins of adjacent mass (possible isotopologues) as well as MS 1 bins corresponding to salt (e.g. K 1 and Na 1 ) adducts. We have also compared their MS 2 product ion spectra as an aid to identifying isotopologues, adducts, and binning anomalies.
Analysis of the MS 2 data can also assist in addressing the alternative scenario, namely, that any one MS 1 bin (a nominal unit m/z) is also likely to contain signals for a number of different metabolites not resolved by instrument resolution (and also binning artefacts). If the MS 1 bin of a nominal m/z contains ions from Figure 1. The distribution of MS 1 ion abundance in different treatment groups. MS 1 bins m/z 205, 248, 335, and 554 differ significantly (false discovery rate adjusted P value ,0.05) between E1 and E2 in sheath tissue. The barplot is based on the median and median absolute deviation values of the replicates (n 5 4, median 6 median absolute deviation). Labels of E1 and E2 refer to presence and absence of endophyte, and I, B, and S refer to plant tissue immature leaf, blade, and sheath, respectively. See all sample names in Supplemental Table S1. different metabolites in different samples, or from the same metabolites but in different concentrations, then the MS 2 product ion spectra are likely to differ between samples. Thus we have developed a method for automated comparison of MS 2 data across a sample set to investigate ion homogeneity within selected MS 1 bins.
The method is based on the modified Manhattan distance as a measurement of the similarity of MS 2 spectra from ions in a given parent MS 1 bin. The procedure is described in detail in ''Materials and Methods.'' In brief, MS 2 data for a parent MS 1 bin of interest were pairwise compared between samples. Only the 20 most intense fragment ions in each MS 2 spectrum were used for the comparison. The sum of the absolute values of the difference in normalized intensities was used as a distance score. From a set of pairwise differences, a distance matrix was obtained for statistical classification (hierarchical clustering or multidimensional scaling [MDS], etc.) to assess the homogeneity of each MS 1 bin. For most MS 1 bins, no distinct groups were seen, indicating that the MS 2 spectra were consistent, and thus these MS 1 bins are homogeneous and likely to be dominated by a single ion species. However, there were a number of MS 1 bins for which different MS 2 spectral patterns were observed for different samples. The differences between MS 2 spectra in samples can be visualized with clustering analysis methods such as hierarchical clustering or MDS. MDS preserves the distance metric, and the cluster structures are revealed in different directions in the manner of principal component analysis (Lattin et al., 2003). The m/z 209 MS 1 bin, which showed significant differences in ion abundance between E1/E2 in the immature tissues (see the plot in Supplemental Fig. S2), also exhibited different MS 2 patterns in E1 and E2 samples (Fig. 2, premz 209, where premz means precursor m/z in MS 2 plots). In-spection of individual MS 2 spectra from the MS 1 bin m/z 209 ( Fig. 3) suggests there are at least two metabolites within the m/z 209 MS 1 bin present in different concentrations in the E1 and E2 samples (e.g. fragment ions m/z 191 and 192; in comparison with m/z 149, 177, and 181). The MS 2 spectra for another significant MS 1 bin m/z 554 and the MS 1 bin of adjacent mass m/z 555 (see the following discussion) also showed different patterns between E1 and E2 samples (Fig. 2, premz 555). See Supplemental Figure S4 for the detailed MS 2 spectra of m/z 555.
Many distance metrics have been proposed to measure the similarity of spectra such as the dot product   (Stein and Scott, 1994) and correlation-based distance metrics such as 1-correlation coefficient or 1-cosine (Tabb et al., 2003). We observed that fragment masses in MS 2 spectra often showed mismatches between samples, and major fragment masses needed to be treated individually. The top 20 ions in each MS 2 spectrum were retained and compared in this study, but this number could be extended or decreased to any  arbitrary number. However, McLafferty et al. (1999) reported that 18 peaks were 97% as effective as 150 peaks for searching and comparing electron impact mass spectra. In standard practice of chemical analysis, only a few major MS 2 product ions are used for confirmation of the identity of the parent MS 1 species (Allwood et al., 2006;Koulman et al., 2007b).
The predominant component of the m/z 248 MS 1 bin in the endophyte-infected samples is peramine, which is a known fungal alkaloid. Peramine has a guanidinium moiety that undergoes distinctive neutral losses of 17 and 42. Its MS 2 product ion spectrum is in accordance with previous findings (Koulman et al., 2007a); see also Supplemental Figure S7 in which m/z 248 is used as an example for MS 2 data handling.
The MS 2 fragmentation pattern of the m/z 205 MS 1 bin showed a clear water loss. Weakly basic metabolites such as alcohols are prone to form sodium adducts rather than [MH] 1 ions (Jemal et al., 1997). We propose the actual nominal mass of the metabolite to be 182 (i.e. m/z 205 is [MNa] 1 ). A logical candidate is mannitol or a related sugar alcohol. To test this hypothesis, we infused a water solution of mannitol into the mass spectrometer and observed a sodiated ion of m/z 205 with a highly similar MS 2 spectrum to that of the m/z 205 MS 1 bin in the endophyte-infected samples (Supplemental Fig. S3). However, as there are several naturally occurring hexitols (10 possible stereoisomers) with similar MS 2 product ion spectra (based on data from triple quadrupole MS; see www.hmdb.ca and www.massbank.jp), the MS 2 spectrum may be derived from a combination of several unresolved sugar alcohols in the m/z 205 MS 1 bin.
No direct fragmentation data are available for the m/z 335 MS 1 bin that was detected at higher abundance in endophyte-free tissue (Fig. 1, m/z 335). However, consideration of MS 1 bins of adjacent masses suggested the ions of m/z 335 are mainly isotopologues of the ryegrass alkaloid perloline. The m/z 335 MS 1 bin is highly correlated with the MS 1 bins of m/z 333 (r 5 0.86, r is the Pearson's correlation coefficient, thereafter) and 334 (r 5 0.93). The major ion detected at m/z 333 in positive ESI MS is assigned as the anhydrocation perloline (C 20 H 17 N 2 O 3 1 ), with predicted isotopologue ions at m/z 334 ( 13 C 1 and 15 N 1 ) and m/z 335 ( 13 C 1, 15 N 1, 13 C 2 , 15 N 2 , and 18 O 1 ). The expected relative intensities of m/z 333, 334, and 335 are 1:0.24:0.03. The MS 1 bins of m/z 333, 334, and 335 were observed to have a mean relative intensity of 1:0.37:0.07. The measured ratios show higher abundances for the higher mass ions than the theoretical prediction, which suggests ions from additional compounds have also been detected in the m/z 335 MS 1 bin. Thus, although modeling isotopic distribution has been attempted for high resolution MS data (Bö cker et al., 2006), for low resolution infusion MS data this may be confounded by interfering components isobaric with isotopologue peaks.
The m/z 554 MS 1 bin is correlated with the m/z 555 MS 1 bin with r 5 0.72. MS 2 product ion spectra of m/z 554 were available only from four E1 samples and were very similar to the MS 2 product ion spectra of m/z 555 in E1 samples (m/z 555 represents a different compound in E2 samples as noted above; see Fig. 2; Supplemental  Fig. S4). For both MS 1 bins, the MS 2 spectra in E1 samples show a series of product ions with a higher m/z than the parent ion (Supplemental Fig. S4), suggesting this is a doubly charged ion. Manual examination of these ions showed that the parent ion occurred at m/z 554.5 and its monoisotopologue at m/z 555.1. Due to the limited precision of the mass spectrometer, the measured m/z varied between 554.22 and 554.55 and therefore caused binning problems with bins of unit m/z. The doubly charged state was confirmed by the occurrence of high mass product ions in the MS 2 and MS 3 data. There was a significant product ion of m/z 904.3 and several other high mass ions in the MS 2 spectrum from m/z 554.5 and in the MS 3 spectrum from its major MS 2 product of m/z 516.7 (Supplemental Fig. S4, a and b). The exact structure of the compound remains to be elucidated, but the complex pattern of product ions suggests that it is a cyclic oligomer of amino acids.
Differentially expressed ions in E1 versus E2 immature leaves were observed in MS 1 bins of m/z 209, 297, 223, and 230 (230 is also different in E1/E2 sheaths). MS 2 spectra for the m/z 297 MS 1 bin were observed in only two samples. The dominant product ions were of m/z 104,105, 237, and 238. This occurrence of pairs of fragment ions in the MS 2 spectrum suggested that the m/z 297 MS 1 bin comprised isotopologues of ions in the m/z 296 bin, and the m/z 297 MS 1 bin was highly correlated with the m/z 296 MS 1 bin with r 5 0.82. The m/z 296 MS 1 bin abundance is higher in endophytefree samples and its MS 2 spectrum showed a dominant m/z 104 ion as well as a clear m/z 59 loss. The MS 3 spectrum of the m/z 104 ion showed a major fragment of m/z 60. These fragmentations are all highly indicative of a choline group (http://metlin.scripps.edu). This appears to be a novel compound, as no plant or fungal compounds with a corresponding mass and a choline group have been reported. One possibility is a 5-hydroxyferulyl analog of sinapine. We also remain uncertain about the chemical identity of the major metabolites detected in the m/z 209 and 223 MS 1 bins, although MS 2 product ion spectra (data not shown) were obtained in this study.

Correlation Network Analysis of MS 1 Bins
Feature selection based on statistical ranking or machine learning algorithms is an important step in highthroughput data analysis. However, no golden rules exist for choosing a cutoff of P values or ranking scores and alternative approaches other than statistical ranking may be of use. Correlations among variables are sometimes considered as redundancy and often one feature (variable) is selected from a correlative group for further analysis (Zou and Hastie, 2005). However, correlations between MS 1 bins deriving from different metabolites may provide insights into the functional dependency of these metabolites.
With the aid of network analysis tools (Carey et al., 2005), correlation (or relevance) networks (Butte et al., 2000) among all the measured MS 1 bins in all the samples were investigated. The correlation networks constructed in this way should reveal when a group of MS 1 bins exhibit the same pattern of relative concentration (ion abundance) across samples. Based on the criterion of r . 0.9, many isolated correlation units were found to be composed of MS 1 bins of adjacent mass, which are likely to be due to natural isotopologues. A large highly connected subgraph (clique) in the network was identified with 12 nodes (Fig. 4). In this subgraph component, some correlations between these MS 1 bins are due to naturally occurring isotopes. The m/z 543, 544, 545, and 546 MS 1 bins belong to one group, the m/z 705 and 706 MS 1 bins to another, and m/z 867, 868, 869, and 870 MS 1 bins to a third group. That these subgroupings (highlighted by different gray scale in Fig. 4) are due to isotopologues was also supported by MS 2 spectral information, as within each group the MS 2 spectra for the higher mass MS 1 bins were similar to that from the lowest mass (monoisotopic) MS 1 bin but with additional isotopologous fragments. The core correlation is between the three most intense MS 1 bins of the lowest mass in each group, corresponding to the monoisotopic ions (m/z: 543, 705, and 867; see Table I).
These three MS 1 bins differ by a mass of 162, and this corresponds to the mass of a hexose (180) 2 H 2 O. The MS 2 and MS 3 data showed consecutive losses of 162 from the high mass ions (Supplemental Fig. S5a). Ions of these m/z ratios have been reported by Enot et al. (2006) for potassiated fructan oligomers detected in DIMS of genetically modified potatoes (Solanum tuberosum; [504K] 1 , [666K] 1 , and [828K] 1 ; potassiated trihexose, tetrahexose, and pentahexose, respectively). Perennial ryegrass is known to produce a range of fructan oligomers from degree of polymerization (DP) 3 to .8 (Pavis et al., 2001), and the identification of these MS 1 bins as deriving from fructan oligomers was supported by comparison of MS 2 and MS 3 spectra with those obtained by infusion and MS/MS analysis of aqueous KCl solutions of 1-kestose, 1,1-tetrakestose, and 1,1,1-pentakestose (Supplemental Fig. S5b).
We considered the possibility that the two other MS 1 bins m/z 779 and 941 in the correlation network might derive from glycerol adducts of the oligosaccharides, as they differed in mass from the m/z 705 and 867 species by 74 units. Glycosylglycerides have recently been reported by Yamamoto et al. (2006) as synthetic products of kojibiose phosphorylase from Thermoanaerobacter brocki. However, the MS 2 spectrum of the m/z 779 and 941 species in the plant extracts showed in each case only a neutral loss of 74, while synthetic glucosyl-, maltosyl-, and maltotriosylglycerol showed neutral losses of 92 and 162 on MS/MS analysis. Further, m/z 779 and 941 ions were also observed in the DIMS n profiles of 1,1-tetrakestose and 1,1,1-pentakestose, respectively, in aqueous KCl (above), and were accompanied by ions of lower intensity of m/z 781 and 943, respectively, in the DIMS n profiles of both the standard solutions and plant extracts. These higher mass ions fragmented with a sole neutral loss of 76. In each case the product ion underwent subsequent MS fragmentation as for the potassiated oligosaccharide. As neutral losses of 74 and 76 correspond to the two chlorine isotopologues of KCl, we conclude the m/z 779 and 941 species are KCl cluster adducts of the potassiated oligosaccharides of the m/z 705 and 867. Similar adduct ions were present for the potassiated trihexose, although they were not detected as part of the correlation network.
The levels of the oligohexoses were low in the blades and high in immature tissue and present at intermediate levels in sheath tissue (Fig. 5). When considering the endophyte effect, concentrations of the oligosaccharides were significantly higher in the endophyteinfected sheaths by a simple t test, with P values of 0.0082, 0.0048, and 0.0075 for the m/z 543, 705, and 867 MS 1 bins, respectively. However, there were no significant (P values .0.05) endophyte infection effects in immature leaf and blade.

Metabolite Identification and Measurement
The identification of metabolites from raw signals detected by mass spectrometers is a challenging task in metabolomics that still demands considerable effort (Schauer and Fernie, 2006). Two types of MS technologies are in general use for high-throughput metabolomics analyses. One type is high-accuracy MS using ICR-MS or time-of-flight MS (Dettmer et al., 2007). Highly accurate m/z measurements narrow down the search space by providing a short list of chemical  formulae (elemental compositions), although without additional isotope abundance data this list may remain extensive (Kind and Fiehn, 2006). The other type exploited in metabolomics is tandem MS (MS/MS or MS n ) using an ion trap that can provide fragmentation data on the initial MS 1 ions, although in practice this capacity is often foregone (Enot et al., 2006). As demonstrated here, these fragmentation data are useful for the elucidation and classification of chemical structures. Technologies combining these two features, e.g. ion trap with FT-ICR-MS, are also available to provide high mass resolution MS 1 profiles and information on fragmentation patterns. However, applying this combination to obtain high resolution data on both MS 1 and MS 2 ions for large numbers of samples is impractical due to the slow scan speeds required by the FT-ICR-MS. All of these technologies generate complex data sets and there are many steps in translating raw signals into chemical entities. With access to the raw data now readily available in standard formats such as mzXML (Pedrioli et al., 2004), novel or improved algorithms can be employed in many steps of data analysis, such as data preprocessing (including baseline detection and removal, peak detection, peak, or retention time alignment, etc.) and inference of the identity of components (e.g. Listgarten and Emili, 2005; for a recent review of LC-MS data processing for metabolomics and currently available software, see Katajamaa and Orešič, 2007).
We recently applied direct-infusion ESI MS using DIMS n to determine metabolic differences between endophyte-infected and endophyte-free ryegrass seed samples (Koulman et al., 2007b). In this study, we have extended this approach by developing tools to analyze the raw MS data from DIMS n and to compare MS 2 spectra derived from the same parent MS 1 bin from different samples. This has enabled us to handle the collected data appropriately. Rather than seeking alignment in the time domain as addressed by programs such as XCMS and MZmine, our software bins the data into unit m/z bins and finds the median intensity in each m/z bin over the course of the infusion. The challenges of mass alignment and binning of data collected at high mass resolution (e.g. Hansen and Smedsgaard, 2004) are also much less for data collected at unit m/z resolution. Developing methods for automating the handling of MS 2 data has enabled us to determine if the assignment of MS 1 signal variations to treatment effects on a specific metabolite is justified, as it has allowed us to distinguish whether these MS 1 signals were derived from the same metabolite(s) in all the samples or whether there were different metabolites in different samples detected within the same MS 1 bin. For chemical identification, an MS 2 product ion spectrum derived from an isotopically and chemically homogenous MS 1 bin is desirable. An MS 2 spectrum derived from an MS 1 bin comprising a mixture of isotopologue ions will show an anomalous pattern, as noted above for the m/z 297 MS 1 bin. Thus prior to investigating the MS 2 data, it is useful to screen candidate MS 1 bins for the existence of highly correlated MS 1 bins of adjacent mass and higher intensity that are likely candidates for the monoisotopic species. Correlation analysis can also reveal the presence of ESI adducts such as the [K 2 Cl] 1 cluster adducts reported here. The development of software to automate the discovery of isotopologues and adducts is an area of current active research (Tautenhahn et al., 2007). While facilities for comparison of MS n data (ion trees) and construction of libraries are available in commercial software (e.g. MassFrontier), further refinement and extension of the tool developed here to automate the construction of MS n libraries to facilitate metabolite identification would be a valuable tool for the analysis of DIMS n data.
Metabolites have diverse physical and chemical properties and a wide range of concentrations in a biological system, so any single analytical technique cannot detect all the metabolites of biological relevance. Using DIMS n , we have identified or classified a number of metabolites known to be present in endophyteinfected grass samples such as peramine and a sugar alcohol putatively annotated as mannitol (but we cannot exclude other sugar alcohols). Other well-known metabolites, e.g. the alkaloids lolitrem B and ergovaline, although of high biological significance were not detected in this experiment. Indolediterpenes like lolitrem B are lipophilic and not sufficiently extracted by the extraction solvents used in this study. Ergopeptides (like ergovaline) are usually present at very low concentrations in the symbiotum and their MS signals are within the noise range of DIMS n data. Therefore, the analysis of these classes of metabolites requires the deployment of dedicated approaches (Lehner et al., 2005; Spiering et al., 2005). In this study we only used positive ionization and only one type of extraction procedure. We believe that alternative extraction procedures and ionization methods would deliver additional information on other classes of metabolites.
Quantitation in infusion ESI MS is subject to signal suppression or enhancement in the source (see Dettmer et al., 2007), and ionization from an infused mixture is likely to be selective and dependent on the ability of a molecule to capture a charge in the source. Thus the detection of species such as the peramine and perloline cations and the doubly charged putative peptide ion is not unexpected, and other species less prone to forming cations are likely to have been underrepresented in the profile. Developments in nanoscale ESI may provide improved performance in this regard. The experiments described here were carried out with a standard capillary and flow rates of 5 mL/min. Nanospray technology utilizing very low flow rates (,20 nL/min) can reduce and perhaps eliminate analyte suppression (Schmidt et al., 2003) and may provide a less biased profile. An implementation in a microchip-mounted microfluidic device has shown promise for drug metabolite discovery (Trunzer et al., 2007) and may provide advantages for metabolomics.
The other factor confounding quantitation in ion trap DIMS n is the presence of multiple components within each 1 m/z bin. Thus, while the endophyte effect on the m/z 248 MS 1 bin can be attributed to peramine, the differences in intensity between tissue types (Fig.  1) appear to derive from the unknown plant components also detected in this bin. Concentrations of peramine in these samples estimated by HPLC with photo diode array detection (L. Johnson, unpublished data) were similar in the three tissue types as reported by Spiering et al. (2005).
Although we have clearly demonstrated the usefulness of fragmentation data for the classification and structural elucidation of metabolites, the method is of limited use for characterizing metabolites that do not show a fragmentation (e.g. m/z 230, 189). For such metabolites the MS 1 data provide a lead to further investigation using other methods. Indeed for all putative novel metabolites, additional data such as accurate mass MS, and targeted isolation and structure elucidation by, for example, NMR spectroscopy is necessary for their complete chemical characterization.

Biological Implications of Identified Metabolites
Several metabolites identified in this study have interesting implications for the metabolic regulation of the perennial ryegrass-endophyte symbiotum. As discussed in detail above, the hexitol (m/z 205) present in endophyte-infected plants only is probably mannitol. Mannitol appears to be a very common polyol in fungi (Lewis and Smith, 1967) and has been reported to accumulate in endophyte-infected tall fescue (Richardson et al., 1992) and perennial ryegrass plants (Harwood, 1954;Johnson et al., 2006;Rasmussen et al., 2008). Although mannitol has been implicated as an osmoprotectant in the resurrection plant Myrothamnus flabellifolia (Bianchi et al., 1993) and in transgenic Arabidopsis (Arabidopsis thaliana) expressing a celery (Apium graveolens) Man-6-P reductase (Sickler et al., 2007) as well as in Nicotiana tabacum expressing a mannitol-1-P dehydrogenase (Karakas et al., 1997), a study in tall fescue indicates that mannitol levels in endophyteinfected plants are not increased under drought stress (Richardson et al., 1992). A recent review (Solomon et al., 2007) also questions this role for mannitol as well as other claimed functions like fungal carbohydrate storage (Voegele et al., 2005) or NADPH regeneration (Hult and Gatenbeck, 1978;Hult et al., 1980). Solomon et al. (2007) conclude that the role and requirements for mannitol seem to differ depending on the species of fungus. In Aspergillus niger, mannitol is involved in conidial oxidative and high temperature stress protection (Ruijter et al., 2003), and in the wheat (Triticum aestivum) pathogen Stagonospora nodorum it is required for asexual sporulation (Solomon et al., 2006). Clearly, more studies are needed to understand the function of mannitol in endophyte-infected grasses.
Peramine (m/z 248) has been shown to be the likely agent to confer improved insect resistance to endophyteinfected plants affecting Argentine stem weevil and a range of other insects (Rowan and Gaynor, 1986;Latch, 1993;Rowan, 1993;Rowan and Latch, 1994). Recently, it was shown that peramine is produced by an endophyte-specific two-module nonribosomal peptide synthetase (perA) and that an Epichloë festucae mutant deleted for perA lacks detectable levels of peramine (Tanaka et al., 2005). It was also shown that plant material containing this mutant endophyte was as susceptible to Argentine stem weevil feeding as endophyte-free plants, demonstrating unambiguously that peramine confers resistance to this insect. Peramine is also the most abundant alkaloid produced by this endophyte in infected plants; its concentration is usually an order of magnitude higher than for the other endophyte-specific alkaloids (Spiering et al., 2005) and it is detectable in plant fluids from cut leaf and in guttation fluid (Koulman et al., 2007a).
Perloline (m/z 333), a diazaphenanthrene alkaloid, is produced by the grass plant and has been isolated from both ryegrass and tall fescue plants (Grimmett and Waters, 1943;Jeffreys, 1964;Bush and Jeffreys, 1975). Our results suggest that perloline concentrations are reduced in endophyte-infected mature blades and sheaths; the mechanism for this effect remains to be elucidated. Not much is known about the biosynthesis or function of perloline, although it has been implicated in effects on fall armyworm (Spodoptera frugiperda JE Smith) performance (Salminen et al., 2005) and to stimulate prolactin secretion in rats (Strickland et al., 1992). Earlier reports on the function of perloline, such as causing ryegrass staggers, are questionable due to the possible presence of endophytes in the studied material unknown at the time (Fairbourn, 1962;Aasen et al., 1969).
The putative oligopeptide (m/z 554.5) identified in this study accumulates exclusively in endophyteinfected tissues and is therefore most probably an endophyte-produced metabolite. Recently, a novel cyclic peptide, epichlicin, inhibiting spore germination of Cladosporidium phlei, a pathogenic fungus of timothy grass (Phleum pratense), was isolated from timothy grass infected with Epichloë typhina (Seto et al., 2007), a relative of N. lolii investigated in this study. Oligopeptides have been isolated from fungal endophytes previously, e.g. leucinostatin A, a phytotoxic, anticancer, and antifungal peptide (Arai et al., 1973), and from Acremonium sp., a fungus infecting Taxus baccata . This mycotoxin causes necrotic symptoms in nonhost plants, presumably because these plants, unlike T. baccata, are not able to transform it into the less toxic leucinostatin A-b-di-O-glucoside (Strobel and Hess, 1997;Tan and Zou, 2001). The antimicrobial cyclic echinocandin peptides have been isolated from endophytic Cryptosporiopsis sp. and Pezicula sp. in Pinus sylvestris and Fagus sylvatica (Noble et al., 1991) and the antifungal cyclopeptide cryptocandin from the endophytic Cryptosporiopsis compared with quercina of redwood (Strobel et al., 1999). We are currently isolating the oligopeptide identified in this study from endophyteinfected perennial ryegrass to elucidate its structure and to test its potential antimicrobial activity.

Correlation Network Analysis and Its Biological Implications
Correlation analysis of metabolites has been used previously to explore the functional dependency of metabolites, and it was shown that this type of analysis allowed, for example, the reconstruction of the metabolic pathway leading to the biosynthesis of glucosinolates in Arabidopsis (Keurentjes et al., 2006). It has been proposed that the construction of correlation networks based on metabolic fingerprinting might help to uncover underlying enzymatic reaction networks (Steuer et al., 2003), although other origins of correlations between metabolites within a physiological state have also been discussed (Camacho et al., 2005). However, in this study comparing different plant tissues and endophyte infection status, physiological differences are likely to be the dominant factor.
In this study we have identified three MS 1 bins representing monoisotopic ions of different metabolites that correlate significantly in our sample set. Mass fragmentation indicates that these metabolites are potassiated tri-, tetra-, and pentahexosides and their identification as fructans of DP 3, 4, and 5 was supported by the comparison with DIMS n of solutions of standards in aqueous KCl. Many cool-season C 3 grasses accumulate fructans (Suc derived Fru polymers) as storage carbohydrates in their vegetative tissue, especially in mature sheaths (Pollock and Cairns, 1991). The genera Lolium and Festuca accumulate appreciable amounts of low DP fructans belonging to the inulin series, inulin neoseries, and levan neoseries, which differ in the position of Glc (terminal or internal) and the linkage type of Fru residues (b2,1 or b2,6;Pollock, 1982;Pavis et al., 2001). It was also shown that the proportion of low DP fructans (DP , 6) was more prominent in bases of elongating leaves than in leaf sheaths, and that mature leaf blades accumulate predominantly 6G-kestotriose and 1-and 6G-kestotetraose. The limited MS n data obtained here do not provide direct information on linkage and branching type of these Fru-containing polymers, but differences in the relative intensity of product ions in the MS 2 and MS 3 spectra between extracts and standards (Supplemental Fig. S5, a and b) may reflect the mixed isomer composition of the plant fructans. Recent developments in the elucidation of carbohydrate structures by MS n without chemical derivatization (e.g. Fang and Bendiak, 2007) suggest more extensive MS n analysis may provide additional structural information. As was shown previously (Rasmussen et al., 2007(Rasmussen et al., , 2008, endophyte infection resulted in higher levels of some of the sugars, which might indicate that the increased sink strength in the infected tissue results in a higher turnover of high DP fructans with a concomitant increase in low DP oligosaccharides. Although the exact mechanism for this pattern of accumulation remains to be elucidated, it has been documented (for review, see Chalmers et al., 2005) that the base of youngest leaves and the sheath of the more mature leaves represent the organs where fructosyltransferase activities, fructan accumulation, and remobilization in perennial ryegrass is most active, and where several fructan metabolism genes are expressed.

CONCLUSION
Our results extend our current knowledge on the metabolites involved in the symbiosis of the fungus N. lolii and its host perennial ryegrass. Using unbiased metabolite profiling (DIMS n ) and advanced data-mining strategies, we have been able to uncover a number of unexpected perturbations of the metabolome upon endophyte infection. Based on the MS 1 spectra we have found several metabolites that were significantly different between endophyte-infected and endophytefree samples. New methods for automated processing of MS 2 data have proved useful in detecting whether ions in a unit m/z MS 1 bin represent a single major component across a sample set, or a heterogeneous mixture. With the aid of the MS 2 product ion spectra we could readily identify some MS 1 ions on the top of the list such as the known metabolites peramine and mannitol. The analysis has also revealed some new metabolites that are present in endophyte-infected plants, such as putative cyclic oligopeptides, and plant compounds present at reduced levels in infected plants such as a novel putative choline derivative. The identification of unknown MS 1 bins as being statistically significantly different in uninfected compared to infected tissues also provides justification for their further characterization using more targeted approaches.
Linear correlation network analysis revealed the effect of the endophyte on a range of oligosaccharides, giving us new clues on how the endophyte utilizes plant carbohydrates. The methodology has proved to be a powerful tool for discovering leads to novel chemistry associated with the symbiosis, and these demand further chemical and biological investigation.

Experimental Design and Sampling
Clonal perennial ryegrass plants (Lolium perenne 'Nui'), either infected with the fungal endophyte Neotyphodium lolii (strain Lp19) or endophyte free, were used in this study. Endophyte-free perennial ryegrass was obtained as described by Tanaka et al. (2005). A 2 3 3 factorial design was applied with endophyte-infected (E1) and endophyte-free (E2) plants, and three tissue types, namely, immature leaf, blade, and sheath. Four individual tillers, either all E1 or all E2 perennial ryegrass, were planted in pots, to give four replicate pots of E1 and E2 material. The three tissue types were dissected from each of the plants in each pot and pooled for analysis (see also Supplemental Table  S1 for sample description). Thus, 24 ryegrass samples in total were examined in this study. These plants were grown in a controlled environment chamber with 14-h daylength (653 mmol m 22 s 21 of light intensity), a temperature of 20°C day/10°C night, and supplied with a modified Hoagland nutrient solution. The tissue samples were harvested and immediately frozen in liquid nitrogen, and stored at 280°C for subsequent analysis.

Direct-Infusion MS
Plant tissue samples were ground using pestle and mortar in liquid nitrogen and stored at 280°C. Fifty milligrams of ground samples were extracted with 1.5 mL of MeOH. The extract was partitioned between water and dichloromethane. The aqueous phase was lyophilized and redissolved in 1.5 mL of MeOH. The infusion solvent (MeOH) was pumped at 20 mL min 21 flow to a T junction just in front of the ESI source where 5 mL min 21 MeOH with 2% formic acid was added. A 100-mL aliquot of each sample was injected using an autosampler. After 10 min a MeOH blank was injected and run at 200 mL min 21 flow rate for 3 min.
A linear ion trap mass spectrometer (Thermo LTQ) coupled to a Thermo Finnigan Surveyor HPLC system was used. Thermo Finnigan Xcalibur software (version 1.4) was used for data acquisition. The mass spectrometer was set for ESI in positive mode. Samples were infused through a polyimide-coated glass capillary (0.1 mm i.d., 0.19 mm o.d.) at a flow rate of 5 mL/min. The spray voltage was 5.0 kV and the capillary temperature 275°C. The ion optics were tuned using paxilline. The flow rates of sheath gas, auxiliary gas, and sweep gas were set (in arbitrary units/min) to 20, 5, and 12, respectively. For the first 0.9 min after injection only MS 1 spectra were recorded; for the period from 0.9 to 10 min the mass spectrometer was set up in data-dependent mode to collect one MS 1 spectrum, followed by the isolation (2 m/z) and fragmentation (35% CE; relative collision energy) of the most intense ion from the MS 1 spectrum, followed by the isolation (2 m/z) and fragmentation (35% CE) of the most intense ion from the MS 2 spectrum. A new MS 1 spectrum was then recorded, followed by the repetitive isolation (2 m/z) and fragmentation (35% CE) of the most intense ions from that MS 1 spectrum and the most intense MS 2 product ion. When an MS 1 ion with a specific mass had been isolated and fragmented for the second time, it was placed on an exclusion list for the duration of the run. In total up to approximately 200 MS 2 spectra were recorded in an average run.

Data Analysis
The raw data (Xcalibur raw file, in centroid mode) were converted into mzXML data format (Pedrioli et al., 2004) using software ReAdw available from http://sashimi.sourceforge.net/software_glossolalia.html. mzXML is a standard data format for tandem mass spectrometric data and relatively easy to manipulate.

MS 1 Data Analysis
All the MS 1 scans were retrieved from mzXML and the original data could also be checked manually using the Thermo proprietary software Xcalibur. Given the 1 m/z resolution of the machine, we binned the data to unit nominal m/z. Thus in each sample, for a given ion, for example, m/z 248, all the measured m/z values (e.g. 247.98, 248.12, 248.35) were rounded to an integer value of 248 (equivalent to binning with 1 m/z width), and the median (rather than the average) of the abundance of corresponding ions within a sample was taken as a robust statistical estimate to reduce potential rounding (or binning) artefacts. The resulting bins are described here as MS 1 bins. For each MS 1 bin, the first 300 scans (see Supplemental Fig. S6) were removed because they were background signals from solvent (noise). For each MS 1 bin, the median value of these first 300 scans was then used as a baseline value, and subtracted from the ion abundance values for subsequent scans. Any negative values (below the background noise) and weak signals (less than the 10% quantile) were removed. The median value of the ion abundance of all the remaining scans in each sample was used as the representative MS 1 bin abundance for that sample. The ion abundance values for MS 1 bins over the range m/z 150 to 1,000 were determined for each sample to generate an MS 1 data matrix of 24 3 851 for statistical analysis.
The abundance of each MS 1 bin was normalized against the median of the observations in all the samples, using log 2 [x(i)/median (x)], where vector x comprises the abundance measurements of each MS 1 bin, and x(i) is the abundance of each individual treatment with i from 1 to 24.
Empirical Bayes moderated t statistics (Smyth, 2004) were applied to identify MS 1 ions that were differentially expressed between E1 and E2 across the three developmental stages (immature leaf, blade, and sheath). The algorithms are implemented and available as R package Limma (Smyth, 2004). MS 1 data have been provided as Supplemental Data Set S1.

MS 2 Data Analysis
All the MS 2 data were retrieved by querying the mzXML data. MS 2 spectra derived from an MS 1 bin in one sample (e.g. 248.35,248.39;see Supplemental Fig. S7) were merged by rounding to nominal m/z MS 2 bins and assigning the median abundance value for each bin. For each sample, only the top 20 most abundant ions (MS 2 bins) derived from an MS 1 bin were retained for comparisons in this report. For a given parent MS 1 bin, all derived MS 2 spectra in each sample, if available, were pairwise compared based on a customized distance metric. The measurement of similarity of MS 2 spectra is complicated not only by variation in ion abundance, but also by the occurrence of different fragment ions in spectra from the same MS 1 bin in different samples and by carry over from adjacent MS 1 bins of high intensity (as for isotopologues) as the isolation width of the ion trap was set to a range of 2 m/z for efficient capture and fragmentation of ions. The procedure for the distance metric of two MS 2 spectra is as follows: (1) normalize each spectrum to its maximum abundance value and sort each spectrum based on its intensity, up to 20 most intense fragment ions are retained; (2) if the number of MS 2 bins is still different, low intensity ions in the longer spectrum are truncated; (3) for the matched nominal ions (MS 2 bins of same nominal m/z) occurring in both spectra, the Manhattan distance d 5 + k i j x i 2 y i j is calculated, where the two spectra are denoted as x 5 (x, ., x m ) and y 5 (y, ., y n ), and k is the number of shared MS 2 bins [with m, n # 20; k , min(m, n)]. For any unmatched MS 2 bins, the normalized intensity is added to the distance.
Step 3 provides a simplified way for m/z alignment. A similar idea was also employed by Zhang et al. (2005). The calculated pairwise distances for all samples were used for clustering analysis.
All the software functions for handling and analysis of MS 1 and MS 2 data, and correlation network analysis were written in R2.5 (R Development Core Team, 2007) based on a number of R packages.

Supplemental Data
The following materials are available in the online version of this article.
Supplemental Figure S1. PCA analysis of normalized MS 1 data.
Supplemental Figure S2. Abundance of MS 1 bins in treatment groups.  Figure S3. MS 2 spectra of parent MS 1 m/z 205 [182Na] 1 and mannitol standard.
Supplemental Figure S4. MS 2 and MS 3 spectra of parent m/z 555.
Supplemental Figure S5. MS 2 and MS 3 spectra of potassium adducts of oligosaccharide and oligokestose standards.
Supplemental Figure S6. Intensity plot of the nominal ion m/z 248 across all scans.
Supplemental Figure S7. Spectral processing of MS 2 using nominal MS 1 ion m/z 248.
Supplemental Table S1. Designation and description of experimental material.
Supplemental Data Set S1. MS 1 data matrix.