Deciphering genetic diversity and inheritance of tomato fruit weight and composition through a systems biology approach

Integrative systems biology proposes new approaches to decipher the variation of phenotypic traits. In an effort to link the genetic variation and the physiological and molecular bases of fruit composition, the proteome (424 protein spots), metabolome (26 compounds), enzymatic profile (26 enzymes), and phenotypes of eight tomato accessions, covering the genetic diversity of the species, and four of their F1 hybrids, were characterized at two fruit developmental stages (cell expansion and orange-red). The contents of metabolites varied among the genetic backgrounds, while enzyme profiles were less variable, particularly at the cell expansion stage. Frequent genotype by stage interactions suggested that the trends observed for one accession at a physiological level may change in another accession. In agreement with this, the inheritance modes varied between crosses and stages. Although additivity was predominant, 40% of the traits were non-additively inherited. Relationships among traits revealed associations between different levels of expression and provided information on several key proteins. Notably, the role of frucktokinase, invertase, and cysteine synthase in the variation of metabolites was highlighted. Several stress-related proteins also appeared related to fruit weight differences. These key proteins might be targets for improving metabolite contents of the fruit. This systems biology approach provides better understanding of networks controlling the genetic variation of tomato fruit composition. In addition, the wide data sets generated provide an ideal framework to develop innovative integrated hypothesis and will be highly valuable for the research community.


Introduction
Identifying the genes controlling the variation of complex traits is a key goal of evolutionary genetics and plant biology. Attempts to identify genetic variants underlying quantitative traits have been achieved by traditional linkage mapping and genome-wide association studies using molecular markers. However, resolution of quantitative trait loci (QTLs) is limited and the identification of the polymorphisms responsible for the variation is not straightforward. Furthermore, as several intermediate levels interact between the genotypes and the phenotypes, DNA sequence variation [single nucleotide polymorphisms (SNPs) or Indels] may not directly affect the traits. Intermediate molecular phenotypes such as gene expression, protein abundance, and metabolite concentration also vary in populations and are themselves quantitatively inherited (Rockman and Kruglyak, 2006). Nowadays, rapid technological advances in high-density experiments such as next-generation sequencing (NGS), RNA expression analysis through microarray or RNAseq, mass spectrometry (MS) coupled to gas chromatography (GC-MS) or to liquid chromatography (LC-MS), and nuclear magnetic resonance (NMR) metabolic profiling enable scientists to obtain large exhaustive data sets and analyse biological systems as a whole. Integration of the genome expression products at different levels should help in dissecting the genetic variation of a given quantitative trait.
Systems biology relates the variation analysed at different expression levels, from phenotype to metabolome and proteome, studying the behaviour of all the elements in a biological system (Gutierez et al., 2008;Saito and Matsuda, 2010). A bottom-up systems biology approach consists of integrating 'omic' resources (genomic, transcriptomic, proteomic, and metabolomic) and large physiological data sets, together with statistical network analysis in order to identify candidate genes underlying phenotypes and to construct complex regulation networks (Kliebenstein, 2010). This approach was first applied to yeast by combining DNA microarrays and quantitative proteomics to describe the galactose pathway (Ideker et al., 2001). It was then applied to gene expression analysis in Escherichia coli (Rosenfeld et al., 2002), and to Arabidopsis by Hirai et al. (2005) who elucidated gene to gene and metabolite to gene networks by integrating metabolomic and transcriptomic data. Systems biology has also been used to study the natural genetic variation at different levels, such as metabolomics (Keurentjes, 2009;Kliebenstein, 2009a), proteomics (Stylianou et al., 2008), and transcriptomics (Keurentjes et al., 2008;Kliebenstein, 2009b).
Tomato (Solanum lycopersicum) is the model species for the study of fleshy fruit development and composition (Giovannoni et al., 2004). It is a self-pollinated species and is derived from its closest wild ancestor Solanum pimpinellifolium (Nesbitt and Tanksley, 2002). Cherry tomato accessions (Solanum lycopersicum var. cerasiforme) have an intermediate position between these two species, as their genome is a mosaic of those from S. lycopersicum and S. pimpinellifolium (Ranc et al., 2008). During tomato domestication, the diversification of fruit aspect, as well as the adaptation to a wide range of environmental conditions was simultaneous with a strong reduction of molecular diversity (Miller and Tanksley, 1990;Blanca et al., 2012). This lack of genetic variation in cultivated species led geneticists to study trait variation mostly in distant crosses involving wild species, and thus limited the exploitation of intraspecific variation. Today the availability of the tomato genome sequence (Tomato Genome Consortium, 2012) and of a large number of SNP markers (Sim et al., 2012) allows a re-examination of the variation and inheritance of agronomical and fruit traits at the intraspecific level.
Systems biology approaches have been used in tomato to study fruit development. Carrari et al. (2006) and Mounet et al. (2009) analysed transcriptome and metabolome variation during fruit development. Garcia et al. (2009) combined phenotype, metabolome, transcriptome, and proteome profiles to study genes related to ascorbate metabolism in three transgenic lines. Wang et al. (2009) compared the transcriptome and metabolome to uncover the molecular events underlying fruit set, while Osorio et al. (2011) compared enzyme activity, metabolite and transcript profiles to analyse the connectivity between these groups of traits in fruit ripening mutants. However, these studies were only focused on a few mutants or on the effect of introgression in S. lycopersicum of wild species alleles. Little is known about the genetic variation in metabolic, enzymatic, and proteomic profiles contributing to phenotypic trait variation in the species.
In the present study, the aim was to decipher the complex relationships between several successive levels of omic profiles to characterize the genetic variation and physiological bases of quantitative traits in tomato fruit. For this purpose, the variation of eight genotypes representing a large range of phenotypic and genotypic diversity (four S. lycopersicum and four S. lycopersicum var. cerasiforme) and four of their corresponding hybrids was compared at two stages of fruit development [cell expansion (CE) and orange-red (OR)]. Their metabolic, enzymatic, and proteome profiles were characterized. Genetic variability was analysed for all traits, and inheritance patterns of traits that were significantly different among genotypes were assessed. Relationships among traits were analysed within and between each group of traits at each stage, and networks were constructed using sparse partial least square (sPLS) regression. This systems biology approach combining proteome, metabolome, and phenotypic analysis gave insights into the diversity and relationships of quantitative traits at different levels.
Plants were grown during spring 2010 under greenhouse conditions (16/20 °C) in Avignon (South of France). Plants were separated into two blocks; five plants per genotype were included in each block.
For proteome, metabolome, and enzymatic measurement, two stages of development, CE and OR, were selected. The CE stage was chosen as a representative stage of the growing tomato fruit; the OR stage was chosen because it is unequivocally determined and is the key step where enzyme and protein concentrations are changing and will determine the final characteristics of the fruit. The number of days after anthesis to reach cell expansion varied among genotypes depending on their fruit size. Thus CE sampling was done at 14, 20, or 25 d after anthesis for small (Cervil), medium (Criollo, Plovdiv 24A, Stupicke Polni Rane, and the four F 1 hybrids), or large (LA0147, Levovil, and Ferum) fruited accessions, respectively. OR sampling was done based on fruit colour change. Three biological replicates by stage were analysed. Each replicate included 7-20 fruits from both greenhouse blocks to buffer environmental variations. Fruit pericarps were collected, immediately frozen, ground in liquid nitrogen, and stored at -80 °C until analysis. For fruit phenotypic trait measurements, five fruits were harvested from the 10 plants of each genotype at the following six stages: (i) CE stage; (ii) CE+7 d; (iii) CE+14 d; (iv) CE+21 d; (v) OR stage; and (vi) red ripe. Fruits were evaluated for fresh weight (FW), fruit diameter (FD; measured using a caliper), and dry matter content (DMC). DMC (expressed in g 100 g -1 FW) was assessed after 5 d in a ventilated oven at 80 °C.

Metabolome and enzyme activity analysis
Metabolome analyses were performed at the Metabolome Facility of Bordeaux, using quantitative proton NMR ( 1 H-NMR) profiling of polar extracts and liquid chromatography quadrupole timeof-flight tandem mass spectrometry (LC-QTOF-MS) profiling of semi-polar extracts. 1 H-NMR profiling was performed as described in Deborde et al. (2009) with minor modifications. Briefly, polar metabolites were extracted on lyophilized powder (50 mg DW per biological replicate) with an ethanol-water series at 80 °C. The lyophilized extracts were titrated to pH 6 and lyophilized again. Each dried titrated extract was solubilized in 0.5 ml of D 2 O with (trimethylsilyl) propionic-2,2,3,3-d 4 acid (TSP) sodium salt (0.01% final concentration) for chemical shift calibration and EDTA (5 mM final concentration for the CE and 2 mM for the OR stage). 1 H-NMR spectra were recorded at 500.162 MHz on a Bruker Avance III spectrometer (Bruker, Karlsruhe, Germany) using an ATMA inverse 5 mm probe flushed with nitrogen gas and an electronic reference for quantification (ERETIC2). Sixty-four scans of 32 000 data points each were acquired with a 90 ° pulse angle, a 6000 Hz spectral width, a 2.73 s acquisition time, and a 25 s recycle delay. Two technological replicates were used per biological replicate. Preliminary data processing was conducted with TOPSPIN 3.0 software (Bruker Biospin, Wissembourg, France). The assignments of metabolites in the 1 H-NMR spectra were made by comparing the proton chemical shifts with values of the MeRy-B metabolomic database (Ferry-Dumazet et al., 2011), by comparison with spectra of authentic compounds recorded under the same solvent conditions, and/or by spiking the samples. The metabolite concentrations were calculated using AMIX (version 3.9.7, Bruker) software. The 144 1 H-NMR spectra of the data set were converted into JCAMP-DX format and deposited with associated metadata into the Metabolomics Repository of Bordeaux MeRy-B (Ferry-Dumazet et al., 2011, http://www.cbib.u-bordeaux2.fr/MERYB/view/project/34, accessed September 2013. LC-QTOF-MS profiling of aqueous methanol-0.1% formic acid extracts was performed from lyophilized powder (20 mg in 1 ml). For each biological replicate, two extractions were performed and two injections per extract were used. An Ultimate 3000 HPLC (Dionex, Sunnyvale, CA, USA) was used to separate metabolites on a reversed phase C18 column (150 × 2.0 mm, 3 µm; Phenomenex, Torrance, CA, USA) using a 30 min linear gradient from 3% to 95% acetonitrile in water acidified with 0.1% formic acid. Metabolites were detected using a QTOF mass spectrometer (Bruker). Electrospray ionization in positive mode was used to ionize the compounds. Scan rate for ions at m/z range 100-1500 was fixed at two spectra per second. Methyl vanillate was spiked in the extraction solvent and used as an internal standard.
One sample was used as a quality control sample and injected after every 10 injections. Raw data were processed in a targeted manner using QuantAnalysis 2.0 software (Bruker). This resulted in eight compounds identified based on accurate mass measurement and comparison with data from Gomez-Romero et al. (2010).
Ascorbic acid was measured using a spectrofluorometric method and values were expressed as total ascorbate (ascorbic acid+dehydroascorbate) as previously described by Stevens et al. (2007). The maximum activity (V max ) of 26 enzymes of primary metabolism was assayed using a robotized platform as described in Gibon et al. (2004) and in Steinhauser et al. (2010). Supplementary Tables S3 and S4 at JXB online present the lists of the primary and secondary metabolites and the enzyme activities analysed.

Proteome analysis
Methods for protein extraction, two-dimensional gel electrophoresis (2-DE), and protein identification and classification were as detailed in Xu et al. (2013a). Briefly, proteins were extracted using the phenol extraction method developed by Faurobert et al. (2007). Later, proteins were separated by 2-DE. After Coomassie colloidal staining, image analysis was performed with Samespot software (version 4.1), and the normalized spot volumes were obtained. Protein identification of 424 variable spots was performed at the proteome platform of Le Moulon (Gif-sur-Yvette) using a nano-LC-MS/MS method following the procedure described in Xu et al. (2013a). The database search was run against the International Tomato Annotation Group (ITAG) Release 2.3 of predicted proteins (SL2.40) database (http:// solgenomics.net/, accessed September 2013) with X!Tandem software (http://www.thegpm.org/TANDEM/, version 2010.12.01.1). The Fasta sequence of the identified proteins was employed to reannotate the proteins using the Blast2GO package (Conesa et al., 2005). Sequences were compared against the NCBI-nr (version April 9, 2012) database of non-redundant protein sequence using BLASTX with the default settings.

Statistical analysis and inheritance analysis
Metabolite contents and enzyme activities were expressed on a dry weight basis to be comparable. All the analyses were performed using R (R Development Core Team, 2012, http://www.R-project. org/, accessed September 2013).
Data were submitted to a two-way analysis of variance (ANOVA; P < 0.05) with genotype, stage, and interaction effect, and then to one-way ANOVA with genotype effect at each stage. In addition, to assess the mode of inheritance of the traits, one-way ANOVA was also performed with genotype effect for each cross (two parental lines and their hybrid) and stage.
Means and standard deviations were calculated for each trait (phenotypic traits, metabolite contents, enzyme activities, and protein spot volumes) in each genotype and stage. Significantly different (P < 0.05) traits in each cross were selected at each stage to estimate additive (A) and dominance (D) components of genetic variation. A is equivalent to half of the difference between two parental lines. The S. lycopersicum line was systematically the first parent in a cross. D is the difference between the hybrid value and the parental mean. The inheritance pattern of each trait was then assessed by the dominance/additivity (D/A) ratio and classified as over-recessive (OR; Means of metabolite contents, enzyme activities, and protein spot volumes were centred and scaled to variance unit and used for the rest of the analysis. Principal component analyses (PCAs) were performed for metabolites, enzymes, and phenotypic traits, as well as for protein spot volumes for both development stages and at each stage, with the 'pcaMethods' package (Stacklies et al., 2007). Pearson correlations and P-values were calculated between significantly variable traits at each stage. Correlations were considered to be significant when |r| >0.7 (P-value <0.01). Significant correlations were plotted using the R 'corrplot' package (Wei, 2012, http://CRAN.R-project. org/package=corrplot, accessed September 2013). To analyse the relationships among protein spot volumes and other traits, networks were reconstructed and visualized using sPLS correlation regression analysis with the 'mixOmics' package (Lé Cao et al., 2009). An arbitrary threshold of 0.7 was employed for network reconstruction. Nodes represent the different traits, and edges represent the relationships between variables belonging to different levels.

Results
To represent a large range of the genetic diversity, eight tomato accessions were chosen according to previous studies (Ranc et al., 2008;Xu et al., 2013b). The eight accessions and their four hybrids were characterized at phenotypic, metabolic, and proteomic levels. The final fruit weight of the eight parental lines and the four hybrids ranged from 5.3 g to 134.4 g. Fruit weights of the four hybrids were intermediate between the values of their parental lines throughout fruit development (Supplementary Fig. S2 at JXB online). Fruit diameter ( Fig. 1) was highly correlated to fruit weight, as fruits were round. DMC also showed a wide range of variation ( Fig. 1; Supplementary Table S5).

Metabolome, enzyme, and proteome profiles strongly differ among accessions
Metabolome profiling by 1 H-NMR and LC-QTOF-MS allowed the quantification of 18 metabolites from central carbon metabolism and eight secondary metabolites (Supplementary Table S3 at JXB online). In addition, 26 enzyme activities were assessed by robotized assays (Supplementary Table S4). These analyses provided a detailed characterization of sugar, organic acid, and amino acid metabolism pathways, as well as those of glycoalkaloids and phenolic compounds (Fig. 2). Proteins were isolated following 2-DE. A total of 1230 protein spots were detected. A subset of 424 spots whose abundance was significantly different between genotypes or stages were sequenced by LC-MS/MS. A total of 422 spots were identified (Xu et al., 2013a). Supplementary Table S5 lists the mean and standard deviation of every trait for each genotype and stage.
The 12 accessions differed for most of the metabolites and phenotypic traits according to the ANOVAs (Table 1). The means of most of the traits (27/29) were significantly different across stages. The content of glucose, fructose, citrate, asparagine, aspartate,and phenylalanine increased from CE to OR, while the other amino acids decreased. The interactions between stage and genotype were significant for 93% of 29 metabolite and phenotypic traits, and 50% of these traits showed a different trend according to the genotype. The data were thus analysed stage by stage (Table 1). A large range of variability was observed among the 12 genotypes at each stage as all the trait means were significantly different, except for that of crypto-chlorogenic acid. The fold change difference between genotypes reached values as high as 5.6 for threonine content at CE or 7.9 for malate at OR.
The activity of 26 enzymes from central carbon metabolism, including enzymes of the Calvin cycle, glycolysis, sucrose metabolism, the tricarboxylic acid (TCA) cycle, and amino acid metabolism, was quantified and expressed relative to dry weight to be comparable with the metabolome and proteome. Enzyme activities exhibited a lower range of variation than  Table 2). The greatest differences were found between stages, where all the enzyme activities difered except that of alanine aminotransferase, fumarase, and glyceraldehyde-3-phosphate dehydrogenase (NADP). The activity of 15 enzymes was greater at CE. Two and 13 enzyme activities were significantly different among accessions at CE and OR, respectively.
The volume of the 424 protein spots was compared among the 12 genotypes. The genes corresponding to most of these spots are identified (Xu et al., 2013a;Supplementary Table  S6 at JXB online). They include 133 protein spots related to primary metabolism. Several multispot proteins (one gene corresponding to several spots) were detected, such as acid invertase (seven spots), phosphoglucomutase (five spots), and enolase (five spots). These multispots may be caused by post-transcriptional and post-translational modifications or by allelic variations (Xu et al., 2013a). A large range of variability was observed among genotypes and between stages for all the protein spot amounts (Supplementary Table S6). As for metabolites and enzymes, the main differences were observed between stages (84% significantly variable spots; Supplementary Table S6), with 46% in a lower amount and 38% in a higher amount at OR. When the data were analysed stage by stage, 256/424 spot amounts were significantly different among genotypes at CE and 274/424 at OR.
The variation among the 12 accessions at the different levels was illustrated by PCA. When the phenotypic traits, metabolite, and enzyme profiles were analysed at both stages, two main groups corresponding to each stage of development were detected (Supplementary Fig. S3A at JXB online). Similar results were obtained for the protein spot volumes ( Supplementary Fig. S3B). PCAs were thus computed stage by stage (Fig. 3). In every case, Cervil (the accession with the smallest fruits) was separated from the other genotypes, and the large fruited accessions (Levovil, LA0147, and Ferum) were grouped together. Hybrids were usually located in between their parental lines.

Inheritance of traits is predominantly additive
The four F 1 hybrids were derived from crosses among the eight lines and corresponded to different distances among parental lines. The mode of inheritance of the traits that were significantly different for each cross is assessed separately ( Fig. 4; Supplementary Table S7 at JXB online).
The phenotypic traits fruit diameter and fruit weight were additive in the four crosses at each stage. The DMC was additive, over-recessive, or not significant according to the stage and cross (Fig. 1). Most of the metabolic contents were significantly variable at both stages. A large number of additive traits was found in the cross between the most distant lines (Levovil×Cervil) (Fig. 4A). Traits could exhibit different inheritance modes at the two stages for the same cross or in different crosses, as illustrated for the citrate content in Fig. 5.
Most of the enzyme activities were not significantly variable within one cross, so the inheritance mode of only a few enzyme activities was assessed. At CE, Ferum×LA1420 was the most variable cross, with four significantly variable enzymes, while for the other crosses, only one or no enzyme was variable. At OR, the predominant inheritance mode was additivity for Levovil×Cervil and Stupicke Polni Rane×Criollo, but not for the two other crosses.
The number of significantly variable protein spots varied among crosses at CE in relation to the genetic distance between the parental lines (Fig. 4B). As for the metabolites, proteins showed different inheritance patterns at the two stages in the same cross or in different crosses. On average, 40% of the variable traits showed a non-additive mode of inheritance without bias against recessivity or dominance.

Dissection of relationships among traits
Relationships among traits were only assessed among the traits which were significantly different between genotypes at each stage (Supplementary Tables S8, S9, S10 at JXB online). The significant correlations (summarized in Supplementary  Table S8) were more frequent than expected by chance, with an excess of positive correlations. Correlations among metabolites, phenotypic traits, and enzymes activities are illustrated in Fig. 6. At CE, sugars (glucose and fructose) were highly correlated with each other, and negatively with most amino acids, tomatine, and DMC. Amino acid contents were highly correlated with each other (Fig. 6A). Very few correlations were detected between metabolites and phenotypic traits at OR (Fig. 6B). For enzyme activities, correlations were significant between the glycolysis and TCA cycle enzymes at OR (Fig. 6B). Very few correlations were significant between  Supplementary Table S8 at JXB online and provided in Supplementary Tables S9 and S10. Correlations between protein spots corresponding to enzymes and their enzyme activities were analysed at OR. In total, 28 spots corresponding to eight enzymes were analysed (Supplementary Table S11). Significant correlations, ranging from r=0.877 to 0.715, were detected between enolase activity and the four spots annotated as two enolase genes (Solyc09g009020 and Solyc10g085550), aldolase activity, and one aldolase gene (Solyc09g009260), and acid invertase and two spots corresponding to the Solyc03g083910 acid invertase gene. For these enzymes, even when correlations were not significant (P <0.01), they were often positive, with a P-value <0.05 ( Supplementary Fig. S4). The other enzymes analysed, pyruvate kinase, glyceraldehyde-3-phosphate dehydrogenase (NAD), isocitrate dehydrogenase, malic enzyme (NADP), and malate dehydrogenase, were not significantly correlated with their corresponding spot volumes. Table 3 lists the protein spots whose volume was strongly correlated with fruit weight or DMC. The numbers of spots were equivalent at both stages, with seven of the 15 spots related to stress response (heat shock proteins, NifU like protein, and chaperonin). A larger proportion (13/32) of correlations was detected between DMC and spots related to primary metabolism (fructokinase, malate dehydrogenase, acid invertase, and enolase). Most of the correlations with DMC were positive, in contrast to those with fruit weight.

Reconstruction of networks integrating metabolic and protein profiles
Due to the large number of traits and correlations, sPLS regression was used for integrating protein expression data and metabolites, enzymes, and phenotypes. sPLS regression is a bidirectional multivariate regression method that allows separate modelling of covariance between two data sets. The main advantage of sparse methods over non-sparse methods is that it sets the contribution of noise variables to zero to improve the prediction or classification performance (Filzmoser et al., 2012). sPLS networks relating protein spot volumes to phenotypes, metabolites, and enzymes were constructed at each stage. The three levels (phenotypes, metabolites, and enzymes) considering they represent a global metabolic-related level, were grouped as being related to the proteome level. At CE, a network was constructed between metabolites, phenotypic traits, and enzyme activities (variable among genotypes) on one hand, and 77 variable protein spots related to primary and secondary metabolism and vitamin synthesis on the other hand. The network reconstructed connected eight traits and 26 proteins by >50 edges (Supplementary Fig. S5, Table S12 at JXB online), among which two fructokinase spots were connected to glucose, fructose, and DMC.
A network connecting variable metabolites, phenotypic traits, and enzyme activities with 87 protein spots (related to primary and secondary metabolism and vitamin synthesis) was also constructed for the OR stage. Two main networks were obtained, with more connections than for the CE stage ( Supplementary Fig, S6, Table S13 at JXB online). Sucrose and DMC played a pivotal role. They were linked to 11 and 17 proteins, including spots corresponding to acid invertase, enolase, malate dehydrogenase, and malic enzyme As the variation of proteins expressed during CE may influence the metabolome and activome at a later stage, the connections between protein variations at CE and fruit composition and enzyme activities at OR were analysed ( Fig. 7; Supplementary Table S14 at JXB online). A relationship was detected between sucrose content at OR and the volumes of two spots corresponding to fructokinase at CE. In addition, those spots were also related to the fructose content at CE (Supplementary Fig. S5). An isocitrate dehydrogenase spot was related to isocitrate activity at OR, and phosphoglycerate kinase from the Calvin cycle to shikimate dehydrogenase activity, an enzyme downstream of the erythrose-4P produced in that cycle. The amounts of cysteine sysnthase and fructokinase 3 proteins had a pivotal role, each being connected to several traits.
Networks were also constructed between phenotypes, metabolites, and enzyme activities and the proteins corresponding to other functions (data not shown). The most interesting relationship involved a chaperonin (JX383) that played an important role at CE, as it was related to six enzymes activities, and to glucose and fructose content at OR.

Discussion
The variation of tomato fruit composition has been widely studied, due to its role in sensory and nutritional value. However, until now the variation of metabolic compounds has been studied in tomato either during fruit development or according to environmental perturbations, mainly in one accession or in lines resulting from the introgression of genome fragments from a unique wild species (Schauer et al., 2006;Steinhauser et al., 2010). The results are subsequently supposed to represent the variation of the species. In the present study, the aim was to analyse the actual variation of the species, by comparing eight accessions selected to represent a large part of the phenotypic and molecular diversity of S. lycopersicum (Ranc et al., 2008). The variation, the inheritance, and the relationships among metabolic, enzymatic, and proteomic traits assessed at two developmental stages were described.

A large range of genetic variation is detected at all levels
A large range of variation was observed for most of the phenotypic and metabolic traits at least at one stage. Usually, the ratio of maximum to minimum values among genotypes varied in the range of 2-3, showing that there was wide variation present in the species. The secondary metabolites showed a higher range of variation, some of them being present in one line and almost absent in another. This may be due to the inclusion in the study of S. l. var cerasiforme accessions which are not fully domesticated, as domestication caused great alterations in those compounds (reviewed by Meyer et al., 2012). Among enzyme activities, significant differences were detected between stages, and the genetic variation was less significant at CE. Steinhauser et al. (2011) observed the same tendency, with enzyme activities having a lower heritability than metabolites, suggesting that metabolites have a tight regulation, while enzyme activities can be compensated by coordinated changes in other enzymes.
To date, proteome variation in tomato fruit and inheritance of protein amount were poorly documented (reviewed by Faurobert et al., 2013). Faurobert et al. (2007) described the proteome variation of tomato pericarp in one line during fruit development. They could identify the function of 90 spots. Previously 424 protein spots that were variable among stages or genotypes were studied (Xu et al., 2013a). Thanks to the release of the tomato genome sequence, the function of almost every spot was identified and 307 unique proteins corresponding to 424 spots were detected. Most of the spots which were variable at both stages showed the same tendency (an increase or decrease during fruit development) in all the genotypes. Nevertheless, 57% of the spots revealed significant genotype by stage interactions, indicating that the trend observed in one genotype at a given physiological level (stage) may change in another genotype.
The observed variation may be related to the genetic distance among accessions. In the PCA, all the large-fruited lines closely related at the molecular level were grouped together, while the small cherry tomatoes, genetically more diverse, were more spread out. In addition, Cervil, the line most distant from all the others at the molecular level, presented a very specific profile for every trait, leading to most of the extreme values (lowest fruit weight, and highest dry matter, sugar, and acid contents). It was also specific in terms of secondary metabolites, with a high content of chlorogenic acid, dehydrotomatin, and rutin. The large variation detected and the differences between genotypes during fruit development showed the important effect of genetic diversity in fruit composition and enhances the value of the presented data set.

Diversity in the modes of inheritance among crosses and traits
Hybrids are widely used in modern agriculture, either for heterosis (the advantage of a hybrid compared with both parents) or for the combination of dominant traits. Agronomical traits often show heterosis in the F 1 generation when distant cultivated accessions or cultivated and wild species are crossed (Springer and Stupar, 2007;Li et al., 2008). The molecular origin of heterosis has been studied for years and is usually related to a combination of dominance or overdominance effects and to epistatic interactions (Stuber, 2010).
In tomato, fewer traits than in maize, a highly heterotic crop, show a systematic heterosis trend (Lippman and Zamir, 2007). Steinhauser et al. (2011) studied the enzyme activities in introgression lines derived from the wild species S. pennellii and found an approximately equivalent ratio of QTLs showing additive, recessive, and dominant modes of inheritance, with only 5% showing overdominance. In the present study, the number of traits which were significantly variable within each cross (one hybrid and its two parents) differed from one cross to the other and was related to the genetic distance at the proteomic level. Regarding phenotypic traits, in accordance with the absence of heterosis, fruit weight and diameter were additive in the four crosses. Around 60% of the other traits showed an additive inheritance, with a number of traits exhibiting an overdominant or over-recessive mode of inheritance, but no specific trend towards either one of them. The higher rate of additivity in this study compared with previous studies involving S. pennellii introgression lines (Schauer et al., 2008;Steinhauser et al., 2011) may result from the smaller distance between the parental lines, which are all from the same species.
The inheritance mode in one cross was not systematically the same in another cross and appeared relatively independent from one stage to the other, as a consequence of the complex genetic control of the traits studied. Enzyme activities, for example, are suggested to be controlled by a network of trans-acting genes (Steinhauser et al., 2011); thus, dealing with different genetic backgrounds that carry different combinations of haplotypes may lead to different inheritance modes.

A systems approach revealed complex connectivity among the different levels analysed
The genetic variation was dissected at several levels, from phenotype to metabolite and proteome profiles, in eight unrelated tomato accessions and four F 1 hybrids at two developmental stages. In such an experimental design, a significant correlation may reveal the effect of a polymorphic gene acting on two related traits, but also fortuitous association, as a correlation between two traits may not be due to a causal relationship but to linkage disequilibrium between genes controlling the variation of both traits. Nevertheless, Osorio et al. (2011) in a similar approach described co-varying genes or proteins as 'guilty by association', as the closer the functions implicated the more meaningful were the relationships. In addition, the two stages studied in the present study correspond to very distinct physiological processes (Gillaspy et al., 1993;Giovannoni, 2004), increasing the complexity but also the impact of the study.
At every expression level, more significant correlations than expected by chance were detected. The bias towards positive correlations among enzyme activities and metabolites suggested a coordinated regulation of these traits. Some of the detected relationships were already established in previous studies, such as for instance the coordinated variation between several amino acids and sugars (fructose and glucose) at CE (Prudent et al., 2010).
Different studies have tried to uncover the relationships between different levels of traits (phenotypes, metabolites, enzymes, and transcripts) in tomato (Carrari et al., 2006;Schauer et al., 2006). As the enzyme activities assessed correspond to V max , and thus mainly reflect the corresponding protein amount, one might hypothesize that proteins and enzyme activities should be correlated. Correlations were only found between three out of eight enzymes and the protein spot amount corresponding to the same function at  Table 3. Spots highly correlated with fruit weight (a) and dry matter content (b) in eight tomato accessions and four F 1 hybrids Pearson correlations are indicated with spot volumes assessed at the cell expression (rCE) or orange-red (rOR) stage. Range of variation of the spots at the same stage (min and max) in the 12 genotypes. Only highly significantly correlated spots (P-value <0.001 and |r| >0.78) are indicated.

Spot
Gene Annotation Process  (Gibon et al., 2006;Morcuende et al., 2007;Steinhauser et al., 2010). This may be due to the fact that enzyme activities result from a combination of several proteins (subunits) or that most of the primary metabolism enzymes belong to multigene families. In addition, protein spots may also be the product of complex posttranslational modifications, where only one of the forms will be the functional one (Faurobert et al., 2007). One problem when dealing with omic data is that the number of traits is much larger than the number of samples. Sparse methods were developed for dealing with high-dimensional data. Such a method sets the contribution of noise variables to zero and thus improves the prediction of correlations or classification performance (Filzmoser et al., 2012). The networks reconstructed with sPLS methods showed complex patterns of connectivity, relating several nodes together and different pathways or metabolisms. In each network, a few hubs could be identified relating many different compounds or proteins.
At CE, several correlations with DMC and metabolite contents involved two of the protein spots coding for fructokinase, an enzyme participating in sugar phosphorylation. Fructokinase plays a role in sugar import and in starch biosynthesis (Dai et al., 2002). Several isoforms Fig. 7. Network reconstruction based on sPLS between protein spot volumes at the cell expansion stage (circular nodes) and metabolite contents, phenotypes, and enzyme activities at the orange-red stage (square nodes). Positive and negative relations are shown bu solid and dotted lines, respectively. Annotation of spots is detailed in Supplementary Table S14 at JXB online. (This figure is available in colour at JXB online.) were detected, being correlated with the variation of sugars.
The network constructed at OR revealed the key role of invertase in sucrose breakdown, as already documented (Faurobert et al., 2007). Two spots corresponding to this function were strongly related to the content in sucrose and DMC. In addition, they were also correlated with several enzyme activities and with fruit weight. Schauer et al. (2006) have also detected an association between phenotypic traits and metabolic compounds in tomato.
Previous studies in Arabidopsis suggested that the relationships between transcript modifications and enzyme activities showed greater agreement in the long term (Osuna et al., 2007). The relationships between protein amounts at CE and the other traits at OR were thus analysed. Relationships between isocitrate dehydrogenase enzyme activity and the protein spots corresponding to this protein were then detected. In addition, the interconnections suggested a possible role as a regulator for cysteine synthase whose amount at CE is correlated with several enzymes, glucose, and fruit diameter at OR. This enzyme has been suggested to play a key role in highly metabolically active cells (Wang et al., 2003). Those facts should be taken into account when trying to modify gene expression or protein contents in order to alter metabolite contents. According to the present findings, this approach will only work for a small subset of metabolites, so researchers should focus on proteins such as fructokinase or cysteine synthase that affect several metabolites, instead of just the enzymes that regulate the direct synthesis of a target compound.
The present analysis provided a detailed characterization of fruit metabolism, at several levels in a set of accessions representing a wide range of genetic variation, and led to interesting conclusions. First, the contents of primary and secondary metabolites are quite variable depending on the genetic background, while enzyme activities seem to be less variable, particularly at CE. In addition, significant genotype by stage interactions showed that the trends observed in one genotype at a physiological level may change in another genotype. In agreement with this, the inheritance modes varied between crosses and stages, showing the multigenic nature of the traits studied, although additivity was predominant. The network reconstruction revealed associations between different levels of expression and provided information on several key proteins that might be targets for improving metabolite contents. This study is the starting point of a broad experiment including the development of a multiallelic population derived from the eight parental lines. QTLs for fruit composition will be mapped in the population and will be related to the variations observed at various levels in the parental lines.

Supplementary data
Supplementary data are available at JXB online. Figure S1. Fruits of the eight tomato lines and four F 1 hybrids studied. Figure S2. Fruit fresh weight variation during fruit development in the eight lines and four F 1 hybrids. Figure S3. First plan of the principal component analysis showing the variation of 12 genotypes at two stages. Figure S4. Correlations between the volumes of protein spots corresponding to acid invertase and the enzyme activity. Figure S5. Network reconstruction based on sPLS between protein spot volumes and metabolites, phenotypes, and enzyme activities at cell expansion. Figure S6. Network reconstruction based on sPLS between protein spot volumes and metabolites, phenotypes, and enzyme activities at the orange-red stage. Table S1. Accession origins. Table S2. Polymorphism rate between the eight parental lines and Heinz1706 (the tomato reference genome) detected with 139 SNPs (from Xu et al., 2013b). Table S3. List of the 19 primary metabolites and eight secondary metabolites analysed. Table S4. List of the enzyme activities analysed. Table S5. Means and standard deviations for all the traits studied. Table S6. Annotation and analysis of variation of protein spots Table S7. Analysis of variation cross by cross at each stage. Table S8. Overview of the number of significant correlations within and among different levels of analysis. Table S9. Correlations at the cell expansion stage. Table S10. Correlations at the orange-red stage. Table S11. Correlations between enzyme activities and the spots coding for those enzymes at the orange-red stage. Table S12. Annotation of spots included in the cell expansion stage network ( Supplementary Fig. S5) Table S13. Annotation of spots included in the orange-red stage network ( Supplementary Fig. S6) Table S14. Annotation of spots included in the cell expansion and orange-red network (Fig. 7)