Quantitative Proteomics of Seed Filling in Castor: Comparison with Soybean and Rapeseed Reveals Differences between Photosynthetic and Non-photosynthetic Seed

Seed maturation or seed filling is a phase of development that plays a major role in the 2 storage reserve composition of a seed. In many plant seed photosynthesis plays a 3 major role in this process, although oilseeds such as castor ( Ricinus communis ) are 4 capable of accumulating oil without the benefit of photophosphorylation to augment 5 energy demands. To characterize seed filling in castor a systematic quantitative 6 proteomics study was performed. Two-dimensional gel electrophoresis was used to 7 resolve and quantify Cy-dye labeled proteins expressed at 2, 3, 4, 5, and 6 weeks after 8 flowering in biological triplicate. Expression profiles for 660 protein spot groups were 9 established, and of these, 522 proteins were confidently identified by liquid 10 chromatography-tandem mass spectrometry by mining against the castor genome. 11 Identified proteins were classified according to function; and, the most abundant groups 12 of proteins were involved in protein destination and storage (34%), energy (19%) and 13 metabolism (15%). Carbon assimilatory pathways in castor were compared with 14 previous studies of photosynthetic oilseeds, soybean and rapeseed. These 15 comparisons revealed differences in abundance and number of protein isoforms at 16 numerous steps in glycolysis. One such difference was the number of enolase isoforms 17 and their sum abundance; castor had approximately six times as many isoforms as soy 18 and rapeseed. Furthermore, ribulose-1,5-bisphosphate carboxylate (RuBisCO) was 19 eleven-fold less prominent in castor compared to rapeseed. These and other 20 differences suggest some aspects of carbon flow, carbon recapture, as well as ATP and 21 NADPH production in castor differs from photosynthetic oilseeds. Our analysis identified putative cytosolic and plastidal Glc-6-phosphate dehydrogenases but only cytosolic gluconate-6-phosphate lactonase and plastidal gluconate-6-phosphate dehydrogenase. All identified Glc-6-phosphate 25 dehydrogenase proteins show decreasing expression pattern during seed filling; however, the number of identified proteins and their relative abundance was higher for the putative cytosolic isoforms. Both steps catalyzed by Glc-6-phosphosphate and gluconate-6-phosphate dehydrogenase produce NADPH and were identified in castor. This study provides a quantitative, proteomics analysis of seed filling in a non- photosynthetic seed. The results of this study and comparison with parallel proteomic 2 studies of photosynthetic oilseeds enhance our current knowledge of both seed-filling 3 and carbon metabolism in a non-photosynthetic tissue. Using the data from our 4 proteomic analysis, we mapped the activities to various carbon assimilatory pathways. 5 The data show an increase in number and abundance of five glycolytic proteins in castor 6 compared to soybean and rapeseed. We also demonstrate the presence of proteins for 7 a complete cytosolic malate synthesis pathway that was absent in prior parallel studies 8 of rapeseed and soybean. Our data corroborate previous studies of isolated 9 leucoplasts, PEPC, and ME from developing castor seed suggesting that cytosolic 10 glycolysis and malate synthesis are likely contributors of both carbon and reducing 11 equivalents for FAS.


INTRODUCTION
1 necessary for FAS in sunflower. Using metabolic flux analyses, they concluded that 91-2 95% of the carbon required for FAS was supplied by triose-phosphates produced from 3 hexoses imported into the plastid. The origin of energy and reducing equivalents 4 needed for FAS also remains under investigation for non-photosynthetic seeds. Studies 5 conducted on heterotrophic plastids suggest two pathways are capable of providing 6 adequate levels of reducing equivalents. One such pathway is the malate pathway 7 described previously. Pleite et al. (2005) suggest NADP-malic enzyme (ME) and the 8 pyruvate dehydrogenase complex (PDC) produce adequate levels of acetyl-CoA and 9 reducing equivalents to support FAS in isolated plastids from sunflower embryos.

10
Another pathway is the oxidative pentose phosphate pathway (OPPP), where three 11 molecules of Glc-6-phosphate enter the pathway and are oxidized to three molecules of 12 ribulose-5-phosphate and CO 2 while forming six NADPH molecules (Kruger and von 13 Schaefer, 2003). Alonso et al. (2007) showed that OPPP produces the majority of 14 reducing equivalents for FAS in sunflower, compared to less than 4% produced by ME.

4
Castor proteins were isolated from each experimental seed stage in biological 5 triplicate for separation by 2-DGE. A two-step approach was employed to produce 2-D 6 gels with minimal spot overlap and gel distortions. First, castor protein extracts were 7 labeled with N-hydroxsuccinimide-activated Cy5 (NHS-Cy5), a lysine reactive 8 fluorophore. Cyanine dyes have been used in two-dimensional difference gel 9 electrophoresis (DIGE) to detect quantitative differences between two protein samples 10 (Ünlü et al., 1997;Tonge et al., 2001). In this study, the advantage of using a NHS-11 activated Cy-dye was its specificity for lysine, an amino acid that is rare in storage 12 proteins. Figure 2 shows the comparison of NHS-Cy5 labeled and Coomassie Brilliant 13 Blue (CBB) stained-proteins separated by 2-DGE. In figure 2A, there are few instances 14 of spot overlap and gel distortion despite the preponderance of seed storage proteins 15 when compared to CBB stained gel (Fig. 2B). NHS-Cy5 labeled proteins were resolved 16 for analytical spot quantitation by both wide (pH 3 to 10) and medium (pH 4 to 7) range 17 immobilized pH gradient (IPG) strips (Fig. S1). The use of medium range IPG strips 18 enhanced spot separation in a dense area (pI 4-7) of the protein map.

20
Gel images were scanned and analyzed with ImageMaster to detect, quantify, 21 and match spots. Relative volume, defined as the ratio of each spot volume per total 22 spot volume, was calculated for each spot (Table SI). To merge data from medium and 23 wide range gels, spot volumes were adjusted with a correction constant (Hajduch et al., 24 2005). Corrected relative volumes were averaged, graphed, and displayed as protein 25 expression profiles. Spots from each gel were matched within biological replicates and 26 between developmental stages (Fig. 3A). To be considered for identification spots had 27 to meet two criteria: 1) present in at least two of five developmental stages and 2) 28 present in three biological replicates of each developmental stage. Matched spots that 29 met these criteria were termed spot groups. A total of 660 spot groups met these 30 criteria. These spots were excised from colloidal CBB-stained reference gels to ensure 31 sufficient protein for mass spectrometry and to prevent any mis-assignments caused by 32 molecular weight shifting of labeled-proteins (Ünlü et al., 1997). After excision, spots 33 were trypsin digested, analyzed by LC-MS/MS, and searched against the TIGR castor translated genome using SEQUEST (Fig. 3B) resulting in identification of 522 (79%) out 1 of 660 spot groups. Of these 522 proteins, 303 were non-redundant. Protein 2 assignment information and expression profiles of these spot groups were deposited in 3 the Oilseed Proteomics web database (http://oilseedproteomics.missouri.edu/).

8
There were striking similarities in the distribution of proteins from the three oilseeds.

9
Functional classes with the highest percentage of proteins were destination/storage, energy and primary metabolism for all three oilseeds. As expected, soybean, a high 11 protein oilseed, had the highest percentage (44%) of protein in the destination/storage 12 class followed by castor (34%) and rapeseed (25%). Rapeseed (23%) had the highest 13 percentage of proteins involved in energy production followed by castor (19%) and 14 soybean (11%). All three oilseeds showed similar percentages of proteins in primary 15 metabolism (castor 15%, rapeseed 17%, and soybean 16%).

16
To understand the progression of metabolic activities during the early seed-filling phase of castor seed development, composite expression profiles were created for each 18 functional subclass (Hajduch et al., 2005). Pooled relative volumes were calculated by 19 summing relative volumes of identified proteins in each subclass and expression profiles 20 graphed. For simplicity, only those subclasses with ten or more proteins were included 21 in this analysis. As a result, 79% of identified proteins were grouped into 12 subclasses (Fig. 5;Table SIV). Two of the three classes with the largest relative abundance were in 23 the protein destination/storage functional class. The storage protein subclass had the 24 largest relative abundance and showed an increasing expression pattern beginning at 3 25 WAF. The glycolysis subclass, a member of the energy functional class, was the 26 second largest class of proteins and showed a steady decrease in expression. The third 27 largest class was folding and stability which showed an expression profile that increased 28 at 3 WAF but declined steadily thereafter. Overall, the expression patterns of the 12 29 subclasses were categorized into three groups based on their expression pattern during 30 seed development (Fig. 5). The first group included 9 of the 12 subclasses and was 31 composed of subclasses that are expressed mainly during the early stages of seed 32 development. Group two, which only included the detoxification subclass, was expressed generally in the middle stages of development. The expression pattern, of this group, showed little change in relative abundance during development. Finally, 1 group three was expressed mainly in the later stages of the experimental phase and 2 included defense-related proteins and storage protein subclasses.

3
Of the identified proteins, the twenty spot groups with the largest overall relative 4 abundance are listed in Table I. To calculate the overall relative abundance throughout 5 seed filling, the average relative volumes from each experimental stage of a single 6 protein spot were summed. Because of their well-documented prominence in seeds, 7 storage proteins were excluded from this analysis. Interestingly, eight out of the twenty 8 proteins were energy-related and specifically involved in glycolysis. These proteins 9 include glyceraldehyde 3-phosphate dehydrogenase (GAPDH), enolase, Fru 10 bisphosphate aldolase (FBA), and phosphoglycerate kinase (PGK). Six of the twenty 11 proteins were involved in protein folding and stability and were annotated as either 12 protein disulfide isomerase (PDI) or heat shock proteins. A PDI protein (spot 185) was 13 the most abundant non-storage protein expressed during castor seed filling. High levels 14 of PDI and heat shock proteins may assist in the folding of large amount of storage 15 proteins (Houston et al., 2005). Together, these twenty proteins represent 11 to 20 16 percent of the pooled relative volume during castor seed filling.

20
To compare carbon assimilation in castor to photosynthetic oilseeds, soybean 21 and rapeseed, metabolic enzymes were mapped to glycolysis and C 4 /C 3 organic acid 22 metabolism leading to FAS (Hajduch et al., 2005;Agrawal et al., 2008). In both figures 23 ( Fig. 6-7), enzymes are mapped to show the production of PEP in the cytosol and 24 plastid; but, from PEP production there are two potential pathways. First, in the flow of 25 C 3 carbon, PK converts PEP into pyruvate in the cytosol or plastid; and second in the

30
These comparisons reveal differences in abundance and number of protein isoforms at 31 many of these steps. The differences suggest malate may have a more prominent role 32 in intermediary metabolism in castor than in photosynthetic oilseeds soybean and rape.

5
(2008) suggest that FAS is primarily supported by glycolysis in B. napus embryo 6 cultures, sunflower embryos, and soybean cotyledons. Classical glycolysis (i.e. via 7 hexokinase, ATP-PFK, GAPC, and pyruvate kinase) has a net yield of two pyruvate, two 8 ATP, and two NADH for every glucose molecule that enters the pathway (reviewed in 9 Plaxton and Podestá, 2006). Also, glycolytic intermediates and products, such as Glc-6-10 phosphate (G-6-P), PEP, and pyruvate, may indirectly lead to similar components 11 through the TCA cycle, malate synthesis, or the oxidative pentose phosphate pathway.

12
For example, PEP can be transported across plastid membranes and then used by 13 pyruvate kinase (PK) to produce ATP and pyruvate (Ruuska et al., 2002). Therefore, we 14 will highlight selected glycolytic enzymatic steps between triose-phosphate and pyruvate 15 to investigate their contribution to FAS. Because glycolysis in plants occurs in both the 16 cytosol and plastid, subscripts of c (cytosol) and p (plastid) are used to indicate predicted subcellular localization when necessary.

18
The first glycolytic enzyme that was differentially expressed in castor compared 19 to soybean and rapeseed is cytosolic FBA. This enzyme was 30% higher in castor than 20 rapeseed while soybean expression levels were undetectable. FBA catalyzes an aldol 21 cleavage of Fru-1,6-bisphosphate (F-1,6-bp) to triose-phosphates, dihydroxyacetonephosphate (DHAP) and glyceraldehyde 3-phosphate (GAP). Alonso et al. (2007) 23 suggested that triose-phosphates are the major carbon source for FAS in sunflower 24 embryos, another non-photosynthetic oilseed. The increase in FBA c for castor may be 25 necessary to produce large amounts of triose-phosphates. Two of the eight FBA c 26 isoforms (spots 344, 356) were in the top ten overall abundant proteins. In contrast, four 27 FBA p isoforms were identified. In green leaves of maize, wheat, spinach, and pea, most 28 of the FBA activity (~90%) is chloroplastic (Schnarrenberger and Krüger, 1986;Pelzer-29 Reith et al., 1993) and a small decrease in FBA p activity in potato leaves resulted in the 30 inhibition of photosynthesis and starch synthesis (Haake et al., 1998). However, in 31 germinating castor seed, FBA activity is mostly cytosolic, representing two-thirds of the 32 activity (Moorhead et al., 1994). In developing castor seed FBA c is five-fold more 1 castor seed although the plastid fractionation was contaminated with other cytosolic 2 enzymes.

23
Two other glycolytic activities that were more prominent in castor compared to 24 soybean and rapeseed are PGK and 2,3-bisphosphoglycerate-independent 25 phosphoglycerate mutase (iPGAM). First, PGK converts 1,3-bPGA to 3- volume in castor, soybean, and rapeseed, when compared to the previous steps 1 catalyzed by FBA and GAPDH. The conservation of this decrease may be of regulatory 2 importance to control the flow of carbon or may reflect a higher K cat for these two 3 enzymes.

4
Miernyk and Dennis (1992) previously showed that enolase activity in castor 5 seeds was highest during FAS then decreased to undetectable levels. These results are 6 reflected in this study. Enolase catalyzes the conversion of 2-PGA to PEP. Castor had 7 six times as many enolase isoforms, than soybean and rapeseed, during the seed-filling 8 phase. Two of these isoforms were classified as the most abundant proteins overall.

9
Collectively, the enolase isoforms produced a relative abundance two to three times 10 higher than soybean and rapeseed. There is a paucity of literature to explain this high 11 expression, but a simple explanation is that it is necessary to accommodate increased 12 flux of triose-phosphate from both glycolysis and carbon dioxide recycling. This

19
In the final step of glycolysis, pyruvate kinase (PK) catalyzes the irreversible 20 transfer of inorganic phosphate (Pi) from PEP to ADP, yielding pyruvate and ATP. This 21 step is considered to be a highly regulated glycolytic step as it is activated by pH changes, inhibited by Glu/Asp and ATP, and degraded after phosphorylation and 23 ubiquitination (Podesta and Plaxton 1991;Tang et al., 2003;Turner et al., 2005).

24
Among the oilseeds compared, PK c isoforms were identified in soybean and castor while 25 the PK p was identified in rapeseed; however, in earlier studies, PK p activity and 26 concentration was shown to coincide with fatty acid levels in developing castor seed 27 (Plaxton, 1991;Negm et al., 1995). Castor and soybean expression patterns are very 28 different. PK c expression in castor has a steady decreasing trend consistent with other glycolytic enzymes while soybean shows a more anomalous trend, perhaps suggestive

30
The presence of malate dehydrogenase (MDH) and ME suggests a contribution 31 of malate synthesis to intermediary metabolism in castor seed. MDH followed by ME 32 converts OAA to pyruvate in two reactions. MDH catalyzes the reversible conversion of 33 OAA to malate (Goward and Nicholls, 1994 the lowest number of cytosolic MDH isoforms and overall relative volume. However, 1 castor has the only MDH isoform identified from the plastid. Expression of the plastid 2 isoform suggests a role for malate in the transfer of reduction equivalents from the 3 plastid to the cytosol. Unlike MDH, ME was only identified in castor seed. In developing 4 castor seed, plastid NADP-ME activity was shown to peak at later stages of seed 5 development as lipid production increased (Shearer et al., 2005); however, cytosolic ME 6 had the higher volume when compared to the plastid isoform and may also be important 7 in supplying carbon and reducing equivalents for FAS in developing seed. Overall, it 8 appears that the cytosolic pathway of malate synthesis is the major pathway in 9 developing castor seed.

19
but, some studies indicate the castor plastid lacks a complete or efficient OPPP. For 20 example, studies indicate that castor seed plastids do not contain Glc-6-phosphate 21 dehydrogenase, a key enzyme in the pentose phosphate pathway (Simcox et al., 1977;Nishimura and Beevers, 1979). Our analysis identified putative cytosolic and plastidal 23 Glc-6-phosphate dehydrogenases but only cytosolic gluconate-6-phosphate lactonase 24 and plastidal gluconate-6-phosphate dehydrogenase. All identified Glc-6-phosphate 25 dehydrogenase proteins show decreasing expression pattern during seed filling; 26 however, the number of identified proteins and their relative abundance was higher for 27 the putative cytosolic isoforms. Both steps catalyzed by Glc-6-phosphosphate and 28 gluconate-6-phosphate dehydrogenase produce NADPH and were identified in castor.

29
Of the OPPP identified proteins, the putative plastidal gluconate-6-phosphate 30 dehydrogenase showed the largest relative abundance. The relative abundance of 31 gluconate-6-phosphate dehydrogenase was 3-fold and 6 fold higher than that of 32 soybean and rapeseed.

CONCLUSION
This study provides a quantitative, proteomics analysis of seed filling in a non-1 photosynthetic seed. The results of this study and comparison with parallel proteomic 2 studies of photosynthetic oilseeds enhance our current knowledge of both seed-filling 3 and carbon metabolism in a non-photosynthetic tissue. Using the data from our 4 proteomic analysis, we mapped the activities to various carbon assimilatory pathways.

5
The data show an increase in number and abundance of five glycolytic proteins in castor 6 compared to soybean and rapeseed. We also demonstrate the presence of proteins for 7 a complete cytosolic malate synthesis pathway that was absent in prior parallel studies 8 of rapeseed and soybean. Our data corroborate previous studies of isolated 9 leucoplasts, PEPC, and ME from developing castor seed suggesting that cytosolic     13 (2005). Protein abundance was expressed as relative volume according to the 14 normalization method provided by ImageMaster software. Relative volumes were 15 adjusted with correction constants to merge data from both 4 to 7 and 3 to 10 gels.

18
Proteins spots represented in all three biological and technical replicates were 19 excised from corresponding preparative CBB stained 2D gel and trypsin digested as  to cross-correlation scores (XCorr at least 1.5, 2.0 and 2.5 for +1, +2 and +3 charged peptides, respectively) and peptide probability (0.05 or lower). For all protein 1 assignments a minimum of two unique peptides was required. All data from this 2 investigation are available from the oilseed proteomics web database 3 (http://oilseedproteomics.missouri.edu). Programming for the web database was 4 performed, as described previously (Hajduch et al., 2005). Data are viewable as active 5 links from 2D gels and as a protein identification table.

7
We are grateful to Ganesh Agrawal for technical assistance. We also thank Jianjiong 8 Gao for uploading the castor data to the Proteomics of Oilseeds website.

32
Percents of average dry mass and water content equal 100% of the average fresh mass.        wide range gels, spot volumes were adjusted with a correction constant. The correction 16 constant for each dataset (i.e. pI 4-7 and pI 3-10 datasets) was calculated according to corrected relative volumes.          (2005). These data were compared with parallel studies of seed filling in soybean and rapeseed (Hajduch et al., 2006;Agrawal et al., 2008). The histogram represents percentage of identified proteins in each functional class. Numbers in parentheses correspond to functional classifications in Table SIII Figure 6. Characterization of carbon assimilation during seed filling in oilseeds. Proteins involved in sugar breakdown that lead to carbon sources for biosynthetic pathways are displayed. Expression profiles show the relative abundance for each enzyme at each stage of the early seed-filling phase. C (castor), S (soybean), or R (rapeseed) above the expression profile represents the origin of the protein (Hajduch et al., 2006;Agrawal et al., 2008). The number above the graph shows the number of identified proteins. The number to the left of the graph shows the maximum value (y-axis) of the relative volume. Solid lines represent proteins identified in parallel proteomic studies of castor, soybean, or rapeseed while unidentified proteins are represented by dashed lines. Checked squares indicate identified proteins with no expression profiles due to low volumes or presence in less than three stages. Brackets indicate the number of proteins identified by Sec-MudPIT. Grey boxes highlight differences in number (>2) or relative abundance (>2