C4GEM, a Genome-Scale Metabolic Model to Study C 4 Plant Metabolism 1[W][OA]

Leaves of C 4 grasses (such as maize [ Zea mays ], sugarcane [S accharum ofﬁcinarum ], and sorghum [ Sorghum bicolor ]) form a classical Kranz leaf anatomy. Unlike C 3 plants, where photosynthetic CO 2 ﬁxation proceeds in the mesophyll (M), the ﬁxation process in C 4 plants is distributed between two cell types, the M cell and the bundle sheath (BS) cell. Here, we develop a C 4 genome-scale model (C4GEM) for the investigation of ﬂux distribution in M and BS cells during C 4 photosynthesis. C4GEM, to our knowledge, is the ﬁrst large-scale metabolic model that encapsulates metabolic interactions between two different cell types. C4GEM is based on the Arabidopsis ( Arabidopsis thaliana ) model (AraGEM) but has been extended by adding reactions and transporters responsible to represent three different C 4 subtypes (NADP-ME [for malic enzyme], NAD-ME, and phospho enol pyruvate carboxykinase). C4GEM has been validated for its ability to synthesize 47 biomass components and consists of 1,588 unique reactions, 1,755 metabolites, 83 interorganelle transporters, and 29 external transporters (including transport through plasmodesmata). Reactions in the common C 4 model have been associated with well-annotated C 4 species (NADP-ME subtypes): 3,557 genes in sorghum, 11,623 genes in maize, and 3,881 genes in sugarcane. The number of essential reactions not assigned to genes is 131, 135, and 156 in sorghum, maize, and sugarcane, respectively. Flux balance analysis was used to assess the metabolic activity in M and BS cells during C 4 photosynthesis. Our simulations were consistent with chloroplast proteomic studies, and C4GEM predicted the classical C 4 photosynthesis pathway and its major effect in organelle function in M and BS. The model also highlights differences in metabolic activities around photosystem I and photosystem II for three different C 4 subtypes. Effects of CO 2 leakage were also explored. C4GEM is a viable framework for in silico analysis of cell cooperation between M and BS cells during photosynthesis and can be used to explore C 4 plant metabolism.

Many of the most productive crops in agriculture, such as maize (Zea mays), sorghum (Sorghum bicolor), and sugarcane (Saccharum officinarum), possess the Kranz leaf anatomy directly and causally associated with the C 4 photosynthetic pathway (Laetsch, 1974;Hatch and Kagawa, 1976). Unlike C 3 plants, where photosynthetic CO 2 fixation proceeds in a single tissue, the mesophyll (M), in C 4 plants, this process is distributed between M and bundle sheath (BS) cells (Edwards and Walker, 1983). In the Kranz anatomy, a ring of M, containing a few small chloroplasts concerned with the initial fixing of carbon dioxide, surrounds a sheath of parenchyma cells (BS), which have large chloroplasts involved in the Calvin-Benson cycle. BS cells have thick cell walls and contain centrifugally arranged chloroplasts with large starch granules and unstacked thylakoid membranes, whereas the M cells contain randomly arranged chloroplasts with stacked thylakoids and little or no starch (Kennedy et al., 1977;Edwards and Walker, 1983;Spilatro and Preiss, 1987). The fully differentiated BS and M chloroplasts each accumulates a distinct set of photosynthetic enzymes and proteins that enable them to cooperate in carbon fixation.
All C 4 species operate on the same basic theme of pumping CO 2 via C 4 acids from M tissue, where phosphoenolpyruvate carboxylase (PEPC) activity is enhanced, to the BS layer, where Rubisco is localized and C 4 acids are decarboxylated. Other than this, the only common feature shared by C 4 plants is a reduction in the ratio of M to BS cells when compared with C 3 plants. Because diffusion of organic acids between the M and BS must be relatively rapid, M cells in the C 4 plants are rarely more than one cell distant from BS cells. As a result, M-to-BS ratios are between 1 and 2 in C 4 plants, while they are over 4 in most laminate C 3 leaves. Beyond these common features, C 4 plants exhibit substantial variation in how they accomplish CO 2 concentration. These variations result from three distinct decarboxylation modes and multiple patterns of anatomical modification. Biochemically, the major distinguishing feature of C 4 subtypes is the enzyme used for the decarboxylation step in BS (Hatch, 1974(Hatch, , 1992Hatch and Kagawa, 1976). The three carboxylation modes are NADP-malic enzyme (ME; present in maize, sugarcane, and sorghum), NAD-ME (present in Panicum miliaceum and Amaranthus spp.), and phosphoenolpyruvate carboxykinase (PCK; present in Panicum maximum and Amaranthus spp.). These different carboxylation modes are represented in Figure 1. It is important to note that these different mechanisms lead to different patterns of intercellular movements of C 3 and C 4 compounds. During C 4 photosynthesis, atmospheric CO 2 is initially fixed in M cells by PEPC to form C 4 dicarboxylic acids. The movement of photosynthetic intermediates between M and BS cells is restricted largely or entirely to the plasmodesmata and transpirational water movement through the cell walls (Evert et al.,Figure 1. Schematic representation of the photosynthetic metabolism of three C 4 subtypes distinguished according to the decarboxylating enzyme. Numbers refer to enzymes: (1) PEPC, (2) NADPmalate dehydrogenase, (3) NADP-ME, (4) pyruvate-Pi dikinase, (5) Rubisco, (6) PCK, (7) Ala aminotransferase, (8) Asp aminotransferase, (9) NAD-malate dehydrogenase, (10) NAD-ME. Some steps were hidden for the sake of simplicity. 3PGA, 3-Phosphoglycerate; DHAP, dihydroxyacetonephosphate; OAA, oxaloacetate; RuBP, ribulose-1, 5-bisphosphate. 1977). The C 4 dicarboxylic acids are transferred to Kranz cells (BS cells) and decarboxylated there. The released CO 2 is refixed via ribulose-1,5-bisphosphate carboxylase of Kranz cell chloroplasts and incorporated into the Calvin-Benson cycle (Hatch and Kagawa, 1973, 1976. The transfer of atmospheric CO 2 to BS cells is unique to C 4 photosynthesis, since it leads to concentration of CO 2 at the site of the reaction of Rubisco (Furbank et al., 1989(Furbank et al., , 1990b. This CO 2 -concentrating mechanism, together with specific properties of photosynthetic enzymes, configurations of intracellular and intercellular transporters, and modifications in leaf anatomy, allows C 4 plants to achieve higher photosynthetic efficiency than C 3 plants (Bull, 1969). C 4 photosynthesis allows fast biomass accumulation with high nitrogen and water use efficiency (Smith, 1976;Sage, 2004).
Attempts have been made to engineer C 4 metabolism in C 3 plant species, such as rice (Oryza sativa), by overexpressing C 4 enzymes (from maize; Suzuki et al., 2000Suzuki et al., , 2006Ku et al., 2001;Agarie et al., 2002). However, significant increase in photosynthetic efficiency in the C 3 mutants has yet to be achieved.
The annotated genomes, along with literature data, define the known biological components of organisms. From these components, biological networks can be reconstructed and the properties of the system explored. In silico genome-scale metabolic reconstruction remains the most tangible and applicable systems biology approach for metabolic engineering (Blazeck and Alpert, 2010;Dietz and Panke, 2010;Tyo et al., 2010). A well-curated genome-scale network reconstruction is a common denominator for those studying systems biology of an organism. For these metabolic networks, a stoichiometric formalism (Edwards and Palsson, 2000;Thiele and Palsson, 2010) is natural and underpins a powerful set of analytical tools for exploring relationship between genotype and phenotype, predicting product yields and growth rates under different environments, and discovering global effects of genetic manipulations (Edwards and Palsson, 2000;Famili et al., 2003;Price et al., 2003;Quek and Nielsen, 2008;Oberhardt et al., 2009;Thiele and Palsson, 2010).
The most represented domain for genome-scale models is bacteria, with 25 species reconstructed (Oberhardt et al., 2009). Genome-scale metabolic networks have also been reconstructed for mouse (Sheikh et al., 2005;Quek and Nielsen, 2008), human (Mo et al., 2007;Sigurdsson et al., 2009), and more recently for Arabidopsis (Arabidopsis thaliana; Poolman et al., 2009;de Oliveira Dal'Molin et al., 2010). These reconstructions vary both in detail and in scope. For example, the Arabidopsis reconstruction of Poolman et al. (2009) considered two organelles (cytosol and mitochondria) to represent the plant metabolic network and hence was limited to describing heterotrophic cell culture. In contrast, our model (AraGEM; de Oliveira Dal'Molin et al., 2010) describes metabolism across cytosol, plastid, mitochondrion, peroxisome, and vacuole, which enabled the exploration of metabolic differences in photosynthetic and nonphotosynthetic cells and the examination in greater metabolic detail of cells undergoing photosynthesis, photorespiration, b-oxidation, or respiration. While both models covered the full range of known Arabidopsis reactions, validation was limited to primary metabolism (i.e. the synthesis of major biomass components, such as amino acids, nucleotides, lipid, starch, cellulose, and some vitamins). Nevertheless, both models are important contributions redressing the conspicuous lack of plant metabolic reconstructions (Oberhardt et al., 2009). Genome-scale model reconstruction and validation of eukaryotes is certainly more challenging than in prokaryotic organisms, given the multiple organelles. However, such an approach applied for complex, multitissue plant systems would be of potentially even greater value, since direct measurement of fluxes using pulsechase studies is practically impossible in plants due to signal dispersion in the large network as well as metabolite and pathway replication in multiple organelles. In this work, we attempted to use this systemsbased framework to understand the interplay of metabolic networks in complex systems, like C 4 plant metabolism. This, to our knowledge, is the first reconstruction used to formulate an in silico genome-scale model describing the interactions between the two tissue types, M and BS, involved in C 4 metabolism. The predicted differences in flux distribution between M and BS cells were in agreement with observed proteomic studies and C 4 metabolism features reported in the literature, suggesting that the model can be used to further explore C 4 plant metabolism.

Characteristics of the Genome-Scale Model
The metabolic network reconstruction includes associations between genes, enzymes, and reactions to represent C 4 plant metabolism based on the best available online resources (Table I).
Draft models were constructed from all reactions supported in the Kyoto Encyclopedia of Genes and Genomes (KEGG) for the three C 4 plants sorghum, maize, and sugarcane. Inconsistencies in the KEGG database (e.g. multiple identifiers for single metabolites and unbalanced reactions) were resolved as described previously (Quek and Nielsen, 2008;de Oliveira Dal'Molin et al., 2010).
Subcellular compartmentalization plays an important role in plant metabolism. An initial attempt to use WoLF PSORT (http://wolfpsort.org/), a protein subcellular localization predictive tool (Horton et al., 2007), to assign compartments for each gene failed, as most calls were ambiguous or inconsistent across the three species. Instead, AraGEM (de Oliveira Dal'Molin et al., 2010) was used to make the compartmentalization calls for each reaction. AraGEM is manually curated for subcellular localization (cytosol, mitochondrion, plastid, peroxisome, or vacuole), and we have not identified any inconsistencies between localization in Arabidopsis and the limited number of proteins in C 4 plants for which experimental localization data are available. The compartmentalized draft model covers 3,557 genes and 13,114 gene-reaction association entries for sorghum, 11,623 genes and 38,829 associations for maize, and 3,881 genes and 13,593 associations for sugarcane (Table II). The large number of gene-reaction associations reflects that, in the absence of localization information, genes associated with a reaction found in multiple compartments must be associated with each compartment.
A functional model of primary metabolism, C4GEM, was derived from the draft model through manual curation. The model describes 1,588 unique reactions involving 1,755 metabolites (Table II). Fortyseven biomass drain equations describe the accumulation of carbohydrates, amino acids, fatty acid (palmitic acid only), cellulose, and hemicellulose, representing the major biomass drains for a plant cell, as well as some vitamins and cofactors (Table III). Eighteen intercellular exchange reactions (cytoplasm extracellular) have been included to describe the uptake of light (photons), assimilation/secretion of inorganic compounds (CO 2 , H 2 O, O 2 , NO 3 , NH 3 , H 2 S, SO 4 2-, PO 4 3-), translocation of sugars (Suc, Glc, Fru, and maltose), and amino acids (Gln, Glu, Asp, Ala, and Ser). Together with biomass drains, the intercellular exchangers define the broad physiological domain of the model, that is, the curated aspects of primary C 4 plant metabolism captured by C4GEM. A total of 83 interorganelle transporters were introduced in the model to achieve the desired functionality. The active, validated scope of C4GEM includes glycolysis (plastidic and cytosolic), pentose phosphate pathway (plastidic and cytosolic), tricarboxylic acid (TCA) cycle, light and dark reactions (Calvin cycle), NADH/ NADPH redox shuttle between the subcellular compartments, fatty acid synthesis, b-oxidation, and glyoxylate cycle.
Although genome annotation can be used to derive a majority of plant metabolic reactions, there are missing reactions where the genome has not been fully elucidated. During the reconstruction, gaps were found in the primary metabolism for the three C 4 models: 131, 135, and 156 gaps for sorghum, maize, and sugarcane, respectively. Some of these gaps could be filled by combining all of the C 4 plant models. The remaining 100 essential reactions were all found in AraGEM (for a list of covered reactions, see Supplemental Table S1). The reconstruction allowed a functional annotation of the C 4 plant models to ensure that reaction gaps are filled in the context of metabolic functions of nucleotide, carbohydrate, amino acids, and nitrogen metabolism, citrate cycle (TCA), carbon fixation, and pentose phosphate pathway.
The final model has 446 singleton or dead-end metabolites (i.e. internal metabolites only used in a single reaction; Supplemental Table S2). A total of 512 reactions are linked to the use or production of dead-end metabolites. These reactions will by definition have zero flux in flux balance analysis. There are several causes of singletons. Most are linked to vitamins, cofactors, and secondary metabolites not currently included in the model (i.e. not described by biomass drains or intercellular transporters). Some result from true gaps in the network, where one or more essential reactions have not yet been assigned. Finally, some result from KEGG's automatic annotation, in which every reaction known to be catalyzed by a particular Enzyme Commission enzyme in some organism will be included by default, whether or not other reactions required for functionality of the particular reaction are present. Continued curation efforts will focus on resolving these singletons.
In its current form, the model has 183 degrees of freedom. Of these, 47 are associated with the production of biomass components and reduce to a single degree of freedom (growth rate) when a fixed biomass composition is assumed. Another 18 degrees of freedom are associated with intercellular transport. The remaining 118 degrees of freedom represent the maximum cellular scope for using alternative pathways to achieve identical outcomes in terms of growth rate and net transport (e.g. the use of cytosolic or chloroplastic glycolysis). The real scope is further limited by irreversibility constraints as well as regulatory constraints. C4GEM is available in System Biology Markup Language (SBML) format for maize, sorghum, and sugarcane (Supplemental Files S1-S3). Coefficients were estimated based on literature data (Guinn, 1966;Poorter and Bergkotte, 1992). Although individual drains for vitamins and cofactors were tested, they were not added in the stoichiometric biomass equation, as they account for minor compounds. Coefficients for biomass drain are as follows: 0.001 maltose + 0.096 Suc + 0.053 Glc + 0.053 b-D-Fru + 0.265 starch + 0.305 cellulose + 0.282 D-Xyl + 0.04 4-coumaryl alcohol + 0.048 coniferyl alcohol + 0.056 sinapyl alcohol + 0.026 hexadecanoic acid + 0.107 L-Ala + 0.39 L-Asn + 0.162 L-Asp + 0.024 L-Cys + 0.39 L-Glu + 0.35 L-Gln + 0.14 Gly + 0.06 L-Ile + 0.14 L-Leu + 0.002 L-Met + 0.08 L-Phe + 0.01 L-Pro + 0.21 L-Ser + 0.086 L-Thr + 0.04 L-Trp + 0.03 L-Tyr + 0.1 L-Val + 0.001 L-Lys + 0.0027 dATP + 0.0017 dGTP + 0.0019 dCTP + 0.0028 dTTP + 0.0029 ATP + 0.0039 GTP + 0.0033 CTP + 0.0029 UTP + 30 ATP = biomass + 30 ADP + 30 orthophosphate.

Two-Tissue Model
The active scope of C4GEM is similar to AraGEM (de Oliveira Dal' Molin et al., 2010). We have previously demonstrated how AraGEM can be used to explore metabolism in a single tissue (e.g. photorespiration in C 3 M cells). The purpose of this study, however, is to explore the interactions between two distinct tissues, M and BS. Significantly, it represents the first step to represent a plant multitissue model.
C4GEM encapsulates the full complement of primary metabolic reactions available to any cell in a C 4 plant. Tissue specificity is realized through differential gene expression (i.e. by constraining what part of the network is available to the individual tissues). Thus, M and BS tissue will be described as two instances of C4GEM differing only in terms of the constraints applied to the models (see below). The Kranz anatomy is modeled by connecting these two instances, M and BS, through transporters representing plasmodesmata. The movement of photosynthetic intermediates between M and BS cells ( Fig. 1) is largely or entirely limited to the plasmodesmata (Evert et al., 1977). While plasmodesmata are capable of transporting a broad spectrum of low molecular mass (less than 800-900 D) metabolites, significant transport requires high concentration, and transporters were only added for major substrate carriers involved in the C 4 photosynthetic pathway (subtypes NADP-ME, NAD-ME, and PCK), such as malate, pyruvate, phosphoglycerate, triose phosphates, Asp, Ala, and phosphates (pyrophosphate and orthophosphate; Furbank et al., 1989Furbank et al., , 1990bFig. 1).
The final C 4 model is described pictorially in Figure  2A. The model is based on the annotated genome for sorghum but uses AraGEM for compartmentalization and the 131 missing, essential reactions in primary metabolism. The model has been extended with 11 intercellular transporters (plasmodesmata), including Suc translocation and gases efflux.
Mathematically, the large network is broken down into a stoichiometric matrix, S, with a row for each intracellular metabolite and a column for each reaction. Component balances governing a metabolic network of N = {1,.., n} chemical transformations and M = {1,.., m} chemical entities are where C i denotes the concentration of chemical entity i, S ij is the stoichiometric coefficient of chemical entity i in transformation j, and r j is the corresponding flux of transformation j. The time scale of metabolic reactions is short compared with regulation and growth dynamics, allowing us to invoke the steady-state assumption In addition, some reactions are irreversible, so With two tissues communicating, the model is duplicated and the metabolites in plasmodesmata, which are external (unbalanced) for the individual tissues, become internal and balanceable in the two-tissue model. In matrix form, the two-tissue model reads 2 where P M and P BS have a row for each of the 11 metabolites transported via plasmodesmata and a "1" entry in the column for the corresponding transporter. Thus, the third row dictates that the sum of metabolites exported into the plasmodesmata is 0, or, in other words, that whatever is exported from M must be taken up by BS (negative export).
The two-tissue model describes the network topology connecting inputs (e.g. photons, inorganic compounds) to outputs (e.g. biomass). The network is redundant (i.e. there are several paths through the network connecting any given input to any given output). This redundancy is inherent in all biological networks and confers flexibility and robustness to biological systems. The actual path chosen is defined by enzyme kinetics and regulation, and the current model does not consider these.
Equations 3 and 4 define the feasible flux space (i.e. the combinations of fluxes in M and BS that satisfy flux balance and irreversibility constraints). By careful formulation of additional assumptions and, in some instances, an optimality criterion, it is possible to explore the feasible space and formulate testable biological hypotheses, such as the following. Given a certain photon supply, what is the maximum rate of photosynthate production? What is the minimum number of photons required to support plant metabolism, and what is the optimal flux distribution? Is a particular gene deletion expected to affect growth and/or photosynthate production?
It is also possible to incorporate expression data from transcriptomics or proteomics to reduce the feasible solution space by deleting nonexpressed reactions. In the following sections, we will illustrate the use of constraint-based analysis.

Simulations
In this study, we are interested in exploring (1) if the observed differences between M and BS metabolism in C 4 plants are consistent with an assumption that the plant has evolved to use the network in an optimal manner, and (2) if the different C 4 subtypes differ in their potential efficiency. For a photosynthetic system, a logical optimality criterion is photon efficiency (i.e. a plant network will operate with the feasible flux distribution that minimizes photon uptake for fixed rates of biomass syn-thesis and export of carbon assimilates to other tissues). While no proof exists for the validity of this criterion, we observed for AraGEM that this criterion accurately predicted the classical photorespiration pathway as the most photon-efficient way to handle the effect of the Rubisco oxygenation reaction and that the predicted cost of photorespiration compared with pure photosynthesis was a 40% increase in photon usage for a 3:1 carboxylation:oxygenation ratio, which is consistent with experimental data (de Oliveira Dal'Molin et al., 2010).

Assumptions and Constraints
As a minimum definition of tissue specificity, Rubisco and PEPC activity are constrained to BS and M tissues, respectively, based on gene expression, enzyme activity, and proteome studies (Ghosh et al., 1994;Gutteridge and Gatenby, 1995;Dong et al., 1998;Patel and Berry, 2008; Table IV). These constraints are common to all three C 4 subtypes. Mathematically, the tissuespecific activities are captured by equality constraints: To represent each C 4 subtype, the enzyme used to release CO 2 from the C 4 acids was specified (NADPH-ME, NAD-ME, or PCK; Table IV). Similarly, the active C 3 -C 4 transport systems have been specified based on subtypes. Finally, the drain of biomass components, starch accumulation, and export of carbon assimilates (Suc) was fixed based on average measured values.
It should be stressed that this is a minimum definition of C 4 subtype metabolism used in this study, because we seek to explore questions around optimality. More detailed constraints (e.g. based on transcriptome analysis) may be used when exploring other questions.

Preferential Starch Accumulation in BS Cannot Be Explained by Photon Efficiency
In C 4 plants, starch is principally synthesized in BS, although the amount of M starch biosynthesis is species dependent (Black and Mollenha, 1971;Spilatro and Preiss, 1987). In maize, for example, starch biosynthesis occurs only in the BS during normal photosynthesis, but plants growing in continuous light do accumulate starch in the M, suggesting that photosynthate availability is the limiting factor (Downton and Hawker, 1973).
One explanation for preferential starch biosynthesis in BS could be that it is "less expensive" (i.e. requires less photons) to accumulate starch in BS than in M, and hence BS stores are filled before M stores of starch. We tested this hypothesis by comparing the minimum photon requirement predicted with all starch produced in M to the minimum photon requirement for the standard model with all starch produced in BS. We did not observe an energy efficiency advantage for producing starch in BS rather than in M; thus, preferential starch production in BS cannot be argued to be the result of an evolutionary pressure for higher energy efficiency. Starch accumulation in BS only was added as an experimentally based constraint for subsequent analysis.

Suc Translocation
To sustain other plant tissues, fixed carbon must move out of the photosynthetic cells and into phloem (Braun and Slewinski, 2009). For most plants, this occurs by loading Suc into the phloem and transporting it from source tissues (net exporters) to sink tissues (net importers), where Suc is unloaded. Suc drain from BS to vascular parenchyma has been added to represent the translocation of carbon for the sustenance of the sink tissues. It is unclear if Suc produced in M or BS is transported through plasmosdesmata during C 4 photosynthesis or if malate and pyruvate (for NADP-ME subtypes like maize) are the main carbon flow during the carbon assimilation. We tested this hypothesis in silico by allowing Suc exchange between both cells during C 4 photosynthesis. According to the flux variability analysis (see "Materials and Methods"), Suc can be exchanged during this cooperative process, but malate and pyruvate show a much greater flux in magnitude (about 100-fold). Suc exchange between both cells did not affect the main contrasts in metabolic activity by comparing BS and M.

In Silico Flux Profile Correlates with Maize Chloroplast Proteome Studies
The optimal flux distribution predicted by the twotissue model shows a number of reactions through which there is a significant difference in the flux observed in BS and M (Fig. 3). While the magnitude of flux is not directly linked to the amount of enzyme present, one might reasonably expect that a significant increase in flux in a tissue in most cases is matched by a significant increase in the expression of the corresponding enzyme. We compared metabolic flux predictions with a large data set comparing BS and M chloroplast proteomes in maize (Majeran et al., 2005(Majeran et al., , 2008Friso et al., 2010). Since the optimality criterion does not necessarily yield a unique flux distribution, flux variability analysis (see "Materials and Methods") was used to establish the range of values for each flux for which optimal photon usage could be achieved. All but four fluxes (ATP/ADP and oxaloacetate/malate transporters, glyceraldehyde 3-phosphate dehydrogenase, and triose phosphate isomerase) were uniquely defined by the optimality criterion.
For 50 of the 66 proteins or protein complexes annotated in the proteome assigned to enzymatic reactions, differential expression (or absence thereof) was consistent with predicted flux differences (Table V). Of these 50, two predictions were a direct consequence of the model assumptions for Rubisco and starch synthesis ( Table IV). Twenty of the reactions were assigned to fatty acid biosynthesis, and the average and median BSto-M ratios for proteins involved in fatty acids synthesis were 0.91 and 0.94, respectively, supporting the assumption that the demand for fatty acids is similar in M and BS.
The simulations show that flux through enzymes involved in carbon fixation, pentose phosphate pathway, and starch synthesis is up-regulated in BS, while flux through enzymes involved in purine, pyruvate, and nitrogen metabolism and nitrogen assimilation is up-regulated in M. The model also highlights contrasts in flux distribution through isoenzymes distributed in multiple organelles, such as higher flux through malate dehydrogenase in M plastids and higher flux through ME in BS plastids. Interestingly, in silico analysis shows no difference in flux through chloroplastic pyruvate dehydrogenase during photosynthesis, indicating an essential need to maintain malate and pyruvate pools required for operation of the C 4 cycle. These flux predictions agree with the proteome studies.
Of the 16 reactions not accurately predicted, six are involved in chlorophyll and isoprenoid synthesis. While C4GEM has been validated for its ability to synthesize chlorophyll, the biomass content of chlorophyll was not specified in the model. Since BS chloroplasts have lower chlorophyll b levels, most of the predictions would agree with the proteome data if the content was accurately specified in the biomass equation. The reason for higher protoporphyrinogen oxidase activity in BS is unclear, given that the enzyme catalyzes an early step in chlorophyll biosynthesis. Another seven of the reactions not predicted in the simulated fluxes relate to the expected overexpression of oxidative stress response activities in M tissues. The model currently does not describe the generation of reactive oxygen species or their downstream products, and hence it does not predict the need for these activities. Similarly, the overexpression of three enzymes involved in degradation pathways was not predicted. The photon-optimal solution for making biomass clearly will not predict degradation, which is a wasteful process. The flux through maintenance and turnover processes needs to be expressed explicitly to be accounted for in the model.

The Model Predicts Cyclic Electron Flow in Bundle Chloroplast for NADPH-ME Subtype
Whereas linear electron transport in photosynthesis produces both ATP and NADPH, the cyclic electron transport (CET) around PSI has been shown to produce only ATP (Shikanai, 2007;Endo et al., 2008). Two alternative routes have been shown for CET: NAD(P) H dehydrogenase-dependent and ferredoxin:plastoquinone oxidoreductase-dependent flows, but their physiological relevance has not been elucidated in detail. Meanwhile, because C 4 photosynthesis requires more ATP than does C 3 photosynthesis to concentrate CO 2 , it has not been clear how the extra ATP is produced. Linear electron transport alone cannot produce enough ATP to satisfy the stoichiometry required for CO 2 fixation. The shortage of ATP may be compensated for by the function of CET. Interestingly, two subtypes of C 4 photosynthesis, NAD-ME and NADP-ME, have been shown to have different cell-specific ATP requirements. The ATP/NADPH ratio required in NAD-ME species is higher in M cells than in BS cells, but the opposite is true for NADP-ME species (Edwards and Walker, 1983;Takabayashi et al., 2005). This cell type-specific ATP requirement suggests that the activity of CET should be higher in M cells of NAD-ME species and in BS cells of NADP-ME species, if CET plays a critical role in supplying the additional ATP needed in C 4 photosynthesis.
C4GEM predicts conventional linear electron transfer in M chloroplasts and CET in BS chloroplast for NADP-ME species (Fig. 4). One explanation for preferential CET in BS could be that it requires less photons to get the extra ATP in BS chloroplast than in M chloroplast. We tested this hypothesis by comparing the minimum photon requirement predicted with CET   (Majeran et al., 2005(Majeran et al., , 2008Friso et al., 2010). For flux (F) analyses (BS/M), data show relative flux distribution (this study). Green indicates increased protein accumulation (P) or increased flux (F); red indicates decreased protein accumulation (P) or decreased flux (F); gray indicates no change; white indicates not predicted. GAPDH, Glyceraldehyde 3-phosphate dehydrogenase; LET, linear electron flow; PPP, pentose phosphate pathway.

Enzymes EC No. Metabolism Proteomics Flux
in M chloroplast with the minimum photon requirement for the model with CET in BS chloroplast. C4GEM highlights that CET in BS is energetically more efficient (about 25% less photons required) in NADP-ME-type metabolism than if CET is active in M. Our simulations are consistent with the proteome data showing high NAD(P)H dehydrogenase activity in BS (Table V), which is backed by previous studies showing that ndh gene expression is up-regulated in BS for NADP-ME subtypes (Dariel et al., 2006;Shikanai, 2007).

The Model Predicts Little Difference in Theoretical Quanta Requirement for the C 4 Subtypes
The costs of concentrating CO 2 in BS in NADP-MEand NAD-ME-type plants consist of two ATP per CO 2 fixed for regeneration of PEP, plus the amount of ATP required to pump extra CO 2 (overcycle CO 2 ) to compensate for the CO 2 that leaks out of the BS. In PCK type, one extra ATP and 0.5 extra NADPH are required in addition to the amount required for overcycling.
Our flux simulations show little difference in the theoretical quantum yield of photosynthesis among the three C 4 decarboxylation types (Fig. 4), also discussed in the literature (Shikanai, 2007). Such results indicate that these distinct biochemical subtypes among C 4 plants represent different, equally optimal biochemical solutions to the same problem.
Although the theoretical yield is similar for all the C 4 subtypes, in practice, differences in quantum yield are observed (Hatch et al., 1995). The reasons for the differences in quantum yield between decarboxylation  types have proved elusive to identify. Our results suggest that these differences are not due the distinct C 4 decarboxylation pathways. It has been suggested that if CO 2 leakage (from BS back into M) occurs, this would change quantum yield between C 4 plants (Furbank et al., 1989). However, estimates of leak rates show little correlation between C 4 subtype and quantum yield. Our flux simulations show little change in total quanta requirement under the assumption that leakage occurs (Fig. 4), but the simulation shows considerable differences in activity of linear electron transport and CET around PSI and PSII for each of the C 4 species. These results are consistent with the differences in photochemical activity in C 4 subtypes observed in the literature (Edwards and Walker, 1983;Takabayashi et al., 2005;Romanowska and Drozak, 2006). As discussed previously, photosynthetic electron transport can involve either a linear flow from water to NADP, via PSII and PSI, or a cyclic flow just involving PSI. Because C 4 subtypes have different cellspecific ATP requirements (Edwards and Walker, 1983;Dariel et al., 2006), there are differences in activity of PSI and PSII (Romanowska and Drozak, 2006). It was found, for example, that NAD-ME and PCK species have substantial PSII activity in the BS tissues, while NADP-ME species have little PSII activity. CONCLUSION C4GEM, to our knowledge, is the first genome-scale metabolic reconstruction of C 4 plants. The two-tissue model based on C4GEM, to our knowledge, is the first attempt at constructing and integrating contextspecific metabolic networks in multicellular interactions. The use of this model for in silico flux analysis illustrates the potential of using genome-scale models to explore complex and compartmentalized networks to derive nontrivial hypotheses. The in silico-predicted differences in the metabolic activity in BS and M during C 4 photosynthesis agreed with observed differences in proteome data for many plastidic enzymatic reactions. The fact that in silico fluxes predicted assuming optimal photon use agree with observed proteome differences and C 4 metabolic features reported in the literature lends support to the assumption that gene expression in BS and M has evolved to realize photon optimality.

Metabolic Reconstruction
The reconstruction process employed here is the same as the ones used for mouse (Quek and Nielsen, 2008) and Arabidopsis (Arabidopsis thaliana) GEMs (de Oliveira Dal'Molin et al., 2010). The workflow for the development and refinement of the C 4 genome-scale model (C4GEM) is illustrated in Figure 5. The process of the metabolic reconstruction naturally lends itself to an iterative approach, and subsequent rounds of model refinement facilitate the testing of hypotheses.
The network reconstruction followed these steps. (1) Extraction of the metabolic information. A gene-centric organization of metabolic information was adopted, in which each known metabolic gene is mapped to one or several reactions. The metabolic elements contained in C4GEM (genomic metabolic reactions) were collected from KEGG (release 49.0, August 3, 2009;Kanehisa et al., 2008) for three C 4 plant models: maize (Zea mays), sorghum (Sorghum bicolor), and sugarcane (Saccharum officinarum; stored in an Excel spreadsheet).
(2) Manual curation. The metabolic information was merged into a single generic plant cell model and subsequently curated using literature information. AraGEM was used as a core model to cover gaps found in primary metabolism of the C 4 plant models. Curation was performed in Excel, accounting for cellular enzyme localization.
(3) Enzyme localization. Enzymes were assigned to different compartments according to literature evidence or enzyme localization databases (e.g. PPDB, a plant proteome database for Arabidopsis and maize).
(4) Gene association. In many cases, there are many genes for a particular enzyme (i.e. isoenzymes) found in multiple compartments. Attempts to assign localization based on in silico predictions failed (see "Results and Discussion"), and generally, all genes for a particular reaction had to be assigned to all organelles to which the reaction had been assigned in (3).
(5) Stoichiometric matrix. The set of unique reaction identifiers were extracted and stored as a stoichiometric matrix (Java application). In this step, multiple entries for a reaction in a particular compartment appearing in the Excel gene-enzyme-reaction table are collapsed to a single reaction entry. The stoichiometric matrix is used for flux balance analysis.

Intercellular and Extracellular Transport Elements
A multitissue model capturing the interactions between BS and M was formulated by combining two instances of the C4GEM and adding the movement of photosynthetic intermediates between M and BS cells (Fig. 1), which are restricted largely or entirely to the plasmodesmata (Evert et al., 1977). Transport through the bridge channel plasmodesmata was restricted to the main metabolites that give the characteristics of the C 4 photosynthetic pathway inferred from the literature (Furbank et al., 1989(Furbank et al., , 1990b: that is, malate, pyruvate, phosphoglycerate, triose phosphates, and phosphates (pyrophosphate and orthophosphate; Fig. 1). The extracellular transporters were included for water and gases exchange flux through the cell wall (Evert et al., 1977), biomass drains for M and BS, and Suc drain from BS to vascular parenchyma to represent the translocation of carbon for the sustenance of the sink tissues (Braun and Slewinski, 2009). We have also considered gases efflux (CO 2 and oxygen) to be exchanged through plasmodesmata to test the hypothesis, for example, that some percentage of CO 2 that is released from the C 4 acids in BS leaks back to M, as reported in the literature (Farguhar and Hatch, 1983;Furbank et al., 1990a;Hatch et al., 1995; see "Results and Discussion").

Optimality Criterion
For the C 4 metabolic model, the corresponding photon minimization solution can be expressed as subject to The last constraint can be used to define the previously mentioned irreversibility constraints as well as tissue specificity (e.g. zero flux for Rubisco in M). It is also used to define the fixed output of the system, in the form of biomass drains. To simplify comparisons, biomass synthesis rates were assumed to be the same for the two tissue types (except for starch; see below) and estimated based on literature values (Poorter and Bergkotte, 1992). Inorganic carbon was assumed to be limited in BS (no CO 2 uptake) and free in M, where CO 2 is initially fixed (by PEPC) to form C 4 dicarboxylic acids. It was further assumed that the two tissues can freely exchange other inorganic compounds with the environment, while the exchange rate for organic compounds is assumed to be zero except for exchange via plasmodesmata.

Flux Simulations
Following manual curation, flux balance analysis was applied to confirm that each biomass component could be synthesized and in the subsequent model validation. C4GEM was compiled and curated as described for the compartmentalized C 3 plant model (de Oliveira Dal'Molin et al., 2010). From the data collection, a two-dimensional reaction-centric SBML (www.sbml.org) database was generated. The stoichiometric matrix, S, and reversibility constraints (defining V min ) were extracted from the SBML database in MATLAB (version 9; The MathWorks), and the linear programming problems were solved using the MOSEK Optimization Toolbox for MATLAB (version 6). Constraints were applied in MATLAB (C4Flux tool box) to represent cell type interactions (Fig. 2B). Reactions were activated or deactivated based on gene and enzyme activity studies (Westhoff et al., 1991;Bassi et al., 1995) for the specified cells (e.g. Rubisco is not present in M but present in BS chloroplasts). Biomass drains (see the following section for more details) were also specified based on literature values (Poorter and Bergkotte, 1992). Optimum flux distribution was simulated by linear programming and visualized on a metabolic flux map (which represents the central metabolism of a compartmentalized plant cell) drawn in Excel (Fig. 3).

Biomass Synthesis and Network Functionality
In the final step of the manual curation process, network gaps were identified based on the model's ability to produce biomass components from substrates. Biomass drain reactions are incorporated into C4GEM as the accumulation terms of the biomass precursors (e.g. "Starch = Starch_biomass") in order to simplify the task of uncovering the pathway gaps in the each of the biosynthetic routes separately. The list of biomass components considered is shown in Table II and includes major structural and storage components as well as trace elements such as vitamins.
For each biomass component, the following linear programming problem was formulated and solved maximize subject to where n i is the corresponding biomass drain reaction. In this study, the problem was solved for leaf tissue (photons as energy source, CO 2 as carbon source, and nitrate as nitrogen source) to represent biomass synthesis during C 4 photosynthesis. The model can also be used for nonphotosynthetic tissues, when the minimization criterion is applied for carbon source usage (e.g. Suc).

Flux Variability Analysis
Due to the lack of experimental constraints, flux solutions generated by linear programming for large-scale models are often not unique. Despite having identical network topology and maintaining the same set of constraints and objective function, it is possible to have different flux distributions that give rise to the same objective function value, with each flux distribution representing a different metabolic state (e.g. different reaction utilization and/ or directionality; Lee et al., 2000;Papin et al., 2002). Instead of a single flux distribution, it is therefore important to characterize the flux space encompassing all alternative optimal solutions before making reasonable interpretation of whether a metabolic phenotype is sufficiently different between two systems (e.g. BS versus M). An efficient linear programming-based strategy was used to quantify the range and variability of flux values, which involves minimizing and subsequently maximizing the flux value of all reactions in the network while keeping the same optimum objective function value (Mahadevan and Schilling, 2003). This method identifies the range of fluxes that is possible within all possible alternative optimal solutions. The following linear programming problem was formulated and solved maximize subject to and minimize subject to where v i represents each reaction in the network, c is the objective function vector, and Z obj is the objective function value calculated from the first optimization instance. This method was used to identify and interpret the main contrasts between the two systems, M and BS, as shown in Table V (see "Results and Discussion").

Supplemental Data
The following materials are available in the online version of this article.
Supplemental Table S1. Reactions with no open reading frames in all C 4 models, covered by AraGEM.