- Split View
-
Views
-
Cite
Cite
Zahra Razaghi-Moghadam, Zoran Nikoloski, GeneReg: a constraint-based approach for design of feasible metabolic engineering strategies at the gene level, Bioinformatics, Volume 37, Issue 12, June 2021, Pages 1717–1723, https://doi.org/10.1093/bioinformatics/btaa996
- Share Icon Share
Abstract
Large-scale metabolic models are widely used to design metabolic engineering strategies for diverse biotechnological applications. However, the existing computational approaches focus on alteration of reaction fluxes and often neglect the manipulations of gene expression to implement these strategies.
Here, we find that the association of genes with multiple reactions leads to infeasibility of engineering strategies at the flux level, since they require contradicting manipulations of gene expression. Moreover, we identify that all of the existing approaches to design gene knockout strategies do not ensure that the resulting design may also require other gene alterations, such as up- or downregulations, to match the desired flux distribution. To address these issues, we propose a constraint-based approach, termed GeneReg, that facilitates the design of feasible metabolic engineering strategies at the gene level and that is readily applicable to large-scale metabolic networks. We show that GeneReg can identify feasible strategies to overproduce ethanol in Escherichia coli and lactate in Saccharomyces cerevisiae, but overproduction of the TCA cycle intermediates is not feasible in five organisms used as cell factories under default growth conditions. Therefore, GeneReg points at the need to couple gene regulation and metabolism to design rational metabolic engineering strategies.
Supplementary data are available at Bioinformatics online.
1 Introduction
Computational modeling of metabolic networks has become a prominent approach in the design of cell factories that accomplish a desired goal, e.g. improving yield of a chemical compound (Burgard et al., 2003; Kamp and Klamt, 2017; Patil et al., 2005; Pharkya and Maranas, 2006; Pharkya et al., 2004; Ranganathan et al., 2010; Rocha et al., 2010). The design of cell factories is achieved by metabolic engineering strategies which involve a combination of knockouts, over- or under-expression of native genes of a host organism along with insertion of genes from other organisms and even synthetic enzymes which do not occur in nature (Erb et al., 2017). The creation of metabolic manipulation strategies has largely been propelled by the developments in the constraint-based modeling framework (Burgard et al., 2003; Kamp and Klamt, 2017; Patil et al., 2005; Pharkya and Maranas, 2006; Pharkya et al., 2004; Ranganathan et al., 2010; Rocha et al., 2010). The existing computational approaches for prediction of metabolic engineering strategies can be roughly grouped into optimization-based or enumeration-based. The former use optimization-based formulation of the engineering task and can accommodate the gene manipulations, while the latter enumerate set of reactions whose knockouts achieve a desired goal (Maia et al., 2015).
Common to both types of approaches is that they result in engineering strategies that specify how the fluxes through selected reactions are to be manipulated to achieve the desired goal. For instance, OptKnock uses bilevel programming to identify reaction knockouts which improve a desired goal while ensuring growth (Burgard et al., 2003). In addition, OptReg approach also uses bilevel programming to identify up- or downregulation of reaction fluxes and reach a metabolic engineering goal (Pharkya and Maranas, 2006). In contrast, the concept of minimal cut sets can be used to enumerate reaction knockouts of specific size that disrupt a function in the network. The latter has been recently employed to demonstrate that growth-coupled overproduction is feasible for almost all metabolites in five organisms used in metabolic engineering (Kamp and Klamt, 2017). Yet, what a majority of the existing approaches neglects to determine is whether or not the predicted strategies can, in fact, be realized by appropriate manipulation of the underlying set of genes, included in the models via the gene–protein–reaction (GPR) rules.
GPR rules characterize the genes and respective proteins that catalyze reactions. They can either be included as additional annotation of reactions (Reed et al., 2003) or as part of the stoichiometric matrix that represents the stoichiometry of the reactions included in the model (Machado et al., 2016). GPR rules specify whether a protein catalyzes a reaction on its own or if there are isoenzymes, enzyme complexes or their combinations (Reed et al., 2003). For instance, a reaction catalyzed by two isoenzymes, encoded by the genes and , is denoted by the GPR rule ( OR ); further, a reaction catalyzed by an enzyme complex composed of two units, encoded by and , is denoted by ( AND ). We refer to GPR rules that include more than one gene as complex rules. For instance, in the metabolic network in Figure 1A, the metabolite is taken up as a substrate from the environment to produce metabolite , the undesired by-product , and the metabolite , which denotes the engineering objective. The GPR rules in this example are such that is a reaction catalyzed by an enzyme complex composed of two proteins encoded by the genes of and . Further, to carry flux, reaction requires any of the two isoenzymes encoded by the genes and .
The participation of genes in multiple GPR rules and the inclusion of complex GPR rules may lead to conflicts on the level of gene manipulations. For instance, in the network presented in Figure 1A and B, with the objective of maximizing the production of metabolite , the pathway through and yield the optimal solution (Fig. 1C). On the other hand, maximizing the production of requires flux via the reaction and blocking any flux through reactions and (Fig. 1D). On the level of gene manipulations, can carry flux if the proteins encoded by the genes and are present. However, to remove the reaction , it is necessary to knockout genes and , which encode the associated isoenzymes. Therefore, this leads to an infeasible design for the objective of maximizing the production of due to contradicting gene manipulation—the need for simultaneous presence and absence of the product encoded by the gene .
We hypothesize that such infeasibilities are likely to occur in metabolic engineering strategies obtained with the existing computational approaches in genome-scale metabolic models. For instance, by using a metabolic model of Escherichia coli, OptKnock predicts that 63% increase in production of succinate in comparison to the wild-type is possible by knocking out pyruvate formate lyase and lactate dehydrogenase (Burgard et al., 2003). Looking closer at the GPR rules of pyruvate formate lyase (EC2.3.1.54) in the E.coli iJR904 model (Reed et al., 2003), we observe that it involves five genes (three enzymes) in a complex GPR rule: (b3951 AND b3952) OR (b0902 AND b0903) OR b3114. Knocking out this reaction requires that the three isoenzymes are knockedout. However, the last two terms form the GPR rule of the 2-oxobutanoate formate lyase, which will be also knockedout although the metabolic engineering strategy must not alter the flux through the respective reaction. As another example, OptReg proposes a metabolic manipulation strategy that leads to 94.3% of the maximum theoretical yield of ethanol production of 19.87 mmol/(gDW h). In this design, the overproduction is possible by knocking out pyruvate formate lyase, which is likewise also infeasible (Pharkya and Maranas, 2006). Moreover, Machado et al. (2016) tested the feasibility of strain designs proposed by the enumeration-based method (Kamp and Klamt, 2017) and the results show that fewer than 10% of the proposed solutions are actually feasible, when GPR rules are considered. All these examples demonstrate that complex GPR rules often lead to contradicting gene manipulations, referred to as gene conflicts, and that there is a need for overcoming this important problem in the design of metabolic engineering strategies.
Two constraint-based approaches, OptGene (Patil et al., 2005) and OptORF (Kim and Reed, 2010), facilitate manipulation of genes. However, OptGene suffers from non-linear constraints, which render it impractical for usage with large-scale models. This issue is partly addressed by using an evolutionary programming approach, which, however, allows only to approximate the solution. In contrast, OptORF allows only gene knockouts and off–on overexpression, while finer manipulations are not considered. Although OptORF includes linear constraints that capture the environmental regulation of metabolism, it is challenging to implement this approach since it requires additional information about condition-specific activation of enzyme-coding genes that is not readily available even for model organisms that experience multi-factorial environmental cues (see Section 3.2, for further details). There is a modified version of OptORF which is based on bilevel programming that only considered knockout manipulations while claiming to couple reaction and gene manipulations (Kim et al., 2011). This variant can be seen as analogous to OptKnock (Burgard et al., 2003), with a modified objective that prefers fewer gene knockout. While this variant of OptORF is feasible to implement (see https://github.com/MonaRazaghi/GeneReg), the considered constraints that couple reaction flux and its activity [on–off, see constraint A.3 in Supplementary Text of Kim et al. (2011)] leads to the case in which a reaction does not carry flux (since the lower flux bound is often set to zero), although it is considered active. Therefore, we recognize that there is no working solution that resolves gene conflicts while allowing fine tuning of gene manipulations, which is the problem we address in this study.
Another elegant way to approach this problem is to rely on logical transformation of model (LTM) that enforces gene–reaction associations. These associations can then be combined with constraint-based approaches for metabolic engineering (Zhang et al., 2015). LTM is based on inclusion of pseudo-reactions and metabolites that facilitate the simulation of complex GPR rules (i.e. isoenzymes and protein complexes). As a result, the metabolic models extended based on LTM have usually 2- to 5-fold more reactions and metabolites in comparison to the original models, while leaving the number of gene intact (Zhang et al., 2015). However, LTM only allows the simulation and design of knockout strategies, while fine-tuned up- and downregulations of genes are not considered. The question that remains is whether the metabolic engineering strategies feasible at the gene level can be achieved with a considerably smaller number of variables, rendering the models applicable with more involved constrained-based approaches (e.g. considering integer-value variables).
Moreover, and most importantly, both approaches with and without LTM when applied to design knockout strategies do not ensure that the engineering goal is achieved only by gene knockout (i.e. without any other transformations, such as up- or downregulation). For instance, in the metabolic network in Figure 1A and B, with the objective of maximizing the production of metabolite , knockout strategies only propose blocking flux through reactions and (Fig. 1D) and ignore the fact that the engineering goal is impossible without upregulation of . This possibility is also not precluded by the elegant LTM approach.
Here, we first investigate how complex the GPR rules are across organisms, including those relevant for biotechnological applications, for which there are high-quality metabolic models available. We then propose an optimization-based approach, called GeneReg, which predicts gene manipulation strategies while avoiding gene conflicts. Finally, we show that the approach is applicable to genome-scale metabolic networks of E.coli and Saccharomyces cerevisiae, and pinpoint the differences in comparison to the contenders. Applications of GeneReg brings into question recent claims about feasibility of growth-coupled overproduction of key metabolites in major production organisms, namely E.coli, the Gram-positive bacterium Corynebacterium glutamicum, the filamentous fungus Aspergillus niger and the cyanobacterium S. sp. PCC6803 (Kamp and Klamt, 2017). Altogether, GeneReg provides the means for rational and feasible design of cell factories.
2 Materials and methods
Equations (4) and (5) show that in case of upregulation, i.e. when , the flux range is [,] and [,], otherwise. Similarly, Eqs. (6) and (7) restrict the flux range in response to downregulation. The constraints in Eqs. (2)–(7) are motivated by OptReg (Pharkya and Maranas, 2006), and resolve the issue of this approach so that down- and upregulated reactions in an engineering strategy have fluxes which do not fall in the FVA ranges at optimal growth (i.e. those of the wild-type) if they are declared as altered.
To overcome the shortcomings of reaction-based strategies, we opt to design a manipulation strategy that is feasible at the gene level. Therefore, we introduce novel constraints that capture the coupling between the modulation of the reactions and the respective genes. Our main contribution is the encoding of constraints for the different GPR rules in a form of a mixed-integer linear program (MILP).
Equations (12) and (13) model the coupling between gene manipulation and the up-/downregulation of reaction , i.e. implies that must be manipulated, and implies that remains unchanged. To prevent conflicts, for every gene we add the constraint that .
3 Results
3.1 Complex GPR rules across organisms
To determine whether gene conflicts may arise in the design of metabolic engineering strategies, we first investigate the occurrence of complex GPR rules in 21 metabolic models across different species from the BiGG database (King et al., 2016) (Supplementary Table S1). We find that between 48.5% and 93.1% of reactions in the models are associated to at least one gene. Of these reactions, between 12.4% and 54.8% have complex rules, with an average number of rules per gene ranging from 1.4 to 13.7 (Fig. 2 and Supplementary Table S1). In addition, we observe that the maximum number of GPR rules in which a gene participates varies from 8 to 1040. Since a gene can be associated with multiple reactions, it can be included in different rules. Calculating the maximum percentage gene occurrence in rules over all genes in a model, we find that it varies between 1.5% and 28.5% over the considered models (Fig. 2 and Supplementary Table S1). We also observe that some of the GPR rules for one reaction are embedded as part of other, more complex rules, and refer to these as rule repetitions. We find that the average rule repetition ranges from 0.02% to 0.4%, while the maximum rule repetition ranges from 1.4% to 25.1% across the models (Fig. 2 and Supplementary Table S1). Therefore, we conclude that complex rules are present to a considerable degree in the analyzed metabolic models and, therefore, gene conflicts are expected to be present in the design of metabolic engineering strategies. As a result, to arrive at metabolic engineering strategies that are feasible at the gene level, we rely on the GeneReg approach explained in detail in Section 2.
3.2 Feasible strategy for overproduction of ethanol in E.coli
To assess the performance of the proposed optimization approach, GeneReg, we employed it on the iJR904 metabolic model for E.coli, to determine a modification strategy for overproducing ethanol feasible at the gene level. This model includes 761 metabolites participating in 1075 reactions (821 irreversible and 254 reversible reactions), along with 904 genes associated to 873 GPR rules. The maximum possible flux for ethanol synthesis () is 20 mmol/(gDW h), while the maximum flux of ethanol synthesis () at the optimal growth (of 0.922 per hour) is 0.046 mmol/(gDW h). We are interested in designing a metabolic engineering strategy that results in 30% of the maximum production of ethanol at reduction in growth not larger than 50% of the optimal. To ensure the existence of a feasible solution, the tunable parameter is set to 0.01, and no limit is placed on the total number of reaction manipulations. With this setting, GeneReg identifies a modification strategy which involves 85 genes and 300 reactions, of which 114 include GPR rules (see Supplementary Tables S2 and S3).
The modified reactions take part in 14 metabolic subsystems. The eight metabolic subsystems with the largest number of modified reactions include the cell envelope biosynthesis pathway (28), cofactor and prosthetic groups (24), nucleotide salvage pathway (12), histidine metabolism (9), membrane lipid metabolism (7), tyrosine, tryptophan and phenylalanine metabolism (7), valine, leucine and isoleucine metabolism (6), methionine metabolism (6) (Supplementary Table S3).
It can be expected that avoiding conflicts in gene-based strategies may lead to a larger number of manipulations in comparison to the number of manipulations in reaction-based designs. To compare the strategy proposed by GeneReg with that of reaction-based designs with respect to the number of reactions manipulated, we applied a modified version of OptReg. The modification to optReg include: (i) flux ranges in response to up-/downregulations (see Section 2) and (ii) the minimization of the number of reaction manipulations, while guaranteeing that a certain factor of optimal growth is achieved. The strategy predicted by the modified OptReg implies the manipulation of 99 reactions, of which 74 are associated with GPR rules (see Supplementary Table S4). This implies that, with the comparable number of manipulations based on the reaction associated with genes, GeneReg proposes a feasible strain design of similar size while addressing the shortcomings of the reaction-based design.
To assess GeneReg and the modified version of OptReg, we examine the reactions associated with GPR rules proposed for manipulation by both approaches. To increase the production of ethanol, the solution of GeneReg includes 114 reactions with GPR rules, 110 of which need to be downregulated and the other four are upregulated. Respectively, out of 74 reactions considered by the modified OptReg, 73 have to be downregulated and the only reaction to be upregulated is fatty acid oxidation (n-C16:0). The complex GPR rule corresponding to fatty acid oxidation (n-C16:0) comprises four genes that also participate in GPR rules of three other reactions, whose flux must remain unchanged. This again indicates at the infeasibility of the strategy proposed by modified OptReg. In comparison, the reactions proposed by GeneReg to be upregulated involve three genes b1262, b1263 and b1264, which participate in no other reactions (see Supplementary Table S5).
Taking a closer look at the list of reactions to downregulate shows that the strategy devised by GeneReg specifies 37 more reactions in addition to the 73 proposed by the modified OptReg. To illustrate the issue, we consider a reaction that is predicted to be downregulated by both approaches, namely UDP-sugar hydrolase. This reaction is associated with gene b0480, which itself is present in the GPR rules of 11 other reactions. While all of these reactions must have unaltered flux, according to OptReg, they are all considered as downregulated in the strategy proposed by GeneReg (see Supplementary Tables S3 and S4).
Comparison with OptGene is not possible due to the mentioned non-linear constraints involved, rendering the application to genome-scale models impractical. Comparison to OptORF is not feasible due to the lack of public implementation and availability of fully parameterized model, detailing the active reactions as input to this approach. For a comparison on a small example, akin to that shown in Figure 1, please, see Supplementary Material.
3.3 General observations for the tunable parameter
As previously explained, the tunable parameter determines the degree of deviation from the wild-type flux values at the optimal growth. To investigate the effect of this parameter on the proposed strategy for overproducing ethanol, we used five different values of in the optimization formulation, and determine the effect of tuning this parameter on the number of reactions and genes required to be manipulated (Fig. 3). As shown in Figure 3, the higher value of the tunable parameter results in a larger number of genes and reactions (with or without GPR rules) to be manipulated. For values of the parameter of at least 0.05, no feasible solution can be identified. The question is then how general are these observations?
The parameter determines the range in which a particular reaction is considered up- or downregulated. By increasing the value of parameter the ranges for up-/downregulations, determined by FVA, become smaller. Therefore, the number of reactions and genes that need to be manipulated increases. Importantly, in the case of knockouts, the sets of genes and reactions that need to be knockedout for a given value of are always subsets of the sets proposed for a higher value of the . However, this is not guaranteed for up-/downregulation, since the changes in the ranges of up-/downregulation have effect on the definition of correspondingly regulated reactions. Therefore, GeneReg looks for new sets of upregulated reactions and genes (see Supplementary Table S6).
To determine which value of to use, we recommend that a search in the interval [0, 1], with increments of 0.01, is conducted up to reaching infeasibility for the MILP formulation of GeneReg and based on the strength of the modifications (e.g. promoters, silencing) used in the engineering strategy. Following this suggestion, we also conducted empirical tests for the time required to design an engineering strategy. Our results indicate that the running time on the different values for ranges from 18.8 to 992.8 min, for overproducing ethanol in E.coli (see Supplementary Table S6). The timing in other models is expected to depend on the size of the model and the engineering strategy to be reached.
3.4 Feasible strategy for overproduction of lactate in S.cerevisiae
The findings from the E.coli model have motivated us to examine the performance of GeneReg on a larger metabolic model. To this end, we employed GeneReg on the metabolic model of S.cerevisiae, yeast 7.6 (Aung et al., 2013), to identify a metabolic engineering strategy that increases the production of lactate—a compound that has diverse applications in food-processing and industry (Chang et al., 1999). We used the yeast 7.6 model that contains 2220 metabolites participating in 3493 reactions, of which 2302 are associated with GPR rules accounting for 909 genes. The optimal growth is 23.73 per hour associated with the maximum flux toward lactate synthesis is 56.66 mmol/(gDW h), while the maximum possible flux toward lactate synthesis is 1000 mmol/(gDW h). Here, we are interested in achieving 20% of the maximum production of lactate, while guaranteeing that 80% of optimal growth is achieved. The tunable parameter is set to 0.001 and no limit is placed on the total number of reaction manipulations. GeneReg predicts a modification strategy in which 134 reactions needed to be manipulated, out of which 41 are associated with GPR rules. The minimum number of genes required to implement these manipulations is 36 (see Supplementary Table S7).
3.5 Comparison of model size between GeneReg and LTM
LTM provides an elegant way to fully capture the gene–reaction association while considering the specific types of GPR rules. While LTM leads to the increase in the number of metabolites and reactions (Zhang et al., 2015), it is also important to determine the number of variables and constraints used in the design of a metabolic engineering strategy. These numbers determine the size of the MILPs used. Therefore, we examined the number of variables as well as equality and inequality constraints resulting from the application of GeneReg and LTM to models of five organisms: E.coli [iJR904 (Reed et al., 2003)], S.cerevisiae [yeast 7.6 (Aung et al., 2013)], the Gram-positive bacterium C.glutamicum [iJM658 (Mei et al., 2016)], the filamentous fungus A.niger [iMA871 (Andersen et al., 2008)] and the cyanobacterium S. sp. PCC6803 [iJN678 (Nogales et al., 2012)]. For the case when only knockouts are considered, as shown in Table 1, the number of variables in LTM in comparison to GeneReg is 1.7-fold larger for the yeast 7.6 model and as much as 3.2-fold larger for the model of A.niger. Similarly, the number of constraints is at least 2-fold larger in LTM in comparison to GeneReg across all five models. Therefore, we concluded that GeneReg consistently results in smaller models. Moreover, for the case of GeneReg with up- and downregulation of reaction fluxes, the number of variables in LTM is 1.1-fold larger for the yeast 7.6 model and as much as 2-fold larger for the
. | Number . | E.coli (iJR904) . | S.cerevisiae (yeast 7.6) . | C.glutamicum (iJM658) . | A.niger (iMA871) . | S. sp. PCC6803 (iJN678) . |
---|---|---|---|---|---|---|
Original | Metabolites | 761 | 2220 | 982 | 1049 | 795 |
Reactions | 1329 | 4821 | 1452 | 1988 | 1087 | |
Genes | 904 | 909 | 658 | 731 | 622 | |
LTM | Metabolites | 2103 | 5112 | 2461 | 4862 | 1828 |
Reactions | 2930 | 8630 | 3408 | 7225 | 2313 | |
Variables | 6764 | 18 169 | 7474 | 15 181 | 5305 | |
Eq. constraints | 5033 | 13 742 | 5869 | 12 087 | 4141 | |
In. constraints | 5860 | 17 260 | 6818 | 14 450 | 4626 | |
GeneReg | Variables | 3614 | 10 622 | 3593 | 4707 | 2886 |
Eq. constraints | 1539 | 4426 | 1838 | 1513 | 1404 | |
In. constraints | 3324 | 10 986 | 3484 | 5026 | 2650 |
. | Number . | E.coli (iJR904) . | S.cerevisiae (yeast 7.6) . | C.glutamicum (iJM658) . | A.niger (iMA871) . | S. sp. PCC6803 (iJN678) . |
---|---|---|---|---|---|---|
Original | Metabolites | 761 | 2220 | 982 | 1049 | 795 |
Reactions | 1329 | 4821 | 1452 | 1988 | 1087 | |
Genes | 904 | 909 | 658 | 731 | 622 | |
LTM | Metabolites | 2103 | 5112 | 2461 | 4862 | 1828 |
Reactions | 2930 | 8630 | 3408 | 7225 | 2313 | |
Variables | 6764 | 18 169 | 7474 | 15 181 | 5305 | |
Eq. constraints | 5033 | 13 742 | 5869 | 12 087 | 4141 | |
In. constraints | 5860 | 17 260 | 6818 | 14 450 | 4626 | |
GeneReg | Variables | 3614 | 10 622 | 3593 | 4707 | 2886 |
Eq. constraints | 1539 | 4426 | 1838 | 1513 | 1404 | |
In. constraints | 3324 | 10 986 | 3484 | 5026 | 2650 |
Note: The comparison is performed for five models of organisms used as cell factories. Eq. and In. constraints stand for Equality and Inequality constraints, respectively. The number of reactions considers the splitting of reversible reactions.
. | Number . | E.coli (iJR904) . | S.cerevisiae (yeast 7.6) . | C.glutamicum (iJM658) . | A.niger (iMA871) . | S. sp. PCC6803 (iJN678) . |
---|---|---|---|---|---|---|
Original | Metabolites | 761 | 2220 | 982 | 1049 | 795 |
Reactions | 1329 | 4821 | 1452 | 1988 | 1087 | |
Genes | 904 | 909 | 658 | 731 | 622 | |
LTM | Metabolites | 2103 | 5112 | 2461 | 4862 | 1828 |
Reactions | 2930 | 8630 | 3408 | 7225 | 2313 | |
Variables | 6764 | 18 169 | 7474 | 15 181 | 5305 | |
Eq. constraints | 5033 | 13 742 | 5869 | 12 087 | 4141 | |
In. constraints | 5860 | 17 260 | 6818 | 14 450 | 4626 | |
GeneReg | Variables | 3614 | 10 622 | 3593 | 4707 | 2886 |
Eq. constraints | 1539 | 4426 | 1838 | 1513 | 1404 | |
In. constraints | 3324 | 10 986 | 3484 | 5026 | 2650 |
. | Number . | E.coli (iJR904) . | S.cerevisiae (yeast 7.6) . | C.glutamicum (iJM658) . | A.niger (iMA871) . | S. sp. PCC6803 (iJN678) . |
---|---|---|---|---|---|---|
Original | Metabolites | 761 | 2220 | 982 | 1049 | 795 |
Reactions | 1329 | 4821 | 1452 | 1988 | 1087 | |
Genes | 904 | 909 | 658 | 731 | 622 | |
LTM | Metabolites | 2103 | 5112 | 2461 | 4862 | 1828 |
Reactions | 2930 | 8630 | 3408 | 7225 | 2313 | |
Variables | 6764 | 18 169 | 7474 | 15 181 | 5305 | |
Eq. constraints | 5033 | 13 742 | 5869 | 12 087 | 4141 | |
In. constraints | 5860 | 17 260 | 6818 | 14 450 | 4626 | |
GeneReg | Variables | 3614 | 10 622 | 3593 | 4707 | 2886 |
Eq. constraints | 1539 | 4426 | 1838 | 1513 | 1404 | |
In. constraints | 3324 | 10 986 | 3484 | 5026 | 2650 |
Note: The comparison is performed for five models of organisms used as cell factories. Eq. and In. constraints stand for Equality and Inequality constraints, respectively. The number of reactions considers the splitting of reversible reactions.
model of A.niger. In this case, the number of constraints in GeneReg and LTM is similar (see Supplementary Table S8). Most importantly, LTM allows the design of engineering strategies that involve knockouts, but does not facilitate the development of strategies that include up- and downregulation of genes, which is the principal novelty of GeneReg.
3.6 Infeasibility of overproduction for key metabolites in five major production organisms
Kamp and Klamt (2017) used the concept of minimum cut set to design optimal reaction-based strategies, and showed that it is feasible to overproduce almost all metabolites in genome-scale metabolic models of the five organisms, enumerated above, even at no growth. To investigate the feasibility of gene-based strategies, we applied GeneReg to increase the production of metabolites participating in the tricarboxylic acid (TCA) cycle as well as amino acids present in the five models analyzed above (see Supplementary Tables S9 and S10). Looking at the structure of GPR associations in each organism, we find that more than a fifth of the rules involve at least two genes (Fig. 4). Therefore, this result suggests infeasibility of optimizing the production of metabolites at the core of central metabolism, like the intermediates of the TCA cycle.
To increase the likelihood of finding a solution, the tunable parameter is set to 0.005 and 0, allowing small flux changes to be considered, and we place no limit on the total number of reaction manipulations. A metabolite is considered as an optimization objective, if there is room for improvement of the metabolite synthesis, i.e. the ratio of to is bigger than 1 for its sink (or exchange) reaction. The results show that feasible metabolic strategies at the gene level exist only for the increased production of malate, citrate, aspartate, cysteine, glutamine and methionine in the E.coli model and the increased production of pyruvate in S.cerevisiae, when the parameter is set to 0.005 and at least 20% of optimal growth is achieved (see Supplementary Tables S9 and S10).
We would like to emphasize that the metabolic engineering strategies from constraint-based modeling approaches depend strongly on the constraints and the model used for their design. For instance, the overproduction of pyruvate, succinate, serine and threonine is not feasible when the upper-bound of 20 mmol/(gDW h) for oxygen uptake is imposed and glucose (the default carbon source) is the sole carbon source used in the iJR904 E.coli model (see Supplementary Table S11). While GeneReg predicts that an increase in production of fumarate is not feasible under the previously mentioned constraints, the usage of glycerol as a carbon source allows the possibility for overproduction of this compound, in line with experimental evidence (Li et al., 2014). By considering three different environments (see Supplementary Table S11), we found that only for three amino acids, i.e. tryptophan, tyrosine and valine, GeneReg reports infeasibility for increase of production by 10% in these environments in contrast to feasible engineering strategies documented in the literature (Chávez-Béjar et al., 2008; Panichkin et al., 2016; Park et al., 2007). However, closer look at this strategy point out they consider alteration (e.g. removal) or allosteric regulation effects of metabolites on enzymes, for which the constraint-modeling framework remains underdeveloped. In addition, the parameter also will have effect on the reported infeasibilities. For instance, when , GeneReg predicts that fumarate production can be increased by 20% even under default constraints of iJR904 E.coli model. Therefore, GeneReg demonstrates the need to couple gene regulation and metabolism to understand the limits of metabolic engineering under specific assumptions of the constraint-based modeling framework.
4 Conclusion
Here, we proposed GeneReg, a constraint-based approach based on MILP formulation to design gene manipulation strategies that facilitate under- and overexpression of genes at a genome-scale level. GeneReg formulation of up- and downregulation of reactions is motivated by OptReg, but is modified to account for differences between wild-type and mutant ranges. The novelty of GeneReg consists of constraints that ensures coupling of the reaction and gene manipulations as well as constraints that avoid conflict at the gene level. As a result, GeneReg facilitates the generation of manipulation strategies feasible at the gene level.
Our findings demonstrate that GeneReg is applicable with large-scale metabolic networks in model organisms as well as organisms used as cell factories. In contrast to the results about growth-coupling of the overproduction of almost every metabolite, we found that there is no strategy feasible at the gene level that leads to overproduction of any intermediates in the TCA cycle as well as amino acids present in the analyzed models (with few exceptions in E.coli and S.cerevisiae) under the default growth conditions. However, we also identify that changes in environmental cues (e.g. carbon used) has an effect on the proposed engineering strategies, in line with experimental evidence. Therefore, coupling of gene and reaction levels, along with context-dependent activity and allosteric regulation of enzymes, should be given further emphasis in future studies aimed at design of feasible and more accurate metabolic engineering strategies, and motivates the integration of gene regulatory and metabolic networks for improved prediction of complex traits.
Funding
This work was supported by the MELICOMO project 031B0358B of the German Federal Ministry of Science and Education to Z.N.
Conflict of Interest: none declared.