-
PDF
- Split View
-
Views
-
Cite
Cite
Evangelos Simeonidis, Nathan D Price, Genome-scale modeling for metabolic engineering, Journal of Industrial Microbiology and Biotechnology, Volume 42, Issue 3, 1 March 2015, Pages 327–338, https://doi.org/10.1007/s10295-014-1576-3
- Share Icon Share
Abstract
We focus on the application of constraint-based methodologies and, more specifically, flux balance analysis in the field of metabolic engineering, and enumerate recent developments and successes of the field. We also review computational frameworks that have been developed with the express purpose of automatically selecting optimal gene deletions for achieving improved production of a chemical of interest. The application of flux balance analysis methods in rational metabolic engineering requires a metabolic network reconstruction and a corresponding in silico metabolic model for the microorganism in question. For this reason, we additionally present a brief overview of automated reconstruction techniques. Finally, we emphasize the importance of integrating metabolic networks with regulatory information—an area which we expect will become increasingly important for metabolic engineering—and present recent developments in the field of metabolic and regulatory integration.
Special Issue: Metabolic Engineering.
Introduction
Organisms natively use metabolic, mostly enzymatically catalyzed reactions to convert raw materials into the essential substances that are needed for the survival of their cells. As such, they represent a tremendous resource of existing biological machinery to carry out biochemical transformations. Metabolic engineering involves the process of modifying the metabolic potential and genetics of a microorganism to our advantage to increase the production of a specific substance of interest [91]. The objective of metabolic engineering is thus to reroute metabolism towards a pathway of interest to improve production of commercially valuable chemicals on an industrial scale. This has been achieved for several commodities, including fuels, pharmaceuticals, drinks such as wine and beer, fine chemicals and diesels. In short, many biotechnological products are being produced using microbial strains as cell factories [3, 9, 19, 37, 53, 79], with an increasing number on the horizon [35, 70, 104, 107, 109].
Traditionally, metabolism was altered using classical breeding and random mutagenesis, followed by selection and screening [65]. More recently, however, the introduction of recombinant DNA techniques has allowed the application of targeted genetic changes [47, 111] through gene knockouts, overexpression, and expression of heterologous genes [50]. In large part owing to the advent of genomics and systems biology, we nowadays have a number of new tools that generate a wealth of data for analysis, contributing to our understanding of metabolism and cellular behavior. Improved knowledge and new analytical tools [14, 67, 68] are increasingly available for use in the development of novel microbial strains with phenotypes that allow production of various bulk chemicals [74, 113]. Successful applications, for example using the model organisms Escherichia coli, Saccharomyces cerevisiae and Corynebacterium glutamicum (for amino acid production mainly) as production hosts, have been reported widely in the literature [52, 103].
Metabolic engineering focuses on altering the function of enzymes, transporters, or regulatory proteins informed by existing knowledge of the metabolic network, enzymes, their encoding genes, and overall regulation [59]. Strategies focus on either introducing new metabolic enzyme functions and pathways or altering existing metabolic pathways to optimize production of the chemical of interest [47]. For either strategy, detailed understanding of the network and a way to determine the distribution of flux [96] are necessary. Metabolic analysis methods are powerful analytical tools that can be utilized extensively in metabolic engineering, as they allow exploration and detailed consideration of the structure and design of a metabolic network [83]. Stoichiometric methods in particular, which are based on collecting all the available biochemical knowledge surrounding a particular metabolic network of an organism, have helped to construct a collection of metabolic models for an expanding number of microorganisms based on annotated genome sequences. Such models allow researchers to conduct simulations based on all known reactions occurring in the metabolic network of an organism using only the knowledge of the stoichiometry of the network as input and, thus, make computational predictions for achievable metabolic states of an organism under varying conditions. These predictions can encompass the outcomes of genetic manipulations, including but not limited to removal or addition of reactions to the network. The capability to perform such manipulations and simulate the results computationally forms the basis for rational metabolic engineering [61] and provides an aid for prospective study design [30, 44].
Here, we review applications and successes of genome-scale modeling for metabolic engineering, provide an overview of the metabolic reconstruction process (particularly the tools for automated reconstructions), and briefly offer our view on future developments of the field.
The flux balance analysis (FBA) formulation

Conceptual illustration of flux balance analysis formulation and solution. a Reconstruction of a genome-scale metabolic network is performed by mathematically representing the flux through the reactions of the network. b The stoichiometric matrix for the system is constructed to represent the stoichiometry of all reactions, and the mathematical formulation for FBA is based on the steady-state condition. These stoichiometric constraints coupled with minimum and maximum bounds on reaction rates define the steady-state solution space. c FBA provides a method for calculating achievable fluxes through the system (c2), based only on the knowledge of the stoichiometry of a metabolic network (c1). Through simulations, alternative solutions can also be identified and/or the effects of alterations to the network, such as gene deletions or additions, can be predicted (c3). The “1” in the graph signifies that a reaction is “on”, i.e., there is flux through it
Vector w incorporates weights that represent the relative contribution of each reaction to the objective function. FBA formulations constitute linear programming (LP) problems, which makes the FBA approach suitable for application to very large metabolic networks. Typically, genome-scale metabolic networks consist of hundreds or a few thousand reactions. LP solvers are capable of solving problems with tens of thousands or more variables. The solution of an FBA problem is unique for the optimal value of the objective function, and also results in a non-unique (except in trivial cases) calculation of a flux distribution through every reaction in the system. Subsequently, patterns of consumption and production of each metabolite can be determined for systems with thousands or tens of thousands of components. Crucially, kinetic information or enzyme concentrations are not required for the analysis; although such information can be incorporated for increased accuracy. This lack of a high number of parameters greatly reduces the opportunities for overfitting models—although some overfitting certainly still exists, for example in the choices during model construction—and makes the resulting models amenable to very broad use across a wide range of organisms at the genome scale. Additional methods such as Flux Variability Analysis [55] or Monte Carlo sampling of solution spaces [4, 73] can address the variability possible in each of these reaction fluxes, providing insight to the full range of achievable metabolic states of a system given physico-chemical constraints and a finite set of biological measurements.
Genome-scale reconstructions
The reconstruction of genome-scale metabolic models requires the construction of an S matrix that closely represents the biochemistry of the organism. Models for an ever increasing number of bacteria have been published in recent years [58] (examples: [6, 12, 25, 31, 32, 34, 57, 60, 63, 82, 86, 110]) and more papers describing both new reconstructions and improvements upon previous iterations are published regularly. Most reconstructions are now available in a standard format such as the systems biology markup language (SBML) [36]. The SBML files can easily be imported into most software applications for FBA, such as the COBRA Toolbox [10]. Nevertheless, wherever a pre-existing model is not readily available (including when the existing model is not of the necessary quality or does not cover the required elements of metabolism for the intended analysis), a new reconstruction is needed. This process is data intensive and involves gathering species-specific information from genome annotations, high-throughput experiments, the literature and/or publically available databases, such as KEGG [41], EcoCyc [42], BKM-react [46], or BRENDA [84]. Gap-filling methodologies are subsequently applied [13, 75] to improve connectivity to the point where the model can simulate phenotypes. As labor intensive as manual reconstruction is, the process has been well developed and described [95].
Automated reconstructions
As pointed out above, the construction of a genome-scale model is a complex task; but tools for improving and accelerating this process are becoming increasingly available. To reduce the painstaking process of manual annotation, draft metabolic models can be built by utilizing and integrating the resources available in various biological databases in an automated manner. Several such automated methods have been reported in the literature; for example, Model SEED [24, 33] is an online resource designed to simplify the construction of a genome-scale model by utilizing an automated framework. Model SEED can be used to create genome-scale metabolic models in a high-throughput manner, by automating the annotation of the genome, producing a preliminary reconstruction of the metabolic network, performing automatic gap filling of reactions necessary for cellular growth, and, when such data are available, incorporating array and gene essentiality data to improve the quality of the reconstruction. BioNetBuilder [8] is a Cytoscape plugin with a user-friendly interface to create biological networks integrated from several databases. ReMatch [71] is a web-based framework that reconstructs a metabolic network by integrating user-developed models into a database collected from several comprehensive metabolic data resources, including KEGG, MetaCyc and CheBI. The SuBliMinaL Toolbox [93] is a framework for reconstructing metabolic networks by providing independent modules that can be used individually or in a pipeline, and can perform tasks that are common in every reconstruction process, such as generating a draft, determining metabolite protonation states, mass-balancing reactions, compartmentalizing the cell, adding transport reactions, creating a biomass function and exporting the reconstruction in a format readable by software packages (typically SBML). Reyes et al. [77] presented an automatic method for the reconstruction of genome-scale metabolic models for any organism implemented in COPABI. Dale et al. [23] developed a method for predicting metabolic pathways that relies on machine learning approaches to reconstruct the network of an organism. In addition to automated tools, there have also been instances of semi-automated tools in the literature, for example reconstruction, analysis and visualization of metabolic networks (RAVEN) [2] is a toolbox for semi-automated reconstruction of genome-scale models, which accesses published models and the KEGG database to build a draft reconstruction, coupled with extensive gap filling and quality control. Microbes Flux [28] and a method presented by Zhou [112] both make extensive use of KEGG to achieve the construction of a draft metabolic model. Finally, Benedict et al. [13] presented a likelihood-based gap filling method that can automatically improve the quality of metabolic reconstructions by incorporating alternative potential gene annotations. This method assigns a score to gene annotations based on sequence homology, selects the most likely pathways for gap filling using an mixed integer linear programming (MILP) formulation and identifies orphaned reactions. The likelihood-based approach performs better both quantitatively and qualitatively when compared to pre-existing algorithms.
While automated methods significantly decrease the time and effort required for reconstructing a new metabolic model, there is still need for user feedback and manual curation to improve the quality and accuracy of the metabolic model. This is especially true during the final stages of the reconstruction, as the resulting model is being validated against experimental data. The curator is responsible for assessing the precision and accuracy of the model, and for evaluating if there is further need for gap filling, removing futile cycles and improvement of the biomass reaction. Semi-automated methods permit greater flexibility for user intervention during the reconstruction process and constitute a good compromise for refining an initial draft model to further elevate the quality of the reconstruction up to the required standards.
Once a working model has been constructed and improved to a satisfactory level, in silico experiments can predict flux distribution ranges and phenotypic behavior under conditions of the user’s choice. Targets for possible genetic manipulation to improve strain performance can be identified through comparative studies under both genetic and environmental perturbations. The model can then be used to calculate knockout lethality or growth rates, and results can be compared to experimental observations, which allows for the model to be iteratively tested and improved [40]. Several computational approaches for network manipulation and phenotypic simulation have been developed, such as the COBRA Toolbox for MATLAB [10], a popular FBA simulator.
Successes of genome-scale modeling
Flux balance analysis and related constraint-based methods can be used to predict the optimal set of gene knockout and overexpression targets to increase an organism’s ability to produce a chemical of interest. Here, we present various applications of genome-scale modeling to gage the impact this computational approach has had on metabolic engineering efforts. Table 1 summarizes examples of successes of genome-scale modeling in the context of metabolic engineering.
Examples of recent developments and successes of genome-scale modeling in metabolic engineering
Publication . | Year . | Target . | Organism . |
---|---|---|---|
Lee et al. [49] | 2002 | Succinic acid production | E. coli |
Alper et al. [5] | 2005 | Lycopene production | E. coli |
Bro et al. [16] | 2006 | Decrease glycerol and increase ethanol yield | S. cerevisiae |
Lee et al. [48] | 2007 | Threonine production | E. coli |
Park et al. [66] | 2007 | l-valine production | E. coli |
Song et al. [90] | 2008 | Optimize media and succinic acid production | M. succiniciproducens |
Meijer et al. [56] | 2009 | Succinic acid production | A. niger |
Ohno et al. [62] | 2013 | Butanol, propanol, propanediol production | E. coli |
Sun et al. [93] | 2014 | Terpenoid biosynthesis | S. cerevisiae |
Borodina et al. [15] | 2015 | 3-Hydroxypropionic acid biosynthesis | S. cerevisiae |
Publication . | Year . | Target . | Organism . |
---|---|---|---|
Lee et al. [49] | 2002 | Succinic acid production | E. coli |
Alper et al. [5] | 2005 | Lycopene production | E. coli |
Bro et al. [16] | 2006 | Decrease glycerol and increase ethanol yield | S. cerevisiae |
Lee et al. [48] | 2007 | Threonine production | E. coli |
Park et al. [66] | 2007 | l-valine production | E. coli |
Song et al. [90] | 2008 | Optimize media and succinic acid production | M. succiniciproducens |
Meijer et al. [56] | 2009 | Succinic acid production | A. niger |
Ohno et al. [62] | 2013 | Butanol, propanol, propanediol production | E. coli |
Sun et al. [93] | 2014 | Terpenoid biosynthesis | S. cerevisiae |
Borodina et al. [15] | 2015 | 3-Hydroxypropionic acid biosynthesis | S. cerevisiae |
Examples of recent developments and successes of genome-scale modeling in metabolic engineering
Publication . | Year . | Target . | Organism . |
---|---|---|---|
Lee et al. [49] | 2002 | Succinic acid production | E. coli |
Alper et al. [5] | 2005 | Lycopene production | E. coli |
Bro et al. [16] | 2006 | Decrease glycerol and increase ethanol yield | S. cerevisiae |
Lee et al. [48] | 2007 | Threonine production | E. coli |
Park et al. [66] | 2007 | l-valine production | E. coli |
Song et al. [90] | 2008 | Optimize media and succinic acid production | M. succiniciproducens |
Meijer et al. [56] | 2009 | Succinic acid production | A. niger |
Ohno et al. [62] | 2013 | Butanol, propanol, propanediol production | E. coli |
Sun et al. [93] | 2014 | Terpenoid biosynthesis | S. cerevisiae |
Borodina et al. [15] | 2015 | 3-Hydroxypropionic acid biosynthesis | S. cerevisiae |
Publication . | Year . | Target . | Organism . |
---|---|---|---|
Lee et al. [49] | 2002 | Succinic acid production | E. coli |
Alper et al. [5] | 2005 | Lycopene production | E. coli |
Bro et al. [16] | 2006 | Decrease glycerol and increase ethanol yield | S. cerevisiae |
Lee et al. [48] | 2007 | Threonine production | E. coli |
Park et al. [66] | 2007 | l-valine production | E. coli |
Song et al. [90] | 2008 | Optimize media and succinic acid production | M. succiniciproducens |
Meijer et al. [56] | 2009 | Succinic acid production | A. niger |
Ohno et al. [62] | 2013 | Butanol, propanol, propanediol production | E. coli |
Sun et al. [93] | 2014 | Terpenoid biosynthesis | S. cerevisiae |
Borodina et al. [15] | 2015 | 3-Hydroxypropionic acid biosynthesis | S. cerevisiae |
An exhaustive search of all feasible knockouts in an organism, especially with an experimental approach, to identify the exact genotype with the optimal production profile for a substance of interest, is a painstakingly tedious and often practically infeasible process. Genome-scale metabolic models can be a valuable tool for understanding the inner workings of metabolic networks, which cannot always be intuitively discerned. Such insight may be used to design strains with specific properties in a manner faster by many scales of magnitude, and therefore much more desirable. Genome-scale modeling has been applied in various metabolic engineering contexts and has been successfully used to predict genetic modifications for improved strains.
Lee et al. [49] constructed a metabolic model for E. coli, which was successfully used to develop and implement a strategy for increased succinic acid production. The authors proposed optimal metabolic pathways for the production of succinic acid based on the results of the metabolic flux analyses. For increasing succinic acid production, the pyruvate carboxylation pathway was selected as optimal for increasing the production in E. coli. Experimental validation of the proposed pathway was performed by comparing the yield of succinic acid with traditional succinic acid producing pathways. The experimental results suggested that the novel pathway selected through the computational analysis is more efficient than conventional pathways.
Alper et al. [5] used a genome-scale model for E. coli and identified and experimentally confirmed seven gene deletion strains that showed increased lycopene production. The E. coli iJE660 model [76] served as the basis for this approach. Targets for single gene knockouts were initially selected, and the ones that resulted in the highest production of lycopene were chosen as candidates. Then, a second knockout was computationally predicted and then performed on the best performing single gene mutants, and the double mutants with the highest yield were selected once more. This process produced knockout mutants with progressively increasing yields. The selected single, double, and triple knockout strains were constructed experimentally and were shown to significantly improve the yield of lycopene, with the top selected strain producing a yield almost 40 % higher than an engineered, high-producing parental strain.
Bro et al. [16] used an FBA model of Saccharomyces cerevisiae to identify a strategy for metabolic engineering of the redox metabolism that would lead to decreased glycerol and increased ethanol yields on glucose under anaerobic conditions. Several suggested mutants were suggested computationally that eliminated formation of glycerol and increased ethanol yield. One of the most promising results was selected and constructed experimentally. The resulting strain had a 40 % decrease and 3 % increase in glycerol and ethanol yields, respectively, without affecting the maximum specific growth rate.
Lee et al. [48] reported a strategy for increased threonine production in E. coli. A threonine producing strain was re-engineered based on transcriptome profiling and flux analysis simulations. The resulting strain produced threonine with a high yield of 0.393 g per gram of glucose and 82.4 g/l threonine by fed‐batch culture. Similarly, Park et al. [66] constructed a genetically well-defined E. coli strain based on known metabolic information, transcriptome analysis, and in silico genome-scale knockout simulation. The authors identified the necessary gene knockouts for the construction of an E. coli strain with increased l-valine production. Genes ilvA, leuA, and panB were deleted to make more precursors available for l-valine biosynthesis, lrp and ygaZH were overexpressed and aceF, mdh, and pfkA were identified as knockout targets using gene knockout simulation. The resulting strain produced a high yield of 0.378 g per gram of glucose of l-valine, which is higher than industrial strains developed through random mutation and selection.
Another useful application of FBA is to identify optimal media composition for the growth of an organism and production of a desired metabolite [90]. Song et al. used a genome-scale metabolic network and flux balance analysis to identify two amino acids and four vitamins as essential compounds to be supplemented to a minimal medium that would improve the growth of Mannheimia succiniciproducens and the production of succinic acid. The optimized media increased the yield of succinic acid by 15 % compared to growth on a complex medium. The optimal, chemically defined medium also lowered by-products by 30 %.
Meijer et al. [56] presented a metabolic engineering approach for increased production of succinic acid with Aspergillus niger, a microorganism that is well established industrially, making it an interesting target for engineering of the production of specific chemicals. A deletion strategy based on simulations with a genome-scale stoichiometric model of the organism was devised. The gene producing citrate lyase (acl) was identified as a deletion target through in silico tests with a genome-scale metabolic model of the organism. The authors found that the mutant strain tripled the yield of succinic acid compared to the wild type, along with an overall increase in the production of organic acids in the mutant strain.
In 2013, Ohno et al. [62] demonstrated that the production of many valuable compounds, such as L-butanol, L-propanol, and 1,3-propanediol, can be improved using a triple gene knockout strategy. In silico screening was performed and the metabolic potential of all possible sets of triple knockouts were evaluated using a reduced metabolic model of Escherichia coli, based on the iAF1260 genome-scale model [27]. The use of a reduced model was preferred in this study, as it significantly lowered the computational costs. The results demonstrated the applicability of multiple deletion strategies, since in many cases the effects of the deletions were only observable when multiple genes were simultaneously disrupted. Traditional screening methods would have missed these opportunities. Such results are indicative of the possibility to develop industrially viable strains through metabolic engineering that utilizes genome-scale modeling.
Sun et al. [93] presented a study that identified knockout targets for improving terpenoid biosynthesis in S. cerevisiae. Terpenoids have important pharmacological activity, but the production of sufficient amounts is challenging. A constraint-based approach was used to identify knockout sites with the potential to improve terpenoid production (specifically, sesquiterpene amorphadiene). Based on the simulation results, a single mutant was constructed and engineered to produce amorphadiene. Production of amorphadiene was measured to assess the effects of gene deletions on the production of terpenoids. Ten novel gene knockout targets were described. The yield of amorphadiene produced by most single mutants increased 8- to 10-fold compared to the wild type.
Borodina et al. [15] engineered a synthetic pathway for de novo biosynthesis of 3-Hydroxypropionic acid, using a genome-scale model of S. cerevisiae to evaluate the metabolic capabilities of two promising routes. 3-Hydroxypropionic acid (3HP) is a potential chemical building block for sustainable production of superabsorbent polymers and acrylic plastics. Simulations suggested β-alanine biosynthesis as the most economically attractive route. A synthetic pathway for de novo biosynthesis of β-alanine and its subsequent conversion into 3-Hydroxypropionic acid was engineered, using a novel β-alanine-pyruvate aminotransferase discovered in Bacillus cereus. The expression of the critical enzymes in the pathway was optimized and aspartate biosynthesis was increased to obtain a high 3-Hydroxypropionic acid producing strain.
In addition to the growing number of studies that demonstrate the applicability of genome-scale modeling to rational metabolic engineering efforts by performing analyses and producing strains that improve the production of chemicals of interest, several computational approaches for automatic selection of gene knockout candidates have been developed. Such frameworks make FBA a tool that is now available to a much wider audience. In Zomorrodi et al. [114], the authors review computational tools that utilize mathematical optimization and were designed to assist in metabolic network analyses and redesign of metabolism. For example, OptKnock [18] is a framework that exploits duality theory to search for multiple gene knockout candidates, by solving a bi-level optimization problem: the inner problem optimizes biomass production, while the outer problem optimizes target chemical yield. The problem is formulated as a single MILP problem. Sets of gene knockouts for improved succinate, lactate, and propanediol production in E. coli were predicted by the authors.
The OptKnock framework suffers from certain limitations, for example the intractability of the problem when very large sets of knockouts are considered. To address such issues, researchers have developed extended and improved frameworks that identify deletion candidates, such as OptGene and RobustKnock. OptGene [67] utilizes a genetic algorithm to rapidly identify gene deletion strategies for optimization of a strain. The advantages of OptGene are that it also allows the optimization of nonlinear objective functions, and can be much faster than an MILP approach, but unlike with MILP formulations, the identified solution is not guaranteed to be a global optimum. OptGene has been used to predict sets of gene knockouts for improved production of vanillin, succinate, and glycerol in S. cerevisiae. RobustKnock [94] extends OptKnock by accounting for the presence of competing pathways in the network that may reroute metabolic flux away from the chemical of interest. The framework removes reactions from the network, so that the production of the chemical of interest becomes part of the model’s biomass production requirement. RobustKnock was used to predict sets of gene knockouts for improving the production of hydrogen, acetate, formate and fumarate in E. coli.
Although frameworks like OptKnock and OptGene are powerful in their ability to predict knockouts, the possible modifications are restricted by the selection of reactions included in the metabolic reconstruction. The possibility of adding new reactions that are not part of the original metabolic network is not considered with these methods. OptStrain [68] overcomes this problem with the use of a database of known biotransformations to maximize the yield of a pathway from substrate to target product, by including heterologous reactions. The number of non-native reactions is minimized, and the selected non-native reactions are incorporated into the host. In addition to the above tools, OptReg [69] and EMILiO(Enhancing Metabolism with Iterative Linear Optimization) [108] are frameworks that not only identify gene targets selected for deletion, but also identify genes that can be up or downregulated. Such computational tools have been used for several metabolic engineering applications, including the production of lactic acid in E. coli [29], vanillin production in yeast [17] and sesquiterpene production in S. cerevisiae [7]. For researchers and engineers that wish to apply genome-scale modeling methods and the automated gene knockout selection frameworks described here, several software options exist that are now freely available, including the COBRA toolbox [10], OptFlux [78], CellNetAnalyzer [45] other Systems Biology Research Tool [106], to name but a few.
Transcriptional regulation
Genome-scale modeling is not without its limitations; one of the major issues with the predictions made with this analysis method is that it does not consider the effects of gene regulation. In reality, however, the effect of regulation is very significant and one of the major reasons for failed predictions of the metabolic effect of gene modifications. For this reason, there is great motivation to look beyond just the metabolic network and attempt to integrate the effects of regulation on the metabolic reactions of an organism. Integrated models can significantly improve prediction accuracy, though again there is still much room for improvement. Machado and Herrgård have performed a systematic comparison of methods of transcriptomic data integration with genome-scale modeling [54].
In its simplest form, transcriptional regulation can be added to a stoichiometric model using a Boolean representation to map the effects of transcription factors (activating or repressing) on the expression of enzyme encoding genes. Such a representation forces the specific enzyme-catalyzed reaction to be either on or off, depending on the presence or absence of the controlling transcription factors. The implementation of this idea is known as regulatory Flux Balance Analysis (rFBA) [22]. rFBA offers the possibility of considering some basic regulatory effects on the metabolic network, but it is constrained by the fact that the genes that are controlled by transcriptional factors can only be either fully active or completely off. This prohibits good predictions in cases where a transcriptional factor knockout only has a partial effect on target genes. Another limitation of rFBA is that it arbitrarily chooses one metabolic steady state from a space of possible solutions, excluding a whole space of possible profiles. Instead, Steady-state Regulatory Flux Balance Analysis, or SR-FBA [88], enabled a comprehensive characterization of steady-state behaviors in an integrated model of metabolism and regulation. SR-FBA was used to characterize the flux distribution and gene expression levels of Escherichia coli across different growth media. Around 50 % of metabolic genes’ flux activity was found to be determined by metabolic constraints, whereas regulatory constraints determined the flux activity of 15–20 % of genes. The integrated model was then used to identify specific genes for which regulation is not optimally tuned for cellular flux demands.
Probabilistic regulation of metabolism (PROM) [20, 89] is another method that overcomes the limitations of rFBA by implementing a probabilistic approach for predicting the state of a gene, based on the level of expression of a transcription factor. The probability for the state of a gene is determined based on microarray data information, and the bounds on the flux of the relevant reaction are adjusted using this probability estimation. In addition, PROM requires little manual annotation compared to rFBA, because the process can be automated to a large degree. Still, the accuracy of all such methods needs to be improved, and there is substantial need to expand the repertoire of captured regulatory events related to metabolism beyond simple transcriptional effects.
Similarly, E-Flux [21] is an approach that incorporates transcript level measurements to the reaction flux constraints that define the maximum achievable flux through each reaction. The bounds on the fluxes of the system are determined based on the level of expression for the corresponding coding gene. The method was tested on Mycobacterium tuberculosis to predict the impact of drugs, drug combinations, and nutrient conditions. E-flux predicted seven of the eight known fatty acid inhibitors and made accurate predictions regarding the specificity of these compounds for fatty acid biosynthesis.
An important disadvantage of previous methods is that they often require a user-defined expression threshold over (or under) which a gene is considered “on” (or “off”). Metabolic adjustment by differential expression (MADE) [38] aims to overcome the problem of selecting arbitrary thresholds by comparing measurements across multiple conditions. MADE uses the statistical significant changes in gene expression measurements across sequential conditions to determine instances of high and low expression for various reactions. For this reason, MADE requires expression data from more than one experimental conditions. The solutions for all conditions are solved simultaneously to maximize agreement with the predicted patterns.
Other approaches for integrated simulation use mRNA expression data to construct a functional metabolic model for the organism. Gene Inactivity Moderated by Metabolism and Expression (GIMME) [11] utilizes user-supplied gene expression data, a genome-scale model and presupposed metabolic objectives to produce a context-specific reconstruction. GIMME performs an FBA run on the starting metabolic model to identify the maximum possible flux through the network. Then, experimental mRNA transcript levels are compared to a threshold and any reactions that fall below this threshold are removed from the network, unless their removal impacts the metabolic objectives, in which case an LP problem is solved that reintroduces inactive reactions in a way that minimizes deviation from the expression data. The algorithm also provides a quantitative inconsistency score indicating how consistent a set of gene expression data is with a particular metabolic objective.
The integrative Metabolic Analysis Tool (iMAT) [115] on the other hand is a web-based tool based on Shlomi et al. [87], which does not require prior knowledge of a defined metabolic functionality. iMAT enables the prediction of metabolic states in specific conditions using protein (or gene) expression data as input, integrating them with transcriptomic information and a genome-scale metabolic model. The web tool outputs a prediction for the flux state and a set of confidence values for all the reactions in the network. Additionally, iMAT can report predicted upregulated and downregulated genes post-transcriptionally. The main difference to GIMME is that instead of presupposed metabolic objectives, iMAT requires the existence of a minimum flux through reactions that correspond to the highly expressed genes in the dataset. This difference gives iMAT an advantage in cases where clear metabolic objectives cannot be established.
The first model that can be considered “whole-cell” was developed for Mycoplasma genitalium [43], a human pathogen, by combining all the biochemical components and all the interactions in the system. Modules with diverse characteristics were built, representing distinct cellular functions and combined into a dynamic framework. This integrative approach enabled the inclusion of physiologically and mathematically diverse processes and experimental measurements. The model was used to examine areas of cellular function that had not been studied in conjunction before, such as protein–DNA associations and the interactions between DNA replication and the initiation of replication. This whole-cell model represents an important advancement in the development of integrated genome-scale modeling.
The more biochemically accurate a model is, the more detailed the simulations of an organism’s phenotypic behavior we should be able to produce by varying genetic and environmental parameters. With the combination of Metabolism and gene Expression, an ME model was produced; an integrated model of Thermotoga maritima [51] that considerably improves the prediction accuracy of the genome-scale metabolic model of the organism, along with the added capability of gene expression prediction. The ME model represents the next generation of constraint-based models: stoichiometric models of metabolism that also explicitly consider gene transcription and translation. Thanks to the integration of additional levels of biological information, ME models can provide a basis for considering mRNA transcription, protein translation, protein complexing, reaction catalysis or molecule formation within the framework of genome-scale modeling. ME models represent a significant step in the effort to bridge the gap between molecular biology and cellular physiology.
Another important application of integration of transcriptome, proteome, and phenotypic data with metabolic reconstructions is to contextual generic metabolic reconstructions in higher organisms to contextualize those aspects of metabolism that are present in any particular tissue or cell type. A number of automatic reconstruction approaches have been built to achieve this. One such algorithm, the Model Building Algorithm (MBA) [39], was employed in the construction of a tissue-specific, hepatic model, from the generic human RECON1 model [26], integrating tissue-specific molecular data. The hepatic model was validated with flux measurements across various hormonal and dietary conditions. The advantage of MBA is that it eliminates the presence of superfluous metabolic reactions and streamlines the metabolic model to consist of metabolic reactions that are functional in the cell. Similarly, a method called metabolic Context-specificity Assessed by Deterministic Reaction Evaluation (mCADRE) [101] is able to infer a tissue-specific network based on gene expression data and metabolic network topology, along with evaluation of functional capabilities during model building. mCADRE produces models with similar functionality and achieves dramatic computational speed up over MBA using the network topology to set a deterministic ordering for reaction removal rather than computing a large ensemble of models based on random orderings. Using this method, a reconstruction of draft genome-scale metabolic models for 126 human tissue and cell types was performed. Finally, another approach is the INIT (integrative network inference for tissues) algorithm [1], which uses cell type specific information about protein abundances as its main source of evidence. INIT is formulated as an MILP problem and relies on evidence from the Human Protein Atlas [97] and tissue-specific gene expression data to decide on the presence or absence of metabolic enzymes in each cell type, while metabolomics data from the Human Metabolome Database [105] are used as constraints that force the ability to produce a specific metabolite by adding the necessary reactions, if said metabolite has been observed in a tissue. INIT was used to generate genome-scale models for 69 healthy human cell types and 16 cancer cell types.
Cells contain thousands of molecular components including transcripts, proteins and metabolites, and regulation plays a very important role in every cellular process (gene expression, protein transcription, enzymatic reactions). For these reasons, precise estimation of the metabolic states and comprehension of the way regulation works are crucial factors for accurate simulation of cellular processes. Approaches that integrate transcriptional regulation with more traditional constraint-based metabolic simulation make several assumptions, particularly since the transcription of genes and the way it correlates with flux are still not perfectly understood. As a result, predictions made with these approaches are not highly accurate, and while these methods have been successfully applied to specific example organisms, wide application is still problematic. Nevertheless, integrated approaches constitute an initial step in the effort to effectively correlate genotype with phenotype and often offer improved predictions compared to stand-alone FBA simulations.
Conclusions
In the current microbial metabolic engineering field, many tools and applications have been developed that facilitate genetic engineering of model organisms. Here, we summarized the genome-scale modeling approach, which, thanks to its simplicity and the fact that it offers large amounts of biochemical information for an organism’s reactions, is well suited for application in systematic metabolic engineering for bio-production using microorganisms. Metabolic design using genome-scale modeling is already widely used, as it enables prediction of the knockout or amplification target genes for enhancement of productivity. In this review, we offered an overview of genome-scale modeling and flux balance analysis, and focused particularly on the challenge of metabolic reconstructions, and on the developments that the various efforts for automatic reconstruction have achieved. We reviewed several successful studies in the area of genome-scale modeling for metabolic engineering. Techniques for metabolome analysis have made progress in recent years, and researchers can now have direct access to several tools that automate the selection of gene deletions, additions and modifications to produce mutants that would facilitate the production of specific chemicals. Finally, we summarized the importance of studying and understanding the regulatory mechanisms of the cell and presented studies that focused on integration of regulation and metabolism. In the future, we expect that integrated models of metabolism will become particularly important in the field of metabolic engineering.
Acknowledgments
The authors gratefully acknowledge funding from the Luxembourg Centre for Systems Biomedicine (ES), and the DOE ARPA-E program (DE-AR0000426), an NIH Center for Systems Biology (2P50 GM076547) and the Camille Dreyfus Teacher-Scholar Program (NDP). We also thank Julie Bletz and Ben Heavner for critical readings of the manuscript, and James Eddy for assistance with the illustrations.
References
Heavner BD, Smallbone K, Price ND, Walker LP (2013) Version 6 of the consensus yeast metabolic network refines biochemical coverage and improves model performance. Database 9 (10)
Yin J, Chen J-C, Wu Q, Chen G-Q (2014) Halophiles, coming stars for industrial biotechnology. Biotechnology Advances (in press)