Abstract

The accurate representation of all aspects of a metabolic network in a structured format, such that it can be used for a wide variety of computational analyses, is a challenge faced by a growing number of researchers. Analysis of five major metabolic pathway databases reveals that each database has made widely different choices to address this challenge, including how to deal with knowledge that is uncertain or missing. In concise overviews, we show how concepts such as compartments, enzymatic complexes and the direction of reactions are represented in each database. Importantly, also concepts which a database does not represent are described. Which aspects of the metabolic network need to be available in a structured format and to what detail differs per application. For example, for in silico phenotype prediction, a detailed representation of gene–protein–reaction relations and the compartmentalization of the network is essential. Our analysis also shows that current databases are still limited in capturing all details of the biology of the metabolic network, further illustrated with a detailed analysis of three metabolic processes. Finally, we conclude that the conceptual differences between the databases, which make knowledge exchange and integration a challenge, have not been resolved, so far, by the exchange formats in which knowledge representation is standardized.

INTRODUCTION

Our understanding of metabolism is ever expanding, as evidenced by the increasing amount of bibliomic data. Pathway databases have been built to collect and capture this knowledge. Besides serving as knowledge repositories, the databases aim to represent the metabolic network in a digital format in such a way that it can be used for computational analyses. This has enabled numerous analyses ranging from the prediction of phenotypes [1], studying evolution [2], to the analysis and interpretation of high-throughput data [3]. The number of pathway databases describing the metabolic network for one or more organisms continues to grow [4, 5].

From the perspective of a researcher used to the compact representation of biological knowledge on metabolism in a pathway, it may seem trivial to represent the metabolic network in an electronic form. However, the biology of the metabolic network is complex and the terminology used by biologists changes over time and varies among biologists [6]. Furthermore, a pathway database needs to accommodate a wide range of users with different requirements. Numerous choices need to be made by database developers and curators on how to represent and relate each component of the metabolic network (Figure 1). Important considerations hereby are what needs to be described in a structured and standardized form to enable computational analyses and what can be described as background information in unstructured text fields.

Figure 1:

Gluconeogenesis. Selection of the information that needs to be stored to accurately represent the gluconeogenesis pathway [7].

Figure 1:

Gluconeogenesis. Selection of the information that needs to be stored to accurately represent the gluconeogenesis pathway [7].

We selected five, frequently used, pathway databases and compared their approach to represent the human metabolic network in a digital format. Furthermore, we discuss how the databases deal with knowledge that is uncertain or missing. To illustrate the challenges faced in making biological knowledge amenable to computational analyses, we give a detailed description of how each database represents three complex metabolic processes: fatty acid beta oxidation, oxidative phosphorylation and the pyruvate dehydrogenase reaction. Finally, we discuss the challenges posed by the differences in knowledge representation for analyses across pathway databases and exchange of knowledge between databases.

With this article, we intend to increase the awareness on the complexity of representing the current knowledge on metabolism. A detailed understanding of how knowledge is represented is crucial for users of pathway databases, as differences in representation can affect the outcome of computational analyses. As argued by Green and Karp [8], the pathway definition alone may already influence analysis results. Moreover, the choices made in how to represent the metabolic network affect the ability of the database to capture every detail. By pointing out the current limitations, our research will also aid (future) database developers, knowledge curators and domain experts in their quest to further improve knowledge representation.

RESULTS

To illustrate the differences in representation of the metabolic network, we selected the following five databases: H. sapiens Recon 1 [9] from BiGG [10] (referred to as Recon 1 in the rest of the article), HumanCyc [11] from BioCyc [12], EHMN [13], KEGG [14] and Reactome [15] (Table 1). Our analysis of how knowledge is represented in these databases was based on the descriptions given by the pathway database curators themselves in articles and online manuals (if available); when necessary we contacted the database curators for additional details. Moreover, for a more detailed insight we also examined the data files provided by each database, which contain the actual representation of the metabolic network (Supplementary Table S1). Note that we did not consider concepts that were only represented on the website of the database or knowledge that was only provided indirectly by references to other databases, e.g. metabolite databases. For each database, we analyzed how and to what detail knowledge is represented on the level of: (i) the entire network, (ii) its reactions and (iii) the enzymes and their encoding genes. On all three levels, the databases have made different choices (Table 2). We focused in our review on how knowledge is represented rather than how it is collected, although both issues are intimately linked to how accurately the metabolic network is captured.

Table 1:

Metabolic pathway databases compared

Database No. of organismsa Name human network Version File formats and computational access 
BiGG H. sapiens Recon 1 Tab-delimited files, SBML 
BioCyc 1700 HumanCyc 15.5 Pathway Tools (export flat files), API, BioPAX, SBML 
EHMN EHMN Excel file, SBML 
KEGG 1646 – 61 KGML, API, dbget, flat files 
Reactome 46 – 39 MySQL dump, API, BioPAX, SBML 
Database No. of organismsa Name human network Version File formats and computational access 
BiGG H. sapiens Recon 1 Tab-delimited files, SBML 
BioCyc 1700 HumanCyc 15.5 Pathway Tools (export flat files), API, BioPAX, SBML 
EHMN EHMN Excel file, SBML 
KEGG 1646 – 61 KGML, API, dbget, flat files 
Reactome 46 – 39 MySQL dump, API, BioPAX, SBML 

aNote that numbers should not be directly compared, since the level of curation may differ per organism, both within and between databases. BiGG, biochemically, genetically and genomically structured genome-scale metabolic network reconstructions; EHMN, Edinburgh Human Metabolic Network; KEGG, Kyoto Encyclopedia of Genes and Genomes; SBML, Systems Biology Markup Language; API, application programming interface; BioPAX, Biological Pathway Exchange; KGML, KEGG Markup Language.

Table 2:

Representation of concepts in metabolic pathway databases

 EHMN Recon 1 HumanCyc KEGG Reactome 
Network 
 Type 
        Metabolism ✓ ✓ ✓ ✓ ✓ 
        Signaling ✗ ✗  ✓ ✓ 
        Genetic information processing ✗ ✗  ✓ ✓ 
    Compartmentalization ✓ ✓  ✗ ✓ 
    Division into pathways ✓ ✓ ✓ ✓ ✓ 
Reaction 
    Reaction type ✗ ✗ ✓ ✓ ✗ 
    Linking of reactions ✗ ✗ ✓ ✓ ✓ 
    Physiological direction ✗ ✓ ✓ ✓ ✓ 
 Metabolite 
        Type ✗ ✗ ✓ ✓ ✗ 
        Protonation state ✗ ✓ ✓ ✗ ✗ 
Enzyme 
    Isozymes ✗ ✓ ✓ ✗ ✓ 
    Isoforms ✗ ✓ ✗ ✗ ✓ 
 Complexes 
        Heteromers ✗ ✓ ✓  ✓ 
        Homomers ✗ ✗ ✓ ✗ ✓ 
    Prosthetic groups/cofactors ✗ ✗  ✓ ✓ 
 EHMN Recon 1 HumanCyc KEGG Reactome 
Network 
 Type 
        Metabolism ✓ ✓ ✓ ✓ ✓ 
        Signaling ✗ ✗  ✓ ✓ 
        Genetic information processing ✗ ✗  ✓ ✓ 
    Compartmentalization ✓ ✓  ✗ ✓ 
    Division into pathways ✓ ✓ ✓ ✓ ✓ 
Reaction 
    Reaction type ✗ ✗ ✓ ✓ ✗ 
    Linking of reactions ✗ ✗ ✓ ✓ ✓ 
    Physiological direction ✗ ✓ ✓ ✓ ✓ 
 Metabolite 
        Type ✗ ✗ ✓ ✓ ✗ 
        Protonation state ✗ ✓ ✓ ✗ ✗ 
Enzyme 
    Isozymes ✗ ✓ ✓ ✗ ✓ 
    Isoforms ✗ ✓ ✗ ✗ ✓ 
 Complexes 
        Heteromers ✗ ✓ ✓  ✓ 
        Homomers ✗ ✗ ✓ ✗ ✓ 
    Prosthetic groups/cofactors ✗ ✗  ✓ ✓ 

Check mark, concept is represented; Cross, concept is not represented; Bar, database is able to represent the concept, but this is only done to a limited extent.

Representation of the network

Types of networks

An important difference on network level is whether a database only describes metabolic processes (EHMN, Recon 1) or also other types of biological processes such as signaling and genetic information processing (HumanCyc, KEGG, Reactome; Tables 2 and 3). Since in Reactome the different types of processes are intertwined, it is non-trivial to only retrieve the metabolic network, needed for instance to carry out flux balance analyses [16, 17]. Note that all five databases describe what is referred to as the global human metabolic network in which all possible reactions are combined, despite that they may not take place in every tissue or cell type. Defining tissue-specific models is left to algorithms like the one designed by Jerby et al. [1] or more specialized (manually) curated networks like HepatoNet1 [18].

Table 3:

Representation differences on network level

 EHMN Recon 1 HumanCyc KEGG Reactome 
Type 
    Multiple types of processesaNo, only metabolic reactions No, only metabolic reactions Yes, all types of reac tions represented in the same way Yes, different types of reactions repre sented differently Yes, all types of reac tions represented in the same way and are intertwined 
Compartmentalization 
    # compartmentsb 27c Not applicable 47 
    Ontology GO cellular component No ontology used Cell Component Ontology Not applicable GO cellular component 
    Specified on the level of the Metabolites and/or reactiond Metabolites or reactione Metabolites, reaction and enzyme Not applicable Metabolites, reaction, enzyme and pathway 
    If compartment is unknown ‘Uncertain’, but for practical purposes also provided as ‘cytosol’ Cytosol ‘NIL’ (few cases) or nothing is specified in which case ‘cyto sol’ is the default Not applicable Not applicable (reac tion not included in database if compartment is unknown) 
Pathway 
    Definition Re-division of the pathways of KEGG and EMP: -Less overlap between pathways -Small functionally related pathways are grouped -Human-specific Definition of KEGG, but human-specific Guidelines used: -A single biological process -Evolutionary conserved -Regulated as a unit -Boundaries at stable and high- connectivity metabolites Centered on the syn thesis and/or deg radation of one or more related substrates A series of reactions, connected by their participants, leading to a biological outcome 
    Categorization Categories of KEGG Categories of KEGG Hierarchy, first classified according to type of process (e.g. degradation) and next on the type of metabolites 11 categories (e.g. amino acid metab olism), based on the type of metab olites involved Hierarchy, based on the type of metabolites involved 
    Reactions without pathway? Yes, labeled ‘isolated’ Yes, labeled ‘other’ or ‘miscellaneous’ Yes, no link to instance of the pathway frame Yes No 
 EHMN Recon 1 HumanCyc KEGG Reactome 
Type 
    Multiple types of processesaNo, only metabolic reactions No, only metabolic reactions Yes, all types of reac tions represented in the same way Yes, different types of reactions repre sented differently Yes, all types of reac tions represented in the same way and are intertwined 
Compartmentalization 
    # compartmentsb 27c Not applicable 47 
    Ontology GO cellular component No ontology used Cell Component Ontology Not applicable GO cellular component 
    Specified on the level of the Metabolites and/or reactiond Metabolites or reactione Metabolites, reaction and enzyme Not applicable Metabolites, reaction, enzyme and pathway 
    If compartment is unknown ‘Uncertain’, but for practical purposes also provided as ‘cytosol’ Cytosol ‘NIL’ (few cases) or nothing is specified in which case ‘cyto sol’ is the default Not applicable Not applicable (reac tion not included in database if compartment is unknown) 
Pathway 
    Definition Re-division of the pathways of KEGG and EMP: -Less overlap between pathways -Small functionally related pathways are grouped -Human-specific Definition of KEGG, but human-specific Guidelines used: -A single biological process -Evolutionary conserved -Regulated as a unit -Boundaries at stable and high- connectivity metabolites Centered on the syn thesis and/or deg radation of one or more related substrates A series of reactions, connected by their participants, leading to a biological outcome 
    Categorization Categories of KEGG Categories of KEGG Hierarchy, first classified according to type of process (e.g. degradation) and next on the type of metabolites 11 categories (e.g. amino acid metab olism), based on the type of metab olites involved Hierarchy, based on the type of metabolites involved 
    Reactions without pathway? Yes, labeled ‘isolated’ Yes, labeled ‘other’ or ‘miscellaneous’ Yes, no link to instance of the pathway frame Yes No 

aMetabolic, signaling and gene information processing. bIncludes extracellular space. cIncludes generic compartments like ‘in’ and ‘membrane’ and the non-human compartment ‘inner membrane (sensu Gram-negative bacteria)’. dIf it is a transport reaction, the compartment is only indicated at the metabolite level. eIf it is a transport reaction, the compartment is indicated at the metabolite level, otherwise only for the entire reaction. GO, Gene Ontology; EMP, Enzymes and Metabolic Pathways database.

Compartmentalization

Different cellular compartments have distinct metabolic functions. KEGG does not provide any information on compartments. In HumanCyc, compartmentalization is work in progress, but their cell component ontology [19] does allow for a detailed representation. EHMN, Recon 1 and Reactome provide a fully compartmentalized network. Reactome has the most fine-grained compartmentalization of the latter three databases and thereby conveys the most detailed knowledge (Table 3). Moreover, Reactome indicates the compartment not only for each reaction but also for each enzyme and even for some pathways. Recon 1 and EHMN account for the same set of eight compartments, but handle subcellular locations not included in this set differently. In Recon 1, the intermembrane space of the mitochondrion, for example, is merged with the cytosol [9]. EHMN uses the hierarchy of the Gene Ontology (GO) to determine which of their eight compartments is the ancestor of the subcellular location in question. In the example above, the ancestor is the mitochondrion. Such a more coarse-grained compartmentalization will result in a less accurate representation of, for example, oxidative phosphorylation (Supplementary Text S1).

Division into pathways

All five databases divided their network into pathways to provide insight into the functional organization of the metabolic network. Although this division into pathways is not arbitrary and is based on biological criteria in each of the databases, there is no generally accepted definition of a pathway. Consequently, each database defines the boundaries of its pathways differently (Table 3). This results in a large difference in the number of pathways, the average number of reactions per pathway and the overlap between pathways (Table 4). As described by Green and Karp [8] for BioCyc and KEGG, the pathway definition might influence the outcome of pathway-based analyses.

Table 4:

Pathway statistics

 EHMN Recon 1 HumanCyc KEGG Reactome 
Number of pathways 69 96 257 84 171 
Average number of reactions per pathway 52 30 23 
% Of reactions occurring in >1 pathwaya 1% 9% 13% 10% 12% 
 EHMN Recon 1 HumanCyc KEGG Reactome 
Number of pathways 69 96 257 84 171 
Average number of reactions per pathway 52 30 23 
% Of reactions occurring in >1 pathwaya 1% 9% 13% 10% 12% 

Statistics based upon the same data we used previously [20] for a comparison of the content of the five databases. Only the metabolic pathways of HumanCyc, KEGG and Reactome are considered. For Reactome and HumanCyc the lowest level in the hierarchy was used when counting the number of pathways. If reactions only differ in direction and/or compartments they are counted as one. aWith respect to the total number of reactions assigned to at least one pathway.

Reactome defines a pathway as a series of reactions, connected by their participants, leading to a biological outcome [21]. In HumanCyc, more strict guidelines are used, i.e. a pathway is a single biological process that should be evolutionary conserved and regulated as a unit [8]. This partly explains the low average size of a pathway in HumanCyc. Moreover, variants of the same metabolic process are considered as separate pathways, thereby increasing the overlap between pathways. In contrast, in EHMN the emphasis is on the functional relationships between reactions and overlapping metabolic processes are merged into a single pathway [22]. The pathways of KEGG are a mosaic of reactions that take place in any of the organisms included in KEGG and are substrate-centric [8]. The organism-specific version of a KEGG pathway consists of those reactions to which a gene of the organism of interest has been linked. This approach can result in artifacts. For example, ‘lysine biosynthesis’ is part of the human metabolic network in KEGG, although our metabolism lacks the ability to synthesize lysine. Recon 1 uses the same pathway definitions and categories as KEGG; however, only human-specific pathways are included and, e.g. lysine biosynthesis is not included in Recon 1. Ultimately, pathways cannot be studied in isolation as the entire network is connected.

Representation of metabolic reactions

A metabolic reaction can be defined as the synthesis or degradation of chemical compounds, which may or may not be a reversible process. The type of reaction, e.g. an ‘oxidation–reduction’ reaction, is indicated by an Enzyme Commission (EC) number in all databases, although in Reactome a link to GO is preferred. KEGG and HumanCyc also have their own reaction ontology (Table 5). The ontology of HumanCyc enables selecting, for instance, only small molecule reactions. The exact representation of a metabolic reaction differs per database. For example, each database uses a different terminology in its data model to refer to the metabolites before the arrow, e.g. ‘substrates’ or ‘input’, and after the arrow, e.g. ‘products’ or ‘output’ (Figure 2). In addition, the level of detail in which a conversion is described varies within and between databases (see ‘Case Study’ section and Supplementary Text S2). Differences in detail between databases can reflect a disagreement on the number of steps required for the conversion. Intermediate steps may, however, also have been left out because of a lack of evidence or to simplify the description of a process. For these reasons, an apparent disagreement on the underlying biology between multiple descriptions of the metabolic network could also be caused by different decisions on how to represent the same knowledge. If intermediate steps have been left out for simplification only, a mechanism to retrieve these steps should be provided to allow users to determine themselves the level of detail that is required for the application at hand. Only HumanCyc enables its curators to indicate ‘subreactions’, but this option has not been used yet (release 15.5). The reasons for leaving out intermediate steps are not indicated in a structured way in any of the databases.

Figure 2:

Last, irreversible, step of glycolysis. Words in ‘italic’ indicate reserved terms in a database. The term used to indicate the side of the metabolite in the reaction is shown using braces. The term used to describe the reversibility of the reaction is shown in a solid-lined box. Each database indicates the direction of the reaction differently as shown in the dotted-lined boxes. For example, KEGG indicates for the main metabolites of a reaction whether it is a substrate or product, thereby implying the direction of the reaction. HumanCyc, on the other hand, explicitly indicates the direction with, in this example, ‘RIGHT-TO-LEFT’. (A) EHMN, KEGG and HumanCyc store the reaction in the direction defined by NC-IUBMB. (B) Recon 1 and Reactome store the reaction in the physiological direction.

Figure 2:

Last, irreversible, step of glycolysis. Words in ‘italic’ indicate reserved terms in a database. The term used to indicate the side of the metabolite in the reaction is shown using braces. The term used to describe the reversibility of the reaction is shown in a solid-lined box. Each database indicates the direction of the reaction differently as shown in the dotted-lined boxes. For example, KEGG indicates for the main metabolites of a reaction whether it is a substrate or product, thereby implying the direction of the reaction. HumanCyc, on the other hand, explicitly indicates the direction with, in this example, ‘RIGHT-TO-LEFT’. (A) EHMN, KEGG and HumanCyc store the reaction in the direction defined by NC-IUBMB. (B) Recon 1 and Reactome store the reaction in the physiological direction.

Table 5:

Representation differences on reaction level

 EHMN Recon 1 HumanCyc KEGG Reactome 
Reaction 
    Type Indirectly (link to NC-IUBMB) Indirectly (link to NC-IUBMB) Two parallel reaction ontologies, i.e. classification by conversion typea or by substrate Classified in KEGG BRITEa, a collec tion of functional hierarchies Indirectly (link to GO Biological Process) 
 Linked to next/pre vious reaction? No No Yes, to previous reaction(s) within a pathway Yes, successive steps within a pathway are linked Yes, to preceding reaction(s) and/or pathway(s) 
Links to other pathways via a metabolite Links to other pathways via a metabolite 
 Direction of writing 
        Irreversible reactions As stored in NC-IUBMB Physiological direction As stored in NC-IUBMB As stored in NC-IUBMB Direction it has in the pathway 
        Reversible reactions As stored in NC-IUBMB Direction it has in the pathway As stored in NC-IUBMB As stored in NC-IUBMB Both directions, stored separately 
Metabolites 
    Type Indirectly (links to metabolite databases) Indirectly (links to metabolite databases) Classified in their own compound ontology Classified in KEGG BRITE, a collection of functional hierarchies Indirectly (links to metabolite databases) 
    Protonation state Always the neutral form Most common state at pH level 7.2 Most common state at pH level 7.3 Always the neutral form Always the neutral form 
 EHMN Recon 1 HumanCyc KEGG Reactome 
Reaction 
    Type Indirectly (link to NC-IUBMB) Indirectly (link to NC-IUBMB) Two parallel reaction ontologies, i.e. classification by conversion typea or by substrate Classified in KEGG BRITEa, a collec tion of functional hierarchies Indirectly (link to GO Biological Process) 
 Linked to next/pre vious reaction? No No Yes, to previous reaction(s) within a pathway Yes, successive steps within a pathway are linked Yes, to preceding reaction(s) and/or pathway(s) 
Links to other pathways via a metabolite Links to other pathways via a metabolite 
 Direction of writing 
        Irreversible reactions As stored in NC-IUBMB Physiological direction As stored in NC-IUBMB As stored in NC-IUBMB Direction it has in the pathway 
        Reversible reactions As stored in NC-IUBMB Direction it has in the pathway As stored in NC-IUBMB As stored in NC-IUBMB Both directions, stored separately 
Metabolites 
    Type Indirectly (links to metabolite databases) Indirectly (links to metabolite databases) Classified in their own compound ontology Classified in KEGG BRITE, a collection of functional hierarchies Indirectly (links to metabolite databases) 
    Protonation state Always the neutral form Most common state at pH level 7.2 Most common state at pH level 7.3 Always the neutral form Always the neutral form 

NC-IUBMB, Nomenclature Committee of the International Union of Biochemistry and Molecular Biology. aIncludes the NC-IUBMB classification.

Linking of reactions

For various types of network analysis, first the network needs to be constructed from the individual reactions in a pathway database [23]. In HumanCyc, KEGG and Reactome reactions are explicitly linked to the preceding and/or following steps both within pathways as well as across pathways (Table 5). For each reaction, KEGG also stores its main compounds, which connect consecutive reactions. In HumanCyc, the main compounds are deduced automatically per pathway for most reactions [24] and in Reactome main compounds are only captured in the graphical representation of the pathway. Note that in Reactome a metabolic reaction can be preceded by the activation of an enzyme catalyzing this reaction. This gives a more complete view of a biological process, compared to only describing its metabolic component. However, as stated above, retrieving solely the metabolic processes from Reactome is difficult. How to link reactions to each other is not explicitly indicated in EHMN and Recon 1. In this case, reactions are generally linked based on the substrates or products they have in common. This strategy makes network construction more difficult due to ‘currency’ metabolites (ATP, H+, etc.) connecting unrelated reactions [25]. The number of possible connections can be restricted by only linking reactions via metabolites that are assigned to the same compartment.

Physiological direction

Knowing the physiological direction of reactions is crucial when, for example, building an in silico model to predict phenotypes. All databases indicate the direction in slightly different ways (Figure 2, Table 5). In general, reactions in EHMN, HumanCyc and KEGG are stored in the direction defined by NC-IUBMB, which is not necessarily the physiological direction in which the reaction takes place in human. For example, the last step of glycolysis, the formation of pyruvate, is given in the direction opposite to the one in which it takes place (Figure 2A). EHMN only indicates that the reaction is irreversible and thus does not provide the correct physiological direction. In KEGG, (ir)reversibility of a reaction is indicated independently of the specific organism, hereby ignoring that whether a reaction is reversible or not varies among species. Note that in Reactome reversibility is only indicated by providing a link to the reaction in the opposite direction if the reaction is reversible. Finally, only in HumanCyc the direction can be defined in the context of: (i) a specific enzyme that catalyzes the reaction, (ii) the pathway or (iii) only the reaction itself. Only for 4% of the reactions, mainly transport reactions, the direction is specified in the context of the enzyme. Furthermore, for ∼40% of the reactions the direction is not specified at all. These reactions are also not included in any pathway.

Metabolites

The identity of a metabolite is determined by its name and by identifiers from specialized metabolite databases such as ChEBI [26]. An identifier is meant to unambiguously designate a metabolite across multiple resources. Identifiers enable, for example, mapping of experimental data onto the network. All five databases try to link a metabolite to at least one specialized metabolite database. The number of metabolite databases referred to differs (Supplementary Table S2). To minimize ambiguity, it is advisable that pathway databases provide a link to a single, common metabolite database for every metabolite. However, in practice this is not possible yet, as metabolite databases are far from complete.

Metabolite databases use different criteria for assigning IDs to metabolites. Consequently, the type of ID chosen affects the characterization of a metabolite in a pathway database and the level of distinction that can be made. For example, in ChEBI, both a base and its conjugate acid are assigned separate IDs, whereas in KEGG Compound no distinction between these two is made and they are combined in a single entry with one ID. KEGG, EHMN and Reactome prefer to state the neutral form of the metabolite. This choice may result in a different ChEBI ID compared to the one assigned to a metabolite by Recon 1 and HumanCyc, which specify the most common protonation state of a metabolite at a pre-defined and fixed pH level. Indicating the correct protonation state is important to be able to build a charge-balanced network. As the pH level varies between compartments, metabolites may have multiple protonation states. Ideally, databases should also be able to indicate these multiple states, which is currently not the case in any of the databases.

The pathway databases have made different choices with respect to how much information about metabolites is contained in the pathway database itself. EHMN, Recon 1 and Reactome contain the least information on metabolites and refer to specialized metabolite databases for additional information. Aside from referring to metabolite databases, KEGG and HumanCyc themselves also provide detailed information about metabolites (Table 6). KEGG even has its own metabolite database (KEGG Compound), which is often referred to by many other pathway databases. HumanCyc and KEGG also provide their own hierarchical classification of the metabolites (Table 5, Supplementary Figure S1), although not as extensively as ChEBI. These ontologies are a powerful way to provide some level of abstraction, and at the same time explicitly define what is meant by the abstract term. HumanCyc uses compound classes, e.g. ‘an alcohol’, in reactions as a level of abstraction. Such generic metabolites are, in general, linked to specific metabolites and used to represent the broad substrate specificity of an enzyme (see ‘Case Study’ section). If for a generic metabolite no specific metabolites are provided at all, the substrate specificity of the enzyme is likely to be undetermined [27]. Especially when constructing a computational model, it is important to be able to instantiate such generic metabolites and derive specific reactions. For HumanCyc, a mechanism for instantiation of generic reactions has recently been added to their Pathway Tools software for the purpose of building flux balance analysis models [16]. Instantiation is done using the compound ontology by selecting those combinations of specific instances of the generic metabolites that lead to a mass-balanced reaction. However, not for every generic reaction appropriate instances exist which fulfill this requirement. Moreover, part of the generic reactions cannot be instantiated because multiple products are possible for a given substrate (and vice versa). Recon 1 was specifically built to serve as an in silico model capable of predicting phenotypes. For this purpose, very generic reactions do not provide enough information and were therefore not included in Recon 1. The (broad) substrate specificity of an enzyme is, consequently, not explicitly captured in this database. In Reactome, some level of abstraction is given by grouping metabolites that undergo the same conversion into a set (Supplementary Figure S2).

Table 6:

Characteristics metabolites

 EHMN Recon 1 Humancyc KEGG Reactome 
Formula ✗ ✓ ✓ ✓ ✓ 
Charge ✗ ✓ ✓ ✗ ✗ 
Mass ✗ ✗ ✓ ✓ ✗ 
Gibbs free energy of formation ✗ ✗ ✓ ✗ ✗ 
Structure 
    InChI ✗ ✗ ✓ ✓ ✗ 
    SMILES ✗ ✗ ✓ ✗ ✗ 
    Othera ✗ ✗ ✓ ✓ ✓ 
 EHMN Recon 1 Humancyc KEGG Reactome 
Formula ✗ ✓ ✓ ✓ ✓ 
Charge ✗ ✓ ✓ ✗ ✗ 
Mass ✗ ✗ ✓ ✓ ✗ 
Gibbs free energy of formation ✗ ✗ ✓ ✗ ✗ 
Structure 
    InChI ✗ ✗ ✓ ✓ ✗ 
    SMILES ✗ ✗ ✓ ✗ ✗ 
    Othera ✗ ✗ ✓ ✓ ✓ 

Check mark, information is present; Cross, information is not available. aOther structure formats: mol file (KEGG), structure atoms and bonds (HumanCyc), atomicConnectivity and chemical fingerprint (Reactome).

Representation of enzymes and their encoding genes

A metabolic reaction is nearly always catalyzed by an enzyme, which in turn is encoded by one or more genes (Figure 1). An enzyme may be a single protein or a complex consisting of multiple copies of the same protein (homomer) or of multiple different proteins (heteromer). The concepts of an enzyme and a gene are represented differently in each database or, in some cases, even not at all (Figure 3). The same holds for the gene–protein–reaction relationship.

Figure 3:

Representation differences on the level of enzymes and encoding genes. Boxes with a dotted line indicate where in the data model the specific identifier is provided. KEGG: the KEGG MODULE database describes four types of modules, among which structural complexes. Only the complexes of the electron transport chain and oligosaccharyltransferase are represented in this way.

Figure 3:

Representation differences on the level of enzymes and encoding genes. Boxes with a dotted line indicate where in the data model the specific identifier is provided. KEGG: the KEGG MODULE database describes four types of modules, among which structural complexes. Only the complexes of the electron transport chain and oligosaccharyltransferase are represented in this way.

Enzymes

The concept of an enzyme, as defined above, is not explicitly represented in KEGG. Information on a protein, e.g. its sequence, is indicated at the gene level. Furthermore, aside from a few exceptions, complexes are not indicated in KEGG. EHMN merely represents an enzyme by the Uniprot ID(s) assigned to the protein(s) constituting the enzyme; also in EHMN complexes are not represented. In Recon 1, heteromeric complexes are represented using a Boolean expression. However, this is not done on protein level, as one would expect, but at gene level. Homomeric complexes are not represented at all. HumanCyc and Reactome do have a separate enzyme level and both types of complexes, i.e. heteromers and homomers, are represented.

Genes

A gene is defined by its Entrez Gene ID in EHMN, KEGG and Recon 1, while HumanCyc has its own definition which is closer to the definition of Ensembl. Reactome focuses more on proteins. Genes are not represented as a single entity in their MySQL database, but as a collection of identifiers from different genome databases (Figure 3). The various identifiers provided by Reactome are only united through their link to the same protein entry. As for metabolite databases, also genome databases use different criteria for assigning an identifier. The answer to the seemingly simple question of how many genes are involved in the human metabolic network according to each pathway database depends on which type of identifier one counts or whether one follows the convention of the pathway database itself.

Gene–protein–reaction relationship

There is not necessarily a one-to-one relation between a reaction and the catalyst, e.g. multiple enzymes may catalyze the same reaction (isozymes) [28]. EHMN and KEGG do not specify whether the products of multiple genes linked to the same reaction are isozymes, which can separately catalyze the reaction, or that the products together form a complex. This could result in incorrect conclusions with respect to the feasibility of a reaction when studying the effect of a protein deficiency [10]. Furthermore, the EC number and KEGG Orthology number are used instead of a specific enzyme to connect a gene to the corresponding reaction in KEGG. Moreover, only the gene level is organism-specific and the relation between ‘enzyme activity’ (EC number) and reaction is not. To retrieve species-specific reactions, the gene coding for the enzyme that catalyzes a reaction needs to be known. In Recon 1, the relation between a protein and the encoding gene is only available via its website. Isozymes are represented using a Boolean expression and are defined on gene level in Recon 1. Reactome groups isozymes on enzyme level into a set, each member of which can catalyze the reaction it is linked to. In HumanCyc, isozymes are implied when multiple proteins are separately linked to the same reaction (Figure 3). None of the five databases indicates tissue specificity of isozymes and isoforms, aside from statements in unstructured comment fields, which cannot easily be used in computational analyses.

Representation of uncertain and missing knowledge

Our current knowledge on human metabolism is incomplete and based on different types of evidence such as biochemical, genetic or sequence data and studies on other organisms. It is, therefore, important that databases explicitly indicate the source of a piece of knowledge and the degree of confidence associated with it. The databases, except EHMN, commonly cite scientific articles as evidence source. HumanCyc and Recon 1 also indicate the type of evidence available (Table 7). HumanCyc uses an evidence ontology [29] with 160 terms such as ‘Inferred from experiment’, which can be combined with a probability that the evidence is correct. However, evidence codes are available for <18% of the reactions and their catalyzing enzymes. Evidence codes are available for each pathway as a whole. In Recon 1, the type of evidence is indicated for each reaction. Five types of evidence are discerned which are assigned a confidence score (Table 7) [30].

Table 7:

Evidence description

Database Type of evidence Degree of confidence 
EHMN Source of transport reactions is indicated: -Dead-end analysis [13] -H. sapiens Recon 1 -TransportDB and GO annotation in UniProt If the compartment of a protein is unknown it is set to uncertain 
Recon 1 Five types of evidence are discerned: -Biochemical -Genetic -Sequence homology -Physiological -Modelinga Five confidence scores, ranging from 0 (low) to 4 (high), reflecting the information and evidence currently available: 4—biochemical data 3—genetic data 2—sequence homology or physiological data 1—modeling dataa 0—unevaluated 
 Literature references If multiple types of evidence are available scores are added. 
  Remarks in comment fields 
HumanCyc Evidence ontology containing 160 terms, main evidence types: Probability that the evidence is correct 
 Orphan reaction: reaction for which no enzyme that catalyzes the reaction has been sequenced 
     -Inferred from computation 
     -Inferred from experiment Unbalanced reactions are labeled 
     -Inferred by curator Reaction may be labeled as hypotheticalb 
     -Author statement Remarks in comment fields 
 Basis for assignment of a protein to a reaction, e.g. EC number or decision of curator  
 Literature references  
KEGG Literature references Remarks in comment fields 
Reactome Data supported by evidence from other organisms is indicated CandidateSet: enzymes hypothesized to catalyze the reaction 
 BlackBoxEvents used for: 
 Literature references     -Reactions that have imbalances for various reasons 
      -Complex processes of which not all details are known 
      -Summarizing a complex process in a single step for which each step is known 
  OtherEntity: entities that curators are unable or unwilling to describe in chemical detail 
  GenomeEncodedEntity: polypeptide or polynucleotide whose sequence is unknown 
  Remarks in comment fields 
Database Type of evidence Degree of confidence 
EHMN Source of transport reactions is indicated: -Dead-end analysis [13] -H. sapiens Recon 1 -TransportDB and GO annotation in UniProt If the compartment of a protein is unknown it is set to uncertain 
Recon 1 Five types of evidence are discerned: -Biochemical -Genetic -Sequence homology -Physiological -Modelinga Five confidence scores, ranging from 0 (low) to 4 (high), reflecting the information and evidence currently available: 4—biochemical data 3—genetic data 2—sequence homology or physiological data 1—modeling dataa 0—unevaluated 
 Literature references If multiple types of evidence are available scores are added. 
  Remarks in comment fields 
HumanCyc Evidence ontology containing 160 terms, main evidence types: Probability that the evidence is correct 
 Orphan reaction: reaction for which no enzyme that catalyzes the reaction has been sequenced 
     -Inferred from computation 
     -Inferred from experiment Unbalanced reactions are labeled 
     -Inferred by curator Reaction may be labeled as hypotheticalb 
     -Author statement Remarks in comment fields 
 Basis for assignment of a protein to a reaction, e.g. EC number or decision of curator  
 Literature references  
KEGG Literature references Remarks in comment fields 
Reactome Data supported by evidence from other organisms is indicated CandidateSet: enzymes hypothesized to catalyze the reaction 
 BlackBoxEvents used for: 
 Literature references     -Reactions that have imbalances for various reasons 
      -Complex processes of which not all details are known 
      -Summarizing a complex process in a single step for which each step is known 
  OtherEntity: entities that curators are unable or unwilling to describe in chemical detail 
  GenomeEncodedEntity: polypeptide or polynucleotide whose sequence is unknown 
  Remarks in comment fields 

aReaction is included, because it improved the performance of the in silico model. bThe presence of the substrates, products or catalyst of the reaction have not yet been demonstrated.

It is also important to explicitly indicate a complete lack of information, which is, however, rarely done. In Recon 1, for instance, ‘cytosol’ is used as the default value if the compartment is unknown, instead of explicitly indicating that there is a lack of knowledge as done in EHMN (Table 7). Similarly, of the five databases, only HumanCyc explicitly distinguishes a spontaneous reaction from a reaction for which the enzyme is unknown. In KEGG, spontaneous reactions are only indicated in unstructured comment fields. Moreover, for an organism-specific network spontaneous reactions cannot be retrieved as this requires the presence of a gene. The same holds for reactions for which the corresponding gene is unknown for the organism of interest. Finally, it is not always indicated, or only in a comment field, that intermediate steps have been left out. In Reactome, a ‘BlackBoxEvent’ can be used for this, which, however, does not necessarily mean that the intermediary steps are unknown.

CASE STUDY

We selected three complex metabolic processes to further illustrate the different challenges faced by the pathway databases in accurately representing the metabolic network in silico. Here, we discuss fatty acid beta oxidation [31] in more detail. Two more examples, oxidative phosphorylation [32] and the pyruvate dehydrogenase reaction [33], are discussed in the Supplementary Text S1 and S2, respectively. These three case studies show the implications of different design decisions on the ability of the pathway databases to represent a biological process in full detail.

Fatty acid beta oxidation

We focused on the beta oxidation of saturated fatty acids with a straight chain of even length. One particular challenge in representing this pathway is the repetitive nature of this process. The chain of an activated fatty acid is shortened by two carbons via four subsequent reactions, yielding one unit of acetyl-CoA. Several chain-length-specific isozymes are available for each cycle. For the complete oxidation of a fatty acid, this cycle is repeated until only acetyl-CoA is left. The number of cycles needed depends on the chain length of the fatty acid. There is a wide range of fatty acids of which the majority, i.e. short-chain, medium-chain and long-chain fatty acids, are degraded in the mitochondrion. In mammals and many fungi, very-long-chain fatty acids are first shortened in the peroxisome after which they may be transported to the mitochondrion for further oxidation [34]. The exact enzymes involved differ in the two compartments, but the reactions are the same except for the co-substrates of the first step of a cycle.

Representation

Each database has its own strengths and limitations in representing fatty acid beta oxidation (Table 8). KEGG and HumanCyc make no distinction between the peroxisomal and mitochondrial pathway, which emphasizes the similarities, but disregards the differences. For KEGG, it is difficult to separate the two pathways, since KEGG does not provide information on compartments. Moreover, as mentioned above, pathways in KEGG are not species-specific and the distinction between the peroxisomal and the mitochondrial pathway does not hold for the majority of organisms described in KEGG. In HumanCyc, the repetitive nature of this metabolic process is captured by describing a single cycle, using generic metabolites to describe the oxidation for the complete range of saturated fatty acids. Furthermore, a ‘polymerization link’ explicitly connects the fatty acid before and after the removal of the 2-carbon acetyl-CoA (Figure 4). Which combinations of instances of the generic metabolites together form the specific reactions is, however, not explicitly indicated. Moreover, in this particular example, several of the required metabolite instances for each of the four steps are lacking. The chain length specificity of the enzymes is also not captured in HumanCyc. Reactome chooses to represent the repeating cycles of the mitochondrial pathway as separate subpathways and to describe each step of every cycle. In this way, Reactome is able to represent the chain length specificity of each enzyme. Note though that all four steps in the peroxisomal pathway are described for one cycle only, the next eight cycles are lumped into a single step. No mechanism is provided that allows users to retrieve the intermediate steps of these eight cycles. Similarly, in Recon 1 the repetitive nature of this process is not captured as the conversion of, for example, palmitoyl-CoA into octanoyl-CoA is described in a single step instead of 16 steps. Also, the four steps of a single cycle are described for none of the fatty acids. This decision is indicated in the comment field, but the intermediate steps cannot be retrieved. Describing each step is, however, important to be able to simulate enzyme deficiencies that lead to the abnormal build up of intermediate products of a cycle of beta oxidation [35–38]. In EHMN and KEGG, every step is described for fatty acids with a chain of length 16 or shorter, but not for those with a longer chain length. A disadvantage of describing every step is that, in contrast to the generic approach of HumanCyc, it requires that each of the highly similar steps needs to be specified separately for the whole range of possible fatty acids. Moreover, not for all applications the intermediate steps are of interest. A mechanism that allows database users to switch between a high-level and a more detailed representation would be preferred.

Figure 4:

Fatty acid beta oxidation in HumanCyc. Part of the ‘fatty acid β-oxidation I’ pathway in HumanCyc, only the reactions of the cycle itself are shown.

Figure 4:

Fatty acid beta oxidation in HumanCyc. Part of the ‘fatty acid β-oxidation I’ pathway in HumanCyc, only the reactions of the cycle itself are shown.

Table 8:

Representational challenges fatty acid beta oxidation

 EHMN Recon 1 HumanCyc KEGG Reactome 
Repetitive process 
    The four steps of a cycle ✓ ✗ ✓ ✓ ✓ 
    Cycle described multiple times ✓ ✗ ✗ ✓ ✓ 
    Degradation described for the complete range of fatty acids ✗ ✓ a ✗ ✗ 
    Every step of the complete degradation of fatty acids included in the database ✓ ✗ b ✓ c 
Mitochondrial versus peroxisomal pathway 
    Differences ✓ ✓ ✗ ✗ ✓ 
    Similarities ✗ ✗ ✓ ✓ ✗ 
Chain length specificity of enzymes ✓ ✗ ✗ ✓ c 
 EHMN Recon 1 HumanCyc KEGG Reactome 
Repetitive process 
    The four steps of a cycle ✓ ✗ ✓ ✓ ✓ 
    Cycle described multiple times ✓ ✗ ✗ ✓ ✓ 
    Degradation described for the complete range of fatty acids ✗ ✓ a ✗ ✗ 
    Every step of the complete degradation of fatty acids included in the database ✓ ✗ b ✓ c 
Mitochondrial versus peroxisomal pathway 
    Differences ✓ ✓ ✗ ✗ ✓ 
    Similarities ✗ ✗ ✓ ✓ ✗ 
Chain length specificity of enzymes ✓ ✗ ✗ ✓ c 

Check mark, concept is represented; Cross, concept is not represented; Bar, not for all cases the concept is represented. aNot all possible fatty acids are specified as instances for each of the generic metabolites. bBy using generic metabolites to represent one cycle all possibilities are captured, but not all possible specific instances can be deduced (Figure 4). cFor the mitochondrial pathway each step of all cycles is described. For the peroxisomal pathway eight of the nine cycles are summarized in a single reaction (‘BlackBoxEvent’).

CHALLENGES

Within a database

It remains a challenge to capture every detail of the knowledge on the human metabolic network, both for relatively straightforward processes like gluconeogenesis (Figure 1) and the more complex processes illustrated by the three case studies. On the other hand, the question which level of detail is required to be able to perform a wide range of possible computational analyses does not always have a clear-cut answer. For example, to represent a (de)polymerization processes it is not always an option nor always strictly necessary to specify each step. The degradation of glycogen, for instance, consists of thousands of steps [39], but not each intermediate product might be of interest. Furthermore, it is also important for users to have some degree of abstraction, such as the ontologies provided by some databases, to see how everything fits in the bigger picture.

Unstructured text fields in the databases, which cannot be easily used in computational analyses, frequently contain more detailed information, such as the tissue specificity of an enzyme. As argued by Khatri et al. [40], tissue- and cell-specific information is essential to improve the accuracy and relevance of pathway analyses. In the Biological Connection Markup Language format [41] that was recently proposed, this information can be stored, but this format has not yet been adopted by the major databases. Ultimately, there are even more factors to consider to accurately capture the complete (human) physiology in a digital format. This includes the inherent dynamics of metabolic processes and the multiple levels at which these processes are being controlled. The complexity of metabolism is further increased by its multi-scale nature ranging from cellular compartments and cells to organs. These issues are not (yet) addressed by the pathway databases selected in this review. However, several large-scale projects have taken up this challenge. For example, the goal of the Virtual Liver project (http://www.virtual-liver.de) [42] is to construct a multi-scale representation of liver physiology.

The ability to indicate that a piece of knowledge is missing is another desirable characteristic of a pathway database. This is, however, not yet done in each database. A further extension of the pathway databases is to not only provide affirmative evidence as done by HumanCyc and Recon 1 but also indicate ‘negative evidence’ such as a statement that a reaction cannot take place in human. This would enable users to distinguish such cases from knowledge gaps. This information is highly valuable, especially given that the (human) metabolic network is not yet complete. Note that although we focused on the human network, most observations also hold for the other organisms the five databases describe (when applicable).

Across databases

Efforts to reconcile different descriptions of the metabolic network for a specific species are hampered by the representation differences discussed in this article [43–46]. Similar problems arise in analyses that require the metabolic network of multiple organisms from different databases, e.g. to study evolution [2]. For these purposes, it is essential to be aware what differences could be caused by a difference in representation rather than a true difference in opinion on the underlying biology. One example is the number of steps a process is described in (see ‘Case Study’ section and Supplementary Text S2). Furthermore, it is important to realize that even the smallest difference in terminology and the definition of a concept needs to be accounted for.

To simplify the exchange of knowledge between databases, several standards have been proposed such as SBML and BioPAX, each with their own advantages [47]. However, these do not resolve all representational differences that we discussed. HumanCyc and Reactome provide their network in the BioPAX (Level 3) format [48]. They, however, followed their own representation when converting their data into this format. For example, in Reactome a reversible reaction is stored separately in both directions in their own data model and also in their BioPAX file. In HumanCyc’s data model, a reversible reaction is stored only once and its direction is indicated as ‘reversible’. The same is done in their BioPAX file. Semantic standards like BioPAX can also not enforce the level of detail in which a process needs to be described by a curator or how a pathway is defined. Curators will, therefore, need to adhere to strict guidelines to make these exchange formats more easily comparable. Alternatively, rules could be formulated to translate one representation into the other. Based on Figure 3, for example, one could develop more precise rules to translate the different ways of representing the relations between gene products. The results of our comparison and the accompanying overviews provide useful insights in the road ahead to further simplify integration of the knowledge contained in pathway databases. Integration will also enable the construction of a more accurate in silico representation of metabolic networks. Endeavors in this direction have already been undertaken for multiple organisms [43, 45], including human (I. Thiele et al., submitted for publication).

DISCUSSION

The five pathway databases each have made different decisions on how and in what detail to represent the metabolic network. Of these five databases, EHMN provides the least detailed information and HumanCyc the most (Table 2). At the same time, not every aspect of the data model of HumanCyc is used yet. Filling in every detail will likely require a lot of time and effort from the curators. Moreover, the lack of knowledge on human metabolism may currently preclude the usability of every feature of HumanCyc’s data model. It also depends on the application at hand which aspects of the metabolic network are important and to what detail they need to be represented in a structured format. The overviews given in this article provide insight into the differences between the databases and can help to make a well-informed decision on which database to use. The different choices the database developers have made each have their own advantages and disadvantages. For example, to perform in silico simulations and predicting metabolic phenotypes, Recon 1 may be preferred. This network is fully compartmentalized and fully mass and charge balanced. Also, the relation between gene products is provided, which is important for simulating the effect of, for example, gene defects. On the other hand, for pathway enrichment analyses [3] each of the five databases may suffice and the determining factor is the pathway definition used. Other factors than the representation of the network will also play a role in selecting the database that best fits the application, such as the coverage of the metabolic network [20], e.g. EHMN contains a more extensive description of lipid metabolism than other databases. Finally, retrieving and capturing every detail of the (human) metabolic network in a digital format is a huge challenge and will require a joint effort of a broad scientific community.

SUPPLEMENTARY DATA

Supplementary data are available online at http://bib.oxfordjournals.org/.

KEY POINTS

  • It remains a challenge to represent the biology of the metabolic network in full detail such that it: (i) accurately reflects the current knowledge and (ii) can be used for computational analyses.

  • Metabolic pathway databases differ in what knowledge they represent, how it is represented and into what detail.

  • It depends on the application at hand which aspects of the metabolic network are important and to what detail they need to be represented in a structured format.

  • Differences between the representations of the metabolic network are not easily solved by semantic standards.

ACKNOWLEDGEMENTS

We would like to thank the reviewers for their helpful comments and suggestions for improving the presentation and comprehensibility of the article.

FUNDING

This research was carried out within the BioRange programme (project SP1.2.4) of The Netherlands Bioinformatics Centre (NBIC; http://www.nbic.nl), supported by a BSIK grant through The Netherlands Genomics Initiative (NGI) and within the research programme of the Netherlands Consortium for Systems Biology (NCSB), which is part of the NGI/Netherlands Organization for Scientific Research.

References

1
Jerby
L
Shlomi
T
Ruppin
E
Computational reconstruction of tissue-specific metabolic models: application to human liver metabolism
Mol Syst Biol
 , 
2010
, vol. 
6
 pg. 
401
 
2
Tanaka
T
Ikeo
K
Gojobori
T
Evolution of metabolic networks by gain and loss of enzymatic reaction in eukaryotes
Gene
 , 
2006
, vol. 
365
 (pg. 
88
-
94
)
3
Antonov
AV
Dietmann
S
Mewes
HW
KEGG spider: interpretation of genomics data in the context of the global gene metabolic network
Genome Biol
 , 
2008
, vol. 
9
 pg. 
R179
 
4
Karp
PD
Caspi
R
A survey of metabolic databases emphasizing the MetaCyc family
Arch Toxicol
 , 
2011
, vol. 
85
 (pg. 
1015
-
33
)
5
Oberhardt
MA
Palsson
Papin
JA
Applications of genome-scale metabolic reconstructions
Mol Syst Biol
 , 
2009
, vol. 
5
 pg. 
320
 
6
Karp
PD
Mavrovouniotis
ML
Representing, analyzing, and synthesizing biochemical pathways
IEEE Expert Intell Syst Appl
 , 
1994
, vol. 
9
 (pg. 
11
-
21
)
7
Berg
JM
Tymocsko
JL
Stryer
L
Glycolysis and gluconeogenesis
In: Biochemistry
 , 
2012
7th edn
New York
W.H. Freeman and Company
(pg. 
469
-
513
)
8
Green
ML
Karp
PD
The outcomes of pathway database computations depend on pathway ontology
Nucleic Acids Res
 , 
2006
, vol. 
34
 (pg. 
3687
-
97
)
9
Duarte
NC
Becker
SA
Jamshidi
N
, et al.  . 
Global reconstruction of the human metabolic network based on genomic and bibliomic data
Proc Natl Acad Sci USA
 , 
2007
, vol. 
104
 (pg. 
1777
-
82
)
10
Schellenberger
J
Park
JO
Conrad
TM
, et al.  . 
BiGG: a Biochemical Genetic and Genomic knowledgebase of large scale metabolic reconstructions
BMC Bioinformatics
 , 
2010
, vol. 
11
 pg. 
213
 
11
Romero
P
Wagg
J
Green
ML
, et al.  . 
Computational prediction of human metabolic pathways from the complete human genome
Genome Biol
 , 
2004
, vol. 
6
 pg. 
R2
 
12
Karp
PD
Ouzounis
CA
Moore-Kochlacs
C
, et al.  . 
Expansion of the BioCyc collection of pathway/genome databases to 160 genomes
Nucleic Acids Res
 , 
2005
, vol. 
33
 (pg. 
6083
-
9
)
13
Hao
T
Ma
HW
Zhao
XM
, et al.  . 
Compartmentalization of the Edinburgh human metabolic network
BMC Bioinformatics
 , 
2010
, vol. 
11
 pg. 
393
 
14
Kanehisa
M
Goto
S
Sato
Y
, et al.  . 
KEGG for integration and interpretation of large-scale molecular data sets
Nucleic Acids Res
 , 
2012
, vol. 
40
 (pg. 
D109
-
14
)
15
Croft
D
O’Kelly
G
Wu
G
, et al.  . 
Reactome: a database of reactions, pathways and biological processes
Nucleic Acids Res
 , 
2011
, vol. 
39
 (pg. 
D691
-
7
)
16
Latendresse
M
Krummenacker
M
Trupp
M
, et al.  . 
Construction and completion of flux balance models from pathway databases
Bioinformatics
 , 
2012
, vol. 
28
 (pg. 
388
-
96
)
17
Orth
JD
Thiele
I
Palsson
What is flux balance analysis?
Nat Biotech
 , 
2010
, vol. 
28
 (pg. 
245
-
8
)
18
Gille
C
Bölling
C
Hoppe
A
, et al.  . 
HepatoNet1: a comprehensive metabolic reconstruction of the human hepatocyte for the analysis of liver physiology
Mol Syst Biol
 , 
2010
, vol. 
6
 pg. 
411
 
19
Zhang
P
Foerster
H
Tissier
CP
, et al.  . 
MetaCyc and AraCyc. metabolic pathway databases for plant research
Plant Physiol
 , 
2005
, vol. 
138
 (pg. 
27
-
37
)
20
Stobbe
MD
Houten
SM
Jansen
GA
, et al.  . 
Critical assessment of human metabolic pathway databases: a stepping stone for future integration
BMC Syst Biol
 , 
2011
, vol. 
5
 pg. 
165
 
22
Ma
H
Sorokin
A
Mazein
A
, et al.  . 
The Edinburgh Human Metabolic Network reconstruction and its functional analysis
Mol Syst Biol
 , 
2007
, vol. 
3
 pg. 
135
 
23
Lacroix
V
Cottret
L
Thebault
P
, et al.  . 
An introduction to metabolic networks and their structural analysis
IEEE/ACM Trans Comput Biol Bioinform
 , 
2008
, vol. 
5
 (pg. 
594
-
617
)
24
Karp
PD
Paley
SM
Representations of metabolic knowledge: pathways
Proc Int Conf Intell Syst Mol Biol
 , 
1994
, vol. 
2
 (pg. 
203
-
11
)
25
Huss
M
Holme
P
Currency and commodity metabolites: their identification and relation to the modularity of metabolic networks
IET Syst Biol
 , 
2007
, vol. 
1
 (pg. 
280
-
5
)
26
de Matos
P
Alcántara
R
Dekker
A
, et al.  . 
Chemical entities of biological interest: an update
Nucleic Acids Res
 , 
2010
, vol. 
38
 (pg. 
D249
-
54
)
27
BioCyc.
Curator’s Guide for Pathway/Genome Databases
  
28
Karp
PD
Riley
M
Representations of metabolic knowledge
Proc Int Conf Intell Syst Mol Biol
 , 
1993
(pg. 
207
-
15
)
29
Karp
PD
Paley
S
Krieger
CJ
Zhang
P
An evidence ontology for use in pathway/genome databases
Pacific Symposium on Biocomputing
 , 
2004
, vol. 
9
 (pg. 
190
-
201
)
30
Thiele
I
Palsson
A protocol for generating a high-quality genome-scale metabolic reconstruction
Nat Protocols
 , 
2010
, vol. 
5
 (pg. 
93
-
121
)
31
Berg
JM
Tymocsko
JL
Stryer
L
Fatty acid metabolism
In: Biochemistry
 , 
2012
7th edn
New York
W.H. Freeman and Company
(pg. 
663
-
96
)
32
Berg
JM
Tymocsko
JL
Stryer
L
Oxidative phosphorylation
In: Biochemistry
 , 
2012
7th edn
New York
W.H. Freeman and Company
(pg. 
543
-
84
)
33
Berg
JM
Tymocsko
JL
Stryer
L
The citric acid cycle
In: Biochemistry
 , 
2012
7th edn
New York
W.H. Freeman and Company
(pg. 
515
-
42
)
34
Cornell
MJ
Alam
I
Soanes
DM
, et al.  . 
Comparative genome analysis across a kingdom of eukaryotic organisms: Specialization and diversification in the Fungi
Genome Res
 , 
2007
, vol. 
17
 (pg. 
1809
-
22
)
35
Das
AM
Illsinger
S
Lücke
T
, et al.  . 
Isolated mitochondrial long-chain ketoacyl-CoA thiolase deficiency resulting from mutations in the HADHB gene
Clin Chem
 , 
2006
, vol. 
52
 (pg. 
530
-
4
)
36
Wanders
RJA
Ijlst
L
Poggi
F
, et al.  . 
Human trifunctional protein deficiency: A new disorder of mitochondrial fatty acid β-oxidation
Biochem Biophys Res Commun
 , 
1992
, vol. 
188
 (pg. 
1139
-
45
)
37
Wanders
RJA
Ijlst
L
van Gennip
AH
, et al.  . 
Long-chain 3-hydroxyacyl-CoA dehydrogenase deficiency: identification of a new inborn error of mitochondrial fatty acid β-oxidation
J Inherit Metab Dis
 , 
1990
, vol. 
13
 (pg. 
311
-
4
)
38
Molven
A
Matre
GE
Duran
M
, et al.  . 
Familial hyperinsulinemic hypoglycemia caused by a defect in the SCHAD enzyme of mitochondrial fatty acid oxidation
Diabetes
 , 
2004
, vol. 
53
 (pg. 
221
-
7
)
39
Berg
JM
Tymocsko
JL
Stryer
L
Glycogen metabolism
In: Biochemistry
 , 
2012
7th edn
New York
W.H. Freeman and Company
(pg. 
637
-
61
)
40
Khatri
P
Sirota
M
Butte
AJ
Ten years of pathway analysis: current approaches and outstanding challenges
PLoS Comput Biol
 , 
2012
, vol. 
8
 pg. 
e1002375
 
41
Beltrame
L
Calura
E
Popovici
RR
, et al.  . 
The Biological Connection Markup Language: a SBGN-compliant format for visualization, filtering and analysis of biological pathways
Bioinformatics
 , 
2011
, vol. 
27
 (pg. 
2127
-
33
)
42
Holzhütter
H-G
Drasdo
D
Preusser
T
, et al.  . 
The virtual liver: a multidisciplinary, multilevel challenge for systems biology
WIREs Syst Biol Med
 , 
2012
, vol. 
4
 (pg. 
221
-
35
)
43
Herrgård
MJ
Swainston
N
Dobson
P
, et al.  . 
A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology
Nat Biotechnol
 , 
2008
, vol. 
26
 (pg. 
1155
-
60
)
44
Radrich
K
Tsuruoka
Y
Dobson
P
, et al.  . 
Integration of metabolic databases for the reconstruction of genome-scale metabolic networks
BMC Syst Biol
 , 
2010
, vol. 
4
 pg. 
114
 
45
Thiele
I
Hyduke
DR
Steeb
B
, et al.  . 
A community effort towards a knowledge-base and mathematical model of the human pathogen Salmonella Typhimurium LT2
BMC Syst Biol
 , 
2011
, vol. 
5
 pg. 
8
 
46
Chindelevitch
L
Stanley
S
Hung
D
, et al.  . 
MetaMerge: scaling up genome-scale metabolic reconstructions, with application to Mycobacterium tuberculosis
Genome Biol
 , 
2012
, vol. 
13
 pg. 
R6
 
47
Strömbäck
L
Lambrix
P
Representations of molecular pathways: an evaluation of SBML, PSI MI and BioPAX
Bioinformatics
 , 
2005
, vol. 
21
 (pg. 
4401
-
7
)
48
Demir
E
Cary
MP
Paley
S
, et al.  . 
The BioPAX community standard for pathway data sharing
Nat Biotechnol
 , 
2010
, vol. 
28
 (pg. 
935
-
42
)

Supplementary data