-
PDF
- Split View
-
Views
-
Cite
Cite
S. A. Rahman, P. Advani, R. Schunk, R. Schrader, Dietmar Schomburg, Metabolic pathway analysis web service (Pathway Hunter Tool at CUBIC), Bioinformatics, Volume 21, Issue 7, 1 April 2005, Pages 1189–1193, https://doi.org/10.1093/bioinformatics/bti116
Close -
Share
Abstract
Motivation: Pathway Hunter Tool (PHT), is a fast, robust and user-friendly tool to analyse the shortest paths in metabolic pathways. The user can perform shortest path analysis for one or more organisms or can build virtual organisms (networks) using enzymes. Using PHT, the user can also calculate the average shortest path (Jungnickel, 2002 Graphs, Network and Algorithm. Springer-Verlag, Berlin), average alternate path and the top 10 hubs in the metabolic network. The comparative study of metabolic connectivity and observing the cross talk between metabolic pathways among various sequenced genomes is possible.
Results: A new algorithm for finding the biochemically valid connectivity between metabolites in a metabolic network was developed and implemented. A predefined manual assignment of side metabolites (like ATP, ADP, water, CO2 etc.) and main metabolites is not necessary as the new concept uses chemical structure information (global and local similarity) between metabolites for identification of the shortest path.
Availability: PHT is accessible at http://www.pht.uni-koeln.de
Contact:d.schomburg@uni-koeln.de
INTRODUCTION
With the advent of the ‘omics’ era, more and more system-based approaches to biological functions are being developed. Metabolome analysis and metabolomics are gaining significance, as they help us to understand the complexity of the underlying cellular networks in organisms. The completion of a large number of genomes has made the comparative study of genomes possible at different levels. One way to gain a better understanding of the sequenced genomes is to analyse the underlying metabolic network and its topology in different genomes. Several databases provide information about metabolic pathways. We have used KEGG (Kanehisa et al., 2004) as the basic database for our analysis apart from BRENDA (Schomburg et al., 2004) and PROSITE (Hulo et al., 2004). A global view of the connectivity in metabolic pathways, and the contribution and usage of certain metabolites in these pathways is highly instructive. Shortest path analysis (Arita, 2004) is one of the most comprehensive methods to analyse a graph (Metabolic Pathways) at different levels (Reactions) in terms of local and global connectivity between metabolites. With the Pathway Hunter Tool (PHT) it is also possible to calculate statistical information (Barabasi and Oltvai, 2004) from the topology arising from the interacting molecules in order to capture the nature of connectivity.
METHODS
Whereas a number of well-established methods exist for the analysis of shortest paths in graphs, the situation in metabolic networks is a little more complicated. In the example using reactions given in Figure 1, a shortest path algorithm for metabolic pathways is required to follow the path of the thick lines with the result that there exists a path between phosphate and ATP via glucose-6-phosphate (thick line). However, there is no way to produce fructose from phosphate to fructose (thin line). In the third reaction of the scheme, the algorithm has to decide in which direction to go, depending on the starting point being either glucose or phosphate.
Therefore it is important to connect two metabolites in a reaction with respect to their structural similarity. We have used the fingerprint algorithm from the Chemistry Development Kit (CDK) (Steinbeck et al., 2003) to convert the 2-dimensional chemical structure information to a 1-dimesional binary stream as a fingerprint for faster similarity search (Whittle et al., 2003). Using the fingerprints, the similarity between two molecules was calculated using a normalized scoring function obtained by combination of the atomic mass value of the metabolites and the Tanimoto algorithm (Xue et al., 2003). This allows avoidance of the false connectivity in the metabolic pathway thus made the path search algorithm more robust.
In order to calculate the shortest path between two metabolites, the breadth-first-search (BFS) algorithm (Jungnickel, 2002) is used in PHT. Higher-order horn logic (HOHL) (Nadathur and Miller, 1990) has been used to satisfy the constraints. Our new algorithm automatically discriminates between side metabolites (like ATP, ADP, water, CO2, etc.) and main metabolites while finding the shortest path without the need to predefine those. Predefined exclusion of small metabolites in the metabolic pathway may lead to broken links in the network or longer connectivity. This implies that at each reaction step the algorithm should be able to decide which metabolite to choose for further connectivity in the pathway and which to skip.
ALGORITHM
In this section the new algorithm used in PHT to find the shortest path in the biochemical network is described.
1 Definition of the metabolite mapping scoring function
2 Local mapping metabolites in reactions
The derived scoring function was used to find a suitable mapping between substrate molecules and product molecules. We use a slightly modified form of game theory (http://www.gametheory.net/) in order to map the substrate to the product metabolite. The method consists of construction of a matrix with substrates as rows and products as columns with the score defined above as matrix elements. The score between any substrate or product whose extension is smaller than three bonds is set to zero. A substrate is mapped to a product when the score dominates all other scores in the present row or column respectively. By this procedure we keep track of the maximum structural similarity between two interacting metabolites. Figure 2 illustrates the outcome of our mapping procedure when applied to a reaction.
3 Shortest path between two metabolites
For the calculation of the shortest paths, the two biochemical criteria ‘local’ and ‘global’ structural similarity are used, where ‘local similarity’ is defined as the similarity between two intermediate molecules and ‘global similarity’ is defined as the amount of conserved structure found between the source metabolite and the destination metabolites after a series of reaction steps (Fig. 3).
The only potential drawback of this method is that not all metabolites in the metabolite databases have structures (e.g. macromolecules like proteins or nucleic acids, or generic molecules like ‘an alcohol’). In these cases the user may miss some connectivity due to lack of structural information. It is possible to cross-check this result by switching off the ‘Atom Mapper’ (local similarity) and ‘Atom Tracer’ (global similarity) options thereby performing the search on the ligand-number-based mapping obtained from the KEGG reaction database. On the other hand, the power and biochemical relevance of having local similarity and global similarity is very high. In the future we plan to provide non-standard structural information for these metabolites in order to allow the inclusion of such reactions.
Complexity of the algorithm
The shortest path between source and destination metabolite is the minimum number of reaction steps between them (Fig. 4). We consider the metabolic pathway in our system as a directed graph with all edges (reactions) sharing the same cost (here 1). Hence this does not lead us to an NP-complete problem as one can calculate the k-shortest path between two metabolites using the BFS (breadth first search) algorithm. HOHL (Nadathur and Miller, 1990) has been used to satisfy the constraints (similarity) with the BFS algorithm in order to calculate k-shortest paths between two metabolites (source and destination). This means that the runtime of the tool depends on the metabolites and reactions present in an organism. We are able to generate all possible k-shortest paths between two metabolites under given criteria of global and local similarity.
Program options
Presently PHT has four options.
Find k-shortest path to convert one metabolite into another in a given network (organism-specific or general metabolic network).
Find k-shortest paths from a substrate metabolite to all feasible metabolites in a given network (organism-specific or general).
Find k-shortest path to a product metabolite from all feasible substrate metabolites in a given network (organism-specific or general).
Statistical analysis of the metabolic pathways like average path length, diameter of the network, average node connectivity, loose ends in the network, hubs in a given network (organism-specific or general).
User defined constraints
There are sets of user-defined constraints, which can be used for an in-depth network analysis without affecting the biochemical/biological relevance.
While traversing through the metabolic pathway it is possible to set the similarity measure score (Atom Mapper) between interacting molecules and to define the amount of structure change with respect to this reference molecule at each reaction step (Atom Tracer).
By setting the Minimum path length and Maximum path length, the path between two metabolites in the network can be altered. For example, if the minimum path length is set to 6, then the algorithm will drop paths below it and report the next possible shortest path above or equal to 6, which is the shortest possible path under the given constraint.
It is possible to choose via Metabolite, not via Metabolites and not via Enzymes options for use of a particular set of pathways.
Under Build Virtual Organism it is possible to add one's own set of enzymes and perform further analysis. This is very useful for identification of the missing links in the network.
RESULTS
We performed a shortest path analysis (Fig. 4) in Escherichia coli K-12 between beta-d-glucose and pyruvate, which was found to be nine steps long. We considered global similarity and local similarity while traversing the path. The algorithm automatically identifies the correct connectivity between the metabolites at each reaction step.
We also performed a comparative study between the KEGG reaction reference map, Corynebacterium glutamicum, E.coli K-12 and Mycobacterium tuberculosis (Fig. 5). We were interested in finding the shortest path between ‘d-erythrose 4-phosphate’ and ‘chorismate’, which was found to be in seven reaction steps in all these cases. Looking closely at Figure 5, it is clear that different pathways are possible to convert ‘d-erythrose 4-phosphate’ to ‘chorismate’ in the reference map, or in C.glutamicum, E.coli K-12 and M.tuberculosis (score 4 on the edge). Some organisms may use enzyme ‘1.1.99.25’ (blue colour) to perform the same conversion (score 1 on the edge).
OUTPUT FORMAT
PHT generates three kinds of output:
A Text-based output can be viewed immediately in the browsers and is supplied with hyperlinks to other database, like BRENDA, KEGG and PROSITE.
A Graphical view of the output is generated for ‘Metabolic Pathways’ and ‘Enzyme’ connectivity as graph modeling language (GML) (http://www.infosun.fmi.uni-passau.de/Graphlet/GML/) files. These portable files can be saved on the client's system and can be viewed later in any dynamic layout software that can read the GML format (e.g. the yEd (http://www.yworks.com/products/yed/ graphical editor).
An ‘enzyme–enzyme’ connectivity matrix, which can be used for pathway alignment and other studies. The ‘reaction-organism matrix’ highlights the presence of reactions in organisms by binary 1 and 0 for absence.
This figure exemplifies the problem of finding a valid shortest path in the biochemical network.
This figure exemplifies the problem of finding a valid shortest path in the biochemical network.
Metabolite mapping obtained from our new algorithm shows that ATP maps to ADP (dashed line) and d-glucose maps to d-glucose-6phosphate (dotted line).
Metabolite mapping obtained from our new algorithm shows that ATP maps to ADP (dashed line) and d-glucose maps to d-glucose-6phosphate (dotted line).
Shortest path between metabolities β-d-glucose to d-xylulose 5-phosphate is in three steps and only 45% of the structural is common between them globally.
Shortest path between metabolities β-d-glucose to d-xylulose 5-phosphate is in three steps and only 45% of the structural is common between them globally.
The shortest path between metabolites β-d-glucose and pyruvate in E.coli K-12 is nine reaction steps long.
The shortest path between metabolites β-d-glucose and pyruvate in E.coli K-12 is nine reaction steps long.
Enzyme–enzyme connectivity map highlights the shortest path (seven reaction steps) between ‘d-erythrose 4-phosphate’ and ‘chorismate’ in the KEGG reference map and C.glutamicum, E.coli K-12 and M.tuberculosis. The weights given at the connections reflect the number of occurrences of this step in the queried pathways. 1.1.99.25 is found only in the reference map (originating from Acinetobacter calcoaceticus).
Enzyme–enzyme connectivity map highlights the shortest path (seven reaction steps) between ‘d-erythrose 4-phosphate’ and ‘chorismate’ in the KEGG reference map and C.glutamicum, E.coli K-12 and M.tuberculosis. The weights given at the connections reflect the number of occurrences of this step in the queried pathways. 1.1.99.25 is found only in the reference map (originating from Acinetobacter calcoaceticus).
The authors are grateful for financial support by the German Federal Ministry for Education and Research (BMBF).
REFERENCES
Arita, M.
Barabasi, A.L. and Oltvai, Z.N.
Hulo, N., Sigrist, C.J., Le Saux, V., Langendijk-Genevaux, P.S., Bordoli, L., Gattiker, A., De Castro, E., Bucher, P., Bairoch, A.
Jech, T.J. and Jech, T.
Jungnickel, D.
Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., Hattori, M.
Nadathur, G. and Miller, D.
Schomburg, I., Chang, A., Ebeling, C., Gremse, M., Heldt, C., Huhn, G., Schomburg, D.
Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., Willighagen, E.
Whittle, M., Willett, P., Klaffke, W., van Noort, P.
Willet, P., Barnard, J.M., Downs, G.M.
Xue, L., Godden, J.W., Stahura, F.L., Bajorath, J.





