The Virtual Metabolic Human database: integrating human and gut microbiome metabolism with nutrition and disease

Abstract A multitude of factors contribute to complex diseases and can be measured with ‘omics’ methods. Databases facilitate data interpretation for underlying mechanisms. Here, we describe the Virtual Metabolic Human (VMH, www.vmh.life) database encapsulating current knowledge of human metabolism within five interlinked resources ‘Human metabolism’, ‘Gut microbiome’, ‘Disease’, ‘Nutrition’, and ‘ReconMaps’. The VMH captures 5180 unique metabolites, 17 730 unique reactions, 3695 human genes, 255 Mendelian diseases, 818 microbes, 632 685 microbial genes and 8790 food items. The VMH’s unique features are (i) the hosting of the metabolic reconstructions of human and gut microbes amenable for metabolic modeling; (ii) seven human metabolic maps for data visualization; (iii) a nutrition designer; (iv) a user-friendly webpage and application-programming interface to access its content; (v) user feedback option for community engagement and (vi) the connection of its entities to 57 other web resources. The VMH represents a novel, interdisciplinary database for data interpretation and hypothesis generation to the biomedical community.


INTRODUCTION
Metabolism plays a crucial role in human health and disease, and it is modulated by intrinsic (e.g. genetic) and extrinsic (e.g. diet and gut microbiota) factors. When considered individually, these factors do not sufficiently explain the development and progression of many complex noncommunicable diseases, including metabolic syndrome and neurodegenerative diseases. Hence, a systems approach is necessary to elucidate the contribution of each of these factors and to enable the development of efficient, novel treatment strategies. The VMH database is divided into two interfaces, and its database contains five distinct but connected resources. Users can interact with the database using the two available interfaces: (i) a user-friendly web interface and (ii) an application-programming interface that allows programmatic access to the information contained in the database. At the core of the database is the representation of reconstructions as sets of reactions. The database connects the five resources through shared nomenclature: (i) the 'Human metabolism' and 'Gut microbiome' resources share metabolites and reactions, (ii) the nutrients in the 'Nutrition' resource are mapped to metabolites that can be shared by the human and gut microbes and (iii) the diseases in the 'Disease' resource include affected genes and metabolite biomarkers present in the 'Human metabolism' resource. Finally, the 'ReconMaps' resource is connected to the 'Human metabolism' resource via metabolites and reactions.
Such a systems approach requires the easy sharing of knowledge and experimental data generated by different research communities. Databases represent a compelling method of storing, connecting, and making available a vast variety of information derived from primary literature, experimental data, and genome annotations. In fact, biological databases have become valuable tools for facilitating knowledge distribution and enabling research endeavors.
There is a wealth of biochemical databases (1), however, a database that explicitly connects human metabolism with genetics, human-associated microbial metabolism, nutrition, and diseases has not yet been developed. One reason for the lack of such a database may be the use of nonstandardized nomenclature, which complicates data integration. Moreover, manual curation of database content is time consuming and requires expert domain knowledge.
Genome-scale metabolic reconstructions represent the full repertoire of known metabolism occurring in a given organism and describe the underlying network of genes, proteins and biochemical reactions (2). High-quality reconstructions go through an intensive manual curation pro-cess that follows established protocols to ensure high standards and coverage of the information available on the organism (3). Thus, metabolic reconstructions are valuable knowledge bases that summarize current information on metabolism within organisms. Genome-scale metabolic reconstructions have been generated for representatives of all domains of life, including humans (4) and gut microbes (5)(6)(7)(8). Importantly, these metabolic reconstructions can be converted into computational models using condition-specific information, e.g. transcriptomic (9) or metabolomic data (10,11). Open-access, communitydeveloped toolboxes, such as the Constraint-Based Reconstruction and Analysis (COBRA) Toolbox (10), facilitate simulations with metabolic models that permit us to address a variety of biomedical and biotechnological questions in silico (12,13).
Here, we describe the Virtual Metabolic Human (VMH, https://vmh.life) database, which consists of the five interconnected resources: 'Human metabolism', 'Gut microbiome', 'Disease', 'Nutrition' and 'ReconMaps'. These resources are interlinked based on shared nomenclature and  Overview of the VMH functionalities. Users can search all resources, using the Quick Search bar (1), or specific resources through the 'Browse' button (2) or the resource panels available in the main page (3). At any point in time, it is possible to provide feedback or report issues with the VMH through the feedback button (4). If a user performs a quick search (e.g. 'h2o') different result grids will be available. Each type of entity will be displayed in its corresponding grid (5). Each detail page (6) contains additional information and connections with other resources (both internal and external -7). For instance, by selecting 'Associated human reactions' a user can then navigate to a reaction detail page (8) and from there to other associated entities, such as human genes (9). The VMH also allows the visualization of metabolic pathways through the 'ReconMaps' resource (10). Users can search for a metabolite using the side bar of the map interface (11) and get results as locations in the map panel (12). It is also possible to search for specific reactions making it easier to investigate specific pathways of interest (13) and upload simulation or experimental data (14) through the interface or the COBRA Toolbox (10). With the nutrition resource the VMH offers the ability to design in silico diets that can be used to perform simulations. In this interface, users can search foods from the 'Available foods' panel (15) and add them to the 'Selected foods' panel by specifying the portion size in grams (16). During this process, the top of the 'Selected foods' panel will automatically update information about the diet (17). When this process is completed, the user can download the flux values to integrate in his experiments (18). database entries for metabolites, reactions and genes (Figure 1). Given the extensively curated, diverse information captured in the VMH database, this resource represents a unique, comprehensive and multi-faceted overview of human and human-associated microbial metabolism.

DATABASE DESCRIPTION
The VMH database contains 17 730 unique reactions, 5180 unique metabolites, 3695 human genes, and 632 685 microbial genes as well as 255 diseases, 818 microbes and 8790 food items. Unique features of the VMH database include (i) metabolic reconstructions of human and gut microbes that can be used as a starting point for simulations; (ii) seven comprehensive maps of human metabolism that permit a visualization of omics data and simulation results; (iii) a nutrition designer that allows researchers to design personal dietary plans for computational simulations; (iv) a userfriendly web interface for browsing, querying and downloading the VMH database content; (v) a well-documented representational state transfer application-programming interface (API) for easy access to the database content; and vi) user feedback integration through the feedback button accessible in all pages of the website and the ReconMaps interface, which allows users to leave comments on specific reactions and metabolites ( Figure 2). Great emphasis has been placed on collecting a comprehensive set of databasedependent and independent identifiers to allow for the identification of each entry of the different resources. Additionally, we cross-reference the entries to more than 30 external resources (Table 1), thereby facilitating the access to further metabolic, genetic, clinical, nutritional and toxicological information.

'Human metabolism' resource
The VMH database hosts the most recent version of the human metabolic network reconstruction, Recon3D (4), which describes the underlying network of 13 543 metabolic reactions distributed across 104 subsystems, 4140 unique metabolites and 3288 genes expressed in at least one human cell. The content of Recon3D has been assembled through an extensive literature review over the past decade by the systems biology community (4,(14)(15)(16). Individual pages are dedicated to each reaction, metabolite and gene. These pages contain information on literaturebased evidence as well the relations of the page with other entities in the VMH database (Figures 1 and 2). Novel features of Recon3D include molecular structures and atom mappings (4,17), which are visualized on the metabolite and reaction pages, respectively, in addition to thermodynamic information (18).

'ReconMaps' resource
The ReconMaps resource consists of seven human metabolic maps drawn manually using CellDesigner (19) and hosted within the web service Molecular Interaction NEtwoRks VisuAlization (MINERVA) (20). Six of these maps correspond to the cellular organelles found in human cells: the mitochondrion, nucleus, Golgi apparatus, endoplasmic reticulum, lysosome and peroxisome. On the organelle level, reactions and pathways are drawn based on the defined subsystems, thus allowing the user to perform focused analyses of metabolism occurring in a particular cellular compartment. The seventh map, which is named ReconMap3, accounts for all six organelle maps plus the human metabolic reactions occurring in the cytosol and the extracellular space. Currently, ReconMap3 covers 8151 of the 13 543 (60%) metabolic reactions and 2763 of the 4140 metabolites (67%) captured in Recon3D.
The maps support low-latency content queries and custom dataset visualizations, which are either represented as a text file or automatically uploaded from the COBRA Toolbox (10,21). Tutorials have been developed demonstrating the visualization of data and simulation results onto the ReconMaps (https://opencobra.github.io/cobratoolbox/ stable/tutorials/tutorialRemoteVisualisation.html). Users can submit feedback through the map interfaces by rightclicking on specific elements. From each map entity, users can access the corresponding entry in the VMH database and obtain further information from external resources, such as the HMDB (22), KEGG (23) and CHEBI (24). The VMH connects ReconMaps with the Parkinson's disease map, PDMap (25), which visualizes cellular processes known to be involved in Parkinson's disease through the 'Biochemical and disease maps' section on the Metabolite page, where possible. We have identified 168 metabolites that are shared between these maps, providing a connection between the general human metabolism and Parkinson's disease related cellular pathways. Similarly, ReconMaps have been connected to the Atlas of Cancer signaling network resource (ACSN) (26), which visualizes pathways known to be deregulated in cancer cells, through shared 252 proteins implicated in 22 functional modules of ACSN and in 51 subsystems of ReconMaps. Further disease maps are currently assembled by the community (27), and we will continue to increase the connectivity of the VMH and the ReconMaps to these valuable resources. The disease map connections with ReconMaps enable for data analysis and visualization beyond metabolism.

'Gut microbiome' resource
The 'Gut microbiome' resource currently contains 818 manually curated genome-scale metabolic reconstructions for microbes (5) commonly found in the human gastrointestinal tract (28) and belonging to 227 genera and 667 species. All microbial reconstructions were based on literature-derived experimental data and comparative genomics. A typical reconstruction contains a mean (standard deviation) of 774 (275) genes, 1218 (249) reactions, and 944 (143) metabolites. We provide detailed information for each strain and reconstruction. Gene, metabolite and reaction content are available in each microbe detail page. In addition, for each microbe, we have compiled a list of metabolites that can be used as carbon sources or that are products of fermentation, including supporting references. Importantly, this resource shares metabolite and reaction nomenclature with the other resources, thus allowing for an integrative analysis of microbial metabolism with host metabolism and nutrition.

'Disease' resource
The 'Disease' resource aims at connecting diseases and their metabolic features. We have, so far, focused on inborn errors of metabolism (IEMs), by linking 255 diseases (29) to the genes present in the 'Human metabolism' resource. A total of 288 unique genes and 1872 unique VMH reactions are associated with these IEMs and provide biochemical and genetic descriptions. We have compiled clinical presentations, genotype-phenotype relationships and the affected organ systems for the IEMs from the primary and review literature. Additionally, we connect each entry with up to 21 external resources, thus providing further information on the diseases, genetic testing and ongoing clinical trials. In the future, we envision the expansion of this resource not only by inclusion of more information on included diseases but also with other diseases with metabolic components. The VMH database also hosts the Leigh Map (30), which represents a computational gene-to-phenotype diagnosis support tool for mitochondrial disorders. The Leigh Map consists of 87 genes and 234 phenotypes expressed in Human Phenotypic Ontology (HPO) terms (31), and they provide sufficient phenotypic and genetic variation to test the network's diagnostic capability. The Leigh Map is a first step in integrating diagnosis tools within the VMH. Further development of this resource will provide a detailed multi-layered overview of the connection between clinical features, genetic mutations and metabolic pathways facilitating better understanding of the underlying mechanisms of complex diseases.

'Nutrition' resource
The 'Nutrition' resource consists of two parts: (i) a food database mapped onto the metabolites present in the VMH and (ii) a diet database listing the nutritional composition of 11 pre-defined diets. The food database was built by integrating the molecular composition information for 8790 food items distributed in 25 food groups obtained from the USDA National Nutrient Database for Standard Reference (32). Of the 150 nutritional constituents, 100 could be mapped onto the metabolites present in the VMH database. Most of the remaining unmapped constituents represent general metabolite classes (e.g. fibers). The resource can be queried based on food items as well as their nutritional constituents.
The diet database contains 11 diets that were formulated based on real-life examples and literature. For instance, an 'EU diet' was designed based on information from an Austrian survey (33). The diets consist of a one-day meal plan and include information on the energy content, fatty acids, amino acids, carbohydrates, dietary fibers, vitamins, minerals and trace elements. The composition of each meal is given in appropriate portion sizes. The information for the nutritional composition of each food item and dish was obtained from the 'Österreichische Nährwerttabelle' (http: //www.oenwt.at/content/naehrwert-suche/). The molecular composition of a diet can be downloaded in grams per person (70 kg) per day or as a flux rate (in millimoles per person per day), which can be directly integrated with the human metabolic model (4,29) using the COBRA Toolbox (10).

'Diet designer'
The 'Diet designer' tool allows users to design their diets (Figure 2.D). The interface is divided into two lists: 'Available foods' and 'Selected foods'. Users can search and select any of the available 8790 foods and add them to the list of selected foods by specifying a portion size. As the user designs the diet, the overall information is updated for total calories, lipids, proteins, carbohydrates and portion weight. The user can then see and download the corresponding molecular composition and flux values for the uptake rate of metabolite-mapped nutrients. These flux values can be a starting point for modeling host-microbiome interactions but do not take into consideration differences in absorption along the gastro-intestinal tract. It is also worth mentioning that not all nutrient amounts are converted to metabolite amounts due to the lack of detailed molecular composition information of the food items.

Resources connectedness
We have focused the design of the VMH on the ability to navigate all its resources seamlessly. From any detail page in the VMH, it is possible to access related entities through links in association grids (Figure 2.B). In addition, each entity of the database contains a list of links to external resources with different purposes and focus (e.g. chemistry, nutrition and clinical). We continuously verify the integrity of our links and where possible, use the resolving system Identifiers.org (34). Overall, the VMH connects to 57 different external databases (Table 1). This focus on connectedness will continuously increase the amount and the depth of knowledge that can be accessed through the VMH, thereby increasing the database's utility to the scientific community beyond the systems biology community.

The VMH beyond computational modeling
A growing number of studies link microbial composition with diet and disease (35,36). The generation of novel hypotheses about the functional implications of observed correlations, e.g. between microbial abundances in disease states, is hindered by the lack of online databases to facilitate such work. In particular, the 'diet designer' tool in conjunction with computational modeling permits the generation of in silico hypotheses that could then be experimentally tested. Moreover, the use of synthetic microbial communities is of great value for hypothesis testing, and the VMH database could facilitate the design of defined microbial communities with specified metabolic capabilities.

Accessing the API
The VMH API can be reached at https://www.vmh.life/ api. This page displays some of the available resources that can be used to retrieve data. Each of these is reachable through a Uniform Resource Identifier (URI), which provides data in different formats, such as HTML, JSON or flat file format (CSV). For each of these identifiers, additional query parameters can be applied, which allow to further refine the search (e.g. search a metabolite with a given HMDB identifier). All API endpoints and query parameters are detailed at https://www.vmh.life/ api/docs, where users can test the API usage and get code templates, in different programming languages, to integrate access to the VMH in their applications or scripts.

DATABASE IMPLEMENTATION
The VMH database was implemented with MySQL 5.6 (https://dev.mysql.com/). The front-end is reachable via web browser at https://vmh.life and was developed in Sencha ExtJS 5.1 (https://www.sencha.com/). The API was developed using Python 2.7 via the DJANGO framework and the Django Rest Framework package.
The diagram editor CellDesigner (version 4.4) (19) was used to manually draw the metabolic maps of the 'Recon-Maps' resource. Continuous quality control was achieved using a dedicated MATLAB (Mathworks, Inc.) code for map correction and manipulation. This code and the corresponding tutorial are freely available in the COBRA Toolbox (10) (https://opencobra.github.io/cobratoolbox/).

CONCLUSION
The VMH database captures information on human and gut microbial metabolism and links this information to hundreds of diseases and nutritional data. Therefore, the VMH database addresses an increasing need to facilitate rapid analyses and interpretations of complex data arising from large-scale biomedical studies. Unique and distinguishing features of the VMH database are the following three key factors. First, the VMH database is a comprehensive, interdisciplinary database that permits complex queries. Sec-ond, the VMH database provides a graphical representation of the 'Human metabolism' resource through the 'Re-conMaps' resource, thus allowing for the analysis of complex, multi-faceted omics data in the context of the biochemical knowledge captured in the VMH database. Third, the VMH database represents a starting point for computational modeling of human and microbial metabolism in healthy and diseased states by providing information and simulation constraints and being fully compatible with the COBRA Toolbox (10). While the front-end of the VMH database permits complex, interdisciplinary queries by the general user, the comprehensive API enables programmers to perform many complex searches on the database content. As such, the VMH database provides a novel research tool by increasing the availability of diverse data along the dietgut-health axis to the biomedical community.

DATA AVAILABILITY
The VMH database and its content are freely available at https://www.vmh.life. Metabolic reconstructions and additional materials are available in the 'Download' section, and search results are directly downloadable from the grid interfaces. Users can provide feedback through the different platforms on the website. Detected issues will be addressed and integrated into the database in subsequent releases. The API can be accessed by third-party applications and is also accessible via web browser at https://www.vmh. life/ api. Detailed documentation for the API is available at https://www.vmh.life/ api/docs.