Metingear: a development environment for annotating genome-scale metabolic models

Summary: Genome-scale metabolic models often lack annotations that would allow them to be used for further analysis. Previous efforts have focused on associating metabolites in the model with a cross reference, but this can be problematic if the reference is not freely available, multiple resources are used or the metabolite is added from a literature review. Associating each metabolite with chemical structure provides unambiguous identification of the components and a more detailed view of the metabolism. We have developed an open-source desktop application that simplifies the process of adding database cross references and chemical structures to genome-scale metabolic models. Annotated models can be exported to the Systems Biology Markup Language open interchange format. Availability: Source code, binaries, documentation and tutorials are freely available at http://johnmay.github.com/metingear. The application is implemented in Java with bundles available for MS Windows and Macintosh OS X. Contact: johnmay@ebi.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Chemical File Formats Metingear can load from several file formats including, Chemical Markup Language (Kuhn et al., 2007), Mol (Warr, 2011), The IUPAC International Chemical Identifier (InChI) and SMILES (Warr, 2011). Models can be exported in SBML with annotations of cross-references and InChI. Mol and CML files provide the exact depiction of what the original author of the structure drew whilst InChI and SMILES requires a new structure diagram is generated. Currently direct drawing of structures is not available. The JChemPaint (http://jchempaint.github.com/) (Krause et al., 2000) application uses a incompatible version of the CDK library.
Services Metingear currently provides services to access; ChEBI (Matos et al., 2010), MetaCyc (Caspi et al., 2012), KEGG Compound (Kanehisa et al., 2012), LIPID Maps (Sud et al., 2007), Human Metabolome Database (HMDB) (Wishart et al., 2009), PubChem-Compound (Bolton et al., 2008 and UniProt (The UniProt Consortium, 2012). Each service, except PubChem, has a loader which allows the user to update the resource with the latest available version. When the download is small enough and available the resource can be updated automatically, in the cases of KEGG and MetaCyc where a fee or registration is required the file can be specified as a location on the local file system (see. https://github.com/johnmay/metingear/wiki/Resources). The loaders create a searchable index in the specified folder. The services then check for this index when a new tool is opened, if the index has been created then the service is available. As with the cross-references each service is linked with MIRIAM registry (Juty et al., 2012) information. This allows Metingear to recognise which resource a metabolite is annotated with and try and locate a required service (e.g. name search, structure download). If there is no local index service loaded then it will default to a web-service query. The services load dynamically at runtime and thus it is possible to add custom services which may connect to in-house databases or web-services and provided specialised compounds. This feature also makes it very easy to integrate new resources (such as the reconciled databases) and keep up to date with the existing resources. Recently HMDB changed their download format, to accommodate this change, only a new loader for the format was required. The existing loader has been kept for legacy and still usable but the existing HMDB services access did not need to be changed.
Inconsistencies Annotating previously published models revealed inconsistencies which could not be easily identified in the original spreadsheet. A model of Lactobacillus plantarum WCFS1 (Teusink et al., 2006) was found to be missing a reaction equation for reaction UGMDDS2 (Fig. S1). Also in a model of Bacillus subtilis (i Bsu1103) (Henry et al., 2009), three reactions were found to reference a metabolite not found in the metabolites table ( Fig. S2a and S2b). These inconsistencies demonstrate the use of specialised software in curation of larger reconstructions. These inconsistencies were identified automatically when a model is loaded from Excel. Other inconsistencies checks are carried out in the background of the model but do not declare an error. The mass and charge balance of a reaction is indicated by scales icon which tips to which ever side is heavier or is balanced when the reaction is balanced. Structures attached to metabolites are checked as to whether they match encoded formulas and charges (which can be imported and extracted). This indication serves only as a hint that something might be wrong as the charge and formula annotations may be absent. Additional Features In addition to handling metabolites and reactions their is also support for genes and gene products. These can be imported from the European Nucleotide Archive (ENA) XML (Amid et al., 2012) and fasta formats. When importing models from a spreadsheet the locus of the reaction is often annotated. This locus annotation can be paired with the gene/gene product information to provide a model which is enriched with sequence as well as chemical information. Although not the primary purpose Metingear can also run a homology search using a locally installed BLAST (Altschul et al., 1990) instance and transfer the annotations from homologous sequences. We are currently focused on metabolite annotation but in future we will improve the gene and gene product linking to be more automated.
A real-time search, undo/redo edit support, star rating and sub-collections help in general navigation. All entities can be easily renamed merged and split allowing the flexibility when editing a reconstruction. Each entity can have it's name and abbreviation changed the primary identifier assigned automatically. Each model is encoded with taxonomy information and when available compartments are annotated with Gene Ontology Terms (Camon, 2003) in the SBML output.
Each metabolite with a structure, molecular formula and charge indicates via structural validity whether the structure matches the given formula and charge. The formula and charge can often be imported from the spreadsheets or SBML notes and provides a check as to whether the attached structure is correct. Reactions indicate whether their participants are balanced (mass only) and whether they are transport reactions.
Internally a binary format is used for the reconstructions, this format provides very rapid loading and saving of reconstructions. Draft reconstructions from the model-SEED (Henry et al., 2010) can be directly imported via the spreadsheet format without having to select which fields are present. Metingear can also create and export a stoichiometric matrix to a tabular file or to a '.sif' which can be loaded in Cytoscape (http://www.cytoscape.org/). The chemical structure of metabolites in the models can be exported to a single structured-data file (SDF) (Warr, 2011).
A primitive but functional dialog plugin framework allows one to extend Metingear with their own tools (https://github.com/johnmay/metingear/wiki/Plugable-Dialogs). The IUPAC International Chemical Identifier line notation of chemical structure SMILES The simplified molecular-input line-entry specification representation of chemical structure Charge The charge of this metabolite Exact Mass The exact mass is the sum of the masses of the atoms in a molecule using the most abundant isotope for each element Molecular Formula The chemical formula of a metabolite   (Table S2-Reaction Data) reference a metabolite, cpd00498 (highlighted red), however information about this metabolite is missing from the metabolites sheet (Table S1-Compound Data) (Henry et al., 2009). 2b) Expected location of the missing metabolite cpd00498 in the metabolites table (marked with a red line). Using the named reaction (not shown) it is possible to see that the name of missing metabolite is 2-aceto-2hydroxybutanoate but no other details to this metabolite are provided.