Updates in Rhea—a manually curated resource of biochemical reactions

Rhea (http://www.ebi.ac.uk/rhea) is a comprehensive and non-redundant resource of expert-curated biochemical reactions described using species from the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Rhea has been designed for the functional annotation of enzymes and the description of genome-scale metabolic networks, providing stoichiometrically balanced enzyme-catalyzed reactions (covering the IUBMB Enzyme Nomenclature list and additional reactions), transport reactions and spontaneously occurring reactions. Rhea reactions are extensively curated with links to source literature and are mapped to other publicly available enzyme and pathway databases such as Reactome, BioCyc, KEGG and UniPathway, through manual curation and computational methods. Here we describe developments in Rhea since our last report in the 2012 database issue of Nucleic Acids Research. These include significant growth in the number of Rhea reactions and the inclusion of reactions involving complex macromolecules such as proteins, nucleic acids and other polymers that lie outside the scope of ChEBI. Together these developments will significantly increase the utility of Rhea as a tool for the description, analysis and reconciliation of genome-scale metabolic models.


AIMS AND SCOPE OF RHEA
Rhea is a manually curated resource of biochemical reactions for the functional annotation of enzymes and the description of genome-scale metabolic networks (1). Rhea provides stoichiometrically balanced descriptions for enzyme-catalyzed reactions, transport reactions and spontaneously occurring reactions using chemical species from the Chemical Entities of Biological Interest (ChEBI) ontology (2), specifying reaction constituents, their stoichiometric coefficients and relative locations. This information is manually curated from peer-reviewed literature by experts. Each Rhea reaction is assigned a unique identifier, with uniqueness ensured by the calculation of a fingerprint for each reaction which considers the constituent compounds, their stoichiometry and localization. Reaction constituents are represented by the major micro-species at pH 7.3 (verified using the Marvin pKa calculator from ChemAxon (version 6.2.0, http://www.chemaxon.com)). All reactions are stoichiometrically balanced for both mass and charge, which facilitates the use of Rhea for the construction, analysis, comparison and reconciliation of genome-scale metabolic models (3,4). More details on the representation of reactions can be found in our preceding paper (1). Rhea provides metabolic reactions for a number of other biological data and knowledge resources including the EBI Enzyme Portal (5), the reference layer of the MetaboLights resource (6), the metabolic model analysis and reconciliation platform of MetaNetX.org (7,8), the microbial genomic annotation platform MicroScope (9) and IntEnz, a reference for the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecu-  (2), to biochemical reactions of EcoCyc (19), MetaCyc (20), KEGG (18), MACiE (16), Reactome (17) and UniPathway (21), to EC numbers of IntEnz (10) and to protein sequences of UniProtKB (15). The Gene Ontology (GO) is closely aligned with ChEBI , and GO molecular functions describing enzymatic reactions cross-reference Rhea (29). Rhea is one of the reaction repositories employed by MicroScope (9), an integrated resource for the curation and comparative analysis of genomic and metabolic data of microbes. Rhea also provides metabolic reactions for a number of other resources including the EBI Enzyme Portal (5), the reference layer of the MetaboLights resource (6), the metabolic model analysis and reconciliation platform of MetaNetX.org (7,8), EC-BLAST (11) and Metabolic tinker (12). lar Biology (NC-IUBMB) on the nomenclature and classification of enzymes (10). Rhea reactions are also used by tools such as EC-BLAST, a tool to automatically search and compare biochemical reactions (11) as well as Metabolic tinker (12), an online tool for guiding the design of synthetic metabolic pathways. Interactions between Rhea and other resources are described in Figure 1.

EXTENDING RHEA TO COMPLEX MACRO-MOLECULES AND POLYMERS
Computational models of cellular metabolism generally include instances of complex biological macromolecules such as proteins, nucleic acids and other polymers that lie outside the scope of ChEBI, which deals with small molecules and metabolites. To permit the representation of reactions involving such entities in Rhea, we have introduced generic compounds ('Rhea generics') and polymers ('Rhea polymers'). Rhea generics represent complex biological macromolecules such as proteins, nucleic acids and complex polysaccharides. Rhea polymers represent compounds that appear on both sides of a given reaction with different relative polymerization indices, such as 'n' and 'n + 1'.
We describe first the use of Rhea generics. Each Rhea generic has a unique identifier and a name that specifies the nature of the biological macromolecule under consideration. Residues and functional groups that are modified during the course of the reaction are represented explicitly using entities from ChEBI, which allows stoichiometric balancing for mass and charge. An example of the usage of Rhea generics is modification reactions involving acyl carrier protein (ACP), which plays an essential role in the process of fatty acid biosynthesis. Before ACP can accept acyl chains for elongation the protein must be activated by ACP synthase, which attaches a phosphopantetheine group from coenzyme A (CoA) to a conserved serine residue of ACP, releasing adenosine 3',5' bisphosphate (13). In Rhea, this post-translational modification is described by RHEA:12071 ( Rhea polymers differ from Rhea generics. Rhea polymers have been introduced in order to allow balancing of polymerization reactions that include different abstract polymerization indices for polymers such as 'n' and 'n + 1', as in this example: ChEBI contains only a single Instance of each abstract polymer, with a single unknown polymerization index. Each Rhea polymer has an identifier (prefixed by 'POLYMER'), a name, a link to the corresponding ChEBI polymer and a relative polymerization index. Several Rhea polymers may share the same ChEBI entry, but they must have different polymerization indices, which are used in reaction balancing. The use of Rhea polymers in the context of polymerization reactions is shown in Figure 3.

RHEA CONTENT
Rhea has grown steadily since our last report through the expert curation of new chemical entities and reactions from peer-reviewed literature. At the time of writing, Rhea (release 53) includes 7044 unique reactions involving 5927 unique reaction participants, and cites 2766 unique PubMed identifiers (see http://www.ebi.ac.uk/rhea/ statistics.xhtml for details). This corresponds to a 63% increase in the number of unique reactions and a 161% increase in the number of unique citations since our last publication in 2012 (Rhea release 24, containing 4321 unique reactions, 3788 unique reaction participants and citing 1058 unique PubMed identifiers).
The value and utility of Rhea reactions are enhanced by extensive cross-references to other public resources (Table 1). The cross-references are extensively manually curated and crosschecked, with information on possible corrections and clarifications being regularly exchanged between curators of Rhea and those of the other resources. In addition, cross-references are automatically added from Rhea to UniProtKB/Swiss-Prot (15) protein records (through Enzyme Commission (EC) numbers in IntEnz) and to reaction descriptions in MACiE (16) and Reactome (17) (through shared participants).

SUBMISSIONS TO RHEA
Rhea welcomes submissions describing new reactions or suggesting updates to existing reactions. All submissions should be posted on our SourceForge Reaction Requests/Updates tracker (http://sourceforge.net/p/rheaebi/reaction-requests-updates) with relevant information (name, 2D structure. . . ) for each reaction participant and cross-references to other relevant databases and source literature where available. Reactions requested for a publication under review are assigned preliminary status during the peer-review process and acquire approved status once the manuscript has been accepted.

RHEA AVAILABILITY
The Rhea web server (http://www.ebi.ac.uk/rhea) provides programmatic access as well as browsing, searching and download facilities. Details of common search options -including compound names, compound and reaction identifiers, reaction equations, EC numbers, UniProtKB/Swiss-Prot accession numbers, bibliographic citations and identifiers from external cross-referenced resources such as KEGG (18), EcoCyc (19), MetaCyc (20), UniPathway (21), MACiE or Reactome --are provided in our last publication (1). Searches with compound identifiers may be pre-fixed with CHEBI, POLYMER or GENERIC to specify the desired type of molecule. Rhea generics and polymers may also be retrieved by searching for the associated ChEBI residue/group or compound (e.g. 'CHEBI:29999') or by name (e.g. 'ACP'). It is possible to link to reactions in the public web site using the following URL template http:// www.ebi.ac.uk/rhea/reaction.xhtml?id=, adding the numerical reaction identifier as in this example: http://www.ebi.ac. uk/rhea/reaction.xhtml?id=10499. All Rhea data is available for free download (http:// www.ebi.ac.uk/rhea/download.xhtml) in BioPAX level 2 (biopax2) (22), RXN and RD (23) formats. In the BioPAX level 2 distribution of Rhea all reaction participants are defined by the class 'physicalEntityParticipant'. Crossreferences to other databases such as ChEBI, EcoCyc, In-tEnz, KEGG, MACiE, MetaCyc, Reactome, UniPathway and UniProtKB are also available as tab-separated text files. The 2D structures of chemical compounds used in Rhea are available for download either as individual molfiles or as a Structure-Data File (SDF). These chemical formats are specified by Accelrys (formerly by Molecular Design Limited (MDL) (23)).
Rhea RESTful web services allow reactions to be retrieved in BioPAX level 2 (22), CMLReact (cmlreact) (24) or RXN CTfile (rxn) (23) formats by querying for their identifier or other terms. Example queries are provided in the online documentation. Rhea also provides a BioJS component (BioJS.Rheaction, http://www.ebi.ac.uk/Tools/ biojs/registry/Biojs.Rheaction.html), which can be used to display (and possibly modify the layout of) a Rhea reaction in an external web page given only its Rhea ID. The EBI Enzyme Portal makes use of this BioJS component to display reactions (example: http://www.ebi.ac.uk/ enzymeportal/search/P45850/reactionsPathways).

FUTURE DIRECTIONS
We are actively developing Rhea as a vocabulary for the functional annotation of enzymes in UniProtKB. This annotation is currently provided using the enzyme classification (EC numbers) of the Enzyme Nomenclature committee of the IUBMB and textual reaction descriptions sourced from the ENZYME database (25) (itself derived from In-tEnz). Our current work involves the translation of all outstanding IUBMB reactions (the majority of which involve generic compounds or polymers) into Rhea. We are also expanding Rhea to cover the hundreds of enzyme activities that are not yet described by the IUBMB classification (26-28), many of which already have textual reaction descriptions annotated in UniProtKB (one example being the aforementioned GDP-␣-D-mannose hydrolysis reaction RHEA:28105 described in UniProtKB/Swiss-Prot record P32056). We will also exploit the underlying ontology of ChEBI in order to provide a logical reaction classification based on the curated relations between reaction participants. This will serve as a useful complement to the classification of enzymatic activities by the IUBMB.