MACiE (which stands for Mechanism, Annotation and Classification in Enzymes) is a database of enzyme reaction mechanisms, and can be accessed from http://www.ebi.ac.uk/thornton-srv/databases/MACiE/. This article presents the release of Version 3 of MACiE, which not only extends the dataset to 335 entries, covering 182 of the EC sub-subclasses with a crystal structure available (∼90%), but also incorporates greater chemical and structural detail. This version of MACiE represents a shift in emphasis for new entries, from non-homologous representatives covering EC reaction space to enzymes with mechanisms of interest to our users and collaborators with a view to exploring the chemical diversity of life. We present new tools for exploring the data in MACiE and comparing entries as well as new analyses of the data and new searches, many of which can now be accessed via dedicated Perl scripts.
Enzymes make the wonderful diversity of life possible, from thermophiles that exist under incredibly harsh conditions to the complexity of higher organisms, such as humans. However, despite their importance and our continued fascination with these often complex proteins we still have a relatively limited understanding of how they function. Since 1964, when the Enzyme Commission (EC) first published their rules for enzyme nomenclature and their system to classify the overall reaction that an enzyme performs (1), there have been over 5000 EC numbers assigned, although 836 have been subsequently either transferred to other EC numbers, or deleted (data correct as of June 2011). The first proteins with a fully defined sequence and assigned identifier from the curated portion of UniprotKB (Swiss-Prot) (2) were deposited in the 1980s, and the first crystal structures relating to an enzyme were deposited in the wwPDB (3) in the early 1970s. Since then, the growth of information has been persistent (Figure 1A); however, there are still some significant gaps in our knowledge (Figure 1B).
Of the 4528 currently active EC numbers, only 2792 have a sequence in Swiss-Prot that has a fully assigned EC number (i.e. a catalytic activity with all four levels of the EC number assigned), and of those only 1761 also have an associated structure deposited in the wwPDB, although not all of these will have a reliable mechanism published in the primary literature. Despite this apparent lack of data, there is a great deal of knowledge available, including structures, gene sequences, mechanisms, metabolic pathways and kinetic data. However, these data tend to be spread between many different databases and throughout the literature. Most web resources relating to enzymes [such as BRENDA (4), KEGG (5), SABIO-RK (6), the IUBMB Enzyme Nomenclature website (1) and IntEnz (7)] focus on the overall reaction, accompanied in some cases by a textual or graphical description of the mechanism. MACiE (8,9), which stands for Mechanism, Annotation and Classification in Enzymes, is a collaboration between the Thornton group (EMBL-EBI), Mitchell group (University of St Andrews, Scotland) and Bertini group (University of Florence, Italy) and was designed to provide a computational description of mechanism by including detailed stepwise mechanistic information for a wide coverage of both chemical space and the protein structure universe. First published in 2005 (9), MACiE usefully complements both the mechanistic detail of the Structure–Function Linkage Database (SFLD) (10) which provides information for a small number of rather ‘promiscuous’ enzyme superfamilies, and the wider coverage with less chemical detail provided by EzCatDB (11) and the Catalytic Site Atlas (CSA) (12). Entries in MACiE are linked, where appropriate, to all of these related data resources. MACiE is also proving a useful resource for understanding how enzymes catalyse the vast array of chemistry with such a (relatively) limited repertoire of catalytic entities (13–16).
This new release of MACiE retains all the original features of previous releases, but includes enriched data content through the extension of data entries (next section), new tools for exploring the diversity of biochemical reactions in MACiE (‘New Methods for Characterizing and Comparing Enzyme Mechanisms’ section) as well as new searches and database statistics (see Supplementary Data). Each biologically meaningful search allows the user to not only access the individual entries, but also view the data in a comparative overview page. Many of these are now available as separate links and visualization of the database online has also been updated (‘Updates to MACiE Website’ section).
DATA CONTENT AND NEW ANNOTATIONS IN MACiE
This release of MACiE represents the addition of 133 new entries since the previous major release (bringing the total number of entries to 335). We now cover >90% (182) of the EC sub-subclasses with an available crystal structure, representing 321 distinct EC numbers. When we include related enzymes as defined using the distant homology described in the CSA, MACiE covers over 800 distinct EC numbers and over 17 000 PDB codes; with a stricter definition, statistically significant similarity using SSEARCH, an implementation of Smith-Waterman, MACiE covers over 600 EC numbers and over 7000 PDB codes. We have also incorporated new annotations, which will be described in the following subsections. With the incorporation of many homologues and functional analogues into MACiE, we have constructed some pre-defined datasets for users interested in specific aspects of MACiE, including datasets relating to the EC classification, diversity in structure and function, mechanistic diversity and other aspects such as cofactor requirements. For more detail on these, please see the Supplementary Data.
Cofactors in MACiE
In previous releases of MACiE (8,9), cofactor annotation was largely neglected. This has now been addressed, and there are two basic types of cofactors which are annotated in MACiE: metal and organic cofactors. Metal cofactors are primarily handled by Metal–MACiE (17), a sister database and collaboration with the Bertini group at CERM in Florence, Italy. Approximately half of all the entries in MACiE contain at least one metal ion (182 MACiE entries, covering 178 distinct EC numbers, have a corresponding Metal–MACiE entry, a complete list can be found at: http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/MACiE/listBy.pl?by=metal). There is significant cross-talk between Metal–MACiE and MACiE, with Metal–MACiE relying upon MACiE for the mechanism annotation, and MACiE taking the metal cofactor annotation from Metal–MACiE. We have created a detailed overview page for each metal involved in a reaction that displays the structural and chemical data for a specific metal ion on a single page. It is possible to retrieve a Metal–MACiE entry directly from MACiE, and also to go directly to the Metal–MACiE entry for a given metal ion from the overview page within MACiE.
We now handle organic cofactors (those small molecules which are mainly composed of non-metal atoms) in a manner analogous to the amino acid residues in MACiE. Thus, we have annotated the function of the cofactor in the individual steps within MACiE, and these data are now displayed on the overview and step information pages as with the catalytic amino acid residues. As part of this remediation process, we have developed the CoFactor database (18) and, where appropriate, MACiE links out to CoFactor from the overview page, which describes the 27 different cofactors currently identified in detail from the perspective of the cofactors themselves, rather than the enzymes in which they function.
Structural data and displaying MACiE in 3D
In order to begin to understand how the local environment of the catalytic amino acid residues affects their function, we have added information on the protein structure. This section (accessible from the overview page under the ‘Structural Overview’ option on the side menu or from the ‘Display structure information’ in the general information section of the overview page) displays the biological unit representative crystal structure for the MACiE entry in an animated Jmol (19) applet (which is distinct from the reaction animations available for some entries) that shows the catalytic domains and catalytic species as a movie. We also identify the different catalytic sites present in the protein [from the CSA (12)].
For each catalytic residue, the residues contacting it have been calculated using HBPlus (20), and are shown, again in a Jmol applet with the display centred on the catalytic residue in focus. The contact information generated using HBPlus has also been used to create a query that allows a user to identify catalytic dyads and triads present in MACiE.
Furthermore, we describe the flexibility of the catalytic residues; this is assessed using the B factors as a crude measure of flexibility. Each residue in the representative PDB code is assigned an average B factor by taking the mean B factor of all the atoms in the residue. In order to cope with the large potential variation in the average B factors and to report these data in a consistent manner, the normalized B Factor (a value between 0 and 1) within the protein structure is created by ordering the average B factors in increasing size and then dividing the ranked position by the total number of residues (21). Both the average B factor and the ranked value are displayed. This section also describes the relative solvent accessibility (RSA) of the residue. This is calculated using NACCESS (22) and is shown as a percentage. Both the B factor and RSA have been added to update the analysis previously performed on a much smaller sub-set of enzymes (21). Finally, this section includes information on the number of hydrogen bond acceptor and donor contacts to the catalytic residue.
Other new annotations
Each reaction now has a reversibility tag added to the overall reaction, which makes no inference on the biological reversibility of the reaction. This reversibility is determined automatically and depends on whether one or more steps are annotated as being unknown, irreversible, or reversible. If one or more steps are annotated with the ‘unknown’ reversibility tag, then the overall reaction is annotated with an unknown reversibility, irrespective of what annotations the other steps have. If one or more steps are annotated with the irreversible tag, then the overall reaction is listed as irreversible, otherwise (i.e. if all steps are annotated as reversible) the overall reaction is listed as reversible.
We have also manually added a brief, textual description of the events of a reaction step. This is displayed from the entry overview page and above the image of the step's reaction on the step page.
Furthermore, we have automated the annotation of CATH domains, based upon the latest release of CATH (v 3.4.0) (23) and the links to both EzCatDB and the SFLD.
NEW METHODS FOR CHARACTERIZING AND COMPARING ENZYME MECHANISMS
MACiE is unique in containing detailed information not only on the overall reaction being performed by an enzyme, but also in the step-wise mechanism and the catalytic residues and cofactors involved in that transformation. The criterion for inclusion into MACiE is that the enzyme is distinct at some level of one or more of these aspects (mechanism, overall reaction or catalytic machinery). In order to define the similarity between enzyme reactions we thus first define similarity (calculated using a Tanimoto similarity score) for each of these three aspects separately, and then combine them to get an ‘overall’ entry similarity.
Catalytic machinery similarity
The catalytic machinery that is carrying out the reaction is defined for the purposes of this measure as the catalytic residues and those residues binding the metal cofactor ions (to include those cases where there are only metal ions acting in the mechanism). We do not currently include the metal and organic cofactors themselves due to the fact that they are often not present in the representative crystal structure used for the 3D superimposition. The simplest method to compare this machinery is to consider the complement of the catalytic amino acid residues. However, due to the variation in the number of amino acid residues annotated as catalytic (from no amino acid residues in M0204 up to 13 in M0143 with the average entry containing only four) a simple fingerprint, in which each amino acid residue type is considered independently and counted, can produce skewed results. In order to compensate for this, we also compare the 3D coordinates of the catalytic machinery by performing a superimposition of the residues using IsoCleft (24). The final similarity is calculated by combining both the complement and superimposition measures in a 9:1 ratio.
Overall reaction similarity
MACiE contains the manual annotation of the bonds formed, cleaved and ‘changed in order’ for the overall and step reactions, and we have turned this annotation into a weighted (i.e. we count both the number and type of bond changed) bond change fingerprint. We have created two types of fingerprint, one that is direction-dependent (i.e. it is important that we know the C–O bond is formed), and another that is essentially direction-independent (i.e. we don't distinguish the exact nature of the bond change, just that the C–O bond is modified during the reaction). At this point stereochemistry is only annotated at the overall reaction level.
The fingerprints describing the bond changes in the overall reaction can then be compared between entries to give an estimation of overall reaction similarity. We currently do not include any measure of the substrate/product similarity, as this information is encoded in the EC number to some extent and it is interesting to observe the cases where very different EC numbers result in almost identical bond change profiles, or cases where similar EC numbers contain very different bond change profiles independent of the substrate/product similarity.
While the similarity of the overall reactions is relatively trivial to calculate, the similarity of the ‘mechanism’ is more difficult. In order to simply capture the similarity between two entries at the step level, we consider the ‘mechanism’ as the sum of all the bond changes involved in all the steps, which we call the ‘composite bond change’ fingerprint. We use this, rather than the more complicated approaches used previously (25–27), as this calculation can be performed quickly on the fly, and also effectively hides differences in how annotators have marked up the reaction, e.g. an elimination followed by a proton transfer happening in two successive steps in one entry and in a concerted manner in another, and reaction sequence timings, e.g. two reactions occur in parallel in the biological system but are annotated as occurring in sequence in MACiE. In the following, when we refer to composite reaction similarity, it is this measure to which we are referring.
Defining the ‘overall’ entry similarity
Each fingerprint thus created can be compared using a Tanimoto similarity score for continuous variables (28), which may take a value between 0 and 1, where 0 indicates no bits in common and 1 indicates that the two fingerprints are identical. The final similarity between two entries can thus be calculated according to the following formula, in which the mechanism is considered the more important, followed by the catalytic machinery and finally the overall reaction chemistry occurring: The weights chosen are arbitrary and designed to define the similarity based on mostly the composite reaction information, but that are also informed by the catalytic machinery and overall reaction. However, each of the measures of similarity can also be investigated individually, and all four measures are displayed on the comparative overview pages.
(0.65 * ‘composite reaction’) + (0.25 * ‘catalytic machinery’) + (0.10 * ‘overall bond change’)
Exploring the data in MACiE
In order to examine the differences between such sets of entries, we have developed the dataset overview pages, which display a comparative analysis of the data available within MACiE for all the entries in the set. This includes an overview of the CATH domains annotated, the number of steps involved, the catalytic machinery and overall reactions as well as the composite reaction similarity and involvement of cofactors.
Each entry now includes a section detailing sequence homologues to the current MACiE entry using the homologues as determined by the CSA [the same as previously reported (8)] and also now using a non-iterative search [using SSEARCH (29)] for a stricter definition of homology (see Supplementary Data for more detail). Furthermore, this section includes details on other MACiE entries with the same EC number (identical to the fourth level) and CATH domains where entries have at least one catalytic CATH domain in common. We also offer the option to view all similar reactions using the overall reaction bond change similarity and the composite reaction similarity, which is available from the side bar menu. Where there are similar entries at the EC or CATH domain level the similarity at the composite reaction and catalytic machinery level is shown and there is the option to compare two reactions, or to view the dataset comparison (where there are three or more entries available).
All entries in MACiE now also include links to view similar reactions from the overview page (for the overall reaction and composite bond change perspectives) and step details page (for the reaction steps). In all cases, only reactions with a Tanimoto similarity score of greater than or equal to a specific cut-off are shown. In the case of the individual reaction fingerprint, this cut off is 0.75, in the case of the composite reaction fingerprint, this cut-off is 0.65. These cut-off values are somewhat arbitrary and have been chosen to show the most similar reactions only. The cut-off value is one of the parameters of the Perl-CGI display script, and so can be altered in the HTML address of the results page by the user.
In the following subsections, specific examples are used to highlight some of the new features available for the comparative overview of sets of entries.
The Diversity within an EC number—the chloroperoxidases (EC:126.96.36.199)
Recently (30) we investigated the number of evolutionary families present in each EC number, and found that on average each EC number had emerged approximately twice independently. Thus, there is potentially a great deal of mechanistic variability within a single EC number. While some of this variability might be related to substrate specificity for those EC numbers that are somewhat generic (e.g. EC 188.8.131.52), there are also cases where the mechanism and catalytic machinery are obviously very different. One such example is the chloroperoxidases (EC 184.108.40.206), for which there are three MACiE entries (M0014, M0248 and M0250), representing three evolutionarily unrelated families.
For this set of entries, the dataset overview pages do not display the overall reaction analysis as all these are identical, the coverage of the EC classification and the mechanisms, some of which is shown in Figure 2.
In the MACiE entries for EC 220.127.116.11 the exact method of producing the hypohalous acid (the common reactive intermediate) from a halide and hydrogen peroxide is different in all three cases. Each enzyme utilizes different catalytic CATH domains and different catalytic machinery, both in terms of amino acid residues and cofactors. These differences are reflected in the composite bond change fingerprints which fall in a relatively wide range (0.3–0.58), despite the overall reactions being identical.
Table 1 shows a selection of homologues to the entry M0248 (one of the chloroperoxidases in MACiE, UniProtKB accession 031168) within UniProtKB. The protein sequence used is taken from the PDB code used as the representative in MACiE (1a7u) and the sequence is fully annotated with the catalytic residues, their location of function and activity, the results of which can highlight where changes in the residues annotated might be related to a change in EC number and hence protein function.
The final columns of the table represent the conservation of the catalytic residues, the top line is the residue number in the sequence of the representative PDB file, the second line denotes the location of function and activity (which utilizes the following symbols: % = main chain spectator, * = side chain reactant, & = side chain spectator) followed by the single letter abbreviation for the residue. Conservative mutations are shown in green text and non-conservative mutations shown in red text.
The diversity within a catalytic motif—entries in MACiE containing a catalytic triad
One of the new searches added to MACiE (among several other new searches described in Supplementary section S.2 of the Supplementary Data) allows the user to search for catalytic dyads and triads. These motifs are defined as two or three residues which are hydrogen bonded to one another, and are determined automatically using HBPlus. One potential application of this search might be to identify all the entries in MACiE that utilize a Ser–His–Asp triad, as described below.
There are five entries in MACiE with a Ser–His–Asp triad where at least one of the residues is annotated as being catalytic. While the majority of these entries are in the hydrolase class of enzymes (EC 3), there are examples in the oxidoreductases (the cofactor-free choloroperoxidase, EC 18.104.22.168, M0248) and lyases (hydroxynitrilase, EC 22.214.171.124, M0217). Despite the fact that all these entries contain a Ser–His–Asp triad, these enzymes perform a distinct set of overall reactions (at the bond change only level) and have different catalytic machinery profiles, as can be seen from Figure 3. The difference in catalytic machinery may be partly related to the fact that although all these enzymes have an oxyanion hole (to stabilize the covalently attached oxyanion tetrahedral intermediate), this hole is usually made up of main chain amide groups (except in the case of M0094 where the side chain of Asn104 is one of the residues making up the oxyanion hole), and the actual identity of these residues are widely different (including Met, Phe, Leu, Glu, Gly and Tyr).
Except for the lyase example (M0217) the mechanisms are similar, and indeed contain at least four identical steps; formation of the enzyme–substrate covalently attached tetrahedral intermediate, initial elimination to re-form the carbonyl group, addition of water to the covalently attached intermediate followed by cleavage of the product from the enzyme. The variation is often either in following steps (as with the chloroperoxidase) or in the substrates involved. However, in the case of hydroxynitrilase, the catalytic triad is not acting in this manner, nor does it appear to have the standard oxyanion hole, with the substrate lacking the common carbonyl group of the other reactions’ reactants. Indeed, in this enzyme the serine is simply acting as a proton shuttle and not in covalent catalysis.
The diversity within an evolutionarily related family
Another question that we can now address is to investigate the diversity of entries relating to a family of enzymes. We have recently shown, using the phosphatidylinositol–phosphodiesterase and Ntn-type amide hydrolase families, (N. Furnham et al., submitted for publication) that there is often a good deal of variability within a family of enzymes (as represented by a single CATH domain) at the overall reaction level, as well as the structural level. This variability can be analysed in terms of the overall reaction, mechanism, composite reaction and catalytic machinery using the new overview pages. We are also starting a long-term collaboration with the SFLD, a database of ‘promiscuous’ enzyme superfamilies, so that all reactions in that database that fulfil the criteria for inclusion into MACiE are incorporated into our dataset. Version 3 of MACiE already incorporates a total of 26 entries from the SFLD, with all 10 structurally characterized families in the crotonase superfamily already included into MACiE.
UPDATES TO MACiE WEBSITE
Version 2.0 of MACiE (8) was based on static HTML pages. We have since moved to a model in which all the pages relating to the data content of MACiE (i.e. the lists of entries by one of the EC number, PDB code, CATH code or MACiE identifier) are generated, on the fly, by Perl CGI scripts and thus are updated automatically whenever the database is updated. Other minor changes to the online content of MACiE include the addition of mouse-over descriptions of the amino acid residue functions, mechanisms and mechanism components. These descriptions are linked to the MACiE dictionaries. We have added navigation buttons to the reaction steps, to allow users to cycle through the steps. Finally, we have added in GO terms for each entry, based on the primary PDB code and the associated UniProt accession code (31).
MACiE is an actively developing resource, and we are continuously extending its coverage. As part of this, and as mentioned before, we are working closely with the SFLD to extend coverage in MACiE of evolutionarily related superfamilies. We are beginning to work towards a new data entry system, which will be online and as automated as possible and will allow the enzyme community to add data to MACiE. We are also working on allowing users to search the intermediates in the database as well as the substrates and products, not only textually (as is currently the case), but also through substructure similarity. Furthermore, we are working on ways to handle alternative mechanisms and enzyme promiscuity more robustly. Finally, we will continue to use MACiE to attempt to understand enzymes and how they function.
Supplementary Data are available at NAR online: Supplementary Sections 1–3, Supplementary Figure 1 and Supplementary References [8,9,13,21,25,30].
Ministero Instruzione Università Ricerca (RBLA032ZM7, PRIN 2007M5MWM9 to C.A.); National Library of Medicine (LM04969 to W.R.P.); National Institutes of Health (R01 GM60595 to D.E.A.). Funding for open access charge: EMBL.
Conflict of interest statement. None declared.
G.L.H. would like to thank Prof. Janet Thornton and Dr John Mitchell for their useful discussions and continued support. We would also like to thank the Wellcome Trust, EMBL and IBM (G.L.H., J.D.F., S.A.R., S.T.W.). J.D.F. is part of the European Molecular Biology Laboratory studentship programme, and is also affiliated with Cambridge University Department of Biology.