Abstract

CATH version 3.5 (Class, Architecture, Topology, Homology, available at http://www.cathdb.info/) contains 173 536 domains, 2626 homologous superfamilies and 1313 fold groups. When focusing on structural genomics (SG) structures, we observe that the number of new folds for CATH v3.5 is slightly less than for previous releases, and this observation suggests that we may now know the majority of folds that are easily accessible to structure determination. We have improved the accuracy of our functional family (FunFams) sub-classification method and the CATH sequence domain search facility has been extended to provide FunFam annotations for each domain. The CATH website has been redesigned. We have improved the display of functional data and of conserved sequence features associated with FunFams within each CATH superfamily.

DESCRIPTION OF CATH HIERARCHY AND CURRENT POPULATION OF DATABASE

The CATH database is a hierarchical classification of protein domain structures, using manual curation aided by a variety of classification and prediction algorithms; for example, structural comparison (1) and hidden-Markov model (HMM)-based methods (2). Each protein structure is checked to ensure it meets the selection criteria before it is split into its constituent chains. These chains are, in turn, split into one or more individual domains and then classified into homologous superfamilies according to structure and function.

At the top of the hierarchy is the Class, or C-level, where the domains are classified on the basis of their secondary structure content—i.e. whether they are mostly alpha-helical (Class 1), mostly beta-sheet (Class 2), contain a significant amount of both alpha-helical and beta-sheet secondary structure elements (Class 3) or have very little secondary structure (Class 4).

Within their class, each domain is then classified according to their Architecture (A-level)—i.e. similarities in the arrangement of secondary structures in 3D space. Each architecture is sub-divided into one or more topology, or fold groups (T-level), where the connectivity between these secondary structures is taken into account. Finally, the domains are classified into their respective Homologous superfamilies (H-level), according to similarities in structure, sequence and/or function. Sequence clustering at the H-level produces sequence families at <35% sequence identity (S-level), <60% (O-level), <95% (L-level) and 100% (I level).

For our latest release, CATH v3.5, we have classified 80 new folds, 240 new superfamilies and over 44 000 new domains compared with the release reported in our last NAR update article (CATH v3.3). This is a nearly 50% increase in the size of the resource since CATH v3.3 (see Tables 1 and 2).

Table 1.

This shows the current population of different levels in the CATH hierarchy

ClassArchitecturesTopologyHomologous superfamilyS35 family
153868752917
2202295202618
31459411136183
41104118208
Total401313262611 926
ClassArchitecturesTopologyHomologous superfamilyS35 family
153868752917
2202295202618
31459411136183
41104118208
Total401313262611 926
Table 1.

This shows the current population of different levels in the CATH hierarchy

ClassArchitecturesTopologyHomologous superfamilyS35 family
153868752917
2202295202618
31459411136183
41104118208
Total401313262611 926
ClassArchitecturesTopologyHomologous superfamilyS35 family
153868752917
2202295202618
31459411136183
41104118208
Total401313262611 926
Table 2.

This shows the increase in new folds and total number of domains in recent releases of CATH

CATH versionv3.2v3.3v3.4v3.5
Number of new folds (%)26 (2.3)123 (10)49 (4.2)31 (2.4)
Number of new domains (%)20 330 (18)14 473 (11)24 232 (16)20 616 (12)
CATH versionv3.2v3.3v3.4v3.5
Number of new folds (%)26 (2.3)123 (10)49 (4.2)31 (2.4)
Number of new domains (%)20 330 (18)14 473 (11)24 232 (16)20 616 (12)
Table 2.

This shows the increase in new folds and total number of domains in recent releases of CATH

CATH versionv3.2v3.3v3.4v3.5
Number of new folds (%)26 (2.3)123 (10)49 (4.2)31 (2.4)
Number of new domains (%)20 330 (18)14 473 (11)24 232 (16)20 616 (12)
CATH versionv3.2v3.3v3.4v3.5
Number of new folds (%)26 (2.3)123 (10)49 (4.2)31 (2.4)
Number of new domains (%)20 330 (18)14 473 (11)24 232 (16)20 616 (12)

COMPARISONS BETWEEN CATH AND SCOP

CATH and Structural Classifications of Proteins (SCOP) (3) are the two most comprehensive protein structure classification resources. Both are in active development. The latest release of SCOP (v1.75) classifies 110 800 domains (38,221 PDB entries) compared with over 173 000 (51,334 PDB entries) for CATH. Currently, CATH has 1313 folds classified compared with 1195 for SCOP, but comparisons at this level are problematic, as more subjective criteria are used in fold classification.

Recent analysis has shown that, if one applies relatively conservative thresholds to identify equivalent superfamilies between the two resources (i.e. a 60% overlap between matching domains identified in the same PDB chain and 60% of these matching domains grouped into equivalent superfamilies), ∼800 superfamilies correspond between SCOP and CATH. A new initiative, Genome3D, is enabling collaboration between the SCOP and CATH groups to refine the identification of equivalent superfamilies and to present information on philosophical differences between the resources that lead to alternative ways of grouping relatives. There is much less agreement at the fold level, again because of the subjective manner in which fold is defined.

FOCUS ON STRUCTURAL GENOMICS STRUCTURES TO DISCOVER NOVEL FOLDS

From CATH v3.3 onwards (4), we concentrated our efforts on classifying more novel SG structures. Recent figures show that, although there was an initial jump in the number of new folds classified in CATH v3.3 (128), this number has decreased steadily since and, at 31 new folds in our latest release, we are now discovering roughly the same number of new folds that we had pre CATH v3.3 (see Table 2 below).

Similarly, Andreeva and Murzin (5) have recently reported that the increase in the numbers of new folds has been lower than expected; although a few new architectures and folds have been discovered, a significant portion of SG structures have been found to have structures similar to already known folds. Past analysis of CATH domain annotations in Gene3D has shown that a significant proportion of domain sequences (up to 70–80% of domain sequences) in completely sequenced genomes can be assigned to a structural family in CATH if highly sensitive methods are used [e.g. HMM-HMM (6)]. This proportion has not changed significantly since expansion of CATH over the past 2 years. This suggests that much of the remaining domain sequences in organisms are largely membrane associated or contain a high proportion of disordered regions. Transmembrane proteins are still under-represented in the PDB.

IMPROVED FUNCTIONAL FAMILY CLASSIFICATION

As we have expanded the CATH superfamilies, we have also been able to use increasing functional information from public resources [e.g. Gene Ontology (GO) (7), Enzyme Commission (EC)(8)] to develop our knowledge of functional divergence within them. We have aimed to use this expanding knowledge to provide functional sub-classification of relatives, which can help biologists understand the structural mechanisms by which functions evolve.

SCOP sub-classifies superfamilies into functionally coherent families through manual analysis using information available in the literature and functional annotation databases [e.g. SwissProt (9), GO, EC, etc]. However, recent analyses by Gough et al. suggest that these groupings correspond more closely to taxonomic groups rather than functional groups (10).

A functional family (FunFam) layer within all CATH superfamilies was first introduced in CATH-Gene3D v10 (11). Predicted domain sequences for CATH superfamilies from Gene3D are now explicitly included in CATH. Domain sequences identified in Uniprot (12) and Ensembl (13) currently expand CATH from 173 536 domain structure entries to 16 297 076 known and predicted domain structure entries. CATH sequence data within each superfamily are sub-classified into FunFams to group together relatives likely to have similar structures and functions.

The original protocol to establish these functional families used a profile-based sequence clustering algorithm together with a fixed generic granularity threshold (14). This corresponds to vertically ‘cutting’ the domain sequence similarity tree of a superfamily at a fixed level to derive a set of FunFams, an ‘unsupervised’ protocol.

A modified version of the FunFam protocol that exploits available GO annotation data to determine the right ‘cut’ of the sequence tree, instead of using a fixed threshold, was used to generate domain families for protein function prediction in Critical Assessment of Protein Function Annotation (CAFA) 2010 (BMC Bioinformatics submitted). This has been extended by a mechanism to detect and account for instances of functional ‘chaining’ in the clustering dendrogram, that is, cases of incongruence between domain sequence similarity and overall protein function similarity. As a whole, this is dubbed the ‘supervised’ protocol.

When dealing with families of (domain) sequences, it quickly becomes apparent that different use cases often suggest and require entirely different levels of family granularity. For example, in the large superfamily that represents the PDZ domain (CATH 2.30.42.10), a promiscuous peptide-binding module, two entirely different sets of families can be identified depending on the ‘point of view’. On the one hand, all domain sequences could be put into a single family, given that the domain always fulfils the same partial function within a diverse set of parent proteins and their different overall functions. On the other hand, the PDZ domains appearing in parent proteins of the same type (e.g. an orthologous group of proteins) will commonly be more similar to each other than to all other domains in the superfamily. These observations lead to two possible sets of families for the same superfamily, one ‘coarse’ and one ‘fine’.

Coarse FunFams in the above-described sense primarily lend themselves to broad evolutionary studies, for example, to track instances of domain shuffling (15). They are also the most intuitive kind of families in the context of a domain-based resource such as CATH-Gene3D, as they clearly focus on domain function, not whole-protein function. At the same time, applications like the detailed study of conserved residues (e.g. in active and binding sites) may require the use of finer FunFams. Eventually, the choice is highly user dependent, and this realization is what governed our strategy.

As a pragmatic attempt to account for the above-described dichotomy, the current Gene3D FunFam protocol uses a hybrid approach: FunFams are first identified in a given superfamily using the latest supervised protocol, including the detection of chaining. As the latter feature is still somewhat experimental, and as finer families may sometimes be required regardless of whether domain function is conserved (see above), a second set of families (‘FineFams’) is then identified, using the original unsupervised protocol. For this, a generic threshold setting of 1e10 E-value (16) was determined in benchmarking EC4 conservation on over 400 enzyme-domain containing superfamilies in Gene3D (data not shown), underlining the focus on whole-protein function at this level. Whenever no high-quality GO annotation data are available for a superfamily, only the FineFam layer is generated.

PROVIDING MANAGEABLE MULTIPLE SEQUENCE ALIGNMENTS

Some FunFams are highly populous, with many sequences and structures. We are able to generate multiple sequence alignments (MSA) that have all of the domain sequences and structures classified in the superfamily. These large MSAs, however, are difficult to visualize or use for post-processing, such as for phylogentic analysis. Therefore, we have developed a protocol for providing MSAs, which represent sequence, structural, taxonomic and functional diversity, but kept within a manageable size. The FunFam MSAs generated are first filtered to remove all fragment sequences. Then an iterative process of removing sequences that share the same taxonomic, multiple domain architecture, sequence similarity (defined by commonality of UniProtKB identifier) and functional annotation to a parsimoniously chosen representative sequence is applied. The illustrative sequence is selected as the first unique occurrence of taxonomic, multiple domain architecture, sequence similarity and functional annotation. This filtering continues changing the level of taxonomic definition until the number of remaining sequences in the MSA is below 500 sequences. It is important to note that sequences where a structure is classified by CATH are preferentially retained over sequences where no structural data are available.

FUNFAM ANNOTATION SERVICE

To support user enquiries, the CATH sequence domain search facility has been extended to now provide FunFam annotations for each recognized domain. The service currently accepts single sequences in FASTA representation. Domains are identified using the in-house HMMER 3/DomainFinder protocol described in (17,18), and then submitted to a new service to determine their FunFam membership (if any). As an extra HMM search is required for each recognized domain, the service is slightly slower for a larger protein. However provided there is not a long job queue it should complete within a few seconds.

The service on the CATH front page provides a simple table of results with the domain location, superfamily code and functional family name, the last two of which are also links to the respective CATH pages. The underlying web service is hosted at http://gene3d.biochem.ucl.ac.uk/Gene3DScanSvc/FunFamScan. Search modes are provided for searching a single domain against a FunFam library, and for searching the complete protein. Currently, the complete protein service provides a ‘text/csv’ (plain text CSV file) response, whereas the domain searches can provide Javascript Object Notation (JSON) and Extensible Markup Language (XML) as well. The services are implemented using a simple RESTful interface and as such can be easily accessed programmatically or from the *Nix command line using tools like wget or curl.

INTRODUCING NEW HOMOLOGOUS SUPERFAMILY PAGES

The CATH website has undergone a significant redesign since the last release. We have continued to concentrate on the development of a single web-based portal for CATH and Gene3D, and on the development of improved web pages displaying the functional data and conserved sequence features associated with FunFams in each superfamily.

The new superfamily pages give a wealth of structural and functional information about the family in an easily accessible format. Some pages will be presented below for a highly populated and functionally diverse superfamily in CATH—the class 1 aldolase superfamily (CATH code 3.20.20.70) - members of which adopt a TIM barrel alpha/beta structure.

Figure 1 shows a snapshot of the re-designed superfamily page for this superfamily. Statistics for the superfamily are given on the right hand side of the page and section (a) confirms that it is highly populated, with 119 sequence families (clustered at <35% sequence identity) and a total of 2559 domains. An indication of structural diversity in the aldolase superfamily is shown in section (b); the user is given the option of scrolling though the smallest, largest and representative structure (according to the number of residues) within the aldolase superfamily. There is considerable functional diversity across the superfamily with a total of 445 unique GO terms and 233 unique EC terms [see section (c) and (d) on the superfamily homepage]. Species diversity is shown in section (e) revealing that this superfamily is found in all kingdoms of life.

Figure 1.

Screenshot of the CATH superfamily page for the aldolase superfamily. Sections displaying different types of data are labelled (a)–(i).

Sequence diversity across a superfamily correlates with structural diversity of relatives and also functional diversity (see Figure 2). If we examine CATH enzyme superfamilies, Figure 2 shows that the majority of superfamilies in CATH (∼90%) have <10 sequence subfamilies (at 30% sequence identity) and 10 enzyme functions, whereas some of the remaining superfamilies (<5%, corresponding to <100 superfamilies) can diverge significantly in sequence, structure and function. The new superfamily pages have been designed to improve the presentation of information on this diversity, particularly by capturing more informative data for functional families within each superfamily.

Figure 2.

Plot showing, for each enzyme superfamily in CATH, the number of unique EC terms, FunFams and SCs.

Clicking on the EC link on the superfamily home page [see section (d), Figure 1] gives a listing of enzyme functions exhibited by different relatives in this superfamily. About 35% of enzymes with the aldolase class I superfamily are lyases. Other enzyme types include transferases (∼26%), isomerases (19%), and oxidoreductases (∼19%). Fructose-bisphosphate aldolase (EC number 4.1.2.13) is the most prevalent enzyme with 5.2% of all enzymes in this superfamily. Fructose-bisphosphate aldolase catalyses the reversible reaction that splits fructose 1,6-bisphosphate into dihydroxyacetone phosphate and glyceraldehyde 3-phosphate. This is observed in Glycolysis, Gluconeogenesis and the Calvin cycle (20). Another prevalent enzyme is Orotidine 5′-phosphate decarboxylase (EC number 4.1.1.23), an enzyme that catalyses the last step in the de novo biosynthesis of pyrimidines (19).

Section (h) on the superfamily home page (see Figure 1) lists all the FunFams that are identified for this superfamily using the FunFam classification method described above. By mousing over each node, it is possible to see a functional description of the FunFam. As the names describing each functional family can be long, we provide abbreviated FunFam names, which are simply the first eight characters of the GO annotation associated with the majority of relatives in the FunFam.

FunFams are also grouped into structural clusters if their structures can be superimposed within an Root Mean Squared Deviation (RMSD) threshold of 9 Å. Again section (h) shows the nodes associated with structural clusters in the superfamily. Mousing over these nodes shows a summary of information. A multiple structural alignment is also viewable together with a 2DSEC plot (21) showing common secondary structure features and a JMol superposition of the representatives from each FunFam (see Figure 3).

Figure 3.

2DSEC plot and structural superpositions of the structural representatives of a structural cluster in the aldolase superfamily. Structural features common to all the domains in the SC are shown in light blue on the superposition.

Other sections on the superfamily home page provide links to resources established at the European Bioinformatics Institute (EBI) (section f) that display information on the different multi-domain contexts in which this domain superfamily is observed [archschema section (f), Figure 1]. Section (g), Figure 1 provides a link to the FunTree (22) resource displaying phylogenetic information for enzyme superfamilies in CATH. FunTree is the product of a collaboration between the Thornton and Orengo groups and was developed and is managed by Dr Nicholas Furnham at the EBI. It links phylogenetic trees, displaying the evolution of relatives within structural clusters in a CATH superfamily, with comprehensive information on function and chemistry extracted from several resources in the Thornton Group; e.g. CSA (23) and MACIE (24).

There are 14 different structural clusters for the aldolase superfamily. Mousing over the structural cluster nodes for this superfamily shows the structural variations observed between different clusters. These are mainly different helical decorations to the eight-stranded beta barrel that forms the core of the structures in this superfamily.

USING THE CATH WEB PAGES TO EXPLORE DIFFERENCES BETWEEN FUNFAMS IN A SUPERFAMILY

To illustrate the information available through the new FunFam pages, we can compare two different FunFams within the aldolase superfamily.

FunFam ‘Delta-am’ comprises relatives that function as 5-aminolaevulinic acid dehydratases (ALAD) (EC 4.2.1.24) involved in the biosynthesis of tetrapyrroles (25). These enzymes catalyse the condensation of two 5-aminolaevulinic acid molecules to form pyrrole porphobilinogen. By contrast, FunFam ‘Tryptoph’ contains dihydroorotate dehydrogenases (DHODH) (EC 1.3.5.2) involved in the biosynthesis of pyrimidines (26).

By viewing the representatives from these FunFams, it can be seen in Figure 4 that there is a large common core between shared by structures from the two FunFams. There are some embellishments to this core in both FunFams and these lie close to the active site (see Figure 4 below).

Figure 4.

Figures showing the protein domains 1h7oA00 (ALAD, left) and 1d3gA00 (DHODH) with common structural features coloured blue, embellishments green, substrates black and catalytic residues orange.

The functional family–specific pages can be used to view differences between FunFams and known functional residues. By clicking on the FunFam node, the user is taken to a FunFam summary page displaying some similar options to the superfamily homepage [i.e. summary statistics on the number of relatives, listing of functions (GO, EC terms), species distribution]. In addition, there is a link to view a MSA for the FunFam. Alignments are generated by the FunFam classification algorithm described above. These pages also display a JMol image of a representative structure from the FunFam. There is an option to colour the residue positions in the multiple alignment and JMol image with highly conserved residues and known catalytic site residues (Figure 5). You can choose to see the 3D structure of the domain in relation to the whole protein, its chain, or on its own.

Figure 5.

Screenshot of the FunFam page associated with ALAD.

By comparing the FunFam alignment pages for the two FunFams (‘Delta-am’ and ‘Tryptoph’) side by side on the screen, it is possible to determine if there are differences in the nature or location of catalytic residues in the active site.

To obtain a direct structural comparison between the FunFams, it is possible to submit representative structures from each FunFam to the Sequential Structure Alignment Program (SSAP) (27) server accessible through the CATH home page.

The SSAP server can be accessed at http://www.cathdb.info/cgi-bin/SsapServer.pl.

In this example, a structural alignment of the representative domains 1h7oA00 (from FunFam ‘Delta-am’) and 1d3gA00 (from FunFam ‘Tryptoph’) reveals that, although the two proteins are globally structurally similar (pairwise SSAP score of 70 of 100), their catalytic residues are in different locations within the active site pocket and have different properties (see Figure 6).

Figure 6.

Structural alignment of the two protein domains 1h7oA00 and 1d3gA00. The catalytic residues are highlighted according to their properties (aromatic residues are in red, polar residues in green and those with a positive charge are in purple).

DETERMINING WHETHER A SUPERFAMILY IS STRUCTURALLY VARIABLE

Finally, section (i) of the superfamily home page (Figure 1) can be used to see whether a particular superfamily is highly structurally diverse. Section (i) shows a plot of sequence diversity (i.e. number of sequence clusters at 30% sequence identity) versus structural diversity (number of structural clusters) to give an indication of how diverse this superfamily is compared with others in CATH. Using the aldolase superfamily example again, the red dot highlights the position of the aldolase superfamily in the plot and mouse-overs of other points on the plot will reveal the CATH codes of other superfamilies. The aldolase superfamily is more diverse than most in CATH, but is not as diverse as some of the most highly populated superfamilies adopting Rossmann and immunoglobulin folds.

SUMMARY

In summary, CATH has increased in size by nearly 50% since publication of our last NAR update article. It now includes 1313 folds and 2626 superfamilies. Our FunFam protocol has been improved and the CATH sequence domain search facility extended to return FunFam annotations. The CATH website has been redesigned and now displays additional functional data and conserved sequence features data associated with the FunFams in each superfamily. The functionally diverse aldolase superfamily (3.20.20.70) has been used to demonstrate the functionality of the new CATH superfamily pages. A new tutorial taking the user through the new web pages can be accessed at http://www.cathdb.info/wiki.

FUNDING

BBSRC (to A.L.C., T.E.L.); MRC (to N.L.D.); Wellcome Trust (to I.S., N.F.); IMI (to J.G.L.); SNSF (to R.A.S.); NIH (to D.L., C.Y.). Funding for open access charge: Wellcome Trust.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

This project has been funded in whole or in part with Federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN272201200026C. RAS acknowledges funding from the Fondation du 450ème anniversaire de l’Université de Lausanne and Swiss National Science Foundation [132476 and 136477].

REFERENCES

1
Redfern
OC
Harrison
A
Dallman
T
Pearl
FM
Orengo
CA
CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures
PLoS Comput. Biol.
2007
, vol. 
3
 pg. 
e232
 
2
Sillitoe
I
Dibley
M
Bray
J
Addou
S
Orengo
C
Assessing strategies for improved superfamily recognition
Protein Sci.
2005
, vol. 
14
 (pg. 
1800
-
1810
)
3
Andreeva
A
Howorth
D
Chandonia
JM
Brenner
SE
Hubbard
TJP
Chothia
C
Murzin
AG
Data growth and its impact on the SCOP database: new developments
Nucleic Acids Res.
2008
, vol. 
36
 (pg. 
D419
-
D425
)
4
Cuff
AL
Sillitoe
I
Lewis
T
Clegg
AB
Rentzsch
R
Furnham
N
Pellegrini-Calace
M
Jones
D
Thornton
J
Orengo
CA
Extending CATH: increasing coverage of the protein structure universe and linking structure with function
Nucleic Acids Res.
2011
, vol. 
39
 (pg. 
D420
-
D426
)
5
Andreeva
A
Murzin
AG
Structural classification of proteins and structural genomics: new insights into protein folding and evolution
Acta Crystallogr. Sect. F Struct. Biol. Cryst. Commun.
2010
, vol. 
66
 (pg. 
1190
-
1197
)
6
Yeats
C
Lees
J
Reid
A
Kellam
P
Martin
N
Liu
X
Orengo
C
Gene3D: comprehensive structural and functional annotation of genomes
Nucleic Acids Res.
2008
, vol. 
36
 (pg. 
D414
-
D418
)
7
Gene Ontology Consortium
The gene ontology: enhancements for 2011
Nucleic Acids Res.
2012
, vol. 
40
 (pg. 
D559
-
D564
)
8
Bairoch
A
The ENZYME database in 2000
Nucleic Acids Res.
2000
, vol. 
28
 (pg. 
304
-
305
)
9
Apweiler
R
Functional information in SWISS-PROT: the basis for large-scale characterisation of protein sequences
Brief Bioinform.
2001
, vol. 
2
 (pg. 
9
-
18
)
10
Pethica
RB
Levitt
M
Gough
J
Evolutionarily consistent families in SCOP: sequence, structure and function
BMC Struct. Biol.
2012
, vol. 
12
 pg. 
27
 
11
Lees
J
Yeats
C
Perkins
J
Sillitoe
I
Rentzsch
R
Dessailly
BH
Orengo
C
Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis
Nucleic Acids Res.
2012
, vol. 
40
 (pg. 
D465
-
D471
)
12
UniProt Consortium
Reorganizing the protein space at the Universal Protein Resource (UniProt)
Nucleic Acids Res.
2012
, vol. 
40
 (pg. 
D71
-
D75
)
13
Flicek
P
Amode
MR
Barrell
D
Beal
K
Brent
S
Carvalho-Silva
D
Clapham
P
Coates
G
Fairley
S
Fitzgerald
S
, et al. 
Ensembl 2012
Nucleic Acids Res.
2012
, vol. 
40
 (pg. 
D84
-
D90
)
14
Lee
DA
Rentzsch
R
Orengo
C
GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains
Nucleic Acids Res.
2010
, vol. 
38
 (pg. 
720
-
737
)
15
Kawashima
T
Kawashima
S
Tanaka
C
Murai
M
Yoneda
M
Putnam
NH
Rokhsar
DS
Kanehisa
M
Satoh
N
Wada
H
Domain shuffling and the evolution of vertebrates
Genome Res.
2009
, vol. 
19
 (pg. 
1393
-
1403
)
16
Miller
T
Krogan
NJ
Dover
J
Erdjument-Bromage
H
Tempst
P
Johnston
M
Greenblatt
JF
Shilatifard
A
COMPASS: a complex of proteins associated with a trithorax-related SET domain protein
Proc. Natl Acad. Sci. USA
2001
, vol. 
98
 (pg. 
12902
-
12907
)
17
Yeats
C
Redfern
OC
Orengo
C
A fast and automated solution for accurately resolving protein domain architectures
Bioinformatics
2010
, vol. 
26
 (pg. 
745
-
751
)
18
Lees
J
Yeats
C
Redfern
O
Clegg
A
Orengo
C
Gene3D: merging structure and function for a thousand genomes
Nucleic Acids Res.
2010
, vol. 
38
 (pg. 
D296
-
D300
)
19
Berry
A
Marshall
KE
Identification of zinc-binding ligands in the class II fructose-1,6-bisphosphate aldolase of Escherichia coli
FEBS Lett.
1993
, vol. 
318
 (pg. 
11
-
16
)
20
Harris
P
Navarro Poulsen
JC
Jensen
KF
Larsen
S
Structural basis for the catalytic mechanism of a proficient enzyme: orotidine 5′-monophosphate decarboxylase
Biochemistry
2000
, vol. 
39
 (pg. 
4217
-
4224
)
21
Reeves
GA
Dallman
TJ
Redfern
OC
Akpor
A
Orengo
CA
Structural diversity of domain superfamilies in the CATH database
J. Mol. Biol.
2006
, vol. 
360
 (pg. 
725
-
741
)
22
Furnham
N
Sillitoe
I
Holliday
GL
Cuff
AL
Rahman
SA
Laskowski
RA
Orengo
CA
Thornton
JM
FunTree: a resource for exploring the functional evolution of structurally defined enzyme superfamilies
Nucleic Acids Res.
2012
, vol. 
40
 (pg. 
D776
-
D782
)
23
Porter
CT
Bartlett
GJ
Thornton
JM
The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data
Nucleic Acids Res.
2004
, vol. 
32
 (pg. 
D129
-
D133
)
24
Holliday
GL
Andreini
C
Fischer
JD
Rahman
SA
Almonacid
DE
Williams
ST
Pearson
WR
MACiE: exploring the diversity of biochemical reactions
Nucleic Acids Res.
2012
, vol. 
40
 (pg. 
D783
-
D789
)
25
Battistuzzi
G
Petrucci
R
Silvagni
L
Urbani
FR
Caiola
S
delta-Aminolevulinate dehydrase: a new genetic polymorphism in man
Ann. Hum. Genet.
1981
, vol. 
45
 (pg. 
223
-
229
)
26
Rowland
P
Björnberg
O
Nielsen
FS
Jensen
KF
Larsen
S
The crystal structure of Lactococcus lactis dihydroorotate dehydrogenase A complexed with the enzyme reaction product throws light on its enzymatic function
Protein Sci.
1998
, vol. 
7
 (pg. 
1269
-
1279
)
27
Orengo
CA
Taylor
WR
SSAP: sequential structure alignment program for protein structure comparison
Methods Enzymol.
1996
, vol. 
266
 (pg. 
617
-
635
)

Author notes

The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com.

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.