Abstract

Summary: Progress in structural biology depends on several key technologies. In particular tools for alignment and superposition of protein structures are indispensable. Here we describe the use of the TopMatch web service, an effective computational tool for protein structure alignment, for the visualization of structural similarities, and for highlighting relationships found in protein classifications. We provide several instructive examples.

Availability: TopMatch is available as a public web service at http://services.came.sbg.ac.at

Contact:sippl@came.sbg.ac.at

Today we face an explosion of newly determined protein structures in part fueled by the various protein structure initiatives. As a result the public repository (PDB) will soon surpass 50 000 entries (Berman et al., 2000). This data base represents our knowledge of protein molecules but the amount of information is overwhelming. To make progress the structures need to be organized, classified and quantified in various ways. For this task and the subsequent retrieval, analysis and visualization of the often intricate relationships structure comparison techniques are indispensable.

Michael Levitt and coworkers (Kolodny et al., 2005) recently presented a most comprehensive analysis of major structure alignment programs. They remark that comparing the various programs is a delicate task and by highlighting the limitations of existing methods they conclude that there is a need for better structural alignment methods. It is indeed surprising that after half a century of protein structure research no generally accepted standards for protein structure alignment have emerged.

A particular difficulty is that as long as existing structural similarities remain undetected we cannot check whether or not any particular method is able to recognize that relationship. According to Kolodny et al., 2005 such difficult examples may be found in existing protein structure classifications by searching for similarities among distinct SCOP (Andreeva et al., 2007) folds or distinct CATH (Greene et al., 2007) architectures or topologies. Here we take up this suggestion and provide a small selection of examples drawn from ongoing classification projects. In these projects we make extensive use of a suite of structure alignment techniques called TopMatch. TopMatch is the successor of ProSup, a program previously used in several large scale structure comparison projects (e.g. Sippl et al., 2001).

We have now completed a web service to make the TopMatch program accessible to the structural biology community. The quality of alignments is essential but ease of use, speed and in particular proper visualization are important ingredients in the interpretation and analysis of structure alignments. The chief goal of this communication is to demonstrate the use of this service by a set of instructive examples drawn from ongoing structure classification initiatives (Suhrer et al., 2007a, b).

In the description of alignments we call the first structure the query (q) and the second structure the target (t). In general a query and target can be aligned in many different ways (Feng and Sippl, 1996). Hence, TopMatch reports a ranked list of alignments. The alignments are characterized by a small set of parameters. The most significant of these is the length of an alignment (the number of residue pairs that are structurally equivalent). We call this the absolute similarity S(q,t). From the alignment we compute a sequence score using a structure derived substitution matrix (Prlic et al., 2000). If this score is positive it is added to S(q,t) and this combined score is used to rank the alignments. Additional useful parameters are the root-mean-square error of superposition (RMS), percentage of sequence identity (Identity), the relative similarity s(q, t) = 100 × 2 S(q, t)/(Lq + Lt), and the relative query and target cover defined as cq = 100 × S(q,t)/Lq and ct = 100 × S(q, t)/Lt, respectively (here Lq and Lt are the respective sequence lengths). Relative similarity and relative cover are simple and intuitive measures describing the extent of mutual similarity amongst two structures.

Figure 1 illustrates the application of TopMatch using a small set of examples. We first demonstrate that for the investigation of structural similarities it is often necessary but also convenient to take into account the manifold of distinct alignments. We then present several examples that may be considered difficult in the sense of Kolodny et al., 2005 where the respective structures reside in distinct SCOP folds and CATH topologies although they share extensive structure similarity.

Fig. 1.

Structure alignments of SCOP and CATH domains. The figure shows five structural alignments (ae) that are difficult in the sense that in SCOP or CATH they are assigned to distinct folds or topologies. A counter example is (f) where the two folds are in the same SCOP superfamily although the similarity is comparatively low. The query is always in blue, the target in green and the regions of similar structure are colored red (query) and orange (target). Table 1 shows the parameters for the respective alignments. Figures (a) and (b) show two distinct solutions for the structural alignment of 1eud-A and 1ccw-A. The first alignment (a) relates 1ccw-A to the C-terminal part of 1eud-A, the second (b) relates 1ccw-A to the N-terminal part of 1eud-A. This implies considerable structural similarity within 1eud-A. This is indeed the case as shown in (c): In SCOP 1eud-A is represented by two domains, d1euda1 and d1euda2, corresponding to these regions which are classified as distinct folds (classification codes c.2.1.8 and c.23.4.1). (d) The SCOP domains d1gt8a4 and d1mo9a1 are classified as two distinct folds (c.4.1.1 and c.3.1.5, respectively). This has to be contrasted with (f) where two domains of considerably less similarity are classified within the same superfamily. (e) Superposition of CATH domains 1te2B02 and 1zolA02. The two domains belong to the two distinct topologies 1.10.150 (1te2B02) and 1.10.164 (1zolA02). (f) Superposition of SCOP domains d1lt3a_ and d1efya2. The two domains reside in the same SCOP superfamily called ADP-ribosylation (d.166.1) but in the two distinct SCOP families called ADP-ribosylating toxins (d.166.1.1) and Poly-ADP-ribose polymerase, C-terminal domain (d.166.1.2), respectively.

Fig. 1.

Structure alignments of SCOP and CATH domains. The figure shows five structural alignments (ae) that are difficult in the sense that in SCOP or CATH they are assigned to distinct folds or topologies. A counter example is (f) where the two folds are in the same SCOP superfamily although the similarity is comparatively low. The query is always in blue, the target in green and the regions of similar structure are colored red (query) and orange (target). Table 1 shows the parameters for the respective alignments. Figures (a) and (b) show two distinct solutions for the structural alignment of 1eud-A and 1ccw-A. The first alignment (a) relates 1ccw-A to the C-terminal part of 1eud-A, the second (b) relates 1ccw-A to the N-terminal part of 1eud-A. This implies considerable structural similarity within 1eud-A. This is indeed the case as shown in (c): In SCOP 1eud-A is represented by two domains, d1euda1 and d1euda2, corresponding to these regions which are classified as distinct folds (classification codes c.2.1.8 and c.23.4.1). (d) The SCOP domains d1gt8a4 and d1mo9a1 are classified as two distinct folds (c.4.1.1 and c.3.1.5, respectively). This has to be contrasted with (f) where two domains of considerably less similarity are classified within the same superfamily. (e) Superposition of CATH domains 1te2B02 and 1zolA02. The two domains belong to the two distinct topologies 1.10.150 (1te2B02) and 1.10.164 (1zolA02). (f) Superposition of SCOP domains d1lt3a_ and d1efya2. The two domains reside in the same SCOP superfamily called ADP-ribosylation (d.166.1) but in the two distinct SCOP families called ADP-ribosylating toxins (d.166.1.1) and Poly-ADP-ribose polymerase, C-terminal domain (d.166.1.2), respectively.

Table 1.

Parameters for alignments shown in Figure 1

Figure Query Target S s cq ct RMS Identity 
1eud-A 1ccw-A 110 50 36 80 2.9 11 
1eud-A 1ccw-A 82 37 27 60 3.0 12 
d1euda1 d1euda2 84 55 65 48 3.0 
d1gt8a4 d1mo9a1 143 63 73 55 2.4 19 
1te2B02 1zolA02 60 80 83 77 2.0 12 
d1lt3a_ d1efya2 80 36 35 37 3.0 10 
Figure Query Target S s cq ct RMS Identity 
1eud-A 1ccw-A 110 50 36 80 2.9 11 
1eud-A 1ccw-A 82 37 27 60 3.0 12 
d1euda1 d1euda2 84 55 65 48 3.0 
d1gt8a4 d1mo9a1 143 63 73 55 2.4 19 
1te2B02 1zolA02 60 80 83 77 2.0 12 
d1lt3a_ d1efya2 80 36 35 37 3.0 10 

We note that the 2D projections shown in Figure 1 do not fully reveal the often complex, intricate, or obscure relationships. We therefore encourage the interested reader to contemplate these examples in 3D using the TopMatch service. We have spent considerable efforts to make the use of this service as convenient as possible. For example, whereas computation of structural alignments of SCOP and CATH domains and their visualization generally requires that the domain definitions are supplied by the user, TopMatch recognizes the domain names automatically. Additional information on the efficient use of TopMatch and proper interpretation of the results is provided by the web service.

ACKNOWLEDGEMENTS

The structure superposition program TopMatch is provided by Proceryon GmbH. Figure 1 was prepared using PyMOL (http://www.pymol.org).

Conflict of Interest: none declared.

REFERENCES

Andreeva
A
, et al.  . 
Data growth and its impact on the SCOP database: new developments
Nucleic Acids Res
 , 
2007
 
doi:10.1093/nar/gkm993
Berman
HM
, et al.  . 
The Protein Data Bank
Nucleic Acids Res
 , 
2000
, vol. 
28
 (pg. 
235
-
242
)
Feng
ZK
Sippl
MJ
Optimum superimposition of protein structures: ambiguities and implications
Fold. Des
 , 
1996
, vol. 
1
 (pg. 
123
-
132
)
Greene
LH
, et al.  . 
The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution
Nucleic Acids Res
 , 
2007
, vol. 
35
 (pg. 
D291
-
D297
)
Kolodny
R
, et al.  . 
Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures
J. Mol. Biol
 , 
2005
, vol. 
346
 (pg. 
1173
-
1188
)
Prlic
A
, et al.  . 
Structure-derived substitution matrices for alignment of distantly related sequences
Protein Eng
 , 
2000
, vol. 
13
 (pg. 
545
-
550
)
Sippl
MJ
, et al.  . 
Assessment of the CASP4 Fold Recognition Category
Proteins
 , 
2001
, vol. 
45
 (pg. 
55
-
67
)
Suhrer
SJ
, et al.  . 
QSCOP-BLAST–fast retrieval of quantified structural information for protein sequences of unknown structure
Nucleic Acids Res
 , 
2007
, vol. 
35
 
Web Server issue
(pg. 
W411
-
W415
)
Suhrer
SJ
, et al.  . 
QSCOP–SCOP quantified by structural relationships
Bioinformatics
 , 
2007
, vol. 
23
 (pg. 
513
-
514
)

Author notes

Associate Editor: Burkhard Rost
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Comments

0 Comments