Abstract

Summary: The availability of advanced profile–profile comparison tools, such as PRC or HHsearch demands sophisticated visualization tools not presently available. We introduce an approach built upon the concept of HMM logos. The method illustrates the similarities of pairs of protein family profiles in an intuitive way. Two HMM logos, one for each profile, are drawn one upon the other. The aligned states are then highlighted and connected.

Availability: A web interface offering online creation of pairwise HMM logos is available at http://www.sanger.ac.uk/Software/analysis/logomat-p. Furthermore, software developers may download a Perl package that includes methods for creation of pairwise HMM logos locally.

Contact:bsb@sanger.ac.uk

INTRODUCTION

The problem of profile–profile comparison has a long history but has received a lot of attention recently (Söding, 2004; Lyngsø et al., 1999; Madera, 2005; Edgar and Sjölander, 2004a). This is a result of the growing number of well characterized protein families in databases, such as Pfam (Bateman et al., 2004). By adding additional information about properties of the entire family, it has been shown that profile–profile methods significantly increase sensitivity compared with profile–sequence comparison (Edgar and Sjölander, 2004b). Several different concepts for profile–profile comparison have been reported. We focused on the visualization of HMM–HMM alignments. The algorithms behind all currently available HMM alignment programs are very similar. Newer approaches mainly differ in details of the scoring function and in the transitions that are taken into account. The approach is to find a sequence of state-to-state pairings that maximizes the probability of both HMMs emitting the same sequence (frequently called co-emission probability). This can be done efficiently by creating a pair HMM (Durbin et al., 1998; Söding, 2004) from the two source HMMs and using standard forward or viterbi algorithms for searching an optimal solution. Nevertheless, the raw output of the alignment tools can be difficult to understand. From the state-to-state pairings alone, it is not immediately obvious which features the two protein families have in common. It was our aim to develop a graphical representation of HMM–HMM alignments that resolves this issue.

FEATURES

Pairwise HMM Logos can be currently accessed in two different ways. First, they can be made online at http://www.sanger.ac.uk/Software/analysis/logomat-p. Second, they can be constructed locally by downloading and installing the Perl sources. In the near future, pairwise HMM Logos will also be added to the Pfam website. A typical pairwise HMM Logo is shown in Figure 1. We intended to construct pairwise HMM Logos to look as similar to HMM Logos as possible. This should facilitate their comprehension for users accustomed to HMM Logos. Therefore, we draw two HMM Logos, one for each aligned family. To illustrate individual aligned states they are framed and connected by a block. Unaligned states are shaded in grey. In a local alignment, positions before the first and after the last aligned states are not shown. A brief summary on the features of simple HMM logos is given in the caption to Figure 1. A more detailed description can be found in (Schuster-Böckler et al., 2004).

In our previous work (Schuster-Böckler et al., 2004), we introduced the HMM Perl package. It provides generalized methods to access and modify HMMs. Emission and transition probabilities are stored and retrieved as multidimensional matrices using PDL, the Perl Data Language. HMMER files can be parsed and written. It also allows the creation of HMM logos from profile HMMs. We added a class called HMM::Alignment to this existing framework that works as an abstraction layer to the HMM alignment program PRC (Madera, 2005http://supfam.mrc-lmb.cam.ac.uk/PRC/). It can parse and write PRC output as well as run PRC directly if it is installed on the system. As it integrates into the HMM package, it takes HMM::Profile objects, HMMER files, Pfam IDs or combinations thereof as arguments for creating alignment objects.

REQUIREMENTS

On-the-fly creation of pairwise HMM Logos from HMMER files, multiple sequence alignments or Pfam IDs is available from the website http://www.sanger.ac.uk/Software/analysis/logomat-p. Uploaded HMMs are aligned directly using PRC. Multiple alignments in ClustalW, MSF or SELEX format are used to create HMMs using HMMER before aligning them. The plain PRC output can be downloaded separately. Local installation of the HMM Perl package requires the PDL and Imager packages to be installed on the system together with a working PRC binary. Both Perl packages can be downloaded from http://www.cpan.org. PRC is available from http://supfam.mrc-lmb.cam.ac.uk/PRC/. This software was tested against PRC version 1.5.2.

Fig. 1

Alignment of the Toxin_7 against the Toxin_9 Pfam family. For each family, an HMM logo is drawn. The numbers above and below each logo show state positions in the HMM. The overall height of the letter stacks represents the information content, the relative letter height corresponds to its emission probability. The column width denotes the relative contribution, the product of the probability that the state is traversed with the expected number of self transitions for the respective state. This is to account for the varying length of insertions. Insert states are drawn in red. Frequently, their relative contribution is very small, making them hard to see. In this picture, you find narrow insert states e.g. at positions 27 and 28 of the Toxin_7 family. The aligned states in each HMM are framed and connected by a block. Omitted states are shaded in grey.

Fig. 1

Alignment of the Toxin_7 against the Toxin_9 Pfam family. For each family, an HMM logo is drawn. The numbers above and below each logo show state positions in the HMM. The overall height of the letter stacks represents the information content, the relative letter height corresponds to its emission probability. The column width denotes the relative contribution, the product of the probability that the state is traversed with the expected number of self transitions for the respective state. This is to account for the varying length of insertions. Insert states are drawn in red. Frequently, their relative contribution is very small, making them hard to see. In this picture, you find narrow insert states e.g. at positions 27 and 28 of the Toxin_7 family. The aligned states in each HMM are framed and connected by a block. Omitted states are shaded in grey.

We would like to thank Martin Madera and Robert Finn for the valuable information about theoretical and practical aspects of PRC. Johannes Söding kindly answered numerous questions about his HHsearch algorithm. The authors are grateful for the valuable suggestions and corrections made by the reviewers. B.S.-B. is funded by the Wellcome Trust.

REFERENCES

Bateman, A., et al.
2004
The Pfam protein families database.
Nucleic Acids Res.
 
32
D138
–D141
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.
Biological Sequence Analysis
 
1998
, Cambridge, UK Cambridge University Press
Eddy, S.R.
1998
Profile hidden Markov models.
Bioinformatics
 
14
755
–763
Eddy, S.R.
HMMER User's Guide: Biological Sequence Analysis Using Profile Hidden Markov Models, Version 2.2
 
2001
http://hmmer.wustl.edu Washington University School of Medicine
Edgar, R.C. and Sjölander, K.
2004
COACH: profile–profile alignment of protein families using hidden Markov models.
Bioinformatics
 
20
1309
–1318
Edgar, R.C. and Sjölander, K.
2004
A comparison of scoring functions for protein sequence profile alignment.
Bioinformatics
 
20
1301
–1308
Lyngsø, R., et al.
1999
Metrics and similarity measures for hidden Markov models.
Proc. Int. Conf. Intell. Syst. Mol. Biol.
 
1999
178
–186
Madera, M.
2005
PRC—the profile comparer
Schneider, T.D. and Stephens, R.
1990
Sequence logos: A new way to display consensus sequences.
Nucleic Acids Res.
 
18
6097
–6100
Schuster-Böckler, B., Schultz, J., Rahmann, S.
2004
HMM Logos for visualization of protein families.
BMC Bioinformatics
 
5
7
Söding, J.
2005
Protein homology detection by HMM–HMM comparison.
Bioinformatics
 
21
951
–960

Comments

0 Comments