Summary: The availability of advanced profile–profile comparison tools, such as PRC or HHsearch demands sophisticated visualization tools not presently available. We introduce an approach built upon the concept of HMM logos. The method illustrates the similarities of pairs of protein family profiles in an intuitive way. Two HMM logos, one for each profile, are drawn one upon the other. The aligned states are then highlighted and connected.
Availability: A web interface offering online creation of pairwise HMM logos is available at http://www.sanger.ac.uk/Software/analysis/logomat-p. Furthermore, software developers may download a Perl package that includes methods for creation of pairwise HMM logos locally.
The problem of profile–profile comparison has a long history but has received a lot of attention recently (Söding, 2004; Lyngsø et al., 1999; Madera, 2005; Edgar and Sjölander, 2004a). This is a result of the growing number of well characterized protein families in databases, such as Pfam (Bateman et al., 2004). By adding additional information about properties of the entire family, it has been shown that profile–profile methods significantly increase sensitivity compared with profile–sequence comparison (Edgar and Sjölander, 2004b). Several different concepts for profile–profile comparison have been reported. We focused on the visualization of HMM–HMM alignments. The algorithms behind all currently available HMM alignment programs are very similar. Newer approaches mainly differ in details of the scoring function and in the transitions that are taken into account. The approach is to find a sequence of state-to-state pairings that maximizes the probability of both HMMs emitting the same sequence (frequently called co-emission probability). This can be done efficiently by creating a pair HMM (Durbin et al., 1998; Söding, 2004) from the two source HMMs and using standard forward or viterbi algorithms for searching an optimal solution. Nevertheless, the raw output of the alignment tools can be difficult to understand. From the state-to-state pairings alone, it is not immediately obvious which features the two protein families have in common. It was our aim to develop a graphical representation of HMM–HMM alignments that resolves this issue.
Pairwise HMM Logos can be currently accessed in two different ways. First, they can be made online at http://www.sanger.ac.uk/Software/analysis/logomat-p. Second, they can be constructed locally by downloading and installing the Perl sources. In the near future, pairwise HMM Logos will also be added to the Pfam website. A typical pairwise HMM Logo is shown in Figure 1. We intended to construct pairwise HMM Logos to look as similar to HMM Logos as possible. This should facilitate their comprehension for users accustomed to HMM Logos. Therefore, we draw two HMM Logos, one for each aligned family. To illustrate individual aligned states they are framed and connected by a block. Unaligned states are shaded in grey. In a local alignment, positions before the first and after the last aligned states are not shown. A brief summary on the features of simple HMM logos is given in the caption to Figure 1. A more detailed description can be found in (Schuster-Böckler et al., 2004).
In our previous work (Schuster-Böckler et al., 2004), we introduced the HMM Perl package. It provides generalized methods to access and modify HMMs. Emission and transition probabilities are stored and retrieved as multidimensional matrices using PDL, the Perl Data Language. HMMER files can be parsed and written. It also allows the creation of HMM logos from profile HMMs. We added a class called HMM::Alignment to this existing framework that works as an abstraction layer to the HMM alignment program PRC (Madera, 2005http://supfam.mrc-lmb.cam.ac.uk/PRC/). It can parse and write PRC output as well as run PRC directly if it is installed on the system. As it integrates into the HMM package, it takes HMM::Profile objects, HMMER files, Pfam IDs or combinations thereof as arguments for creating alignment objects.
On-the-fly creation of pairwise HMM Logos from HMMER files, multiple sequence alignments or Pfam IDs is available from the website http://www.sanger.ac.uk/Software/analysis/logomat-p. Uploaded HMMs are aligned directly using PRC. Multiple alignments in ClustalW, MSF or SELEX format are used to create HMMs using HMMER before aligning them. The plain PRC output can be downloaded separately. Local installation of the HMM Perl package requires the PDL and Imager packages to be installed on the system together with a working PRC binary. Both Perl packages can be downloaded from http://www.cpan.org. PRC is available from http://supfam.mrc-lmb.cam.ac.uk/PRC/. This software was tested against PRC version 1.5.2.
We would like to thank Martin Madera and Robert Finn for the valuable information about theoretical and practical aspects of PRC. Johannes Söding kindly answered numerous questions about his HHsearch algorithm. The authors are grateful for the valuable suggestions and corrections made by the reviewers. B.S.-B. is funded by the Wellcome Trust.