Summary: Production of high quality multiple sequence alignments of structured RNAs relies on an iterative combination of manual editing and structure prediction. An essential feature of an RNA alignment editor is the facility to mark-up the alignment based on how it matches a given secondary structure prediction, but few available alignment editors offer such a feature. The RALEE (RNA ALignment Editor in Emacs) tool provides a simple environment for RNA multiple sequence alignment editing, including structure-specific colour schemes, utilizing helper applications for structure prediction and many more conventional editing functions. This is accomplished by extending the commonly used text editor, Emacs, which is available for Linux, most UNIX systems, Windows and Mac OS.
Availability: The ELISP source code for RALEE is freely available from http://www.sanger.ac.uk/Users/sgj/ralee/ along with documentation and examples.
Non-coding RNA (ncRNA) genes often produce structured RNA products, some of the best known of which are involved in essential ribonucleoprotein complexes, such as the ribosome and the spliceosome. Such structured RNAs are often poorly conserved in sequence, but conserve a secondary structure with patterns of base covariation. This covariation forms the basis of several algorithms for de novo prediction of ncRNA genes (Rivas and Eddy, 2001; di Bernardo et al., 2003). Statistical models incorporating both sequence and structure information [termed covariance models, or stochastic context free grammars (Eddy, 2002)] have recently allowed the Rfam database of ncRNA families to be established (http://www.sanger.ac.uk/Software/Rfam/, Griffiths-Jones et al., 2003).
Computational alignment of ncRNAs is a challenging problem, because the correct alignment is often not evident without knowledge of the secondary structure. However, the best secondary structure predictions rely on comparative analysis of good multiple sequence alignments. Algorithms that align sequence and structure simultaneously are starting to emerge (http://dart.sourceforge.net/stemloc/, Holmes and Rubin, 2002; Gorodkin et al., 1997; Mathews and Turner, 2002), but are in their infancy and are often prohibitively expensive in both time and memory. Production of high quality alignments of structured RNAs is thus a laborious and iterative process of manual alignment and structure prediction.
Several excellent multiple sequence alignment editors are available, including BioEdit (http://www.mbio.ncsu.edu/BioEdit/bioedit.html), GeneDoc (http://www.psc.edu/biomed/genedoc), DCSE (http://rrna.uia.ac.be/dcse/) and Seaview (http://pbil.univ-lyon1.fr/software/seaview.html, Galtier et al., 1996). Of particular note, Jalview (http://www.jalview.org/, Clamp et al., 2004) provides extensive functionality for editing alignments of both proteins and nucleic acids. However, few editors cater specifically to the problem of aligning structured RNAs. A simple but effective solution to the problem is presented here, using ELISP extensions to the widely used, multi-platform text editor, Emacs (http://www.gnu.org/software/emacs/).
The primary requirement of an RNA alignment editing tool is to be able to mark-up the alignment based on the prediction of its consensus secondary structure. Such annotation, in the form of a structure-based colouring scheme, allows the user to quickly and intuitively identify regions of the alignment that do not match the structure well, and thus to refine both the alignment and structure manually. RALEE (RNA ALignment Editor in Emacs) provides such a colouring scheme (shown in Fig. 1), as well as allowing the user to colour the alignment more conventionally by conservation or residue identity. Other RNA specific features of RALEE include the ability to integrate secondary structure predictions of arbitrary sequences in the alignment (using an external package such as ViennaRNA—http://www.tbi.univie.ac.at/~ivo/RNA/, Hofacker, 2003), and to test how the alignment matches the new structure prediction. Helper applications also allow the user to view depictions of predicted secondary structures. Standard alignment editing methods, such as insertion and deletion of whole columns of gaps, trimming the alignment at either end and removing columns that contain only gaps, are also accessible through user-customized control character combinations, or by using the menus provided. A split-screen mode facilitates the viewing and editing of base-paired regions that may be far apart in sequence (see Fig. 1).
Using an available and well-used text editor such as Emacs as the basis for the RALEE tool has a number of advantages:
Many core features are already available and well tested, including file handling, cut and paste, deep undo and menus. Development can therefore concentrate on useful user-driven features.
The interface is familiar to a large user base. RALEE additions conform to the Emacs look and feel.
Emacs is available for a wide-range of platforms, including GNU/Linux, most UNIX systems, Windows and Mac OS.
RALEE extends GNU Emacs 21 (http://www.gnu.org/software/emacs/) and the vast majority of the functionality is also compatible with XEmacs 21 (http://www.xemacs.org/). If available, the ViennaRNA package (http://www.tbi.univie.ac.at/~ivo/RNA/, Hofacker, 2003) can be used as a helper application for on-the-fly structure prediction and display. RALEE at present reads alignments in Stockholm format (http://www.cgr.ki.se/cgb/groups/sonnhammer/Stockholm.html), which is the format in which the Rfam database distributes alignments of RNA families (http://www.sanger.ac.uk/Software/Rfam/). Future development should allow import and export of alignments in a variety of formats, as well as the facility to handle mark-up of pseudoknot interactions. RALEE is being used actively by Rfam curators to improve the quality of alignments in the database.
I would like to thank Simon Moxon, Alex Bateman, Gayle McEwan, Tobias Mourier and Ashwin Hajarnavis for testing features and providing feedback. S.G.-J. is funded by the Wellcome Trust.