Abstract

Summary: Production of high quality multiple sequence alignments of structured RNAs relies on an iterative combination of manual editing and structure prediction. An essential feature of an RNA alignment editor is the facility to mark-up the alignment based on how it matches a given secondary structure prediction, but few available alignment editors offer such a feature. The RALEE (RNA ALignment Editor in Emacs) tool provides a simple environment for RNA multiple sequence alignment editing, including structure-specific colour schemes, utilizing helper applications for structure prediction and many more conventional editing functions. This is accomplished by extending the commonly used text editor, Emacs, which is available for Linux, most UNIX systems, Windows and Mac OS.

Availability: The ELISP source code for RALEE is freely available from http://www.sanger.ac.uk/Users/sgj/ralee/ along with documentation and examples.

Contact:sgj@sanger.ac.uk

INTRODUCTION

Non-coding RNA (ncRNA) genes often produce structured RNA products, some of the best known of which are involved in essential ribonucleoprotein complexes, such as the ribosome and the spliceosome. Such structured RNAs are often poorly conserved in sequence, but conserve a secondary structure with patterns of base covariation. This covariation forms the basis of several algorithms for de novo prediction of ncRNA genes (Rivas and Eddy, 2001; di Bernardo et al., 2003). Statistical models incorporating both sequence and structure information [termed covariance models, or stochastic context free grammars (Eddy, 2002)] have recently allowed the Rfam database of ncRNA families to be established (http://www.sanger.ac.uk/Software/Rfam/, Griffiths-Jones et al., 2003).

Computational alignment of ncRNAs is a challenging problem, because the correct alignment is often not evident without knowledge of the secondary structure. However, the best secondary structure predictions rely on comparative analysis of good multiple sequence alignments. Algorithms that align sequence and structure simultaneously are starting to emerge (http://dart.sourceforge.net/stemloc/, Holmes and Rubin, 2002; Gorodkin et al., 1997; Mathews and Turner, 2002), but are in their infancy and are often prohibitively expensive in both time and memory. Production of high quality alignments of structured RNAs is thus a laborious and iterative process of manual alignment and structure prediction.

Several excellent multiple sequence alignment editors are available, including BioEdit (http://www.mbio.ncsu.edu/BioEdit/bioedit.html), GeneDoc (http://www.psc.edu/biomed/genedoc), DCSE (http://rrna.uia.ac.be/dcse/) and Seaview (http://pbil.univ-lyon1.fr/software/seaview.html, Galtier et al., 1996). Of particular note, Jalview (http://www.jalview.org/, Clamp et al., 2004) provides extensive functionality for editing alignments of both proteins and nucleic acids. However, few editors cater specifically to the problem of aligning structured RNAs. A simple but effective solution to the problem is presented here, using ELISP extensions to the widely used, multi-platform text editor, Emacs (http://www.gnu.org/software/emacs/).

FEATURES

The primary requirement of an RNA alignment editing tool is to be able to mark-up the alignment based on the prediction of its consensus secondary structure. Such annotation, in the form of a structure-based colouring scheme, allows the user to quickly and intuitively identify regions of the alignment that do not match the structure well, and thus to refine both the alignment and structure manually. RALEE (RNA ALignment Editor in Emacs) provides such a colouring scheme (shown in Fig. 1), as well as allowing the user to colour the alignment more conventionally by conservation or residue identity. Other RNA specific features of RALEE include the ability to integrate secondary structure predictions of arbitrary sequences in the alignment (using an external package such as ViennaRNA—http://www.tbi.univie.ac.at/~ivo/RNA/, Hofacker, 2003), and to test how the alignment matches the new structure prediction. Helper applications also allow the user to view depictions of predicted secondary structures. Standard alignment editing methods, such as insertion and deletion of whole columns of gaps, trimming the alignment at either end and removing columns that contain only gaps, are also accessible through user-customized control character combinations, or by using the menus provided. A split-screen mode facilitates the viewing and editing of base-paired regions that may be far apart in sequence (see Fig. 1).

Using an available and well-used text editor such as Emacs as the basis for the RALEE tool has a number of advantages:

  • Many core features are already available and well tested, including file handling, cut and paste, deep undo and menus. Development can therefore concentrate on useful user-driven features.

  • The interface is familiar to a large user base. RALEE additions conform to the Emacs look and feel.

  • Emacs is available for a wide-range of platforms, including GNU/Linux, most UNIX systems, Windows and Mac OS.

REQUIREMENTS

RALEE extends GNU Emacs 21 (http://www.gnu.org/software/emacs/) and the vast majority of the functionality is also compatible with XEmacs 21 (http://www.xemacs.org/). If available, the ViennaRNA package (http://www.tbi.univie.ac.at/~ivo/RNA/, Hofacker, 2003) can be used as a helper application for on-the-fly structure prediction and display. RALEE at present reads alignments in Stockholm format (http://www.cgr.ki.se/cgb/groups/sonnhammer/Stockholm.html), which is the format in which the Rfam database distributes alignments of RNA families (http://www.sanger.ac.uk/Software/Rfam/). Future development should allow import and export of alignments in a variety of formats, as well as the facility to handle mark-up of pseudoknot interactions. RALEE is being used actively by Rfam curators to improve the quality of alignments in the database.

Fig. 1

Editing a multiple sequence alignment of members of the U1 spliceosomal RNA family in GNU Emacs using RALEE mode. The structure colour scheme is based on the consensus secondary structure shown as nested pairs of ‘<’ and ‘>’ symbols in the SS_cons line. Bases of the same colour are part of the same helix—for example, those highlighted in black in the top panel pair with those in the bottom panel. The split-screen view allows editing of paired regions that are far apart in sequence.

Fig. 1

Editing a multiple sequence alignment of members of the U1 spliceosomal RNA family in GNU Emacs using RALEE mode. The structure colour scheme is based on the consensus secondary structure shown as nested pairs of ‘<’ and ‘>’ symbols in the SS_cons line. Bases of the same colour are part of the same helix—for example, those highlighted in black in the top panel pair with those in the bottom panel. The split-screen view allows editing of paired regions that are far apart in sequence.

I would like to thank Simon Moxon, Alex Bateman, Gayle McEwan, Tobias Mourier and Ashwin Hajarnavis for testing features and providing feedback. S.G.-J. is funded by the Wellcome Trust.

REFERENCES

Clamp, M., Cuff, J., Searle, S.M., Barton, G.J.
2004
The Jalview Java alignment editor.
Bioinformatics
 
20
426
–427
di Bernardo, D., Down, T., Hubbard, T.
2003
ddbRNA: detection of conserved secondary structures in multiple alignments.
Bioinformatics
 
19
1606
–1611
Eddy, S.R.
2002
A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure.
BMC Bioinformatics
 
3
18
Galtier, N., Gouy, M., Gautier, C.
1996
SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny.
Comput. Appl. Biosci.
 
12
543
–548
Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A., Eddy, S.R.
2003
Rfam: an RNA family database.
Nucleic Acids Res.
 
31
439
–441
Gorodkin, J., Heyer, L.J., Stormo, G.D.
1997
Finding the most significant common sequence and structure motifs in a set of RNA sequences.
Nucleic Acids Res.
 
25
3724
–3732
Hofacker, I.L.
2003
Vienna RNA secondary structure server.
Nucleic Acids Res.
 
31
3429
–3431
Holmes, I. and Rubin, G.M.
2002
Pairwise RNA structure comparison using stochastic context-free grammars.
Proceedings, Pacific Symposium on Biocomputing
 
163
–174
Mathews, D.H. and Turner, D.H.
2002
Dynalign: an algorithm for finding the secondary structure common to two RNA sequences.
J. Mol. Biol.
 
317
191
–203
Rivas, E. and Eddy, S.R.
2001
Noncoding RNA gene detection using comparative sequence analysis.
BMC Bioinformatics
 
2
8

Comments

0 Comments