The RNA editing process in protozoan parasites is controlled by small RNA molecules known as guide RNAs (gRNAs). The gRNA database is a comprehensive compilation of published guide RNA sequences from eight different kinetoplastid organisms. In addition to the RNA primary sequences, information on the gene localization, the experimental verification of the transcripts, and literature citations are provided. Accessory information includes the secondary structures of four Trypanosoma brucei gRNAs as well as a computer modelled three dimensional gRNA structure. The database is made available as a hypertext document accessible via the World Wide Web (WWW) or from the authors in a printed form.
Guide RNAs (gRNAs) are small, metabolically stable mitochon- drial transcripts identified only in kinetoplastid organisms such as Trypanosoma, Leishmania or Crithidia. The molecules carry out a central function during the unusual mitochondrial RNA processing reaction known as kinetoplastid (k) RNA editing (for recent reviews see 1,2). During editing uridylate residues get inserted into and deleted from mitochondrial transcripts thus completing the sequence information of these mRNAs. Guide RNAs provide the information for the U insertion/deletion process by base pairing to pre-edited mRNAs. They are encoded on the mitochondrial mini- or maxicircle DNA elements in kinetoplastid organisms and the RNAs are presumably primary transcripts. Guide RNAs have an average length of 50–70 nucleotides (nt) with a strong A/U nucleotide bias. The primary sequence of gRNAs can be divided into three functional domains: first, a region of complementarity located at the 5′-end, termed anchor sequence, which is thought to create the initial contact with the pre-edited mRNA; second, an informational sequence domain which presumably directs the editing reaction; and third, a posttranscriptionally added 3′ oligo(U) extension, sometimes of >20 nt in length. More than 200 different gRNAs have been estimated to be required for the editing of all encrypted genes in Trypanosoma brucei (3) and there is an ∼3-fold higher coding capacity for gRNA genes in that organism. Thus, in addition to the large number of different gRNAs the potential for gRNA redundancy exists (4). Guide RNAs have been suggested to fold into simple secondary structures, comprising two consecutive stem loop elements with both terminal ends in a single-stranded conformation (5).
Description of the Database
Release 1.0 of the database contains 235 gRNA sequence entries including published sequences through September 30, 1996. The sequences stem from eight different kinetoplastid species: Trypanosoma brucei, Trypanosoma cruzi, Trypanosoma congolense, Trypanosoma equiperdum, Leishmania tarentolae, Leishmania infantum, Leishmania gymnodactyli and Crithidia fasciculata. The compilation is arranged in tabular form, listing for each entry: organism and name of the gRNAs, their primary sequences [not including the 3′ oligo(U) extension] and their localization on the mitochondrial genome (see Fig. 1 for an example). The order in which the gRNAs have been listed is from left-to-right with reference to the linear maxicircle map as given in (6). The nomenclature of gRNAs differs depending on the laboratory involved and the molecules are listed in a 5′ to 3′ order: the gRNA required to edit a 5′ region of a mRNA sequence is listed before that which is involved in editing a 3′ region. The amount of sequence shown for a gRNA may exceed the actual length of the gRNA. This is because in many cases the 5′ and 3′ termini have not been determined experimentally or because heterogeneity has been observed when gRNAs have been analyzed by primer extension or cDNA sequencing. For 159 of the 235 gRNAs, the existence of the molecules within RNA preparations has been experimentally verified by Northern blotting, primer extension, direct cDNA cloning, or by being isolated as part of a gRNA/mRNA chimera. The remaining sequences have to be considered putative gRNAs based on their base complementarity to fully edited mRNA sequence domains. Since all sequences were collected from published information, the corresponding references are provided in an associated hypertext document including MEDLINE identification numbers. In most of the cases these references will provide an alignment of the gRNAs with their cognate mRNAs.
The database contains accessory information such as the experimentally verified secondary structures of four gRNAs from Tbrucei (gA6-14, gA6-48, gND7-506, gCyb-558) (5) which are presented in Graphic Interchange Format (GIF) (see Fig. 2A for an example) and a three dimensional model for Tbrucei guide RNA gND7-506 (Hermann and Gsringer, unpublished) (Fig. 2B).
The gRNA database is accessible via the URL: http://www.biochem.mpg.de/∼goeringe/. A printed version can be obtained upon request from any of the authors who can be contacted by electronic mail (email@example.comfirstname.lastname@example.org) or by mail at the address given above. Users of the database should cite this publication. Corrections, new entries, errors and omissions or other materials for inclusion in the database are welcome. Submission of new information will be accepted in any form. Unpublished data will be held confidential if required.
This work was supported by grants from the German ministry for education and research (BMBF) and the German research foundation (DFG) to H.U.G.