The Antwerp database on small ribosomal subunit RNA now offers more than 6000 nucleotide sequences (August 1996). All these sequences are stored in the form of an alignment based on the adopted secondary structure model, which is corroborated by the observation of compensating substitutions in the alignment. Besides the primary and secondary structure information, literature references, accession numbers and detailed taxonomic information are also compiled. For ease of use, the complete database is made available to the scientific community via World Wide Web at URL http://rrna.uia.ac.be/ssu/.
Contents of the Database
In 10 years time, the number of small ribosomal subunit RNA (further abbreviated as SSU rRNA) sequences that have been compiled by our research group in Antwerp has increased from 57 (1) to 6085. In 1986, the complete or nearly complete SSU rRNA sequence was known for 14 eukaryotes, 13 bacteria, nine archaebacteria, four plastids and 16 mitochondria. In August 1996, 1484 eukaryotic, 4155 bacterial, 142 archaeal, 88 plastid and 216 mitochondrial sequences were available. All SSU rRNA sequences included in our database are stored in the form of an alignment and contain the postulated secondary structure pattern in encoded form. Partial SSU rRNA sequences are included only if the combined length of the sequenced segments amounts to at least 70% of the estimated chain length of the molecule. The chain length of a partially determined sequence is estimated by comparing it with a complete sequence of a close relative.
Table 1 lists the different eukaryotic taxa and the number of representatives in the database. The taxonomic classification of the species is according to Brusca and Brusca (2) for the Animalia, according to Cronquist (3) for the higher plants, according to Ainsworth et al. (4) for the zygomycetes and ascomycetes, according to Moore (5) for the basidiomycetes and ustomycetes, and according to Margulis et al. (6) for the remaining eukaryotes, viz. the Protoctista.
Table 2 covers the prokaryotic SSU rRNA sequences. The classification of prokaryotes is based on the construction of evolutionary trees. In short, new sequences retrieved from the EMBL (7) and/or GenBank (8) nucleotide sequence libraries are aligned with their presumed closest relative. Evolutionary trees are then constructed by the neighbor-joining method (9), and according to the phylogenetic position observed, the species are assigned to one of the taxa described by Woese and co-workers (10,11) and our research group (12,13). In the case of the Bacteria, no hierarchical distinction is made between divisions and subdivisions such as the α, β, γ, δ and ε subdivisions of the division Proteobacteria, since these subdivisions do not always form together a monophyletic cluster in evolutionary trees. It should also be noted that tree construction shows the Protobacteria γ subdivision to be a paraphyletic taxon, comprising the Proteobacteria β subdivision. For the Archaea, a distinction is made between the divisions Crenarchaeota and Euryarchaeota (14). The latter division is further subdivided into eight subdivisions.
Secondary Structure and Nucleotide Variability
The secondary structure models adopted for prokaryotic and eukaryotic SSU rRNAs were originally derived (15) by comparison of six eucaryal, one archaeal, four bacterial, two plastidial and one mitochondrial SSU rRNA sequences available in 1984 and by surveying 13 secondary structure models proposed at the time in papers listed in (15). Gradual improvements were made to the models, as reported in subsequent papers describing our database on SSU rRNA structure (1,16–19,12,20,21), taking into account compensating substitutions observed in our sequence alignments (22) and the results of studies by others (reviewed in 23).
Secondary structures encoded in the sequences are based either on the prokaryotic model, which is applicable to Bacteria, Archaea, plastids and mitochondria, or on the eukaryotic model applicable to all Eucarya. The two models are slightly different, each containing a number of structural elements specific for the group. The prokaryotic model is essentially identical to those distributed by Gutell (24), but the model followed for eukaryotic SSU rRNAs includes a secondary structure pattern in certain variable areas left undefined in the models of the latter author. It is illustrated in Figure 1 with the SSU rRNA nucleotide sequence of the chlorarachniophyte Chlorarachnion reptans.
Helices in the SSU rRNA secondary structure model are given a different number if separated by a multibranched loop (e.g. helices 9 and 10), by a pseudoknot loop (e.g. helices 1 and 2), or by a single-stranded area that does not form a loop (e.g. helices 2 and 32). A single number is given to 50 ‘universal’ helices, which are present in all SSU rRNAs from Archaea, Bacteria and plastids known to date. The 50 ‘universal’ helices are also present in all known eukaryotic SSU rRNAs except in those of Microsporidia (Microspora), where some of these helices are missing. Helices specific to the eukaryotic model are numbered Ea-b, where a is the number of the preceding universal helix and b sequentially numbers all helices inserted between universal helices a and a+1. Helices specific to the prokaryotic model are similarly given composite numbers of the form Pa-b. Mitochondrial sequences show extreme variability in length and in the number of helices present. Examples of secondary structure models for prokaryotic and mitochondrial SSU rRNAs have been given in previous papers on our database (12,20,21).
Recently, a new method was developed for measuring the relative substitution rate of individual sites in a nucleotide sequence alignment on the basis of the estimated evolutionary distance between sequences (25,26). By dividing nucleotides into five variability subsets, and giving a different color to each of the subsets, color maps superimposed on the secondary structure of SSU rRNAs can be constructed, as explained in detail in Van de Peer et al. (27). In the latter study, variability color maps are presented for bacterial 5S rRNA, SSU rRNA and large ribosomal subunit rRNA (further abbreviated as LSU rRNA). The color map of Figure 2 of the current paper is superimposed on the eukaryotic SSU rRNA secondary structure model of Saccharomyces cerevisiae (19). Nucleotide variabilities were computed on the basis of 500 sequences of species belonging to the eukaryotic ‘crown taxa’ (28). The list of taxonomic groups and the number of their representatives used for this analysis is given in Table 3. Color maps for bacterial 5S rRNA, SSU rRNA, LSU rRNA (27) and eukaryotic SSU rRNA (Fig. 2) can also be consulted via internet at URL http://bioc-www.uia.ac.be/u/yvdp/.
Availability of the Data
Each SSU rRNA sequence is stored in a separate file, in order to simplify access to the data. Each file contains primary and secondary structure information, as well as annotations such as accession number, literature reference and detailed taxonomic specifications. The SSU rRNA database is made available via World Wide Web at URL http://rrna.uia.ac.be/ssu/. Through WWW, it is very easy to select sequences either one by one, or by taxonomic group, or by a combination of both. Recently, new search tools were included to easily find specific sequences (see De Rijk et al., this issue). Sequences can be retrieved in different formats. On-line information about the database is also available.
If problems occur in connecting to the server or in retrieving data, the authors can be contacted by electronic mail to firstname.lastname@example.org or email@example.com. Users publishing results based on data retrieved from our database are requested to cite this paper.
Our research is supported by the BIOTECH programme of the commission of European Communities (contract BIO2-CT94-3098), by the Programme on Interuniversity Poles of Attraction of the Office for Scientific, Cultural, and Technical Affairs of the Belgian State (contract 23), by the National Fund for Scientific Research and by the Special Research Fund of the University. We thank Sabine Chapelle for the computer drawing of the secondary structure model. Yves Van de Peer is Research Assistant of the National Fund for Scientific Research.