The miRNA Registry provides a service for the assignment of miRNA gene names prior to publication. A comprehensive and searchable database of published miRNA sequences is accessible via a web interface ( http://www.sanger.ac.uk/Software/Rfam/mirna/ ), and all sequence and annotation data are freely available for download. Release 2.0 of the database contains 506 miRNA entries from six organisms.
Received July 24, 2003; Revised August 26, 2003; Accepted September 3, 2003
MicroRNAs (miRNAs) are a class of non‐coding RNA gene whose products are ∼22 nt sequences that play important roles in the regulation of translation and degradation of mRNAs through base pairing to partially complementary sites in the untranslated regions (UTRs) of the message. Since the discovery of the founding members of the class, let‐7 and lin‐4 miRNAs in Caenorhabditis elegans (reviewed in 1 ), more than 300 miRNAs have been found in animals and plants ( 2 – 19 ). In animals, the expression of miRNAs has been shown to involve at least two processing steps ( 20 ). miRNAs are transcribed as long primary transcipts (pri‐miRNAs), which may contain more than one miRNA. The primary transcript is processed in the nucleus to give one or more hairpin precursor sequences (pre‐miRNAs). This processing step defines one end of the mature miRNA sequence, which is contained in one arm of the hairpin precursor. The hairpin precursor is exported to the cytoplasm where the mature miRNA is excised by the RNase III‐like enzyme Dicer, suggesting a relationship with RNA interference (RNAi) ( 21 – 23 ). The criteria for distinguishing miRNAs from other classes of RNA, such as small interferring RNAs (siRNAs) have recently been agreed by a number of miRNA scientists ( 24 ).
The rapid rate of miRNA gene discovery has led to two basic needs for the miRNA community. To avoid inadvertant overlap, it is important for miRNA researchers to have access to an independent arbiter of gene names. In addition, a comprehensive and up‐to‐date repository for published miRNA sequences and annotation greatly facilitates the rapid development of computational approaches for the prediction of miRNA genes and targets, as well as aiding sequence and genome annotation. Several groups have recently published work on prediction of miRNAs in C.elegans ( 14 , 16 ) and human ( 17 ), and reports of prediction and verification of the mRNA targets of a number of miRNAs have started to emerge ( 12 , 25 ).
AIMS OF THE miRNA REGISTRY
The primary aims of the miRNA Registry are two‐fold. The first is to assign unique names to distinct miRNAs prior to publication of their discovery. A web interface has been developed to facilitate the submission of miRNA sequences for naming. To avoid accidental overlap of gene names, and to minimize ‘pre‐booking’ of assignments, the Registry will assign a name only after a paper describing the sequence has been accepted for publication. Authors are advised to use temporary names in initial submission of articles to journals for peer‐review. On acceptance, final names are discussed and agreed with the corresponding author. The miRNA Registry maintains complete confidentiality for pre‐publication data.
miRNAs are given numerical identifiers based on sequence similarity. At the time of writing, the last assigned name is miR‐318 from Drosophila melanogaster . The next miRNA with no similarity to previously identified sequences will receive the name miR‐319. It is desirable for homologues in different organisms to receive the same name. Names are based on the similarity of the excised ∼22 nt sequence to previously identified miRNAs. Identical mature sequences are assigned the same name—if they originate from seperate genomic loci in a given organism they are given numberical suffixes, such as mir‐6‐1 and mir‐6‐2 from D.melanogaster ( 4 ). Sequences with one or two base changes are assigned suffixes of the form miR‐181a and miR‐181b ( 17 ). Homologous sequences with more base differences may be suggested by sequence similarity in the hairpin portion of the primary transcript, and such cases are discussed and names agreed with the corresponding author. Some miRNA hairpin precursors give rise to two excised miRNAs, one from each arm. Different naming conventions have been used to describe these sequences. Where cloning studies have allowed researchers to determine which arm of the precursor gives rise to the predominantly expressed miRNA, an asterisk has been used to denote the less predominant form, as in miR‐56 and miR‐56* from C.elegans ( 2 ). Previous reports have also denoted miRNAs from opposite arms of the hairpin precursor as, for example, miR‐142‐s (5′ arm) and miR‐142‐as (3′ arm) ( 5 ). Current opinion favours using names of the form miR‐142‐5p and miR‐142‐3p to designate miRNAs from the 5′ and 3′ arms, respectively, until the data are sufficient to confirm which is predominantly expressed (T. Tuschl and D. Bartel, personal communication). Capitalisation of names should not be relied upon to confer meaning, but historically, mir‐16 has been used to designate the gene (and also the predicted stem–loop portion of the primary transcript), whereas miR‐16 signifies the excised ∼22 nt sequence. Plant gene names follow a slightly different convention—of the form MIR156 ( 10 ).
The second aim of the miRNA Registry is to provide a comprehensive and searchable database of all published miRNA sequences. To this end, submitted sequences are moved to the public sections of the database on their publication. The website includes a browsable list of miRNA entries, name, keyword and publication searches, and allows the user to search a sequence against the database of predicted hairpins and mature miRNAs. Each database entry represents a predicted stem–loop containing the miRNA, with the bounds of the excised sequence(s) reported. The publication describing the discovery of the miRNA is cited as the primary reference. A brief description of the genomic location, homologous sequences and possible targets is provided, with links to literature references for more information. Cross‐links to nucleotide databases, model organism databases and RNA family databases are given. Hairpin base‐paired structures are depicted as predicted by the RNAfold program from the ViennaRNA package ( 26 ). A typical entry page is shown in Figure 1 .
A commitment to the long‐term curation of the miRNA Registry ensures the rapid dissemination of new sequence data and annotation. Each database entry is identified by a stable accession number in addition to the miRNA gene name. This enables the rationalisation of gene names as more data become available, whilst maintaining information for tracking changes from initial published names and descriptions. At the time of writing, the database contains only published miRNA loci, but miRNA annotation guidelines allow for the computational identification of homologues of validated miRNA sequences ( 24 ). The size of the database is likely to increase significantly as such sequences are curated by us and others. As more information becomes available about the biogenesis of miRNAs, we predict that it will become desirable to curate sequence information for the primary transcipt and the hairpin precursor, as well as the excised mature miRNA. Close integration with the Rfam database ( 27 ) facilitates the classification of related miRNA sequences into families.
The database is hosted by the Rfam (UK) website at http://www.sanger.ac.uk/Software/Rfam/mirna/ and is freely available to all. Predicted stem–loop and mature miRNA sequence data are also available for download from the FTP site in FASTA format, and complete with annotation in EMBL format. Release 2.0 of the database (July 2003) contains 506 entries from C.elegans , Caenorhabditis briggsae , D.melanogaster , human, mouse and Arabidopsis thaliana . Queries and feedback, including data revisions are welcomed by email to firstname.lastname@example.org .
I would like to thank Mhairi Marshall for web design and database support, and Simon Moxon for annotating many entries in the database. I am grateful to David Bartel, Tom Tuschl, Victor Ambros, Sean Eddy and Alex Bateman for their support and useful discussions, and David Bartel for critical manuscript reading.