Abstract

The Rfam database aims to catalogue non-coding RNAs through the use of sequence alignments and statistical profile models known as covariance models. In this contribution, we discuss the pros and cons of using the online encyclopedia, Wikipedia, as a source of community‐derived annotation. We discuss the addition of groupings of related RNA families into clans and new developments to the website. Rfam is available on the Web at http://rfam.sanger.ac.uk.

INTRODUCTION

The Rfam database maintains alignments, consensus secondary structures, covariance models (CMs) and corresponding annotation for RNA families. Each family represents a set of RNA sequences that function at the RNA level and share a clear common ancestor. Some examples are tRNA, microRNAs, spliceosomal RNAs, riboswitches, CRISPR elements and thermosensors. The primary purpose of the Rfam database is the automated, accurate annotation of non-coding RNAs (ncRNAs) in genomic sequences. Rfam is also frequently used as a source of high-quality alignments for training and benchmarking RNA sequence analysis software tools ( 1–5 ). Additionally, in the absence of a well-curated and up-to-date general RNA sequence database, equivalent to UniProt in the protein coding world, Rfam is also often used as a source of individual ncRNA sequences.

As described in previous Rfam publications, the database is built upon well‐curated seed alignments of representative members of an RNA family ( 6–8 ). These are used to build CMs, statistical models of a family's conserved sequence and secondary structure, using the Infernal suite of analysis tools ( 9 ). The resultant covariance models are used to scan a large database of nucleotide sequences that is derived from the EMBL nucleotide archive ( 10 ). The searches return a list of putative homologs, or hits, ranked by bit-scores derived from the CMs. A hit's bit-score is the log odds ratio of the probability the hit was generated by the CM versus a random model of background sequence. An expert curator provides a threshold that in their opinion best discriminates between bona fide homologs to the seed sequences and the background distribution of false hits. Subsequently, all sequences with a bit-score above the threshold are included in an automatically generated alignment to the CM.

NEW DEVELOPMENTS

The Rfam 10.0 “decimal” release

In order to keep Rfam as up-to-date as possible we aim to make regular releases of the database. These releases are snap-shots of the live, internal version of the database that are made publicly available via the websites and ftp. We have two types of release. A major release (indicated by an integer and a ‘.0’ in the version number e.g. ‘10.0’) usually involves updating the underlying sequence database, Rfamseq, to the latest version of EMBL and remapping all the seed sequences to the new databases. All the families are subsequently searched against the new database and, if necessary, re-thresholded. Minor releases are indicated by ‘.1’, ‘.2’, etc. in the version number e.g. ‘10.1’. These are usually made after adding many new families to the database built on the same underlying sequence database.

Rfam 10.0 was released in early 2010. This release included a major update to the underlying search algorithm, switching to a new version of Infernal, v1.0 ( 9 ). This required individually re-thresholding each Rfam family due to an important change in Infernal’s underlying scoring scheme from maximum likelihood alignment scores to summed scores over all possible alignments [i.e. switching from using the CYK algorithm to the Inside algorithm ( 11 )]. Additionally, the new version of Infernal reports estimates of the statistical significance of hits ( E -values) returned from database searches using Rfam 10.0 CM files. We also mapped all the families and searched a new version of Rfamseq based on EMBL 100 ( 10 ). The result of these and other internal improvements to our pipeline resulted in a 178% increase in the number of regions that Rfam covers, which contrasts with the rather modest increase in the size of Rfamseq by 40%. This has caused some of our alignments to become very large. For example, the tRNA full alignment now contains more than 1 million sequences. The amount of compute required for this release was roughly 5 CPU months to calibrate the models, 1 CPU year to run blast, 3 CPU years to run CM-searches (cmsearch) and 15 CPU days to produce CM-derived multiple sequence alignments (cmalign).

Evaluating the success of the Wikipedia community annotation model

One of the fundamental problems facing any biocuration effort is keeping the annotation of the entities stored in a database up to date with the current literature. Typically, the annotation of existing entries changes less quickly than new data are added, so entries become rapidly out-of-date.

In mid-2007, Rfam began experimenting with using Wikipedia as a means for storing and curating the textual annotation of RNA families. Three years on, the RNA family pages have received more than 9000 edits from more than 1000 unique users. Slightly over 1% of these edits have been recognized as possible vandalism ( Figure 1 ). The resulting marked-up annotation and curated references has dramatically improved the content of the Rfam database compared with the pre-2007 static text. The Wikipedia entries also help drive users to the Rfam website. Approximately 15% of all the web-traffic to http://rfam.sanger.ac.uk now comes via Wikipedia. As has been observed by others, a typical Google search for a biological term returns a Wikipedia entry among the top hits ( 12 , 13 ). From a curator’s viewpoint, Wikipedia is an excellent model to take advantage of as it includes a large community of contributors and comes with a number of user-friendly tools that help with basic editing, maintaining references and automated updates to pages with programs called bots. The large community also has other benefits, such as the well documented long-tail effect, where the majority of new content is added by a large number of editors, each of whom makes just a few edits ( 12 , 13 ). There are also dedicated editors who are obsessed with small but important details that an average curator may not have time to attend to, such as consistency of style, grammar and spelling. There are also editors who are dedicated to reverting obvious non-constructive edits, commonly referred to as `vandalism’, which are usually recognized and reverted within seconds. It is important to note that all edits are reviewed before appearing on the Rfam website, so the amount of overt vandalism reaching Rfam is 0. Given our positive experiences, we can highly recommend other curation efforts turning to Wikipedia for their annotation. However, it must be borne in mind that Wikipedia is built by consensus and to gain its benefits you will lose the tight control of the data allowed by in-house curation.

Figure 1.

Edits for Wikipedia articles on RNA families. The cumulative number of edits since 1st January 2007 for the 733 Wikipedia articles that are associated with Rfam entries is shown in black. The total number of edits that were reverted or labeled as vandalism is shown in red. To mid-2010, there were just 106 of these. However, some reverted edits may have been well-intentioned but were deemed inappropriate for Wikipedia.

Figure 1.

Edits for Wikipedia articles on RNA families. The cumulative number of edits since 1st January 2007 for the 733 Wikipedia articles that are associated with Rfam entries is shown in black. The total number of edits that were reverted or labeled as vandalism is shown in red. To mid-2010, there were just 106 of these. However, some reverted edits may have been well-intentioned but were deemed inappropriate for Wikipedia.

Rfam clans

One of the fundamental quality control steps that Rfam employs is that no two families can annotate the same nucleotide. This rule prevents us building two or more families for essentially the same entity. When building new Rfam families or extending an existing family, we sometimes find ourselves artificially increasing the threshold to avoid overlaps with another family or trimming the ends of families that have incorrect boundaries. We also find that a single alignment may not capture all the diversity of a group of homologous RNAs. To resolve some of these issues, we have borrowed the concept of a clan from the MEROPS and Pfam databases ( 14 , 15 ).

We have added 99 clans for the Rfam 10.0 release. These clans describe explicit relationships between families that either clearly share a common ancestor but are too divergent to be reasonably aligned or groups of families that could be aligned, but have clearly distinct functions and therefore should be kept as separate families. For example, the RNase P clan contains five homologous families RNase MRP, archeal RNase P, nuclear RNase P and the bacterial RNase P, types a and b. These RNAs are ribozymes involved in processing of pre-tRNA and pre-rRNA sequences. The RNase Ps are, however, notoriously difficult to align to each other. Furthermore, RNase P and RNase MRP are functionally distinct molecules ( 16 ). Another clan of interest is Glm; this clan contains two homologous but functionally distinct bacterial small RNAs, GlmY and GlmZ, which act in a hierarchical fashion to regulate the translation of the glmS coding gene. GlmY activates expression of GlmZ which in turn de-sequesters the GlmS Shine-Dalgarno sequence via an anti-antisense interaction ( 17 ). The new clans mean that some of the internal quality control measures that Rfam uses can be relaxed for the clanned families. Primarily this means we can ignore our no-overlap rule, which has meant that in the past some of these families have had artificially high thresholds to avoid overlapping a related but distinct family.

In order to help assess the likelihood of a relationship between two or more families, we used a number of independent lines of evidence. These included sequence analysis based upon a SCOOP-like analysis for comparing overlapping hits from both profile hidden Markov model (HMM) and covariance model searches ( 18 ), the profile-profile comparison tool PRC ( 19 ) and literature searches for functional and evolutionary relationships. For the snoRNA and miRNA families, we were able to utilize some additional sources of information in order to establish homology. For the snoRNAs, we used some of the specialized snoRNA databases to confirm whether families targeted orthologous regions of rRNA, for many snoRNAs this helped to confirm a relationship between the families ( 20–23 ). For the miRNAs, we used the annotated seed region of the mature miRNA ( 24 ). If two or more miRNA families shared a significant amount of similarity in the seed region, and if they had further similarities identified by the sequence analysis tools, then these too were added to clans.

Species labels

The new set of seed and full alignments available via the website use descriptive species labels for sequence names rather than the more cryptic EMBL accessions and coordinates that were previously provided. The provenance of the sequence data is maintained by using ‘

#=GS
’ tags from Stockholm format ( 25 ) to provide a mapping back to EMBL accessions ( Figure 2 ). Stockholm is a versatile markup format for biological sequence alignments. It allows the markup of general file information, including references, comments and cross-links. It also allows the mark-up of regions of an alignment that cannot be aligned with tildes in the ‘
#=GC RF
’ lines.

Figure 2.

An example Stockholm alignment for the UPSK pseudoknot from turnip yellow mosaic virus. The Stockholm alignment format is flexible enough to allow generic mark-up of file information with ‘

#=GF
' lines, sequence information with ‘
#=GS
' lines and column information with ‘
#=GC
' lines. Each is followed by at least a two-letter code giving an indication for what follows e.g. ‘ID' implies ‘identifier', ‘
AC
' implies ‘accession', ‘
AU
' implies ‘author', etc. All the commonly used tags are documented in the Wikipedia article for Stockholm alignment ( 25 ).

Figure 2.

An example Stockholm alignment for the UPSK pseudoknot from turnip yellow mosaic virus. The Stockholm alignment format is flexible enough to allow generic mark-up of file information with ‘

#=GF
' lines, sequence information with ‘
#=GS
' lines and column information with ‘
#=GC
' lines. Each is followed by at least a two-letter code giving an indication for what follows e.g. ‘ID' implies ‘identifier', ‘
AC
' implies ‘accession', ‘
AU
' implies ‘author', etc. All the commonly used tags are documented in the Wikipedia article for Stockholm alignment ( 25 ).

Ontologies

An important feature for any biocuration effort is linking to related resources, for example, primary sequence resources databases, genomes and to specialized resources such as miRBase and the snoRNA databases. Recently, a number of groups have started developing controlled vocabularies for describing biological entities. Two efforts of particular relevance to Rfam are the sequence ontology (SO) and the gene ontology (GO) ( 26 , 27 ). For the majority of Rfam families, we have now added cross-links to both the SO and the GO. Many of these were provided by researchers at the functional RNA database ( 28 ). In the near future, we plan to introduce more ncRNA terms back into the ontologies. Until then the mapping will remain rather coarse-grained and closely related to the existing types Rfam uses as annotation ( 6 ). This mapping groups the RNAs into three main groups: ‘cis-reg’, ‘gene’ and ‘intron’ with subtypes such as ‘riboswitch’, ‘miRNA’ and ‘snoRNA’.

Future developments

New families in Rfam 10.1

For the forthcoming minor release of Rfam, we have added a number of new and notable families. Of particular note are the direct submissions of Stockholm formatted alignments and corresponding Wikipedia articles from the RNA community via the RNA families track at RNA Biology ( 8 ). This track has released much of the burden of building these new families from our curators, and the families produced have been built and annotated by experts and are therefore of high quality. Updated families from this route include RNase MRP, SRP, tmRNA and the U3 snoRNA ( 29–32 ). In addition, several families missing from past Rfam releases have been published, including the SmY RNA, the cyanobacterial RNA Yfr2, several Trypanosomatid snoRNAs, the self-splicing ribozyme GIR1, an influenza pseudoknot, the Staphylococcus small RNA RsaOG and a putative RNA antitoxin, ptaRNA1 ( 33–39 ). The ptaRNA1 article alerted us to the fact that Rfam contains none of the published and well-characterized RNA antitoxins such as sok and symE ( 40 ). These omissions will be remedied in Rfam 10.1. A growing class of cis -regulatory elements are the environmental sensors. These are generally structured 5′ UTR elements that change conformation in response to environmental changes such as temperature or pH; this change subsequently influences the expression of the protein encoded in the host mRNA. We have added the first examples of a cold sensor and a pH sensor ( 41 , 42 ). Finally, we have received a dramatic number of submissions from a recent bioinformatic screen that was followed by a thorough analysis of the predictions largely based upon genomic context. This has resulted in more than 80 new additions to the database ( 43 ). Fortunately, the authors kindly provide both Stockholm formatted alignments and Wikipedia articles for these new families.

Covariance model pre-filters

A pressing issue for Rfam is the replacement of WU-BLAST as a pre-filter for searching the Rfamseq database. The legal rights to up-to-date versions of WU-BLAST were recently acquired by a commercial entity and the software can no longer be considered free in any meaningful sense. However, there have been several developments that should allow profile HMMs to be used as effective pre-filters for covariance model searches ( 44 ). Accelerated profile HMM searches are now available through the HMMER package ( 45–47 ). In the near future, Rfam will therefore be in a position to replace the current BLAST-based filters with accelerated profile HMMs.

Scale

Sequencing projects such as the Genome 10K ( 48 ) and other attempts to fill sequencing gaps in the tree of life ( 49 ) mean that most Rfam families will dramatically increase in depth in the near future. Large alignments already pose a considerable challenge when it comes to displaying or distributing the alignments themselves, or building and displaying related data such as species and phylogenetic trees. Novel techniques will need to be developed in order to deal with these and many other issues of scale. We look forward to working with the wider community to develop these new tools and techniques.

FUNDING

Wellcome Trust (grant number WT077044/Z/05/Z) (to P.P.G., J.D., J.T., I.H.O., B.M. and A.B.); Howard Hughes Medical Institute (R.D.F, E.P.N., D.L.K. and S.R.E); University of Manchester (S.G.J.). Funding for open access charge: The Wellcome Trust (grant number WT077044/Z/05/Z).

Conflict of interest statement . None declared.

ACKNOWLEDGEMENTS

Many thanks to Guy Coates, James Beal and Peter Clapham for assistance with improving the performance of computational and software infrastructure. The authors received invaluable feedback at the 2009 Benasque RNA Workshop.

REFERENCES

1
Holmes
I
A probabilistic model for the evolution of RNA structure
BMC Bioinformatics
 , 
2004
, vol. 
5
 pg. 
166
 
2
Do
CB
Woods
DA
Batzoglou
S
CONTRAfold: RNA secondary structure prediction without physics-based models
Bioinformatics
 , 
2006
, vol. 
22
 (pg. 
e90
-
e98
)
3
Yao
Z
Weinberg
Z
Ruzzo
WL
CMfinder–a covariance model based RNA motif finding algorithm
Bioinformatics
 , 
2006
, vol. 
22
 (pg. 
445
-
452
)
4
Sun
Y
Buhler
J
Designing secondary structure profiles for fast ncRNA identification
Comput. Syst. Bioinformatics Conf.
 , 
2008
, vol. 
7
 (pg. 
145
-
156
)
5
Yusuf
D
Marz
M
Stadler
PF
Hofacker
IL
Bcheck: a wrapper tool for detecting RNase P RNA genes
BMC Genomics
 , 
2010
, vol. 
11
 pg. 
432
 
6
Griffiths-Jones
S
Bateman
A
Marshall
M
Khanna
A
Eddy
SR
Rfam: an RNA family database
Nucleic Acids Res.
 , 
2003
, vol. 
31
 (pg. 
439
-
441
)
7
Griffiths-Jones
S
Moxon
S
Marshall
M
Khanna
A
Eddy
SR
Bateman
A
Rfam: annotating non-coding RNAs in complete genomes
Nucleic Acids Res.
 , 
2005
, vol. 
33
 
Database issue
(pg. 
D121
-
D124
)
8
Gardner
PP
Daub
J
Tate
JG
Nawrocki
EP
Kolbe
DL
Lindgreen
S
Wilkinson
AC
Finn
RD
Griffiths-Jones
S
Eddy
SR
, et al.  . 
Rfam: updates to the RNA families database
Nucleic Acids Res.
 , 
2009
, vol. 
37
 
Database issue
(pg. 
D136
-
D1340
)
9
Nawrocki
EP
Kolbe
DL
Eddy
SR
Infernal 1.0: inference of RNA alignments
Bioinformatics
 , 
2009
, vol. 
25
 (pg. 
1335
-
1337
)
10
Leinonen
R
Akhtar
R
Birney
E
Bonfield
J
Bower
L
Corbett
M
Cheng
Y
Demiralp
F
Faruque
N
Goodgame
N
, et al.  . 
Improvements to services at the European Nucleotide Archive
Nucleic Acids Res.
 , 
2010
, vol. 
38
 
Database issue
(pg. 
D39
-
D45
)
11
Eddy
SR
A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure
BMC Bioinformatics
 , 
2002
, vol. 
3
 pg. 
18
 
12
Huss
JW
Orozco
C
Goodale
J
Wu
C
Batalov
S
Vickers
TJ
Valafar
F
Su
AI
A gene wiki for community annotation of gene function
PLoS Biol.
 , 
2008
, vol. 
6
 pg. 
e175
 
13
Huss
JW
Lindenbaum
P
Martone
M
Roberts
D
Pizarro
A
Valafar
F
Hogenesch
JB
Su
AI
The Gene Wiki: community intelligence applied to human gene annotation
Nucleic Acids Res.
 , 
2010
, vol. 
38
 
Database issue
(pg. 
D633
-
D639
)
14
Rawlings
ND
Barrett
AJ
Evolutionary families of peptidases
Biochem. J.
 , 
1993
, vol. 
290
 
Pt 1
(pg. 
205
-
218
)
15
Finn
RD
Mistry
J
Schuster-Böckler
B
Griffiths-Jones
S
Hollich
V
Lassmann
T
Moxon
S
Marshall
M
Khanna
A
Durbin
R
, et al.  . 
Pfam: clans, web tools and services
Nucleic Acids Res.
 , 
2006
, vol. 
34
 
Database issue
(pg. 
D247
-
D251
)
16
Ellis
JC
Brown
JW
The RNase P family
RNA Biol.
 , 
2009
, vol. 
6
 (pg. 
362
-
369
)
17
Urban
JH
Vogel
J
Two seemingly homologous noncoding RNAs act hierarchically to activate glmS mRNA translation
PLoS Biol.
 , 
2008
, vol. 
6
 pg. 
e64
 
18
Bateman
A
Finn
RD
SCOOP: a simple method for identification of novel protein superfamily relationships
Bioinformatics
 , 
2007
, vol. 
23
 (pg. 
809
-
814
)
19
Madera
M
Profile Comparer: a program for scoring and aligning profile hidden Markov models
Bioinformatics
 , 
2008
, vol. 
24
 (pg. 
2630
-
2631
)
20
Samarsky
DA
Fournier
MJ
A comprehensive database for the small nucleolar RNAs from Saccharomyces cerevisiae
Nucleic Acids Res.
 , 
1999
, vol. 
27
 (pg. 
161
-
164
)
21
Brown
JW
Echeverria
M
Qu
LH
Lowe
TM
Bachellerie
JP
Hüttenhofer
A
Kastenmayer
JP
Green
PJ
Shaw
P
Marshall
DF
Plant snoRNA database
Nucleic Acids Res.
 , 
2003
, vol. 
31
 (pg. 
432
-
435
)
22
Li
SG
Zhou
H
Luo
YP
Zhang
P
Qu
LH
Identification and functional analysis of 20 Box H/ACA small nucleolar RNAs (snoRNAs) from Schizosaccharomyces pombe
J. Biol. Chem.
 , 
2005
, vol. 
280
 (pg. 
16446
-
16455
)
23
Lestrade
L
Weber
MJ
snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs
Nucleic Acids Res.
 , 
2006
, vol. 
34
 
Database issue
(pg. 
D158
-
D162
)
24
Griffiths-Jones
S
Saini
HK
van Dongen
S
Enright
AJ
miRBase: tools for microRNA genomics
Nucleic Acids Res.
 , 
2008
, vol. 
36
 
Database issue
(pg. 
D154
-
D158
)
25
 
Stockholm format. http://en.wikipedia.org/wiki/Stockholm_format Stockholm format (19 June 2010, date last accessed)
26
Eilbeck
K
Lewis
SE
Mungall
CJ
Yandell
M
Stein
L
Durbin
R
Ashburner
M
The Sequence Ontology: a tool for the unification of genome annotations
Genome Biol.
 , 
2005
, vol. 
6
 pg. 
R44
 
27
Ashburner
M
Ball
CA
Blake
JA
Botstein
D
Butler
H
Cherry
JM
Davis
AP
Dolinski
K
Dwight
SS
Eppig
JT
, et al.  . 
Gene ontology: tool for the unification of biology. The Gene Ontology Consortium
Nat. Genet.
 , 
2000
, vol. 
25
 (pg. 
25
-
29
)
28
Mituyama
T
Yamada
K
Hattori
E
Okida
H
Ono
Y
Terai
G
Yoshizawa
A
Komori
T
Asai
K
The Functional RNA Database 3.0: databases to support mining and annotation of functional RNAs
Nucleic Acids Res.
 , 
2009
, vol. 
37
 
Database issue
(pg. 
D89
-
D92
)
29
Dávila López
M
Rosenblad
MA
Samuelsson
T
Conserved and variable domains of RNase MRP RNA
RNA Biol.
 , 
2009
, vol. 
6
 (pg. 
208
-
220
)
30
Rosenblad
MA
Larsen
N
Samuelsson
T
Zwieb
C
Kinship in the SRP RNA family
RNA Biol.
 , 
2009
, vol. 
6
 (pg. 
508
-
516
)
31
Mao
C
Bhardwaj
K
Sharkady
SM
Fish
RI
Driscoll
T
Wower
J
Zwieb
C
Sobral
BW
Williams
KP
Variations on the tmRNA gene
RNA Biol.
 , 
2009
, vol. 
6
 (pg. 
355
-
361
)
32
Marz
M
Stadler
PF
Comparative analysis of eukaryotic U3 snoRNA
RNA Biol.
 , 
2009
, vol. 
6
 (pg. 
503
-
507
)
33
Jones
TA
Otto
W
Marz
M
Eddy
SR
Stadler
PF
A survey of nematode SmY RNAs
RNA Biol.
 , 
2009
, vol. 
6
 (pg. 
5
-
8
)
34
Gierga
G
Voss
B
Hess
WR
The Yfr2 ncRNA family, a group of abundant RNA molecules widely conserved in cyanobacteria
RNA Biol.
 , 
2009
, vol. 
6
 (pg. 
222
-
227
)
35
Doniger
T
Michaeli
S
Unger
R
Families of H/ACA ncRNA molecules in trypanosomatids
RNA Biol.
 , 
2009
, vol. 
6
 (pg. 
370
-
374
)
36
Nielsen
H
Johansen
SD
Group I introns: moving in new directions
RNA Biol.
 , 
2009
, vol. 
6
 (pg. 
375
-
383
)
37
Gultyaev
AP
Olsthoorn
RC
A family of non-classical pseudoknots in influenza A and B viruses
RNA Biol.
 , 
2010
, vol. 
7
 (pg. 
125
-
129
)
38
Marchais
A
Bohn
C
Bouloc
P
Gautheret
D
RsaOG, a new staphylococcal family of highly transcribed non-coding RNA
RNA Biol.
 , 
2010
, vol. 
7
 (pg. 
116
-
119
)
39
Findeiss
S
Schmidtke
C
Stadler
PF
Bonas
U
A novel family of plasmid-transferred anti-sense ncRNAs
RNA Biol.
 , 
2010
, vol. 
7
 (pg. 
120
-
124
)
40
Fozo
EM
Hemm
MR
Storz
G
Small toxic proteins and the antisense RNAs that repress them
Microbiol. Mol. Biol. Rev.
 , 
2008
, vol. 
72
 (pg. 
579
-
589
)
41
Giuliodori
AM
Di Pietro
F
Marzi
S
Masquida
B
Wagner
R
Romby
P
Gualerzi
CO
Pon
CL
The cspA mRNA is a thermosensor that modulates translation of the cold-shock protein CspA
Mol. Cell.
 , 
2010
, vol. 
37
 (pg. 
21
-
33
)
42
Nechooshtan
G
Elgrably-Weiss
M
Sheaffer
A
Westhof
E
Altuvia
S
A pH-responsive riboregulator
Genes Dev.
 , 
2009
, vol. 
23
 (pg. 
2650
-
2662
)
43
Weinberg
Z
Wang
JX
Bogue
J
Yang
J
Corbino
K
Moy
RH
Breaker
RR
Comparative genomics reveals 104 candidate structured RNAs from bacteria, archaea, and their metagenomes
Genome. Biol.
 , 
2010
, vol. 
11
 pg. 
R31
 
44
Weinberg
Z
Ruzzo
WL
Sequence-based heuristics for faster annotation of non-coding RNA families
Bioinformatics
 , 
2006
, vol. 
22
 (pg. 
35
-
39
)
45
Eddy
SR
A probabilistic model of local sequence alignment that simplifies statistical significance estimation
PLoS Comput. Biol.
 , 
2008
, vol. 
4
 pg. 
e1000069
 
46
Eddy
SR
A new generation of homology search tools based on probabilistic inference
Genome Inform.
 , 
2009
, vol. 
23
 (pg. 
205
-
211
)
47
Johnson
LS
Eddy
SR
Portugaly
E
Hidden Markov model speed heuristic and iterative HMM search procedure
BMC Bioinformatics
 , 
2010
, vol. 
11
 pg. 
431
 
48
Genome 10K Community of Scientists, C
Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species
J. Hered.
 , 
2009
, vol. 
100
 (pg. 
659
-
674
)
49
Wu
D
Hugenholtz
P
Mavromatis
K
Pukall
R
Dalin
E
Ivanova
NN
Kunin
V
Goodwin
L
Wu
M
Tindall
BJ
, et al.  . 
A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea
Nature
 , 
2009
, vol. 
462
 (pg. 
1056
-
1060
)
50
Brown
JW
Birmingham
A
Griffiths
PE
Jossinet
F
Kachouri-Lafond
R
Knight
R
Lang
BF
Leontis
N
Steger
G
Stombaugh
J
, et al.  . 
The RNA structure alignment ontology
RNA
 , 
2009
, vol. 
15
 (pg. 
1623
-
1631
)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Comments

0 Comments