Abstract

MicroRNAs (miRNAs) are short, non-protein coding RNAs that direct the widespread phenomenon of post-transcriptional regulation of metazoan genes. The mature ∼22-nt long RNA molecules are processed from genome-encoded stem-loop structured precursor genes. Hundreds of such genes have been experimentally validated in vertebrate genomes, yet their discovery remains challenging, and substantially higher numbers have been estimated. The miROrtho database (http://cegg.unige.ch/mirortho) presents the results of a comprehensive computational survey of miRNA gene candidates across the majority of sequenced metazoan genomes. We designed and applied a three-tier analysis pipeline: (i) an SVM-based ab initio screen for potent hairpins, plus homologs of known miRNAs, (ii) an orthology delineation procedure and (iii) an SVM-based classifier of the ortholog multiple sequence alignments. The web interface provides direct access to putative miRNA annotations, ortholog multiple alignments, RNA secondary structure conservation, and sequence data. The miROrtho data are conceptually complementary to the miRBase catalog of experimentally verified miRNA sequences, providing a consistent comparative genomics perspective as well as identifying many novel miRNA genes with strong evolutionary support.

INTRODUCTION

MicroRNAs (miRNAs) represent an abundant class of short non-protein coding RNAs that direct post-transcriptional regulation of metazoan genes through repression of mRNA translation or transcript degradation. Since their initial discovery in Caenorhabditis elegans, the roles of miRNAs have been recognized as a widespread phenomenon, implicated in processes such as cell differentiation and cancer (1–6). Intensive studies have begun to unravel the mechanisms and characteristics of these single-stranded, ∼22-nt long RNA molecules that are processed from genome-encoded precursor genes with a defining stem-loop RNA structure. Nevertheless, the discovery and characterization of novel miRNA genes have proved to be challenging both experimentally and computationally, and the miRNA gene repertoire therefore remains largely unexplored. The human genome tops the fast growing number of miRNA genes, with several hundreds now cataloged in the miRBase database of published miRNA sequences (7) and many more estimated (8,9).

The high-throughput experimental approaches usually identify only the short mature segments of the miRNA genes along with other types of endogenous small RNAs (10,11) and degradation products of mRNAs or structural RNAs. Robust computational post-processing of the experimentally derived sequences is therefore essential to identify the underlying miRNA genes. The widely applied discriminatory requirement of a characteristic stem-loop structure for the putative precursor is, however, insufficient as hairpin structures are common in eukaryotic genomes and are not a unique feature of miRNAs (12). Nonetheless, the rapid accumulation of genome-wide sequencing data provides another line of evolutionary evidence from comparative sequence analyses.

Computational screening methods that rely heavily on sequence conservation criteria, such as MirScan (13), were among the first to appear. These characteristically exhibit high specificity [e.g. predicting 35 new miRNA candidates in C. elegans (13) and 107 in human (14), many of which were experimentally confirmed], but their sensitivity, the ability to predict novel or divergent homologs in other organisms, is low. Methods that relax sequence conservation requirements in favor of conservation patterns specific to miRNAs (such as a more diverged loop sequence and a more conserved hairpin stem) gained substantially higher sensitivity, e.g. Snarloop has been used to predict 214 candidate miRNAs in C. elegans (15) and miRSeeker (16) to predict 48 candidate miRNAs in Drosophila melanogaster. A similar approach was proposed that takes into account the shapes of conservation patterns of known miRNAs, e.g. phylogenetic shadowing (17,18). The first 7 nt from the second position of the 5′-end of the mature miRNA, termed the seed sequence, are presumed to be critical for the interaction between the miRNA and its targets (19–22). The intra-species abundance or inter-species conservation of such potential seeds have also been proposed as alternative starting points for miRNA gene hunting (23,24).

Secondary structure thermodynamic stability is another important characteristic that can be used to distinguish miRNAs from other hairpins (25). The recently developed software RNAz combines thermodynamic stability and conservation of secondary structure to predict non-coding RNAs (26) from multiple alignments of orthologous regions. Methods relying on phylogenetic conservation of miRNA structure and sequence are by definition restricted in terms of their predictive power. To overcome this limitation, several groups have developed ab initio approaches (12,27–32) to predict novel, non-conserved genes. However, these approaches often suffer from high rates of false positives.

Aiming to fuel further studies of microRNA’omes, we present here the database of computationally derived miRNA gene candidates using a novel comparative genomics approach coupled with machine-learning techniques that we consistently applied to a comprehensive set of available metazoan genomes. The three-tier pipeline consists of: (i) a custom designed SVM-based ab initio predictor, plus screening for known miRNA homologs, (ii) an orthology delineation procedure and (iii) an SVM-based classifier of the multiple sequence alignments of the putative orthologs. These data are conceptually complementary to the miRBase catalog of experimentally verified miRNA sequences (7). High-throughput experimental exploration of small RNAs requires rigorous follow-up bioinformatic analyses to claim evidence of microRNA genes. Decoupling experimental and bioinformatics approaches, the miROrtho data effectively provide independent supporting evidence for the numerous ongoing experimental interrogations of microRNA’omes.

MATERIAL AND METHODS

Ab initio predictors

The first tier of our analysis pipeline is a novel ab initio miRNA prediction procedure. We scanned the genomic sequences using RNALfold (33) for locally stable hairpins characteristic of miRNA precursors, requiring a length of 60–120 nt, a minimum free-folding energy less than −15 kcal/mol, a stem of 20–60 base pairs, a maximal interior loop size of 8 nt, and a maximum bulge loop size of 5 nt. The loop, however, was allowed to include short stem-loops e.g. hsa-let-7b. Those properties accommodate the vast majority of experimentally validated miRNAs (although there are exceptions, e.g. dme-mir-31b and dme-mir-1017). As stem-loop structures are abundant and not exclusive to miRNA genes, this step yields hundreds of millions of candidates: 1.3 million for the ∼170 Mb genome of fruitfly Drosophila melanogaster. The availability of many experimentally validated miRNAs revealed that although there are biases in biophysical properties of miRNA stem-loops in comparison to non-miRNA sequences, such as higher thermodynamic stability (25), no clear discriminatory features have yet been identified. We investigated a number of the most discriminating features, such as the minimum free-energy index (34) or the mean base pair distance in the ensemble of structures, and trained an SVM (support vector machine) classifier using LIBSVM (http://www.csie.ntu.edu.tw/~cjlin/libsvm). The total number of features used for this first SVM was 253. The radial basis function kernel (RBF) was used on 1000 experimentally verified animal pre-miRNAs from miRBase (7) and a negative set of 3000 potent stem-loops from other confirmed ncRNAs [Rfam (35)]. Optimal parameters for the RBF kernel (C-SVC c = 2.0, gamma = 0.03 125) were estimated using a heuristic approach implemented in grid.py, which is a part of the LIBSVM package. A non-redundant training dataset was compiled using CD-Hit-EST (36) at a cutoff of 90% sequence identity. We tested the performance of the SVM on a test set of 237 miRNA sequences and 568 non-miRNA stem-loops which where not used for training the SVM model. Using the SVM posterior probability cutoff at 0.5, the accuracy was estimated to be 95.03%, the area under the ROC curve (receiver operating characteristic) was 0.984, corresponding to a sensitivity and specificity of 0.84 and 0.97, respectively. Using a 10-fold cross-validation procedure on the training data, we received an average AUC (area under the ROC curve) of 0.982. If the potent hairpins had >70% sequence overlap at the same locus, the one with the lower SVM score was discarded.

This single sequence SVM filter allows the space of likely candidates to be reduced by about 95%, yet still yields rather high numbers of gene candidates: 42 000 for D. melanogaster. The miRNA structure itself is likely to contribute to these elevated numbers: miRNAs have complementary arms in their stem-loop structure and the reverse complement of a precursor often also folds into a stable RNA hairpin. Nevertheless, we did not explicitly require a choice between the sense and the anti-sense candidates (if both of them passed the other filters) as there is evidence of miRNA loci with both strands yielding a functional miRNA, e.g. dme-mir-iab-4 and dme-mir-iab-4as.

Homology-based predictor

Screening for homologs of currently known miRNAs (miRBase 11.0) captures putative miRNAs that either did not pass the stem-loop screen, e.g. 13 (8%) of known D. melanogaster miRNAs, or failed the ab initio SVM classification, another 19 (13%). Our procedure initially performs a WU-BLAST (http://blast.wustl.edu) search using the default parameters, plus the DUST filter and the hspsepSmax = 30 option, which defines the maximal separating distance between two high score pairs to allow for a varying loop while still matching the better conserved 5′ and 3′ arms. Next, blast hits longer than 20 nt are extended at both ends to match the length of the query sequence. These hits are further filtered using a minimum free energy filter (≤−15 kcal/mol) and a RANDFOLD (25) filter (P ≤ 0.05 on 100 sequence randomizations). We investigated the RNAshapes (37) filter, which predicts the probability of a sequence to fold into a simple stem-loop like structure, but it was not employed as several known miRNAs, e.g. hsa-let-7a-1, would not pass the filter. The candidate miRNAs were then aligned to the query sequence using MAFFT (38) and the conservation of the seed region was calculated by mapping the known mature miRNA region on the query miRNA to the alignment. The hits were then tested for the following criteria: a 100% conserved seed region, >90% conservation of the putative mature part, and a total hairpin identity >65%. As close paralogs (like hsa-let-7, mmu-let-7, etc) can map to the same locus when searched again one genome (e.g. the chimp), the matches were then clustered using GALAXY (http://main.g2.bx.psu.edu) and choosing one representative with the lowest e-value of all queries.

Orthology delineation

Groups of likely orthologous genes were automatically identified using a strategy employed previously for protein-coding genes (39) based on all-against-all sequence comparisons using the ParAlign algorithm (40) with NT2 substitution matrix; followed by clustering of best reciprocal hits (BRHs) from highest scoring ones to 10−6e-value cutoff for triangulating BRHs or 10−10 cutoff for unsupported BRHs, and requiring a sequence alignment overlap of at least 20 nt across all members of a group. Furthermore, the orthologous groups were expanded by genes that are more similar to each other within a genome than to any gene in any of the other species, and by very similar copies that share over 97% sequence identity, which were identified initially using CD-Hit (36). The orthology filter allowed us to reduce the space of the miRNA candidates by a further 92%. Passing the orthology filter provides evolutionary support for the predicted miRNAs; however, detailed inspection highlighted the need for further rigorous sequence classification to remove questionable predictions.

Multi-species conservation classifier

We further analyzed the R-COFFEE (41) multiple sequence alignments of orthologous groups of putative miRNA sequences. From the alignments we gathered the 13 most descriptive features for conservation properties of sequence, energy and structures such as: GC content, number of taxa, mean pairwise sequence identity, number of consistent mutations, conservation of the mature part, etc. Those descriptors were chosen among a larger set of features, in order to optimally describe the typical conservation profile of a miRNA gene family and to reduce false positive predictions. Alignments that mapped to at least one known miRNA from miRBase 11.0 were used as the positive training and testing sets (344 and 100 alignments, respectively). Among those alignments which did not map to any known miRNA family, we randomly selected (with manual checking) the negative training and testing sets (344 and 100 alignments, respectively). The GIST SVM software package (http://www.cs.columbia.edu/compbio) was used for training, testing and classification using the default parameter. The final set of newly predicted miRNAs based on the alignment SVM was selected from all alignments which had SVM score ⩾ 0.5, a 100% conserved seed, a mature part >90% conserved and having representatives in at least four taxa. Performance estimation of the alignment SVM on the independent test set showed an accuracy of 91%, with the area under the ROC curve (AUC) of 0.97, and sensitivity and specificity of 0.9 and 0.92, respectively. The AUC for the 10-fold cross validation using the training data averaged to 0.998. The alignment SVM filter allowed us to reduce the space of the miRNA candidates by a further 98%, followed limited manual curation of novel miRNA candidates. We further analyzed the multiple alignments of novel miRNAs (without known homologs) to predict the mature part using a sliding 23-nt long sliding window and scanning for the region with the highest information content in the 5′ or the 3′ arms. The predictions, however, should be taken with caution without further experimental support.

DATABASE CONTENT

The miROrtho database (http://cegg.unige.ch/mirortho) presents computationally predicted putative miRNA genes for a comprehensive set of sequenced animal genomes (selection of genomes in Table 1), employing an in-house developed pipeline combining SVM-based classifiers and orthology delineation procedure adapted from OrthoDB (39). The alignments shown on the website were calculated using R-COFFEE (41), which combines MUSCLE (42), Probcons4RNA (43), MAFFT (38) and the secondary structures predicted by RNAplfold (33). Based on these alignments consensus secondary structures color-coded according to consistent/compensatory mutation were calculated using RNAalifold (44) which incorporates a ribosome scoring matrix suited for aligned RNA sequences. The database aims to provide a comprehensive comparative perspective on the animal repertoire of miRNA genes with direct reference to the putative ortholog multiple alignments, RNA secondary structure conservation, etc. As there seem to be numerous lineage specific miRNAs and miRNA-like sequences that are difficult to differentiate without experimental evidence, we see miROrtho as complementary to miRBase, the repository of experimentally verified miRNA sequences. Overall, miROrtho contains 7887 putative miRNA genes that are homologous to known miRNAs in miRBase 11.0, and 1437 confident predictions that are as yet without experimental support or homology to known miRNAs. Most experimental surveys provide support for mature miRNA sequences, while the identities of the underlying miRNA precursor genes remain somewhat uncertain. In contrast, computational procedures rely on recognizing characteristic sequence and structural properties of the precursors, where even approximate prediction of mature miRNAs is rarely possible. This complementarity extends further, where computational predictions at different stringencies can either be used to prioritize experimental verification, or as direct independent support of miRNAs identified through high throughput experimental screens. Although miRBase accepts annotation of very close homologs of experimentally supported miRNAs, the comparative perspective is heavily biased towards favorite experimental model species. Such a bias is avoided in miROrtho through the consistent application of the same procedures across all the available genomes, delineating groups of orthologous miRNAs over distantly related organisms. The miROrtho methodology has also been applied to the task of miRNA gene annotation in a number of ongoing initial genome analyses, and this database will provide the supporting information for these predictions.

Table 1.

Analyzed genomes

Species name Abbreviation Size (Mb) Number of miRNA genes
 
Source 
   Homologsa Newb miRBase 11.0  
Aedes aegypti Aaeg 1384 58 AaegL1 
Anopheles gambiae Agam 273 55 45 AgamP3 
Apis mellifera Amel 235 60 54 Amel_4.0 
Bombyx mori Bmor 397 33 21 SW_scaffold_ge2k 
Caenorhabditis elegans Cele 100 149 154 WB170 
Canis familiaris Cfam 2532 383 138 203 CanFam 2.0 
Ciona intestinalis Cint 173 25 34 JGI2 
Danio rerio Drer 1626 324 22 337 ZFISH6 
Drosophila ananassae Dana 230 108 12 CAF1 
Drosophila erecta Dere 152 136 16 CAF1 
Drosophila grimshawi Dgri 200 110 13 CAF1 
Drosophila melanogaster Dmel 129 153 15 152 CAF1 
Drosophila mojavensis Dmoj 194 98 14 CAF1 
Drosophila persimilis Dper 188 108 16 CAF1 
Drosophila pseudoobscura Dpse 153 106 15 76 CAF1 
Drosophila sechellia Dsec 167 139 16 CAF1 
Drosophila simulans Dsim 142 131 15 CAF1 
Drosophila virilis Dvir 206 101 14 CAF1 
Drosophila willistoni Dwil 237 112 12 CAF1 
Drosophila yakuba Dyak 169 135 16 CAF1 
Gallus gallus Ggal 1100 168 49 149 WASHUC2 
Gasterosteus aculeatus Gacu 462 320 12 BROAD S1 
Homo sapiens Hsap 3665 626 151 678 NCBI36 
Macaca mulatta Mmul 3097 530 145 464 MMUL_1 
Monodelphis domestica Mdom 3606 205 82 119 monDom5 
Mus musculus Mmus 2661 505 117 472 NCBIM36 
Ornithorhynchus anatinus Oana 2073 207 57 Oana-5.0 
Pan troglodytes Ptro 3524 546 147 100 PanTro 2.1 
Rattus norvegicus Rnor 2719 440 110 287 RGSC 3.4 
Strongylocentrotus purpuratus Surc 907 13 Spur_v2.1 
Takifugu rubripes Trub 393 250 13 131 FUGU4 
Tetraodon nigroviridis Tnig 402 282 14 132 TETRAODON7 
Tribolium castaneum Tcas 200 37 Tcas_2.0 
Xenopus tropicalis Xtro 1511 351 24 184 JGI4.1 
Species name Abbreviation Size (Mb) Number of miRNA genes
 
Source 
   Homologsa Newb miRBase 11.0  
Aedes aegypti Aaeg 1384 58 AaegL1 
Anopheles gambiae Agam 273 55 45 AgamP3 
Apis mellifera Amel 235 60 54 Amel_4.0 
Bombyx mori Bmor 397 33 21 SW_scaffold_ge2k 
Caenorhabditis elegans Cele 100 149 154 WB170 
Canis familiaris Cfam 2532 383 138 203 CanFam 2.0 
Ciona intestinalis Cint 173 25 34 JGI2 
Danio rerio Drer 1626 324 22 337 ZFISH6 
Drosophila ananassae Dana 230 108 12 CAF1 
Drosophila erecta Dere 152 136 16 CAF1 
Drosophila grimshawi Dgri 200 110 13 CAF1 
Drosophila melanogaster Dmel 129 153 15 152 CAF1 
Drosophila mojavensis Dmoj 194 98 14 CAF1 
Drosophila persimilis Dper 188 108 16 CAF1 
Drosophila pseudoobscura Dpse 153 106 15 76 CAF1 
Drosophila sechellia Dsec 167 139 16 CAF1 
Drosophila simulans Dsim 142 131 15 CAF1 
Drosophila virilis Dvir 206 101 14 CAF1 
Drosophila willistoni Dwil 237 112 12 CAF1 
Drosophila yakuba Dyak 169 135 16 CAF1 
Gallus gallus Ggal 1100 168 49 149 WASHUC2 
Gasterosteus aculeatus Gacu 462 320 12 BROAD S1 
Homo sapiens Hsap 3665 626 151 678 NCBI36 
Macaca mulatta Mmul 3097 530 145 464 MMUL_1 
Monodelphis domestica Mdom 3606 205 82 119 monDom5 
Mus musculus Mmus 2661 505 117 472 NCBIM36 
Ornithorhynchus anatinus Oana 2073 207 57 Oana-5.0 
Pan troglodytes Ptro 3524 546 147 100 PanTro 2.1 
Rattus norvegicus Rnor 2719 440 110 287 RGSC 3.4 
Strongylocentrotus purpuratus Surc 907 13 Spur_v2.1 
Takifugu rubripes Trub 393 250 13 131 FUGU4 
Tetraodon nigroviridis Tnig 402 282 14 132 TETRAODON7 
Tribolium castaneum Tcas 200 37 Tcas_2.0 
Xenopus tropicalis Xtro 1511 351 24 184 JGI4.1 

aHomologs to miRBase 11.0 miRNAs.

bNew predictions that do not show any homology to any annotated miRNA.

It should be noted that there is still no defining feature that clearly discriminates between bona fide miRNA precursors and other abundant genomic sequences capable of similar hairpin folding. Classification filters will therefore inevitably suffer from false negatives and false positives (see Materials and Methods section for estimates), leading to errors at each step along the pipeline. Even the most inclusive initial screen for locally stable stem-loop structures misses some miRNAs reported in miRBase as experimentally validated (e.g. dme-mir-1017). Despite the strict 97% specificity of our ab initio SVM, the abundance of false positives is clear and overloads the orthology filter. Computational methods developed for miRNA gene discovery are constantly improving, and will continue to do so as our knowledge of experimentally validated miRNAs grows.

WEB INTERFACE

The miROrtho database presents all predicted miRNA genes within the context of family groups of orthologous miRNAs. For each such family, we provide (Figure 1): (i) a table of annotated miRNA names and genomic coordinates, (ii) a multiple alignment of the miRNA sequences displaying RNA structure conservation, (iii) the minimum energy consensus miRNA hairpin fold, (iv) FASTA sequences and multiple alignment files. Color coding of the alignments and the depicted folds enables clear visualization of compensatory and consistent mutations within a given miRNA family. The mature miRNA sequences are underlined: as annotated in miRBase for known miRNAs or as predicted for novel families. Furthermore, we provide detailed folding information of individual pre-miRNAs including minimum free energy folding, the partition function folding and the centroid structure of the stem-loop. Three images show the secondary structure of a single pre-miRNA with the mature part annotated in red, color-coded according to base pairing probabilities and positional entropy per position. The data can be browsed by the species tree, or can be queried by annotation such as known families (e.g. let-7), identifiers or chromosomes. The predictions can be also searched by sequence homology using WU-BLAST (http://blast.wustl.edu).

Figure 1.

miROrtho screenshot showing a novel miRNA gene family. The results page consists of three parts: (i) a table with detailed information about the individual miRNAs; (ii) a multiple sequence alignment with the consensus secondary structure displayed above in dot-bracket format and conservation profile bars displayed below, with the sequence of the mature miRNAs underlined; (iii) the consensus secondary structure of the orthologous sequences. Both alignment and consensus secondary structure are color-coded according to consistent and compensatory base changes.

Figure 1.

miROrtho screenshot showing a novel miRNA gene family. The results page consists of three parts: (i) a table with detailed information about the individual miRNAs; (ii) a multiple sequence alignment with the consensus secondary structure displayed above in dot-bracket format and conservation profile bars displayed below, with the sequence of the mature miRNAs underlined; (iii) the consensus secondary structure of the orthologous sequences. Both alignment and consensus secondary structure are color-coded according to consistent and compensatory base changes.

FUNDING

Swiss National Science Foundation (SNF PDFMA3-118375 and 3100A0-112588). Funding for open access charges: Swiss National Science Foundation (SNF 3100A0-112588).

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

We thank R.M. Waterhouse for help with the article, and the Vital-IT facility (http://www.vital-it.ch/vitalit-intro.htm). We would also like to acknowledge the sequencing centers that made the genome sequences that were used for this study, available before publication: The Baylor College of Medicine (www.hgsc.bcm.tmc.edu), the Washington University School of Medicine (genome.wustl.edu), the Broad Institute (www.broad.mit.edu), the J. Craig Venter Institute (www.jcvi.org), the DOE Joint Genome Institute (www.jgi.doe.gov), the Sanger Center (www.sanger.ac.uk), the Institute for Genomic Research (www.tigr.org), Celera Genomics (www.celera.com), and Genoscope (www.genoscope.cns.fr).

REFERENCES

1
Ambros
V
The functions of animal microRNAs
Nature
 , 
2004
, vol. 
431
 (pg. 
350
-
355
)
2
Bartel
DP
MicroRNAs: genomics, biogenesis, mechanism, and function
Cell
 , 
2004
, vol. 
116
 (pg. 
281
-
297
)
3
Du
T
Zamore
PD
microPrimer: the biogenesis and function of microRNA
Development
 , 
2005
, vol. 
132
 (pg. 
4645
-
4652
)
4
Calin
GA
Croce
CM
MicroRNA signatures in human cancers
Nat. Rev. Cancer
 , 
2006
, vol. 
6
 (pg. 
857
-
866
)
5
Zhang
B
Pan
X
Cobb
GP
Anderson
TA
microRNAs as oncogenes and tumor suppressors
Dev. Biol.
 , 
2007
, vol. 
302
 (pg. 
1
-
12
)
6
Barbarotto
E
Schmittgen
TD
Calin
GA
MicroRNAs and cancer: profile, profile, profile
Int. J. Cancer
 , 
2008
, vol. 
122
 (pg. 
969
-
977
)
7
Griffiths-Jones
S
Saini
HK
van Dongen
S
Enright
AJ
miRBase: tools for microRNA genomics
Nucleic Acids Res.
 , 
2008
, vol. 
36
 (pg. 
D154
-
D158
)
8
Miranda
KC
Huynh
T
Tay
Y
Ang
YS
Tam
WL
Thomson
AM
Lim
B
Rigoutsos
I
A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes
Cell
 , 
2006
, vol. 
126
 (pg. 
1203
-
1217
)
9
Berezikov
E
van Tetering
G
Verheul
M
van de Belt
J
van Laake
L
Vos
J
Verloop
R
van de Wetering
M
Guryev
V
Takada
S
, et al.  . 
Many novel mammalian microRNA candidates identified by extensive cloning and RAKE analysis
Genome Res.
 , 
2006
, vol. 
16
 (pg. 
1289
-
1298
)
10
Kim
VN
Nam
JW
Genomics of microRNA
Trends Genet.
 , 
2006
, vol. 
22
 (pg. 
165
-
173
)
11
Aravin
A
Tuschl
T
Identification and characterization of small RNAs involved in RNA silencing
FEBS Lett.
 , 
2005
, vol. 
579
 (pg. 
5830
-
5840
)
12
Bentwich
I
Avniel
A
Karov
Y
Aharonov
R
Gilad
S
Barad
O
Barzilai
A
Einat
P
Einav
U
Meiri
E
, et al.  . 
Identification of hundreds of conserved and nonconserved human microRNAs
Nat. Genet.
 , 
2005
, vol. 
37
 (pg. 
766
-
770
)
13
Lim
LP
Lau
NC
Weinstein
EG
Abdelhakim
A
Yekta
S
Rhoades
MW
Burge
CB
Bartel
DP
The microRNAs of Caenorhabditis elegans
Genes Dev.
 , 
2003
, vol. 
17
 (pg. 
991
-
1008
)
14
Lim
LP
Glasner
ME
Yekta
S
Burge
CB
Bartel
DP
Vertebrate microRNA genes
Science
 , 
2003
, vol. 
299
 pg. 
1540
 
15
Grad
Y
Aach
J
Hayes
GD
Reinhart
BJ
Church
GM
Ruvkun
G
Kim
J
Computational and experimental identification of C. elegans microRNAs
Mol. Cell
 , 
2003
, vol. 
11
 (pg. 
1253
-
1263
)
16
Lai
EC
Tomancak
P
Williams
RW
Rubin
GM
Computational identification of Drosophila microRNA genes
Genome Biol.
 , 
2003
, vol. 
4
 pg. 
R42
 
17
Boffelli
D
McAuliffe
J
Ovcharenko
D
Lewis
KD
Ovcharenko
I
Pachter
L
Rubin
EM
Phylogenetic shadowing of primate sequences to find functional regions of the human genome
Science
 , 
2003
, vol. 
299
 (pg. 
1391
-
1394
)
18
Berezikov
E
Guryev
V
van de Belt
J
Wienholds
E
Plasterk
RH
Cuppen
E
Phylogenetic shadowing and computational identification of human microRNA genes
Cell
 , 
2005
, vol. 
120
 (pg. 
21
-
24
)
19
Lewis
BP
Shih
IH
Jones-Rhoades
MW
Bartel
DP
Burge
CB
Prediction of mammalian microRNA targets
Cell
 , 
2003
, vol. 
115
 (pg. 
787
-
798
)
20
Doench
JG
Sharp
PA
Specificity of microRNA target selection in translational repression
Genes Dev.
 , 
2004
, vol. 
18
 (pg. 
504
-
511
)
21
Brennecke
J
Stark
A
Russell
RB
Cohen
SM
Principles of microRNA-target recognition
PLoS Biol.
 , 
2005
, vol. 
3
 pg. 
e85
 
22
Stark
A
Brennecke
J
Russell
RB
Cohen
SM
Identification of Drosophila MicroRNA targets
PLoS Biol.
 , 
2003
, vol. 
1
 pg. 
E60
 
23
Xie
X
Lu
J
Kulbokas
EJ
Golub
TR
Mootha
V
Lindblad-Toh
K
Lander
ES
Kellis
M
Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals
Nature
 , 
2005
, vol. 
434
 (pg. 
338
-
345
)
24
Weaver
DB
Anzola
JM
Evans
JD
Reid
JG
Reese
JT
Childs
KL
Zdobnov
EM
Samanta
MP
Miller
J
Elsik
CG
Computational and transcriptional evidence for microRNAs in the honey bee genome
Genome Biol.
 , 
2007
, vol. 
8
 pg. 
R97
 
25
Bonnet
E
Wuyts
J
Rouze
P
Van de Peer
Y
Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences
Bioinformatics
 , 
2004
, vol. 
20
 (pg. 
2911
-
2917
)
26
Washietl
S
Hofacker
IL
Stadler
PF
Fast and reliable prediction of noncoding RNAs
Proc. Natl Acad. Sci. USA
 , 
2005
, vol. 
102
 (pg. 
2454
-
2459
)
27
Sewer
A
Paul
N
Landgraf
P
Aravin
A
Pfeffer
S
Brownstein
MJ
Tuschl
T
van Nimwegen
E
Zavolan
M
Identification of clustered microRNAs using an ab initio prediction method
BMC Bioinformatics
 , 
2005
, vol. 
6
 pg. 
267
 
28
Xue
C
Li
F
He
T
Liu
GP
Li
Y
Zhang
X
Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine
BMC Bioinformatics
 , 
2005
, vol. 
6
 pg. 
310
 
29
Nam
JW
Kim
J
Kim
SK
Zhang
BT
ProMiR II: a web server for the probabilistic prediction of clustered, nonclustered, conserved and nonconserved microRNAs
Nucleic Acids Res.
 , 
2006
, vol. 
34
 (pg. 
W455
-
W458
)
30
Helvik
SA
Snove
O.
Jr.
Saetrom
P
Reliable prediction of Drosha processing sites improves microRNA gene prediction
Bioinformatics
 , 
2007
, vol. 
23
 (pg. 
142
-
149
)
31
Ng
KL
Mishra
SK
De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures
Bioinformatics
 , 
2007
, vol. 
23
 (pg. 
1321
-
1330
)
32
Jiang
P
Wu
H
Wang
W
Ma
W
Sun
X
Lu
Z
MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features
Nucleic Acids Res.
 , 
2007
, vol. 
35
 (pg. 
W339
-
W344
)
33
Hofacker
IL
Priwitzer
B
Stadler
PF
Prediction of locally stable RNA secondary structures for genome-wide surveys
Bioinformatics
 , 
2004
, vol. 
20
 (pg. 
186
-
190
)
34
Zhang
BH
Pan
XP
Cox
SB
Cobb
GP
Anderson
TA
Evidence that miRNAs are different from other RNAs
Cell Mol. Life Sci.
 , 
2006
, vol. 
63
 (pg. 
246
-
254
)
35
Griffiths-Jones
S
Moxon
S
Marshall
M
Khanna
A
Eddy
SR
Bateman
A
Rfam: annotating non-coding RNAs in complete genomes
Nucleic Acids Res.
 , 
2005
, vol. 
33
 (pg. 
D121
-
D124
)
36
Li
W
Godzik
A
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences
Bioinformatics
 , 
2006
, vol. 
22
 (pg. 
1658
-
1659
)
37
Steffen
P
Voss
B
Rehmsmeier
M
Reeder
J
Giegerich
R
RNAshapes: an integrated RNA analysis package based on abstract shapes
Bioinformatics
 , 
2006
, vol. 
22
 (pg. 
500
-
503
)
38
Katoh
K
Toh
H
Recent developments in the MAFFT multiple sequence alignment program
Brief Bioinform.
 , 
2008
, vol. 
9
 (pg. 
286
-
298
)
39
Kriventseva
EV
Rahman
N
Espinosa
O
Zdobnov
EM
OrthoDB: the hierarchical catalog of eukaryotic orthologs
Nucleic Acids Res.
 , 
2008
, vol. 
36
 (pg. 
D271
-
D275
)
40
Saebo
PE
Andersen
SM
Myrseth
J
Laerdahl
JK
Rognes
T
PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology
Nucleic Acids Res.
 , 
2005
, vol. 
33
 (pg. 
W535
-
W539
)
41
Wilm
A
Higgins
DG
Notredame
C
R-Coffee: a method for multiple alignment of non-coding RNA
Nucleic Acids Res.
 , 
2008
, vol. 
36
 pg. 
e52
 
42
Edgar
RC
MUSCLE: a multiple sequence alignment method with reduced time and space complexity
BMC Bioinformatics
 , 
2004
, vol. 
5
 pg. 
113
 
43
Do
CB
Mahabhashyam
MS
Brudno
M
Batzoglou
S
ProbCons: Probabilistic consistency-based multiple sequence alignment
Genome Res.
 , 
2005
, vol. 
15
 (pg. 
330
-
340
)
44
Hofacker
IL
Fekete
M
Stadler
PF
Secondary structure prediction for aligned RNA sequences
J. Mol. Biol.
 , 
2002
, vol. 
319
 (pg. 
1059
-
1066
)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Comments

0 Comments