Some plant microRNAs have been shown to be de novo generated by inverted duplication from their target genes. Subsequent duplication events potentially generate multigene microRNA families. Within this article we provide supportive evidence for the inverted duplication model of plant microRNA evolution. First, we report that the precursors of four Arabidopsis thaliana microRNA families, miR157, miR158, miR405 and miR447 share nearly identical nucleotide sequences throughout the whole miRNA precursor between the family members. The extent and degree of sequence conservation is suggestive of recent evolutionary duplication events. Furthermore we found that sequence similarities are not restricted to the transcribed part but extend into the promoter regions. Thus the duplication event most probably included the promoter regions as well. Conserved elements in upstream regions of miR163 and its targets were also detected. This implies that the inverted duplication of target genes, at least in certain cases, had included the promoters of the target genes. Sequence conservation within promoters of miRNA families as well as between miRNA and its potential progenitor gene can be exploited for understanding the regulation of microRNA genes.
MicroRNAs (miRNAs) are ∼21 nt long, endogenous non-coding RNAs that regulate a large number of genes at the post-transcriptional level (Bartel, 2004). In plants, miRNAs usually have nearly perfect matches to coding regions of their target messenger RNAs. Binding of miRNAs to these regions leads to transcript cleavage and degradation through the RISC complex (Llave et al., 2002).
The extensive conservation of miRNAs between different species and over long evolutionary distances, such as between grasses and flowering plants, has been reported (Axtell and Bartel, 2005; Xie et al., 2005). Plant miRNA genes are transcribed by RNA polymerase II (Lee et al., 2004). The primary miRNA transcripts (pri-miRNAs) are cleaved by RNase III like enzymes to generate miRNA precursors (pre-miRNAs) and subsequently to mature miRNAs (Kurihara and Watanabe, 2004). Pre-miRNAs contain a hairpin structure which is formed by the miRNA base paired to the respective counterpart miRNA*. In plants, it has been recently shown that pre-miRNAs are more conserved in the miRNA:miRNA* portion of the hairpin while the other part of the hairpin is diverged even among miRNAs from the same microRNA family (Jones-Rhoades and Bartel, 2004).
Currently 117 Arabidopsis thaliana miRNAs have been identified (miRBase, release 8.2, Griffiths-Jones, 2006). These can be classified into 46 miRNA families. A total of 21 are represented by single genes and 25 are defined as multigene families. For 13 (61.9%) single miRNA genes (MIR161, MIR163, MIR170, MIR173, MIR391, MIR400–404 and MIR406–408) no homologous counterpart in Oryza sativa has been described. In contrast for only five (20%) multigene families (MIR157, MIR158, MIR165, MIR405 and MIR447) no homologous counterparts in O.sativa have been reported. We will refer to the latter five miRNA families as non-conserved miRNA families throughout the rest of the article.
Allen et al. (2004) investigated the sequence similarity between 91 Arabidopsis miRNAs and their targets. They found that MIR163 and MIR161 have exhaustive similarities in the two foldback arms to some of their target genes (Allen et al., 2004). In addition, these two miRNA genes are located proximally to some of their target genes. Based on these observations an evolutionary scenario involving an inverted duplication event and active expansion of target gene families has been suggested. After the de novo creation of the miRNA, it may have evolved into a multigene miRNA family by subsequent duplication events. The initial duplication event may have also included the primordial gene promoter (Allen et al., 2004). This scenario has been described as the Inverted Duplication Model for miRNA gene evolution in plants (Allen et al., 2004).
Gene duplication has played a vital role in the evolution of genes (Hurles, 2004). It has been shown that duplication events most probably do not only involve the transcribed part of the respective loci but also include at least part of the promoter regions (Haberer et al., 2004; Hurles, 2004). Regulatory diversification through acquisition or derivation of distinct control elements may trigger divergence of regulatory specificity between closely related family members (Haberer et al., 2004; Xie et al., 2005). Appropriate candidates for the study of the effects of duplication and retention of conserved elements within promoter regions are evolutionary young miRNA families. Having no orthologous counterparts in monocotyledonous plants, as this is the case for non-conserved Arabidopsis miRNA families, is a prime indicator for recently arisen miRNA families.
In this article we report findings on the degree of sequence conservation within non-conserved Arabidopsis multigene miRNA families, namely miR157, miR158, miR165, miR405 and miR447. In addition to the analysis of the transcribed region we subjected the promoter regions of the miRNA families to an analysis of sequence similarity and potentially conserved sequence motifs. We report extensive sequence similarities and short highly conserved sequence motifs between the promoters of miRNA family members. Furthermore, we show that the 5′ upstream sequence of Arabidopsis MIR163 and its target promoters are partially conserved. Our findings are supportive for the inverted duplication model for miRNA gene evolution. Our results and analysis pave the way to a comprehensive in silico assisted study of miRNA promoters and their relationship to their respective target genes.
For promoter comparison and analysis we took 1000 bp upstream from the miRNA precursors. The extent of the miRNA's primary transcript is not known for all miRNA families under investigation. To ensure the comparison of equivalent parts of the upstream sequences we therefore did not exclude the primary transcript from the analysis. However, known primary transcript sequences have been marked in upper case letters and are therefore displayed in dark blue in the graphical output (see Fig. 1 and online Supplementary Material) in contrast to the remaining part of the upstream sequences (lower case letters and light blue). This facilitates to keep track of different parts of the studied sequences.
We took 1000 bp upstream from the assumed transcription start site as a default promoter sequence of target genes. As an evidence for the position of the transcription start site we used full-length cDNAs annotated in the MIPS A.thaliana database (MAtDB; Schoof et al., 2002).
miRNA gene coordinates were obtained from miRBase, release 8.0 (Griffiths-Jones, 2006; ). The upstream sequences of miRNAs and corresponding target genes were retrieved from MAtDB (Schoof et al., 2002; index.html).
Similarity comparisons for the sequences under investigation were carried out using DIALIGN (Morgenstern, 2004), a local alignment tool. Alignment algorithms are not able to detect conserved regions after rearrangement events (e.g. inversions or extensive deletions and insertions). Furthermore these algorithms are developed to align long and less-conserved sequence regions rather than to detect short and highly conserved motifs. To complement this limitation we also used MotifSampler (Thijs et al., 2001), a motif discovery algorithm based on Gibbs sampling, which does not rely on colinearity of conserved sequences and is able to identify very short but highly conserved sequence motifs. MotifSampler is a stochastic algorithm and therefore the results for different runs of the program may vary but tend to cluster at conserved sequence motifs. For that reason we carried out 50 repeated runs of MotifSampler for each analysis and visualized the results as the percentage of runs a nucleotide has been detected as being part of a conserved motif (see Fig. 1 and online Supplementary Material). Both analyses with DIALIGN, and MotifSampler respectively were performed using a modified version of CREDO (Hindemitt and Mayer, 2005; ) a web-based tool that integrates the analysis and results of these two programs amongst others. Detailed results and parameters are available online on .
3 RESULTS AND DISCUSSION
Evolutionary conservation of miRNAs has been exhaustively exploited for the in silico prediction and detection of plant miRNAs (Bonnet et al., 2004; Jones-Rhoades and Bartel, 2004; Wang et al., 2004). Although the majority of Arabidopsis miRNAs are conserved in O. sativa, there are individual Arabidopsis miRNAs (e.g. MIR163) and even miRNA families (MIR157, MIR158, MIR165, MIR405 and MIR447) for which orthologs in O. sativa have not been reported. These non-conserved multigene miRNA families may have evolved and expanded after the separation of the monocotelydoneous and dicotelydoneous plant lineages. Alternatively these individual miRNAs or parts of them have not been retained within O. sativa or the monocotelydoneous lineage in general.
Since non-functional portions of miRNA precursors are likely to be under different evolutionary pressure as compared with the miRNA:miRNA* part, we hypothesize that for miRNA gene families the degree of conservation within the whole miRNA precursor is an indicator for the age of the constituting duplication event. To survey this hypothesis, we analyzed the degree of conservation between precursors of members of the non-conserved multigene families by multiple sequence alignments. The alignments of MIR157a, b, MIR158a, b, MIR405a, b, d and MIR447a, b, c are shown in Supplementary Table S1. In contrast to what has been found for conserved miRNA precursors (Table S2, Jones-Rhoades and Bartel, 2004), sequence identities among the individual miRNA gene family members are not restricted to the miRNA:miRNA* parts but extend strikingly to the remaining stem–loop regions. Conserved miRNA families usually contain diverged stem–loop sequences besides the conserved miRNA:miRNA* fraction (Jones-Rhoades and Bartel, 2004). The exhaustive similarity between individual miRNA family members and extension of this similarity beyond the conserved miRNA:miRNA* fraction support the hypothesis that non-conserved miRNA multigene families arose and evolved by recent duplication events.
For protein-coding genes there is increasing evidence that duplication events are not restricted to transcribed regions only but also involve flanking promoter regions (Haberer et al., 2004; Hurles, 2004). Thus we investigated whether sequence similarities extend to the promoters of the non-conserved multigene families. Upstream promoter sequences from MIR157, MIR158, MIR165, MIR405 and MIR447 precursors were subjected to a similarity and motif analysis by applying DIALIGN and MotifSampler. The results are shown in Figure 1 and in the web Supplementary Material. Within the miR157 family, the upstream regions of 157a and 157b share extensive sequence similarity (Fig. 1). MIR157c and MIR157d show less similarity (15.6% alignable) than is present between MIR157a and MIR157b (44.8% alignable sequence). This observation is in agreement with the observed conservation within the precursors of the MIR157 family. MIR157a and MIR157b share nearly identical stem–loop sequences (Table S1) while MIR157c and MIR157d have only sequence similarity in the area surrounding the miRNA:miRNA* region (data not shown). MIR157a and b are closely located within 7.8 kb on Chromosome 1. An expressed gene At1g66790 lies between MIR157a and b. Two flanking LTR/copia repeat elements indicate that At1g66790 was probably inserted in this genomic region by a transposon. Therefore, it is likely that the insertion had caused the recent duplication which created MIR157a and MIR157b. On the other hand, MIR157c, d are located in segmentally duplicated regions on Chromosome 1 and Chromosome 3, respectively. This segmental duplication event may be older than the duplication event which created MIR157a and b.
Conservation in promoter regions is also observed for the MIR158 family and the MIR165 family (web Supplementary Material). While MIR405b and MIR405d share nearly 50% alignable sequences within their promoter regions (web Supplementary Material), the upstream sequence of MIR405a contains only two alignable regions as identified by DIALIGN, which are not well conserved. The MIR447 family shows the most striking sequence similarity on their promoter sequence. More than 60% promoter sequences are alignable (web Supplementary Material).
Sequence conservation between the MIR gene family members and the extensive sequence similarities within promoter sequences are supportive of comparably recent duplication events which led to the formation of the non-conserved Arabidopsis miRNAs families. With the exception of the regions flanking the miRNA:miRNA*, the precursor sequences of miR165 family are diverged. Nevertheless the upstream sequences within the miR165 family contain significant motifs which are supportive for the common origin of MIR165a and MIR165b (web Supplementary Material).
Recent findings on MIR163 led to the hypothesis that plant miRNAs may be de novo created by inverted duplication of its targets and subsequent evolutionary drift to lead to functional miRNAs in their present form (Allen et al., 2004). Our results given above imply that duplication of miRNAs also involves at least part of the promoters. Thus far it has not been shown whether the initial formation of miRNAs by a mechanism involving inverted duplication also involves duplication of parts of the promoters and to what extent promoter elements and regions are retained between the target gene promoter and the respective miRNA promoter (Allen et al., 2004). In order to answer these questions we aligned the pri-miRNA163 plus 1000 bp of promoter sequence to the respective UTRs and promoter sequences from the miR163 target genes, At1g66690, At1g66700, At1g66720 and At3g44860 (Figure 2). Numerous conserved regions as well as conserved short sequence motifs are present between the 5′ upstream sequences of MIR163 and its target genes (a detailed view is provided in web Supplementary Material). Beside the conserved transcribed region additional conserved regions located within the promoter have been detected. The region depicted in red (Figure 2) harbours the TATA box sequence which can be found in all target genes as well as for MIR163. Our findings on the localisation of the TATA box are consistent with recent results published by Xie et al. (2005). Many additional promoter regions which are shared between the target genes and MIR163 have been detected (Fig. 2). These regions have supposedly been retained since the initial inverted duplication event during the formation of MIR163. Thus they most likely represent functionally important regions which might be important for correct transcriptional regulation of MIR163 as well as the target genes. However, further experiments are needed to elucidate the functionality of these motifs.
In contrast to MIR163, we could not detect any significant motifs for MIR161 and its target genes, although the precursor of MIR161 has also been shown to have similarity to its target genes (Allen et al., 2004). We also performed the same promoter analysis for non-conserved miRNA multigene families MIR157, MIR158, MIR165 and their corresponding targets as listed in ASRP Database (Gustafson et al., 2005). The results are included in the web Supplementary Material. No significant sequence similarities were detected by DiAlign for MIR157, MIR158 and MIR165 and their targets, respectively. We then compared the similarity of the promoter regions of non-conserved miRNA families with those of all miRNA families, non-conserved miRNA families and their targets. We used the similarity of the promoter regions of all miRNA families and random promoter regions as the control group. The similarity of the promoter regions of non-conserved families is considerably higher than other groups (Supplementary Figure S1), which is supportive of our hypothesis that the duplication of non-conserved miRNAs had included the promoter regions of miRNAs.
The role of miRNAs as a part of gene regulatory machinery has been demonstrated by recent intensive research (Bartel, 2004; Baulcombe, 2005). However, the transcriptional regulation of miRNA genes themselves is less well investigated. Our analysis, which detected similarities within the precursor as well as the conserved regions within the promoter sequences, provides additional evidence to support the inverted duplication model for plant microRNA evolution. In addition, the detected conserved miRNA upstream motifs represent prime candidates to study the regulation of Arabidopsis MIR157, 158, 163, 165, 405 and 447. Our finding suggests that the initial inverted duplication event, which had created miRNA from its targets, has included the promoter region of target genes. For expanded miRNA families regulatory diversification through modulation of cis regulatory elements might be a mechanism leading to fixation of sequence redundant miRNA genes. In analogy to duplicated protein-coding genes (Haberer et al., 2004) promoter elements seem to be shared within paralogous MIR genes. This permits to sketch a powerful in silico approach which might be considered for the experimental and in silico analysis of microRNA promoters.
The authors would like to thank Georg Haberer for his insightful discussion, Louise Riley and Sindy Neumann for proofreading the manuscript, Manuel Spannagl for technical assistance on sequence retrieval and anonymous reviewers for the comments which greatly improved the manuscript.
Conflict of Interest: none declared.