Abstract

Motivation

MicroRNAs (miRNAs) are short (∼24nt), non-coding RNAs, which downregulate gene expression in many species and physiological processes. Many details regarding the mechanism which governs miRNA-mediated repression continue to elude researchers.

Results

We elucidate the interplay between the coding sequence and the 3′UTR, by using elastic net regularization and incorporating translation-related features to predict miRNA-mediated repression. We find that miRNA binding sites at the end of the coding sequence contribute to repression, and that weak binding sites are linked to effective de-repression, possibly as a result of competing with stronger binding sites. Furthermore, we propose a recycling model for miRNAs dissociated from the open reading frame (ORF) by traversing ribosomes, explaining the observed link between increased ribosome density/traversal speed and increased repression. We uncover a novel layer of interaction between the coding sequence and the 3′UTR (untranslated region) and suggest the ORF has a larger role than previously thought in the mechanism of miRNA-mediated repression.

Availability and implementation

The code is freely available at https://github.com/aescrdni/miRNA_model.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

A MicroRNA (miRNA) is a short, non-coding, single-stranded RNA with a central role in post-transcriptional regulation (Bartel, 2018). miRNAs bind to mRNAs, lncRNAs (Jalali et al., 2013) and other ceRNAs (reviewed in Thomson et al., 2016); this binding can be canonical, i.e. using base-pairing complementarity to a certain sequence at the 5′ end of the miRNA, called the seed; or non-canonical, i.e. incorporating mismatches and bulges, and/or being complementary to other regions of the miRNA (Bartel, 2009). The miRNA is inserted into the RNA-induced silencing complex (RISC) and guides it to its target mRNA, resulting in gene silencing by mRNA degradation, translation inhibition or both. The first miRNA binding sites discovered resided in the 3′UTR of the mRNA; later studies focused mainly on this region, although functional sites in the ORF are also known to exist (Lewis et al., 2005). Because of their importance as RNAi instigators, and their link to a wide array of diseases, understanding miRNA action determinants and predicting their influence on gene expression is of a particular interest.

Broadly speaking, miRNA prediction models can be divided into two types: (a) models which predict miRNA target sites (i.e. miRNA-mRNA interactions), such as RNAhybrid (Rehmsmeier et al., 2004), RNA22 (Miranda et al., 2006), DIANA-microT-CDS (Hatzigeorgiou et al., 2012), MIRZA-G (Gumienny et al., 2015) and TarPmiR (Hu et al., 2016); and (b) models which predict change in mRNA levels induced by miRNA binding, such as miRmap (Vejnar et al., 2012) and TargetScan (Agarwal et al., 2015). In this study we aim at developing a new type (b) model. While there is some overlap between the two types, they are distinct, as miRNA-mRNA interactions do not always lead to mRNA repression (e.g. they may affect only translation, or have no effect): thus, even perfect knowledge of all mRNA site locations and binding probabilities is not sufficient to predict change in mRNA levels. On the other hand, even perfect prediction of change in mRNA levels is not sufficient to find all miRNA-mRNA interactions (i.e. all miRNA binding sites), as not all such interactions result in repression. This is supported by recent analyses of CLASH datasets which suggest that non-canonical sites do not mediate repression despite binding the miRNA (Agarwal et al., 2015; Wang, 2014).

The features used in such models usually include folding energy; thermodynamic accessibility of the site; AU content around the site; conservation of the site; 3′UTR length; and the site’s position in the 3′UTR.

In most previous type (b) studies, the coding region is given a secondary role, or simply not considered. For example, miRmap and TarPmiR do not consider the ORF; TargetScan uses the number of miRNA sites in the ORF, its length and its AU content. On the other hand there are type (a) models—such as DIANA-microT-CDS, which takes into consideration ORF sites, treating them equally to 3′UTR sites, and applies the aforementioned ‘classic’ thermodynamic, evolutionary and sequential features to the ORF as well as the 3′UTR. RNA22 also predicts sites in both translated and untranslated regions as well. However, none of these models address the unique aspect of the ORF as a ‘ribosome railway’, and the effect thereof on miRNA action during mRNA degradation, by introducing translation-related features or via the study of translation measurements such as Ribo-Seq. Here, we present a miRNA-mediated repression prediction model, with an emphasis on the interplay between sites in the ORF and 3′UTR, and with additional determinants proposed in recent years (Fig. 1). We show our model improves upon previous models and suggest new insights related to the interplay between mRNA translation and miRNA-mediated mRNA degradation.

The steps and rationale behind the study
Fig. 1.

The steps and rationale behind the study

2 Materials and methods

Full details appear in the Supplementary Methods.

2.1 Model features

The features of the model are based on TargetScan, miRmap and new studies on miRNA action determinants. We have divided them into four categories: Thermodynamic, Conservation, Sequence and Translation-related (Fig. 1g–j). The features appear in Supplementary Table S1. Values of the ‘global’ features, i.e. not directly concerning specific short RNAs (sRNAs) or binding sites, used for all genes appear in Supplementary Table S2.

2.2 Ribo-Seq data generation

Ribo-Seq data was generated from (Stumpf et al., 2013), using Bowtie (Langmead et al., 2009) via the method described in (Diament et al., 2016).

2.3 Training set

To train the model, we have used the normalized HeLa transfection data from (Agarwal et al., 2015), keeping mRNAs that appear in the Ensembl v.94 database (Yates et al., 2019) (Fig. 2I). The final training set (Fig. 2a) consisted of 3912 mRNAs (Fig. 2I, Supplementary Table S2) and 74 sRNAs (Fig. 2II)—11 miRNAs and 63 siRNAs

The pipeline of data processing and model training
Fig. 2.

The pipeline of data processing and model training

2.4 Feature selection and training

In order to conduct feature selection and training of the model (Fig. 1n), we have employed the elastic net method, where the L1 and L2 regularizations are given equal relative weights (i.e. α = 0.5). We trained a separate linear regression model, using 10-fold CV, for each of the 8 single-sites subsets (Fig. 2d). The model was implemented in MATLAB R2019a.

2.5 Assessing the model using an independent test set

To objectively assess our model, we have used the normalized HCT116 transfection data from (Agarwal et al., 2015) as an independent test set. The set is based on the microarray datasets published in (Linsley et al., 2007); this time we used the annotation from the RefSeq (O’Leary et al., 2015) database (GRCh38.p12), and filtered them similarly to the training set. This set (Fig. 2e) consisted of 4477 mRNAs (Fig. 2I, Supplementary Table S2) and 7 miRNAs, different from the 74 sRNAs used in the training set (Fig. 2II).

Thus, the training and assessment test sets are taken from (i) different cell types and (ii) different miRNAs.

2.6 Ranking feature importance

We used the random forest method in order to score the new translation-related feature and compare them to ‘classic’ thermodynamic, evolutionary and sequential features. This is an ensemble method which uses n decision trees (in our case, n = 1000), each one trained on a bootstrapped sample set (generated from the original set), and a random subset of predictors in each step, in order to create regression models and/or assess feature importance while controlling for overfitting. The final score is given in the interval [0,1].

3 Results

3.1 Developing a miRNA-mediated repression prediction model

In order to investigate the relationship between miRNA action and the translated ORF, we used the short RNA (sRNA) transfection datasets published in TargetScan.org and developed a linear regression model that predicts miRNA-mediated gene repression, given an mRNA and a miRNA.

The TargetScan datasets consist of a HeLa dataset and an HCT116 dataset (12). These are normalized microarray transfection datasets, collected from different papers and experiments. We have used the HeLa dataset as a training set, and the HCT116 as an independent test set.

First we identified single sites in the HeLa dataset: For each (mRNA, sRNA) pair, we searched the ORF and 3′UTR sequences for binding sites cognate to the sRNA seed, according to the pairing rules of canonical seed types (6mer, 7mer-A1, 7mer-m8, 8mer). We then kept all pairs for which only one binding site was found. This was done in order to isolate the effect of each seed type, since the interaction between different site types is only partly understood; including that interaction in the model would introduce an additional layer of complexity.

The next step was calculating the predictor features. For each single site, features from different categories were calculated: thermodynamic features, sequential features and evolutionary features. Most of these features are as used in miRmap and TargetScan. We also used, for the first time, a novel category of protein translation-related features such as codon usage bias, ribosomal travel speed and amino acid charge (Supplementary Table S1). All in all, we used eight new features, six of which are translation-related.

We then used elastic net to train a submodel for each of the eight single site subsets (2 regions×4 seed types). Elastic net regularization is a regression algorithm combining L1-regularization (i.e. LASSO regression) and L2-regularization (i.e. ridge regression), thereby balancing their strengths and weaknesses. Either regularization method is useful in preventing overfitting, but L1 regularization tends to eliminate correlated features (even if using both can be beneficial), whereas L2 regularization does not perform feature selection, minimizing certain feature weights instead of setting them at 0. Combining the two allows for feature selection while retaining correlated features.

The final model accepts an mRNA and a miRNA, finds canonical target sites in the mRNA and predicts a partial repression for each site (using the appropriate submodel). The upper bound of repression for each site is 0 (meaning no change in mRNA levels, i.e. miRNA sites will not be predicted to cause upregulation). The per-site predicted repression values are then summed up to produce the final, total repression. The repression is log2 fold-change of mRNA levels (e.g. a repression of -1 indicates halving of the mRNA levels).

3.2 Improvement over TargetScan

To assess our model (Fig. 1p), we used the HCT116 dataset as an independent test set. This set consists of a different cell line and different sRNAs from those in the training set, so the effect of overfitting is minimal.

TargetScan was shown in its paper to have improved performance relative to 17 other miRNA prediction models and (to the best of our knowledge) specifically all known type (b) models; because of this, and since TargetScan is a highly influential and popular model, we decided to gauge our improvement relatively to it.

First, we examined the robustness of the submodels: for each (region, seed type) pair, we trained 1000 models using a random 50% of the corresponding single site types (for example, 3′UTR 8mer) in the HeLa dataset. We then used each model to predict the repression of the other

50% single sites in the training (HeLa) set (Fig. 3A). The average correlations for the 3′UTR 7mer-A1, 7mer-m8 and 8mer submodels were 0.267, 0.351 and 0.413, respectively.

Performance assessment of the model. (A) Pearson correlations between submodel predicted repression and measured repression (based on single sites in the training dataset). For each (region, site type) pair, a random 50% of the corresponding single sites in the HeLa set were taken as a training set, and the other 50% as a validation set. A submodel was trained on the training set, was used to predict the repression of the validation set, and then was compared with the measured repression. This process was repeated 1000 times, each time taking a different random 50% of the HeLa set. The average correlation µ is shown as a black vertical line. (B) Histogram: Pearson correlations, produced by 1000 jackknife repeats, between the predicted and measured repression of test set (mRNA, sRNA) pairs with at least one 7-8nt site in the 3′UTR. Similarly to (A), in each jackknife iteration a different 50% of the training set and of the test set were taken. Vertical lines: Average correlation µ of the 1000 repeats (black); correlation between predicted and measured repression of test set (mRNA, sRNA) pairs with at least one 7-8nt in the 3′UTR, when taking 100% of the training and test sets and using our model (green) or TargetScan (red). (C) Scatter plot of measured repression versus predicted repression of the test set, when using the final model (i.e. trained on 100% of the training set). Only (mRNA, sRNA) pairs with at least one 7-8nt site in the 3′UTR were taken. r is the Pearson correlation
Fig. 3.

Performance assessment of the model. (A) Pearson correlations between submodel predicted repression and measured repression (based on single sites in the training dataset). For each (region, site type) pair, a random 50% of the corresponding single sites in the HeLa set were taken as a training set, and the other 50% as a validation set. A submodel was trained on the training set, was used to predict the repression of the validation set, and then was compared with the measured repression. This process was repeated 1000 times, each time taking a different random 50% of the HeLa set. The average correlation µ is shown as a black vertical line. (B) Histogram: Pearson correlations, produced by 1000 jackknife repeats, between the predicted and measured repression of test set (mRNA, sRNA) pairs with at least one 7-8nt site in the 3′UTR. Similarly to (A), in each jackknife iteration a different 50% of the training set and of the test set were taken. Vertical lines: Average correlation µ of the 1000 repeats (black); correlation between predicted and measured repression of test set (mRNA, sRNA) pairs with at least one 7-8nt in the 3′UTR, when taking 100% of the training and test sets and using our model (green) or TargetScan (red). (C) Scatter plot of measured repression versus predicted repression of the test set, when using the final model (i.e. trained on 100% of the training set). Only (mRNA, sRNA) pairs with at least one 7-8nt site in the 3′UTR were taken. r is the Pearson correlation

Next, we combined the submodels inferred from all the different runs to create 1000 models, and calculated for each of them the correlations of their predicted repression to the measured repression of a corresponding random 50% of the test (HCT) set (Fig. 3B). Similarly to TargetScan, here we only considered (mRNA, sRNA) pairs with at least one 7-8nt site in the 3′UTR, as their repression is significant and can be detected on their own (whereas other binding sites have a more marginal effect). The average correlation was 0.386. The z-score of TargetScan’s correlation, compared to the 1000 correlations received from partial models, was -4.05. Finally, we trained the final model, using eight submodels trained on the whole training set, and used it to predict the repression of all (mRNA, sRNA) pairs with at least one 7-8nt site in the 3′UTR (Fig. 3C). The Pearson correlation between the predicted and measured repression was 0.399 (P = 2.5×10−232), which is a 22% increase from TargetScan's correlation on the test set (0.326) and a 49% increase in explained variation.

We also examined using only the most repressive site’s predicted repression, as opposed to taking the sum of all sites; this approach improved upon TargetScan’s correlation as well, but was still lower than the sum-approach, with a correlation of 0.363 (P = 2.4×10−189).

In the predictions calculated by the final model on the whole test dataset, predicted repressions ranged from 0 to -1.6 (Fig. 3C). The average predicted repression was -0.12, with a standard deviation of 0.12.

3.3 Validation of predicted mRNA-miRNA interactions using experimental data

In order to assess our model's capability to predict functional canonical mRNA-miRNA interactions, we used miRTarBase, a database of experimentally validated interactions (Chou et al., 2018). We performed chi-squared test to evaluate enrichment of predicted and validated predictions, either based on all available methods or only CLIP/CLASH (Fig. 4A). An (mRNA, miRNA) pair was classified as having a predicted interaction if the predicted repression was lower than -0.1 (the average predicted repression in the test set).

Enrichment of validated mRNA-miRNA interactions in predicted interactions. (A) Mosaic plots of mRNA-miRNA interactions among (mRNA, miRNA) pairs in the test dataset with at least one 7-8nt 3′UTR binding site. Only mRNAs and miRNAs which appear in both the test set and miRTarBase dataset were used. P-values and χ2 were calculated using chi-squared test. (B) Contingency table of mRNA-miRNA interactions among (mRNA, miRNA) pairs in the CLASH dataset published in (Helwak et al., 2013) with at least one 7-8nt 3′UTR binding site. P-value and χ2 were calculated using chi-squared test. (C) Stacked bar chart of the columns in (B), i.e. the number of validated and predicted mRNA-miRNA interactions. The left y-axis corresponds to the left bar (validated interactions), and the right y-axis corresponds to the right bar (non-validated interactions)
Fig. 4.

Enrichment of validated mRNA-miRNA interactions in predicted interactions. (A) Mosaic plots of mRNA-miRNA interactions among (mRNA, miRNA) pairs in the test dataset with at least one 7-8nt 3′UTR binding site. Only mRNAs and miRNAs which appear in both the test set and miRTarBase dataset were used. P-values and χ2 were calculated using chi-squared test. (B) Contingency table of mRNA-miRNA interactions among (mRNA, miRNA) pairs in the CLASH dataset published in (Helwak et al., 2013) with at least one 7-8nt 3′UTR binding site. P-value and χ2 were calculated using chi-squared test. (C) Stacked bar chart of the columns in (B), i.e. the number of validated and predicted mRNA-miRNA interactions. The left y-axis corresponds to the left bar (validated interactions), and the right y-axis corresponds to the right bar (non-validated interactions)

We considered only mRNAs and miRNAs which appear in both the test dataset and the miRTarBase dataset. In both cases enrichment was significant (P = 3.03×10−67, odds ratio = 3.58; and P = 1.88×10−43, odds ratio = 3.24, respectively), and was robust to selection of different repression thresholds (i.e. the repression value for which stronger repression means classification as ‘predicted interaction’ and vice versa; Supplementary Table S3). The most significant enrichment was observed for repression thresholds of -0.1 to -0.3.

Next, we examined more closely the human CLASH data published in Helwak et al. (2013). We used our model to predict miRNA-mediated repression for all possible (mRNA, miRNA) pairs in this set (Fig. 4B, C), and once again used chi-squared test to assess enrichment of mRNA-miRNA interactions. We note that the number of possible interactions in this dataset is higher by two orders of magnitude relative to the test set (6729 mRNAs and 393 miRNAs in the CLASH dataset, 2665 mRNAs and 7 miRNAs in the test dataset), and thus more prone to false positives by the computational model (which considers all possible such interactions). Nevertheless, enrichment was significant in this case as well (P = 3.12×10−20, odds ratio = 1.91), and the results were robust to repression threshold selection (Supplementary Table S3). The most significant enrichment was observed for repression thresholds of -0.2 to -0.3.

3.4 Translation features perform comparably to ‘classic’ miRNA features

In order to better understand the interaction between ORF and 3′UTR sites (Fig. 1q), we examined the coefficients of the translation features in the 1000 models. We focused on single 7-8nt submodels in the 3′UTR; as explained above, these sites can cause significant repression even by themselves, and thus their data is the most reliable. We gauged their contribution using the random forest method (Fig. 5). When comparing the generated scores to six well-established repression determinants—the PhastCons score of the site, length of the ORF/3′UTR, the AU content of the ORF/3′UTR and ΔG Open—the results of the established and new features were on par with the performance of these ‘classic’ features, and in some cases (e.g. CAI, TDR, RiboSeq) exceeded them.

Importance of translation features. Feature importance of the novel translation features (CAI, tAI, AA Charge, Slow AA, TDR and ribosome profiling) compared to the ranking of well-established miRNA model features. Feature importance was calculated using the random forest method with 1000 forests, on the single-site sets of 3′UTR 7mer-A1, 7mer-m8 and 8mer seed types
Fig. 5.

Importance of translation features. Feature importance of the novel translation features (CAI, tAI, AA Charge, Slow AA, TDR and ribosome profiling) compared to the ranking of well-established miRNA model features. Feature importance was calculated using the random forest method with 1000 forests, on the single-site sets of 3′UTR 7mer-A1, 7mer-m8 and 8mer seed types

3.5 ORF sites and translation features contribute to miRNA modelling, especially sites at the end of the ORF

Most of miRNA research focuses on binding sites in the 3′UTR, showing these sites are significantly more effective than possible sites in the ORF, possibly because of ribosome traversal which may impede miRNA functionality (Grimson et al., 2007; Gu et al., 2009). However, other studies have shown ORF sites can still impart some repression (Hausser et al., 2013). We explicitly included in our model miRNA binding sites that reside in the ORF; we also added features quantifying the speed of ribosome traversal, calculated on the whole ORF (a ‘global translation speed’) and in a window centred around ORF binding sites (a ‘local translation speed’). These features utilized indices and determinants shown to affect translation speed, such as the Codon Adaptation Index (CAI) (Sharp et al., 1987); tRNA Adaptation Index (tAI) (dos Reis et al., 2004; Sabi et al., 2016; Tuller et al., 2011); Typical Decoding Rate (TDR) (Dana et al., 2014); amino acid charge (Charneski et al., 2013; Sabi et al., 2015; Tuller et al., 2011); and the presence of slowly translated codons (Sabi et al., 2015) (details in the Supplementary Methods). In another new set of features, we incorporated the number of recognition sequences for RNA-Binding Proteins (RBPs), such as PUM1 (a regulator of protein translation), which were shown to interact with miRNA action. We also added MRE (MicroRNA Recognition Element) features, in order to model non-canonical binding sites, especially sites that were not characterized before. The MRE features are described in the Supplementary Methods.

We wanted to investigate whether considering ORF sites and features improved the model's performance (Fig. 1q); we trained several models using different parts of the mRNA and compared their performance (Fig. 6), using for all of them the pipeline described in Section 2 (Fig. 2). The comparison of correlation distributions, using one-tailed t-test, showed a hierarchy of position importance: using the ORF sites and novel translation features improved the models; and model performance further improved when taking ORF sites closer to the 3′UTR. The best models were those trained on sites residing at the 3′UTR and the last third of the ORF. In addition, their average explained variance was 6.17% higher than models that were trained using only 3′UTR sites and lack the translation-related features, with P = 2.93×10−71 (one-tailed t-test). Conducting the same process to compare between these subsets when including/excluding translation-related features revealed similar results—model performance was improved when considering translation-related features and downstream ORF binding sites (Supplementary Table S4).

Hierarchy of site subsets. Performance of models trained on different sets of sites and features, arranged according to their average r2. P-values taken from one-tailed t-tests executed on pairs of models, trained on different sets of sites. All models included the translation-related features. Also shown are the relative changes (Δ) between the average r2 of the sets. From top to bottom, the sets are: sites in the 3′UTR and last third of the ORF; sites in the 3′UTR only; sites in the ORF and 3′UTR; sites in the 3′UTR and the middle third of the ORF; and sites in the 3′UTR and first third of the ORF
Fig. 6.

Hierarchy of site subsets. Performance of models trained on different sets of sites and features, arranged according to their average r2. P-values taken from one-tailed t-tests executed on pairs of models, trained on different sets of sites. All models included the translation-related features. Also shown are the relative changes (Δ) between the average r2 of the sets. From top to bottom, the sets are: sites in the 3′UTR and last third of the ORF; sites in the 3′UTR only; sites in the ORF and 3′UTR; sites in the 3′UTR and the middle third of the ORF; and sites in the 3′UTR and first third of the ORF

The miRNA recycling hypothesis. A diagram of the miRNA recycling hypothesis. (1) The ribosome travels along the ORF, encountering a miRNA with relatively weak binding to the ORF. (2) The miRNA dissociates from the mRNA, staying in its vicinity. (3) The miRNA binds to a stronger binding site downstream, at the 3′UTR, strengthening the repression
Fig. 7.

The miRNA recycling hypothesis. A diagram of the miRNA recycling hypothesis. (1) The ribosome travels along the ORF, encountering a miRNA with relatively weak binding to the ORF. (2) The miRNA dissociates from the mRNA, staying in its vicinity. (3) The miRNA binds to a stronger binding site downstream, at the 3′UTR, strengthening the repression

3.6 Non-canonical site features support competition between binding sites

It has been suggested that low-affinity sites can effectively cause derepression, since they compete with high-affinity sites while causing minimal repression (Denzler et al., 2016). We used the MRE features in order to look for such a trend (Fig. 1r, Supplementary Fig. S1). Since the model output is the signed fold change in mRNA levels, a positive feature coefficient here indicates an increase in number of sites is correlated with an increase of mRNA levels, i.e. weaker repression. Focusing on submodels trained on 7-8nt 3′UTR single sites, we found that features counting weak sites (-15 ≤ ΔG  ≤  0), especially in the ORF, tended to be assigned with negative coefficients (indicating an increase in number of sites is linked to a weaker repression) while strong sites (-30 ≤ ΔG < -15) displayed the opposite behaviour. This finding supports and enhances the mentioned competition model: high-affinity sites contribute to repression, while low-affinity sites compete with them, thus contributing to derepression.

3.7 Translation features suggest positive correlation between translation speed/density and repression strength

Lastly, since the translation features were found to provide strong contribution to the repression prediction, we wanted to examine directly the relation between ribosomal features and repression strength (Supplementary Fig. S2). Here, we used again 7-8nt 3′UTR single site submodels. Indeed, we found that in almost all submodels and features (except for the tAI feature in 7mer-m8 submodels), the coefficients of these features indicate a positive correlation between ribosome speed and density and repression strength.

4 Discussion

In this study, we demonstrated our model's improvement over TargetScan, combining determinants featured in existing models with recent insights from studies of miRNA-mediated repression. We put an emphasis on the role of the ORF and translation features, often neglected in the discussion of miRNAs.

Since our model predicts change in mRNA levels, rather than miRNA binding sites (see Section 1), we believe the best point of reference is models of the same type, of which there are not many; most models predict binding sites rather than the direct effect of the miRNA. While there is some overlap between the two, they are distinct, as miRNA-mRNA interactions do not always lead to mRNA repression (specifically they may affect only translation, or have no known effect), and repression may not be fully captured via binding sites estimation tools (many of which are based on NGS experiments).

The most recent type-(b) model to predict change in mRNA levels is TargetScan 7, which show in their paper that their model's superiority over almost all other major models (except for TarPmiR, which was published 2 years later; but this model does not predict the outcome of miRNA binding, only the binding sites themselves, i.e. it is a type-(a) model. See Section 1). Thus, we believe that TargetScan's paper, performing a comparison with more than 20 miRNA model versions and already cited thousands of times, is compelling evidence to the current superiority of TargetScan for predicting mRNA level changes over other type-(b) models.

We concede that our model is not necessarily better at finding miRNA-mRNA targets than dedicated miRNA-mRNA target prediction models, either in the UTR or the coding sequence.

We have specifically shown the importance of translation features and ORF sites; our computational analysis also revealed a spatial pattern of ORF sites: sites closer to the 3′UTR yielded better results, to the point where taking only the sites in the last third of the ORF improved upon both models using only 3′UTR sites, and models using sites in the whole ORF and 3′UTR. This complements the improvement over TargetScan, which primarily identifies 3′UTR binding sites—providing additional evidence to the importance of ORF sites.

mRNA targeting by miRNAs was shown to vary across tissues (Clark et al., 2014), increasing the challenge in developing a general model for such targeting. Unfortunately, the number of large-scale datasets, which include changes in mRNA levels for thousands of mRNAs and for a few different miRNAs, is lacking; we know of no such set other than the TargetScan data. Thus, our ability to validate our model on different tissues is limited. However, we note that our training and test sets were measured in two different cell lines (HeLa and HCT116), and that our additional validation using miRTarBase (Fig. 4) includes several different tissue types, including HeLa, HCT116, HEK293, hESC, BC-3, BJAB and MCF7. We believe these results support the suggestion that our model is general and is expected to be useful for the analyses of various tissues and different miRNAs. We note that this study deals only with human cells, as miRNAs may behave differently in other organisms.

We also provided computational support to the competition between weak and strong binding sites, showing weak sites are linked with derepression whereas strong sites are linked with repression. Interestingly, a rough threshold of transition between ‘weak’ and ‘strong’ sites can also be estimated from the results, namely 15-20 kcal/mol. Future studies could examine this threshold estimation more thoroughly, in order to better understand competition between miRNA binding sites.

As mentioned above, ORF sites were suggested to be less effective because of ribosome traversal, as the ribosome both causes bound miRNAs to dissociate and blocks additional miRNAs from binding to the mRNA. We found that higher translation speed and ribosomal density were associated with a stronger overall repression; according to the relation between ribosome and miRNA mentioned above, we would have expected higher ribosomal speed and density to lower the efficiency of individual ORF sites, thereby decreasing overall repression. This is contrary to our results. We thus propose the miRNA recycling hypothesis (Figs 1s and 7): when a ribosome encounters a bound miRNA and causes it to dissociate, the miRNA remains in the vicinity of the mRNA and may re-bind to a more effective site downstream (either towards the end of the ORF or at the 3′UTR). This recycling results in a net increase of repression. Thus, a higher ribosomal flux causes more bound miRNAs to dissociate and recycle, thereby strengthening repression.

Explicit appraisal of the miRNA recycling hypothesis may prove difficult, because of the many different determinants at work when considering miRNA-mediated repression; the most direct approach would be design of dedicated synthetic biology experiments, controlling both (i) the quantity, quality and position of different binding sites and (ii) the ribosomal density (via modulating the initiation rate).

Computational analysis of large-scale miRNA datasets entails various challenges. One major problem is that the repression is measured as the total effect of a miRNA on an mRNA, so pinning down the repression induced by each site, and the interplay between the different sites, requires additional assumptions and modelling and will not yield exact results.

We also note the inherent ambiguous nature of interpreting miRNA transfection microarray data, which may result from secondary effects rather than direct effect of the miRNA on a specific mRNA (e.g. the transfected miRNA silences a suppressor of the specific mRNA). We believe the combined effect of the relatively large-scale data and different site features mitigate this problem to some degree. Nevertheless, this secondary effect is likely to exist for some part of the dataset.

Another challenge is the assumption, here and in previous studies in the field, of a unified model for all miRNAs and genes, whereas the influence of the determinants used may vary across different genes, miRNAs and tissues. This problem may be somewhat relieved by using a large-scale dataset, as we and others have done, thereby attempting to capture the average determinants' influence.

A larger issue, not considered here, is interaction between miRNAs and ceRNAs (including lncRNAs). This interaction is likely to influence miRNA-mediated repression as well (Jalali et al., 2013; Thomson et al., 2016), and is an interesting possible topic of study and implementation into computational models in the future.

Finally, we have also considered inclusion of 5′UTR canonical sites, either as features or explicitly as separate models (similarly to ORF sites). In both cases, we found no significant improvement (Supplementary Table S4). 5′UTR targeting was suggested to be related to miRNA-mediated translation activation (Ørom et al., 2008). The exact mechanism of this activation, and its prevalence relative to translational repression, are unknown; however, we hypothesize that sites residing close to the start codon (both in the 5′UTR and ORF) will tend to be related to translation repression (perhaps via influence on translation initiation), whereas sites residing close to the stop codon (both in the ORF and 3′UTR) will tend to be related to mRNA degradation. This relation is by no means suggested to be exclusive or universal, as 3′UTR targeting was shown to cause translation repression as well (reviewed in Fabian et al., 2010); but it is congruent with our results regarding the larger importance of sites at the ORF end. Assessment of this suggested link requires translation repression data (e.g. change in protein abundance, in addition to change in mRNA levels).

The dataset used here—and indeed, most miRNA datasets and computational models—is based on mRNA levels rather than protein abundance; mRNA degradation is believed to be the more prevalent form of miRNA-mediated repression, though translation inhibition may account for as many as 34% of the general repression (Eichhorn et al., 2014). Thus we take the changes in mRNA levels as representative of miRNA-mediated repression. Ribosome-miRNA interaction precedes miRNA-mediated repression (since the miRNA may encounter the ribosome before even binding to the mRNA). We argue that translation dynamics affect miRNA binding, which, in turn, affects mRNA levels or even translation dynamics themselves. The intragenic ORF-3′UTR relationship shown here, as well as the recent discovery of translation-inhibiting MREs in the ORF (Zhang et al., 2018), shed new light on the subject. But in order to better investigate miRNA’s two-pronged approach to repression, more empirical data is needed; specifically, a wide miRNA transfection dataset that includes measurements of mRNA levels, protein abundance and ribosome footprints. This data can then be used in a similar manner to our study, in order to better model the ORF-3′UTR competition and their influence on overall repression via both mechanisms.

Our model and hypothesis can be evaluated by future experimental studies. For example, insertion of miRNA binding sites at different parts of the ORF, and measurement of their effect on the effectivity of mRNA degradation and translation inhibition, can aid in assessing the importance of location within the ORF, the strength of their effect, and their link to mRNA degradation and translation inhibition. In another possible experiment, assessing the miRNA recycling hypothesis can be done by inducing silent mutations into the ORF in order to decrease ribosomal flux near the end of the ORF; our hypothesis suggests that this will lead to a decrease in mRNA degradation (since less miRNAs will bind to the 3′UTR). Finally, large scale measurements that include not only the effect of mRNA-miRNA interactions on mRNA degradation but also the effect on translation and protein levels (e.g. based on techniques used in Schwanhäusser et al. (2011) and Ingolia et al. (2012)) will enable estimating the relative contribution of ORF miRNA binding sites to these different types of regulatory mechanisms.

The aspect of ORF-3′UTR interplay may have eluded previous research and is an exciting possible facet in the pursuit of understanding the mechanism of miRNA action.

Acknowledgements

The thank Shimshi Atar for providing data and advice regarding Ribo-Seq.

Funding

This study was supported in part by a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel-Aviv University.

Financial Support: none declared.

Conflict of Interest: none declared.

References

Agarwal
V.
 et al. (
2015
)
Predicting effective microRNA target sites in mammalian mRNAs
.
Elife
,
4
,
e05005
.

Bartel
D.P.
(
2009
)
MicroRNAs: target recognition and regulatory functions
.
Cell
,
136
,
215
233
.

Bartel
D.P.
(
2018
)
Metazoan MicroRNAs
.
Cell
,
173
,
20
51
.

Charneski
C.A.
 et al. (
2013
)
Positively charged residues are the major determinants of ribosomal velocity
.
PLoS Biol
.,
11
,
e1001508
.

Chou
C.-H.
 et al. (
2018
)
miRTarBase update 2018: a resource for experimentally validated microRNA-target interactions
.
Nucleic Acids Res
.,
46
,
D296
D302
.

Clark
P.M.
 et al. (
2014
)
Argonaute CLIP-Seq reveals miRNA targetome diversity across tissue types
.
Sci. Rep
.,
4
,
5947
.

Dana
A.
 et al. (
2014
)
Mean of the typical decoding rates: a new translation efficiency index based on the analysis of ribosome profiling data
.
G3 (Bethesda)
,
5
,
73
80
.

Denzler
R.
 et al. (
2016
)
Impact of MicroRNA levels, target-site complementarity, and cooperativity on competing endogenous RNA-regulated gene expression
.
Mol. Cell
,
64
,
565
579
.

Diament
A.
 et al. (
2016
)
Estimation of ribosome profiling performance and reproducibility at various levels of resolution
.
Biol. Direct
.,
11
,
24
.

dos Reis
M.
 et al. (
2004
)
Solving the riddle of codon usage preferences: a test for translational selection
.
Nucleic Acids Res
.,
32
,
5036
5044
.

Eichhorn
S.W.
 et al. (
2014
)
mRNA destabilization is the dominant effect of mammalian MicroRNAs by the time substantial repression ensues
.
Mol. Cell
,
56
,
104
115
.

Fabian
M.R.
 et al. (
2010
)
Regulation of mRNA translation and stability by microRNAs
.
Annu. Rev. Biochem
.,
79
,
351
379
.

Grimson
A.
 et al. (
2007
)
MicroRNA targeting specificity in mammals: determinants beyond seed pairing
.
Mol. Cell
,
27
,
91
105
.

Gu
S.
 et al. (
2009
)
Biological basis for restriction of microRNA targets to the 3′ untranslated region in mammalian mRNAs
.
Nat. Struct. Mol. Biol
.,
16
,
144
.

Gumienny
R.
 et al. (
2015
)
Accurate transcriptome-wide prediction of microRNA targets and small interfering RNA off-targets with MIRZA-G
.
Nucleic Acids Res
.,
43
,
1380
1391
.

Hatzigeorgiou
A.G.
 et al. (
2012
)
Functional microRNA targets in protein coding sequences
.
Bioinformatics
,
28
,
771
776
.

Hausser
J.
 et al. (
2013
)
Analysis of CDS-located miRNA target sites suggests that they can effectively inhibit translation
.
Genome Res
.,
23
,
604
615
.

Helwak
A.
 et al. (
2013
)
Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding
.
Cell
,
153
,
654
665
.

Hu
H.
 et al. (
2016
)
TarPmiR: a new approach for microRNA target site prediction
.
Bioinformatics
,
32
,
2768
2775
.

Ingolia
N.T.
 et al. (
2012
)
The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments
.
Nat. Protoc
.,
7
,
1534
1550
. doi: 10.1038/nprot.2012.086.

Jalali
S.
 et al. (
2013
)
Systematic transcriptome wide analysis of lncRNA-miRNA interactions
.
PLoS One
,
8
,
e53823
.

Langmead
B.
 et al. (
2009
)
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
.
Genome Biol
.,
10
,
R25
.

Lewis
B.P.
 et al. (
2005
)
Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets
.
Cell
,
120
,
15
20
.

Linsley
P.S.
 et al. (
2007
)
Transcripts targeted by the MicroRNA-16 family cooperatively regulate cell cycle progression
.
Mol. Cell. Biol
.,
27
,
2240
2252
.

Miranda
K.C.
 et al. (
2006
)
A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes
.
Cell
,
126
,
1203
1217
.

O’Leary
N.A.
 et al. (
2015
)
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation
.
Nucleic Acids Res
.,
44
,
D733
D745
.

Ørom
U.A.
 et al. (
2008
)
MicroRNA-10a binds the 5′UTR of ribosomal protein mRNAs and enhances their translation
.
Mol. Cell
,
30
,
460
471
.

Rehmsmeier
M.
 et al. (
2004
)
Fast and effective prediction of microRNA/target duplexes
.
RNA
,
10
,
1507
1517
.

Sabi
R.
 et al. (
2015
)
A comparative genomics study on the effect of individual amino acids on ribosome stalling
.
BMC Genomics
,
16
,
S5
.

Sabi
R.
 et al. (
2016
)
stAIcalc: tRNA adaptation index calculator based on species-specific weights
.
Bioinformatics
,
33
,
589
591
.

Schwanhäusser
B.
 et al. (
2011
)
Global quantification of mammalian gene expression control
.
Nature
,
473
,
337
342
.

Sharp
P.M.
 et al. (
1987
)
The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications
.
Nucleic Acids Res
.,
15
,
1281
1295
.

Stumpf
C.R.
 et al. (
2013
)
The translational landscape of the mammalian cell cycle
.
Mol. Cell
,
52
,
574
582
.

Thomson
D.W.
 et al. (
2016
)
Endogenous microRNA sponges: evidence and controversy
.
Nat. Rev. Genet
.,
17
,
272
283
.

Tuller
T.
 et al. (
2011
)
Composite effects of gene determinants on the translation speed and density of ribosomes
.
Genome Biol
.,
12
,
R110
.

Vejnar
C.E.
 et al. (
2012
)
miRmap: comprehensive prediction of microRNA target repression strength
.
Nucleic Acids Res
.,
40
,
11673
11683
.

Wang
X.
(
2014
)
Composition of seed sequence is a major determinant of microRNA targeting patterns
.
Bioinformatics
,
30
,
1377
1383
.

Yates
A.D.
 et al. (
2019
)
Ensembl 2020
.
Nucleic Acids Res
.,
48
,
D682
D688
.

Zhang
K.
 et al. (
2018
)
A novel class of microRNA-recognition elements that function only within open reading frames
.
Nat. Struct. Mol. Biol
.,
25
,
1019
1027
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
Associate Editor: Jan Gorodkin
Jan Gorodkin
Associate Editor
Search for other works by this author on:

Supplementary data