Abstract

Motivation

Circular RNAs (circRNAs) are long noncoding RNAs (lncRNAs) often associated with diseases and considered potential biomarkers for diagnosis and treatment. Among other functions, circRNAs have been shown to act as microRNA (miRNA) sponges, preventing the role of miRNAs that repress their targets. However, there is no pipeline to systematically assess the sponging potential of circRNAs.

Results

We developed circRNA-sponging, a nextflow pipeline that (i) identifies circRNAs via backsplicing junctions detected in RNA-seq data, (ii) quantifies their expression values in relation to their linear counterparts spliced from the same gene, (iii) performs differential expression analysis, (iv) identifies and quantifies miRNA expression from miRNA-sequencing (miRNA-seq) data, (v) predicts miRNA binding sites on circRNAs, (vi) systematically investigates potential circRNA–miRNA sponging events, (vii) creates a network of competing endogenous RNAs and (viii) identifies potential circRNA biomarkers. We showed the functionality of the circRNA-sponging pipeline using RNA sequencing data from brain tissues, where we identified two distinct types of circRNAs characterized by a specific ratio of the number of the binding site to the length of the transcript. The circRNA-sponging pipeline is the first end-to-end pipeline to identify circRNAs and their sponging systematically with raw total RNA-seq and miRNA-seq files, allowing us to better indicate the functional impact of circRNAs as a routine aspect in transcriptomic research.

Supplementary information

Supplementary data are available at Bioinformatics Advances online.

1 Introduction

Circular RNAs (circRNAs) are classified as long noncoding RNAs (lncRNAs), even though a few have been reported to encode proteins (Miao et al., 2021). circRNAs are characterized by their loop structure, which makes them less prone to degradation (Jeck and Sharpless, 2014; Yu and Kuo, 2019). The biogenesis of circRNAs is explained by the occurrence of a backsplicing event (see Supplementary Fig. S1) during the alternative splicing process of precursor messenger RNA (pre-mRNA), where the 5ʹ terminus of an upstream exon and the 3ʹ terminus of a downstream exon are covalently joined (Yu and Kuo, 2019). The difference between circRNAs and linear RNAs is the lack of a 5ʹ cap and a 3ʹ polyadenylation [poly(A)] tail along with its circular shape, which makes circRNAs more stable and, in most cases, resistant to exonuclease activity (Lasda and Parker, 2014; Mester-Tonczar et al., 2020; Suzuki et al., 2006). These circular molecules can be made up of exonic and intronic regions of its spliced pre-mRNA and are thus found in a variety of sizes, ranging from 100 to >4000 nucleotides (Lasda and Parker, 2014; Szabo and Salzman, 2016). circRNAs are conserved across species, and their expression is tissue- and disease-specific (Jeck et al., 2013; Jeck and Sharpless, 2014; Zhang et al., 2018). Hence, they can play an important role as biomarkers and therapeutic targets (Kristensen et al., 2018; Qu et al., 2015; Zhang et al., 2018). Another type of noncoding RNA is microRNA (miRNA) which plays a role in post-transcriptional gene regulation (Hoffmann et al., 2021; List et al., 2019) and is involved in many biological processes and diseases (Kartha and Subramanian, 2014). miRNAs bind their target genes via the RNA-induced silencing complex causing their degradation or preventing their translation (Weinstein et al., 2013).

The possible interplay between circRNAs, miRNAs, messenger RNAs (mRNAs) that code for proteins, and other types of RNA that share miRNA binding sites gives rise to a large regulatory network. Salmena et al. (2011) proposed that any RNA that carries miRNA binding sites [e.g. mRNAs, circRNAs, pseudogenes, transcripts of 3ʹ-untranslated regions (UTRs) and lncRNAs] can act as a competing endogenous RNA (ceRNA) that competes for the limited pool of available miRNAs in a cell. As a result of this competition, an overexpressed RNA can sponge away miRNAs required for the regulation of other RNAs, which can explain why noncoding RNAs, such as circRNAs, can be implicated in a phenotype.

The enhanced stability of circRNAs might allow them to work as buffers for miRNAs by binding them until sufficient miRNAs are present to outnumber the circRNA binding sites (Zhang et al., 2018). The regulatory function of circRNAs and their alleged association with diseases are the main reasons why identifying sponging activity between circRNAs and miRNAs is of particular interest. The presence of an interaction between miRNAs and circRNAs has been repeatedly proven, and several circRNAs [e.g. CDR1as/CiRS-7, SRY (Hansen et al., 2013) and circNCX1 (Li et al., 2018)] have been recognized as miRNA sponges. Even though individual studies confirmed the existence of circRNA sponges, further studies are needed to elucidate the role of circRNAs in miRNA-mediated gene regulation.

From a computational point of view, the detection of circRNAs is difficult due to their circular shape and the lack of poly(A) tail, which makes it unlikely to observe them in poly(A)-enriched RNA sequencing (RNA-seq) libraries (Szabo and Salzman, 2016). Hence, circRNAs can only be robustly detected in libraries without poly(A) enrichment, such as ribosomal RNA (rRNA) depleted RNA-seq and total RNA-sequencing (total RNA-seq), which do not deplete circRNAs (Szabo and Salzman, 2016). Identification of circRNAs relies on the detection of backsplicing junctions among the unmapped reads, which allows for the estimation of circRNA abundance. By focusing on the backsplicing junction alone, the expression of circRNAs in relation to their linear counterparts is typically underestimated (Yu et al., 2021).

Several approaches for circRNA analysis have been proposed. Chen et al. (2021) reviewed 100 existing circRNA-related tools for circRNA detection, annotation, downstream analysis, as well as network analysis. They list a total of 44 circRNA identification tools including, but not limited to, CIRCexplorer (Zhang et al., 2016, 2014), find_circ (Memczak et al., 2013), CIRI (Meng et al., 2017), KNIFE (Szabo et al., 2015) and circRNA_finder (Westholm et al., 2014). They also present a total of 14 circRNA annotation databases collecting circRNA information from the literature, such as circBase (Glažar et al., 2014) and CIRCpedia (Dong et al., 2018; Zhang et al., 2016). Other circRNA-related tools include databases for feature collection and storing circRNA information related to disease and biomarkers. In addition, circRNA network identification tools model the interactions between circRNAs and miRNAs, lncRNAs or RNA-binding proteins. Other tools for downstream analysis of circRNAs cover alternative splicing detection, circRNA assembly and structure prediction and visualization (Chen et al., 2021). To the best of our knowledge, none of the tools provides a comprehensive and automated circRNA-sponging analysis integrating identification and quantification of both circRNAs and miRNAs, a systematic investigation of potential circRNA–miRNA sponging events and a ceRNA network analysis. We developed ‘circRNA-sponging’, a nextflow pipeline integrating state-of-the-art methods to (i) detect circRNAs via identifying backsplicing junctions from total RNA-seq data, (ii) quantify their expression values relative to linear transcripts, (iii) perform differential expression analysis, (iv) identify and quantify miRNA expression from miRNA-sequencing (miRNA-seq) data, (v) predict miRNA binding sites on circRNAs, (vi) systematically investigate potential circRNA–miRNA sponging events, (vii) create a ceRNA network and (viii) identify potential circRNA biomarkers using the ceRNA network (Fig. 1).

Overview of the individual steps of the pipeline. Total RNA-seq data processing is shown on top, and miRNA-seq processing on the bottom. In the miRNA sponging step, these results are integrated for network analysis and biomarker detection
Figure 1.

Overview of the individual steps of the pipeline. Total RNA-seq data processing is shown on top, and miRNA-seq processing on the bottom. In the miRNA sponging step, these results are integrated for network analysis and biomarker detection

We demonstrate the potential of the circRNA-sponging pipeline on matched rRNA-depleted RNA-seq and miRNA-seq data from mouse brain tissues.

2 Methods

2.1 Data

Using circRNA-sponging, we processed a total of 23 samples of matched single-end rRNA-depleted RNA-seq and miRNA-seq data for four brain regions (cerebellum, cortex, hippocampus, olfactory bulb). Samples include three replicates for wild-type (WT) and 2–3 CDR1 knock-out (KO) mouse replicates (GEO accessions: GSE100265, GSE93129) (Piwecka et al., 2017) (see Supplementary Table S1). We use the mm10 genome version for mapping.

2.2 Pipeline architecture

The circRNA-sponging pipeline is implemented in R (v. 4.2.0) and Python (v. 3.8.12) and wrapped with nextflow version 22.04.0.5697. The pipeline is hosted on dockerhub and will pull the required docker image when executed. The relevant image was built under docker version 22.06. It follows the nf-core guidelines (Ewels et al., 2020) and encompasses several state-of-the-art techniques organized into three modules: (1) the circRNA module, (2) the miRNA module and (3) the sponging module, the latter of which can only be performed if both other modules have been executed (Fig. 2). In the following, we provide a deeper insight into each module and highlight important components of the pipeline.

Workflow of the circRNA-sponging pipeline. The pipeline consists of three modules: (1) the circRNA module, (2) the miRNA module and (3) the sponging module. In (1), we detect circRNAs via identifying backsplicing junctions from total RNA-seq data, quantify their expression values, perform a differential expression analysis, and predict miRNA binding sites on circRNAs using a majority vote between three state-of-the-art methods. In (2), we either detect and quantify miRNAs in raw miRNA-seq or directly process miRNA expression data. In (3), we systematically investigate circRNA–miRNA sponging events, create a ceRNA network and use it to identify potential circRNA biomarkers
Figure 2.

Workflow of the circRNA-sponging pipeline. The pipeline consists of three modules: (1) the circRNA module, (2) the miRNA module and (3) the sponging module. In (1), we detect circRNAs via identifying backsplicing junctions from total RNA-seq data, quantify their expression values, perform a differential expression analysis, and predict miRNA binding sites on circRNAs using a majority vote between three state-of-the-art methods. In (2), we either detect and quantify miRNAs in raw miRNA-seq or directly process miRNA expression data. In (3), we systematically investigate circRNA–miRNA sponging events, create a ceRNA network and use it to identify potential circRNA biomarkers

1) The circRNA module addresses the identification, quantification and miRNA binding site prediction of circRNAs. For read mapping, we employ the STAR (Dobin et al., 2013) aligner, which provides support for the detection of splice-junction and fusion reads. The resulting unmapped split-reads are used by CIRCexplorer2, which uses a combination of methods (i.e. a de novo assembly approach to identify novel circRNA and a reference-based approach, which uses known exon-exon junctions to map backsplicing events to known genes) to increase the accuracy of its predictions (Zhang et al., 2016) to identify backsplicing events. We could confirm the excellent performance of CIRCexplorer2 using data simulated with polyester (Frazee et al., 2015) from the linear mouse reference genome GRCm38 at varying sequencing depth, where we rarely detect false positive backsplicing junctions and zero false positive circRNAs (Supplementary Fig. S2a). Next, raw read counts are normalized with DESeq2 (Love et al., 2014), and circRNAs with low expression levels are excluded to reduce false positives. By default, only circRNAs with a normalized read count >5 in at least 20% of samples are retained. Database annotation is performed using circBase (Glažar et al., 2014), which covers curated circRNAs with experimental evidence of several model organisms.

We use psirc (Yu et al., 2021) to quantify circRNA expression levels, as the detection of backsplicing junctions alone does not reflect circRNA expression levels in relation to the gene’s expression. To mitigate this, psirc employs kallisto (Bray et al., 2016) and considers both linear and circular transcripts in the expectation-maximization step to produce comparable expression values. psirc corrects for various sequencing biases that can affect circRNA detection, such as coverage bias, mapping bias, read length bias and alternative splicing bias. Yu et al. (2021) showed that psirc provides a more accurate identification of circRNA expression levels by validating their method with experimental data. If the data have been sampled from different conditions (e.g. case and control), the quantified linear transcripts and circRNAs can be used to perform a differential expression analysis using DESeq2 (Love et al., 2014). The pipeline generates heatmaps, volcano plots and principal component analysis (PCA) of the circRNAs and linear transcripts between conditions. We analyze alternative splicing between circular and linear transcripts on a gene level using SUPPA2 (Trincado et al., 2018). In order to integrate circular transcripts, we construct a merged gene annotation file consisting of both linear and circular transcripts. Based on this input, we generate percent spliced-in (PSI) values for linear and circular isoforms with the SUPPA2 step psiPerIsoform (Trincado et al., 2018). We, additionally, normalized the linear and circular PSI values for a gene by their sample-wise mean to account for differences in overall linear and circular splicing frequencies. Both nonnormalized and normalized PSI values are automatically visualized. To boost reliability, we predict circRNA–miRNA binding sites using a majority voting between miRanda (Enright et al., 2003), PITA (Kertesz et al., 2007) and TarPmiR (Ding et al., 2016) since each method has a distinct approach for predicting miRNA binding sites. Testing these tools with random miRNA sequences shows that up to 25% of the reported binding sites may be false positives (Supplementary Fig. S2b and d), which aligns with previous findings (Min and Yoon, 2010). Thus, we consider a circRNA–miRNA binding site as relevant if it is predicted by at least two out of the three methods. miRanda considers seed matching, conservation and free energy, and we consider predictions with a score above the 25% quantile. PITA additionally considers site accessibility and target-site abundance. TarPmiR further integrates machine learning to improve results for supported organisms (Ding et al., 2016). We further incorporate experimentally validated target sites from DIANA-LncBase v3 (Karagkouni et al., 2020), miRTarBase (Hsu et al., 2011; Huang et al., 2020) and miRWalk3.0 (Sticht et al., 2018).

2) The miRNA module covers the quantification and processing of miRNA expression. miRDeep2 (Friedländer et al., 2012) is used to obtain miRNA counts. Alternatively, already mapped miRNA expression data can be provided. Raw counts are normalized with DESeq2 (Love et al., 2014) followed by a filtering step, where by default, miRNAs with a normalized read count >5 in at least 20% of samples are retained.

3) The sponging module is used for the identification of crosstalk between circRNAs, miRNAs as well as ceRNA interactions of circRNAs with other transcripts. To identify potential sponging activity, we perform a correlation analysis of circRNA–miRNA pairs, where a negative correlation coefficient indicates a sponging relationship. For all circRNA–miRNA pairs (i.e. a circRNA that harbors at least one binding site for the miRNA), we compute a Pearson correlation coefficient along with the normalized residual sum of squares and the adjusted P-value after the Benjamini-Hochberg correction. Pairs are filtered (e.g. P-adjusted < 0.05, RMSE < 1.5, and optionally by the number of binding sites) and are considered potential sponging candidates. We further construct a ceRNA network using SPONGE (List et al., 2019) on matched gene and miRNA expression data. Finally, we apply spongEffects (Boniolo et al., 2023) to extract ceRNA modules consisting of circRNAs with a high node centrality score in the ceRNA network and their direct neighbors. For each module, spongEffects computes a sample-specific enrichment score (i.e. the spongEffects enrichment scores are calculated using one of three gene set enrichment approaches: Gene Set Enrichment Analysis (ssGSEA) and Gene Set Variation analysis (GSVA) algorithms as implemented in the GSVA package (version 1.34.0) (Hänzelmann et al., 2013), or Overall Expression (OE) (Jerby-Arnon et al., 2018)). These approaches can calculate spongEffects scores even if some genes in the ceRNAs modules are missing. The resulting module-by-sample score matrices can be used for further analysis. No major differences were observed between the three methods, and the choice of the optimal tool depends on the specific task and dataset (Boniolo et al., 2023), such as differential analysis between groups or supervised machine learning. The spongEffects scores are then used for training and testing a random model classifier to distinguish between groups of samples (e.g. healthy and control) (Kuhn, 2008). The prediction power is then measured by a 5-fold cross-validation and a comparison to random modules (i.e. sampling modules of the same size as the ones predicted while preserving the size distribution of the real modules). In our example dataset, we used 16 samples for training (four samples for cerebellum, cortex, hippocampus and olfactory bulb) and eight samples for testing (two samples for cerebellum, cortex, hippocampus and olfactory bulb).

3 Results

circRNAs are highly abundant and conserved in the mammalian brain (Hanan et al., 2017; Rybak-Wolf et al., 2015). To demonstrate the capabilities of the circRNA-sponging pipeline, we analyzed a public RNA-seq dataset from the mouse brain. We focused on the sponging capacity of circRNAs and their potential role as ceRNAs.

3.1 Comparing circRNA and host gene expression reveals changes in circRNA splicing

In total, we detected 46 380 and 27 390 circRNAs before and after filtering, respectively. This number aligns with the known high abundance of circRNAs in brain tissue (Hanan et al., 2017; Rybak-Wolf et al., 2015). We could annotate only 1027 (∼4%) of the circRNAs that passed the filter (Supplementary Fig. S3a), as comparably few circRNAs have thus far been annotated in mice using circBase. psirc-estimated expression levels, which take reads mapping to parts other than the backsplicing junction of the circRNA into account, are up to 6-fold higher compared to counts derived from backsplicing junctions only (Fig. 3c, per tissue type: Supplementary Fig. S3d–g). We observed a generally higher expression of circRNAs in the cerebellum compared to other brain regions, which could indicate a higher importance of the circRNAs in this brain region (Supplementary Fig. S3b).

circRNA results of the mouse brain regions dataset. (a) circRNAs shared between brain regions. (b) Expression of circRNAs across tissues and experimental conditions. (c) Comparison of psirc-quant quantified circRNA counts to CIRCexplorer2 counts. (d) Comparison between a circRNA originating from RIMS2 and expression of the linear RIMS2 gene
Figure 3.

circRNA results of the mouse brain regions dataset. (a) circRNAs shared between brain regions. (b) Expression of circRNAs across tissues and experimental conditions. (c) Comparison of psirc-quant quantified circRNA counts to CIRCexplorer2 counts. (d) Comparison between a circRNA originating from RIMS2 and expression of the linear RIMS2 gene

Concerning the miRNA binding sites, we observed that with a higher number of binding sites, the Pearson correlation increases (Supplementary Fig. S4). Despite the high number of shared circRNAs across brain regions (Fig. 3a), we observed a brain region-specific abundance of overall circRNAs levels (Fig. 3b, Supplementary Fig. S3b). However, expression levels of the circRNAs differ considerably, such that samples cluster by brain region (Fig. 3b, Supplementary Fig. S3c). Rybak-Wolf et al. (2015) reported circRNAs of 12 host genes (TULP4, RIMS2, ELF2, PHF21A, MYST4, CDR1, STAU2, SV2B, CPSF6, DYM, RMST and RTN4). They speculated on the importance of circRNAs originating from these genes for brain cell identity, but we posit that a change in circRNA expression alone does not necessarily imply a functional role as circRNA expression is coupled to the expression of the host gene, as we expect the number of reads mapping to the backsplicing junctions to correlate. We detected nine of the 12 circRNAs (all but TULP4, SV2B and RMST) in our analysis (Supplementary Table S2, Supplementary Fig. S5a and b). The difference between KO and WT samples is negligible, with the exception of the CDR1 region (mmu_hsa_circ_0001878 in circBase annotation) that was targeted successfully (Supplementary Fig. S5c and d). When excluding the circRNA of the CDR1, we observed a clear separation between the cerebellum and other brain regions, while the cortex and hippocampus are more similar (Galea et al., 2011). Our analysis revealed a total of 33 circRNAs that show significantly different expression between brain regions (P-adjusted < 0.01, absolute log2 fold change > 5, Supplementary Table S3). By comparing the expression level of the circRNAs to the linear transcripts, as facilitated by psirc-quant, we can identify cases where the expression of circRNAs increases beyond the level suggested by the overall gene expression. Such cases could offer evidence for the functional importance of a circRNA. For example, mmu_circ_0000595, a circRNA of host gene RIMS2, shows a higher expression in the cortex, whereas the expression of the host gene RIMS2 remains stable in this tissue (Fig. 3d, see also Supplementary Fig. S6a–p). We investigate the output of SUPPA2 to explore the relationship between the quantity of circular and linear transcripts per shared gene more thoroughly. As expected, the PSI values of the linear transcripts are mostly close to 100%, whereas circular RNAs are rare and show overall very low PSI values. However, there are instances where circular transcripts show high PSI values (Supplementary Fig. S7a). Differential splicing analysis of circular transcripts revealed that only very few are significantly differentially spliced between cell types, i.e. pass both filters of a change in PSI ≥ 25% and a P-value of ≤ 0.01 (Supplementary Fig. S7b). We can observe that two circRNAs (chr15:34600014–34625031_– with its host genes HYDIN between cortex-hippocampus and chr8:110298074–110334816_+ with its host gene NIPAL2 between hippocampus-olfactory-bulb) are considered differentially spliced in (Supplementary Fig. S7). These results are similar for normalized PSI values (Supplementary Fig. S8), where we accounted for sample-specific differences. Our results thus suggest that the splicing ratio of linear to circular RNA expression does not change between different brain regions.

3.2 A ceRNA network reveals circRNAs acting as miRNA sponges

If matched total RNA and miRNA sequencing data are provided, circRNA-sponging infers a ceRNA network using the R package SPONGE (List et al., 2019) and visualizes the result (Fig. 4). Important players in this regulatory network are characterized by a large node degree, i.e. they indirectly regulate many of the connected RNAs via sequestering miRNA copies. Since the network inferred by SPONGE does not offer insights into individual samples or conditions, we subsequently computed spongEffects (Boniolo et al., 2023) enrichment scores which capture the interaction of individual circRNAs and their target genes. As these scores are sample-specific, they can offer insights into condition-specific circRNA-sponging activity. spongeEffects scores can also be used as features for machine learning tasks such as classification (Boniolo et al., 2023). Since the number of available samples for training is rather small here, the random forest reported subset accuracy drops considerably on the holdout set in 10-fold cross-validation. While the cortex and hippocampus are difficult to differentiate, the cerebellum can be robustly distinguished from other brain regions (Supplementary Fig. S9). In particular, two circRNAs, chr10:9770449–9800068_− and chr10:79860969–79862010_+ stand out as distinctive features of the cerebellum (Supplementary Figs S10 and S11). While the inferred ceRNA network shows that circRNAs in the mouse brain are regulatory active through miRNA sponging, a larger number of samples is likely needed to fully resolve brain region-specific sponging activity.

circRNA–ceRNA-subnetwork with the top 25 ceRNA modules ranked by the number of significant interactions (node degree in the network). For each ceRNA module consisting of the circRNA and its target genes, we computed spongEffects enrichment scores and used them as input for a random forest model. The bottom-right corner shows the subset accuracy of this model in distinguishing different brain regions on the training and test set. The results of a model trained on random modules of the same size show random performance
Figure 4.

circRNA–ceRNA-subnetwork with the top 25 ceRNA modules ranked by the number of significant interactions (node degree in the network). For each ceRNA module consisting of the circRNA and its target genes, we computed spongEffects enrichment scores and used them as input for a random forest model. The bottom-right corner shows the subset accuracy of this model in distinguishing different brain regions on the training and test set. The results of a model trained on random modules of the same size show random performance

3.3 Comparing the number of circRNA binding sites with respect to their length reveals two distinct clusters

miRNA sponging has long been considered a potential function of circRNAs (Hansen et al., 2013). To fulfill this function, it would be beneficial for circRNAs to carry a large number of miRNA binding sites, and indeed, some known circRNAs, such as CDR1as harboring over 70 miRNA binding sites for miR-7 alone (Jiang et al., 2020), fit the hypothesis well. To investigate if this is a general property of circRNAs, we systematically compared the length of a transcript to the number of binding sites, expecting to observe a larger ratio for circRNAs compared to the 3ʹ untranslated regions of linear transcripts, where miRNA binding sites are predominantly located (Fig. 5). While linear transcripts show a very diverse picture, circRNA length correlates well with the number of binding sites.

Number of miRNA binding sites versus transcript length for linear and circular RNA. For the 3ʹ-UTRs of mRNAs, the number of binding sites was inferred from miRWalk 3.0. circRNA–miRNA binding sites were counted if they were reported by two of the three prediction methods employed, i.e. miRanda, TarPmiR and PITA. circRNAs form two clusters that can be explained by the different target site prediction methods used. Linear regression models were fit to each of the groups to show the trend of the association
Figure 5.

Number of miRNA binding sites versus transcript length for linear and circular RNA. For the 3ʹ-UTRs of mRNAs, the number of binding sites was inferred from miRWalk 3.0. circRNA–miRNA binding sites were counted if they were reported by two of the three prediction methods employed, i.e. miRanda, TarPmiR and PITA. circRNAs form two clusters that can be explained by the different target site prediction methods used. Linear regression models were fit to each of the groups to show the trend of the association

Compared to the prediction in 3ʹ-UTRs (also based on TarPmiR), circRNAs show a comparably high ratio between the number of binding sites and the length. We observed two distinct circRNA clusters despite employing the same three miRNA binding site prediction methods (miRanda, TarPmiR, PITA) for all of them. We only accept a miRNA binding site for a circRNA if it was predicted by at least two prediction methods. It appears that all three miRNA binding site prediction tools were able to identify miRNA binding sites in the circRNAs of the blue cluster, while only miRanda and PITA, but not TarPmiR, could predict miRNA binding sites of the circRNAs in the red cluster. This observation was not expected as TarPmiR, in general, predicts overall more binding sites than miRanda or PITA. An interesting question is hence if TarPmiR is able to differentiate between circRNAs that are active miRNA sponges and those that have other functions. Previous research has defined three types of circular RNAs based on structural features—exonic circular (ecirc) RNAs, circular intronic RNA (ciRNA) and exon-intron circRNA (EIciRNA) (Xiao et al., 2022; Yang et al., 2018; Zhang et al., 2022). It has been suggested that ecircRNAs function predominantly through a miRNA sponging effect in the cytoplasm, whereas other circular RNA forms (e.g. ciRNA and ElciRNA) function in the nucleus to regulate gene transcription (Li et al., 2015, 2017; Zhang et al., 2013). Hence, circRNAs that are functional in the nucleus could have fewer miRNA binding sites. To test alternative explanations, we checked if clusters differed by (i) biotype of the circRNA host gene (i.e. coding or noncoding gene, Supplementary Fig. S12a), (ii) genesis, i.e. the splicing method of the circRNA [ElciRNA, ciRNA, ecircRNA, Supplementary Fig. S12b (Trincado et al., 2018)] or (iii) circRNA expression level (Supplementary Fig. S12c). The observed clusters did not differ in any of these categories, and further work is needed to elucidate if these results are related to other structural features. It should also be noted that TarPmiR was not trained specifically on circRNAs and that a prediction method tailored toward circRNAs should be developed when suitable experimental data become available. In addition, we investigated the relationship between the number of miRNA binding sites and the SPONGE (List et al., 2019) correlation scores associated with each circRNA and found that these scores seem to have no apparent correlation to the number of miRNA binding sites, although a very large number of binding sites seems to be advantageous for generating more elevated scores (i.e. over 0.5) in comparison to circRNAs with lower miRNA binding potential, that are only rarely able to reach comparably high correlation values (Supplementary Fig. S13a and b).

4 Conclusion and outlook

We developed a new circRNA processing and analysis pipeline consisting of three modules harboring multiple current state-of-the-art methods: (i) circRNA detection, (ii) miRNA detection and (iii) detection of sponging events between circRNAs, ceRNAs and miRNAs. To the best of our knowledge, it is the first comprehensive circRNA pipeline to detect, quantify and annotate circRNAs as well as to determine their sponging activity. The latter allows users to bring circRNAs into a functional context with other RNAs, such as mRNAs and lncRNAs, through a joint ceRNA network which is mediated by miRNA sponging. Wen et al. (2021) recently highlighted the need for a further extensive investigation into circRNAs due to their enormous potential to explain human diseases like cardiovascular, autoimmune and cancers. circRNAs are also known to be involved in brain development, brain cell differentiation and neuronal signaling (Hanan et al., 2017; Piwecka et al., 2017). To demonstrate the capabilities of the circRNA-sponging pipeline, we hence re-analyzed a public dataset where we investigated circRNAs across different mouse brain regions. Using our pipeline, we could offer novel insights into circRNA biology across tissues of the brain. We showed that differences in circRNA splicing could be revealed when considering the expression of circRNAs relative to the expression of a host gene, similar to how alternative splicing events are detected by considering exon or intron inclusion. Our pipeline is the first to routinely incorporate differential splicing analysis between linear and circular transcripts of the same genes, allowing to better differentiate between changes in expression and changes in splicing. We further placed our findings into the context of miRNA sponging, demonstrating that circRNA exerts regulatory control over a vast number of transcripts. Finally, we showed that the number of binding sites in circRNAs correlated well with their length and observed that TarPmiR’s machine-learning strategy identifies a subset of circRNAs that could indicate promising candidates for miRNA sponging. Further work is needed to investigate if these two classes represent structurally different circRNAs, such as ecircRNAs, ciRNAs or ElciRNAs, or if this observation can be explained by differences in the miRNA prediction methods with no biological implication at all.

In the future, we plan to extend the circRNA-sponging pipeline with additional features. As various functions other than miRNA sponging have been attributed to circRNAs (Nielsen et al., 2022), we see room for expanding the features toward, e.g. investigating the protein-coding potential of circRNAs (Miao et al., 2021). We further seek to integrate circRNA-sponging into ongoing community efforts such as nf-core (Ewels et al., 2020) to build up long-term support for maintaining and expanding this pipeline. In summary, the circRNA-sponging pipeline is a powerful tool to detect, investigate and analyze circRNAs and their sponging effects and thus, it helps researchers consider circRNAs as a routine aspect in RNA-seq and miRNA-seq data analysis.

Acknowledgements

The authors thank Bahar Tercan, Mauro A. A. Castro, A. Gordon Robertson, Katherine A. Hoadley and Victoria K. Cortessis for the fruitful discussions. They thank Christina Trummer for the logo design. Figures were created with https://www.biorender.com. Parts of the images were designed using resources from Flaticon.com.

Author Contributions

Markus Hoffmann (Conceptualization [lead], Investigation [lead], Methodology [lead], Project administration [lead], Software [lead], Supervision [lead], Validation [lead], Visualization [lead], Writing—original draft [lead], Writing—review & editing [lead]), Leon Schwartz (Data curation [lead], Formal analysis [lead], Methodology [lead], Software [lead], Validation [lead], Writing—original draft [supporting], Writing—review & editing [supporting]), Octavia Ciora (Methodology [equal], Software [lead], Visualization [lead], Writing—original draft [supporting], Writing—review & editing [supporting]), Nico Trummer (Data curation [equal], Formal analysis [supporting], Software [supporting]), Lina-Liv Willruth (Data curation [lead], Resources [equal]), Jakub Jankowski (Methodology [equal], Writing—original draft [equal], Writing—review & editing [equal]), Hye-Kyung Lee (Methodology [supporting], Writing—original draft [equal], Writing—review & editing [equal]), Jan Baumbach (Funding acquisition [lead], Writing—original draft [equal], Writing—review & editing [equal]), Priscilla Furth (Validation [equal], Writing—original draft [lead], Writing—review & editing [lead]), Lothar Hennighausen (Funding acquisition [lead], Writing—original draft [lead], Writing—review & editing [lead]) and Markus List (Conceptualization [lead], Funding acquisition [lead], Methodology [lead], Project administration [lead], Writing—original draft [lead], Writing—review & editing [lead]).

Funding

This work was supported by the Technical University Munich—Institute for Advanced Study, funded by the German Excellence Initiative. This work was supported in part by the Intramural Research Programs (IRPs) of the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). J.B. was partially funded by his VILLUM Young Investigator [grant number 13154]. This project was partly funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) [422216132]. This work was supported by the German Federal Ministry of Education and Research (BMBF) within the framework of the *e: Med* research and funding concept [*grant numbers 01ZX1908A/01ZX2208A* and *grant numbers 01ZX1910D/01ZX2210D*]. This project has received funding from the European Union’s Horizon 2020 research and innovation program [grant agreement number 777111]. This publication reflects only the author’s view, and the European Commission is not responsible for any use that may be made of the information it contains.

Conflict of interest

None declared.

Data availability

Data are available in GEO under GSE100265 (rRNA-depleted RNA-seq) and GSE93129 (miRNA). circRNA-sponging is available under the GPL v.3.0 license at: https://github.com/biomedbigdata/circRNA-sponging. Dockerhub image is available under: https://hub.docker.com/r/bigdatainbiomedicine/circrna-sponging. The results presented in this manuscript are available as RData objects at: https://doi.org/10.6084/m9.figshare.21864948.

References

Boniolo
F.
et al. (
2023
)
spongEffects: ceRNA modules offer patient-specific insights into the miRNA regulatory landscape
.
Bioinformatics
,
39
,
btad276
.

Bray
N.L.
et al. (
2016
)
Erratum: near-optimal probabilistic RNA-seq quantification
.
Nat. Biotechnol
.,
34
,
888
.

Chen
L.
et al. (
2021
)
The bioinformatics toolbox for circRNA discovery and analysis
.
Brief. Bioinform
.,
22
,
1706
1728
.

Ding
J.
et al. (
2016
)
TarPmiR: a new approach for microRNA target site prediction
.
Bioinformatics
,
32
,
2768
2775
.

Dobin
C.A.
et al. (
2013
)
STAR: ultrafast universal RNA-seq aligner
.
Bioinformatics
,
29
,
15
21
.

Dong
R.
et al. (
2018
)
CIRCpedia v2: an updated database for comprehensive circular RNA annotation and expression comparison
.
Genomics Proteomics Bioinf
.,
16
,
226
233
.

Enright
A.J.
et al. (
2003
)
MicroRNA targets in Drosophila
.
Genome Biol
.,
5
,
R1
.

Ewels
P.A.
et al. (
2020
)
The nf-core framework for community-curated bioinformatics pipelines
.
Nat. Biotechnol
.,
38
,
276
278
.

Frazee
A.C.
et al. (
2015
)
Polyester: simulating RNA-seq datasets with differential transcript expression
.
Bioinformatics
,
31
,
2778
2784
.

Friedländer
M.R.
et al. (
2012
)
miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades
.
Nucleic Acids Res
.,
40
,
37
52
.

Galea
J.M.
et al. (
2011
)
Dissociating the roles of the cerebellum and motor cortex during adaptive learning: the motor cortex retains what the cerebellum learns
.
Cereb. Cortex
,
21
,
1761
1770
.

Glažar
P.
et al. (
2014
)
circBase: a database for circular RNAs
.
RNA
,
20
,
1666
1670
.

Hanan
M.
et al. (
2017
)
CircRNAs in the brain
.
RNA Biol
.,
14
,
1028
1034
.

Hansen
T.B.
et al. (
2013
)
Natural RNA circles function as efficient microRNA sponges
.
Nature
,
495
,
384
388
.

Hänzelmann
S.
et al. (
2013
)
GSVA: gene set variation analysis for microarray and RNA-seq data
.
BMC Bioinformatics
,
14
,
7
.

Hoffmann
M.
et al. (
2021
)
SPONGEdb: a pan-cancer resource for competing endogenous RNA interactions
.
NAR Cancer
,
3
,
zcaa042
.

Hsu
S.-D.
et al. (
2011
)
miRTarBase: a database curates experimentally validated microRNA–target interactions
.
Nucleic Acids Res
.,
39
,
D163
D169
.

Huang
H.-Y.
et al. (
2020
)
miRTarBase 2020: updates to the experimentally validated microRNA–target interaction database
.
Nucleic Acids Res
.,
48
,
D148
D154
.

Jeck
W.R.
et al. (
2013
)
Circular RNAs are abundant, conserved, and associated with ALU repeats
.
RNA
,
19
,
141
157
.

Jeck
W.R.
,
Sharpless
N.E.
(
2014
)
Detecting and characterizing circular RNAs
.
Nat. Biotechnol
.,
32
,
453
461
.

Jerby-Arnon
L.
et al. (
2018
)
A cancer cell program promotes T cell exclusion and resistance to checkpoint blockade
.
Cell
,
175
,
984
997.e24
.

Jiang
C.
et al. (
2020
)
The emerging picture of the roles of CircRNA-CDR1as in cancer
.
Front. Cell Dev. Biol
.,
8
,
590478
.

Karagkouni
D.
et al. (
2020
)
DIANA-LncBase v3: indexing experimentally supported miRNA targets on non-coding transcripts
.
Nucleic Acids Res
.,
48
,
D101
D110
.

Kartha
R.V.
,
Subramanian
S.
(
2014
)
Competing endogenous RNAs (ceRNAs): new entrants to the intricacies of gene regulation
.
Front. Genet
.,
5
,
8
.

Kertesz
M.
et al. (
2007
)
The role of site accessibility in microRNA target recognition
.
Nat. Genet
.,
39
,
1278
1284
.

Kristensen
L.S.
et al. (
2018
)
Circular RNAs in cancer: opportunities and challenges in the field
.
Oncogene
,
37
,
555
565
.

Kuhn
M.
(
2008
)
Building predictive models in R using the caret package
.
J. Stat. Soft
.,
28
,
1
26
.

Lasda
E.
,
Parker
R.
(
2014
)
Circular RNAs: diversity of form and function
.
RNA
,
20
,
1829
1842
.

Li
Z.
et al. (
2015
)
Exon-intron circular RNAs regulate transcription in the nucleus
.
Nat. Struct. Mol. Biol
.,
22
,
256
264
.

Li
Z.
et al. (
2017
)
Corrigendum: exon–intron circular RNAs regulate transcription in the nucleus
.
Nat. Struct. Mol. Biol
.,
24
,
194
.

Li
M.
et al. (
2018
)
A circular transcript of ncx1 gene mediates ischemic myocardial injury by targeting miR-133a-3p
.
Theranostics
,
8
,
5855
5869
.

List
M.
et al. (
2019
)
Large-scale inference of competing endogenous RNA networks with sparse partial correlation
.
Bioinformatics
,
35
,
i596
i604
.

Love
M.I.
et al. (
2014
)
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
.
Genome Biol
.,
15
,
550
.

Memczak
S.
et al. (
2013
)
Circular RNAs are a large class of animal RNAs with regulatory potency
.
Nature
,
495
,
333
338
.

Meng
X.
et al. (
2017
)
CircPro: an integrated tool for the identification of circRNAs with protein-coding potential
.
Bioinformatics
,
33
,
3314
3316
.

Mester-Tonczar
J.
et al. (
2020
)
Circular RNAs in cardiac regeneration: cardiac cell proliferation, differentiation, survival, and reprogramming
.
Front. Physiol
.,
11
,
580465
.

Miao
Q.
et al. (
2021
)
Coding potential of circRNAs: new discoveries and challenges
.
PeerJ
,
9
,
e10718
.

Min
H.
,
Yoon
S.
(
2010
)
Got target? Computational methods for microRNA target prediction and their extension
.
Exp. Mol. Med
.,
42
,
233
244
.

Nielsen
A.F.
et al. (
2022
)
Best practice standards for circular RNA research
.
Nat. Methods
,
19
,
1208
1220
.

Piwecka
M.
et al. (
2017
)
Loss of a mammalian circular RNA locus causes miRNA deregulation and affects brain function
.
Science
,
357
,
eaam8526
.

Qu
S.
et al. (
2015
)
Circular RNA: a new star of noncoding RNAs
.
Cancer Lett
.,
365
,
141
148
.

Rybak-Wolf
A.
et al. (
2015
)
Circular RNAs in the mammalian brain are highly abundant, conserved, and dynamically expressed
.
Mol. Cell
,
58
,
870
885
.

Salmena
L.
et al. (
2011
)
A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language?
Cell
,
146
,
353
358
.

Sticht
C.
et al. (
2018
)
miRWalk: an online resource for prediction of microRNA binding sites
.
PLoS One
.,
13
,
e0206239
.

Suzuki
H.
et al. (
2006
)
Characterization of RNase R-digested cellular RNA source that consists of lariat and circular RNAs from pre-mRNA splicing
.
Nucleic Acids Res
.,
34
,
e63
.

Szabo
L.
,
Salzman
J.
(
2016
)
Detecting circular RNAs: bioinformatic and experimental challenges
.
Nat. Rev. Genet
.,
17
,
679
692
.

Szabo
L.
et al. (
2015
)
Statistically based splicing detection reveals neural enrichment and tissue-specific induction of circular RNA during human fetal development
.
Genome Biol
.,
16
,
126
.

Trincado
J.L.
et al. (
2018
)
SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions
.
Genome Biol
.,
19
,
40
.

Weinstein
J.N.
et al. ;
Cancer Genome Atlas Research Network
. (
2013
)
The Cancer Genome Atlas Pan-Cancer analysis project
.
Nat. Genet
.,
45
,
1113
1120
.

Wen
G.
et al. (
2021
)
The potential of using blood circular RNA as liquid biopsy biomarker for human diseases
.
Protein Cell
.,
12
,
911
946
.

Westholm
J.O.
et al. (
2014
)
Genome-wide analysis of drosophila circular RNAs reveals their structural and sequence properties and age-dependent neural accumulation
.
Cell Rep
.,
9
,
1966
1980
.

Xiao
J.
et al. (
2022
)
Circular RNAs acting as miRNAs’ sponges and their roles in stem cells
.
J. Clin. Med. Res
,
11
,
2909
.

Yang
L.
et al. (
2018
)
Circular RNAs and their emerging roles in immune regulation
.
Front. Immunol
.,
9
,
2977
.

Yu
C.-Y.
,
Kuo
H.-C.
(
2019
)
The emerging roles and functions of circular RNAs and their generation
.
J. Biomed. Sci
.,
26
,
29
.

Yu
K.H.-O.
et al. (
2021
)
Quantifying full-length circular RNAs in cancer
.
Genome Res
.,
31
,
2340
2353
.

Zhang
X.-O.
et al. (
2014
)
Complementary sequence-mediated exon circularization
.
Cell
,
159
,
134
147
.

Zhang
X.-O.
et al. (
2016
)
Diverse alternative back-splicing and alternative splicing landscape of circular RNAs
.
Genome Res
.,
26
,
1277
1287
.

Zhang
Y.
et al. (
2013
)
Circular intronic long noncoding RNAs
.
Mol. Cell
.,
51
,
792
806
.

Zhang
Y.
et al. (
2022
)
Emerging functions of circular RNA in the regulation of adipocyte metabolism and obesity
.
Cell Death Discov
.,
8
,
268
.

Zhang
Z.
et al. (
2018
)
Circular RNAs: promising biomarkers for human diseases
.
EBioMedicine
,
34
,
267
274
.

Author notes

The authors wish it to be known that, in their opinion, Markus Hoffmann, Leon Schwartz and Octavia-Andreea Ciora should be regarded as Joint First Authors.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Associate Editor: Thomas Lengauer
Thomas Lengauer
Associate Editor
Search for other works by this author on:

Supplementary data