GreenCircRNA: a database for plant circRNAs that act as miRNA decoys

Abstract Circular RNAs (circRNAs) are endogenous non-coding RNAs that form a covalently closed continuous loop, are widely distributed and play important roles in a series of developmental processes. In plants, an increasing number of studies have found that circRNAs can regulate plant metabolism and are involved in plant responses to biotic or abiotic stress. Acting as miRNA decoys is a critical way for circRNAs to perform their functions. Therefore, we developed GreenCircRNA—a database for plant circRNAs acting as miRNA decoys that is dedicated to providing a plant-based platform for detailed exploration of plant circRNAs and their potential decoy functions. This database includes over 210 000 circRNAs from 69 species of plants; the main data sources of circRNAs in this database are NCBI, EMBL-EBI and Phytozome. To investigate the function of circRNAs as competitive endogenous RNAs, the possibility of circRNAs from 38 plants to act as miRNA decoys was predicted. Moreover, we provide basic information for the circRNAs in the database, including their locations, host genes and relative expression levels, as well as full-length sequences, host gene GO (Gene Ontology) numbers and circRNA visualization. GreenCircRNA is the first database for the prediction of circRNAs that act as miRNA decoys and contains the largest number of plant species. Database URL: http://greencirc.cn


Introduction
Circular RNAs (circRNAs) have been a hot topic in noncoding RNA research in recent years. First discovered in the 1970s (1,2), circRNAs are characterized by covalently closed-loop structures with neither a 5 cap nor a 3 polyadenylated tail and are generated by back splicing (3).
Because of their special cyclic structure, circRNAs are insusceptible to degradation by RNA exonuclease or RNase R, which suggests that these RNAs have important functions in vivo (4,5). With the development of high-throughput RNA sequencing, thousands of circRNAs have been identified in humans, mammals and fungi (6)(7)(8). Moreover, many circRNAs have been identified in plants, including Arabidopsis thaliana (9), maize (10), tomato (11), wheat (12) and rice (13). Recent studies found that plant circRNAs play regulatory roles in stress response (14). For instance, Vv-circATS1 responds to cold stress in grape (15), and overexpression of circR5g05160 can improve the resistance of rice to Magnaporthe oryzae (16). MicroRNAs (miRNAs) are a class of non-coding RNAs that are approximately 20-24 nucleotides in length (17) and interact with mRNAs to regulate gene expression (18). Studies have shown that circRNAs are able to sequester and inactivate miRNAs by acting as miRNA decoys (also known as miRNA sponges) (19)(20)(21), for which function circRNAs carry a short stretch of sequences that share homology with miRNA binding sites in endogenous mRNA targets. For example, circRNA ciRS-7 has 73 conserved miR-7-binding sites and strongly suppresses miR-7 activity by acting as a miR-7 decoy, affecting expression levels of miR-7 targets (22)(23)(24). Currently, many studies are also focusing on the function of circRNA as a miRNA decoy in plants. For instance, 102 circRNAs were found to act as decoys for 24 corresponding miRNAs in tomato (11), 6 circRNAs that respond to dehydration were found to act as decoys for 26 corresponding miRNAs in wheat (12) and 346 circRNAs were found to act as decoys in Zea mays (25). Thus, inferring the function of plant circRNAs based on circRNAs as miRNA decoys is an effective approach.
With the huge progress in animal circRNAs research, some animal circRNA databases have been established, such as CircNet, Circ2Traits and CircRNADb (26)(27)(28). In contrast, with the advance of plant circRNAs, there are only two databases of plant circRNAs that have been built, namely, PlantCircNet and PlantcircBase (29,30), but there is no database for identifying circRNAs as miRNA decoys in plants. In this study, we established Green Cir-cRNA-a database for plant circRNAs that act as miRNA decoys. This database contains 69 plants in total, with 38 plants having relevant miRNA information. Based on the information available, we investigated circRNAs as miRNA decoys and identified mRNAs as miRNA targets in these 38 plants. These findings will facilitate further analysis of the function of circRNAs as competitive endogenous RNAs. GreenCircRNA is a comprehensive plant circRNA database containing 213 494 circRNAs from 69 plant species using 4116 transcriptome sequencing data sets and provides relevant information, full-length sequences and regulatory networks for those circRNAs. We believe that GreenCir-cRNA is a comprehensive and valuable resource and an important platform for further research on plant circRNAs. The data set used by our database can be downloaded freely. Henceforth, we will continue to supplement the data in GreenCircRNA (http://greencirc.cn).

Aims of the database
CircRNAs play essential roles in regulating plant development and metabolism. The mechanism of formation, function and conservation of circRNAs in plants is the focus of recent circRNA research (6,8,31). To gain a deeper understanding of the mechanism and functions of circRNAs as miRNA decoys in plants, we constructed a comprehensive database for plant circRNAs that act as miRNA decoys. This database was created with three main goals: (i) use of publicly available high-throughput transcriptome sequencing data to identify circRNAs in various plants and archive related information for these circRNAs, (ii) analysis of the potential function of circRNAs by identifying circRNAs that act as miRNA decoys and (iii) provision of a user-friendly website with useful web-based tools for the investigation of plant circRNAs.
CircRNAs as miRNA decoys Functional research on circRNAs is the main challenge in the study of plant circRNAs. To infer the potential functions of circRNAs, we established a circRNA-miRNA-mRNA network in which circRNAs act as miRNA decoys and mRNAs act as miRNA targets. First, we downloaded miRNA sequences from miRBase and PMRD and mRNA sequences from Phytozome12 and  extracted circRNA sequences by using an in-house Perl script. Next, to establish the relationship between miRNAs and other RNAs, we used RNAplex with default parameters to predict RNA-RNA hybridization sites (39,40). We then identified miRNA decoys (circRNAs) and targets (mRNA) following a method proposed in our previous report (41)(42)(43)25). The criteria used to define a miRNA decoy were as follows: no more than six mismatched or inserted bases present between the 9th and 20th nucleotides of the miRNA 5 end, perfect matching of the second to eighth bases of the miRNA 5 end sequence and no more than four mismatches or indels in other regions. The criteria used to define a miRNA target were as follows: at most, one mismatch or indel was allowed between the 9th and 12th positions of the 5 end of miRNA sequences, the total number of bulges or mismatches in the other regions was not allowed to exceed 4 nt and no continuous mismatches were allowed. Finally, a picture of the circRNA-miRNA-mRNA network for a species was generated using Cytoscape (v3.7.2) (44). The influence of circRNAs on other mRNAs via miRNAs can be assessed according to the circRNA-miRNA-mRNA network, and the potential functions of the circRNAs can be inferred.

Full-length sequences of circRNAs
The full-length sequences of circRNAs are important for subsequent analysis of the internal structural features and functions of circRNAs, which can help in evaluating the translation potential of cir-cRNAs. However, most circRNA databases do not provide the full-length sequences of circRNAs. For our database, we used downstream programs of CIRI, named CIRI-full, to assemble the full-length sequences of circRNAs (45). After removing redundant results with the in-house Perl script, we obtained final circRNA full-length sequences.
CircRNA visualization CircRNAs form covalently closed-loop structures with neither 5 -3 polarities nor polyadenylated tails. As providing the position of circRNAs in the genome does not lead to an intuitive understanding of its structure for users, a schematic of each circRNA is available in the database. The schematic consists of two parts: one is a line of the host gene of the circRNA, with the exons and introns in the gene labeled; the other is a circle of the circular structure of circRNA, which is marked with corresponding colors that are the same as the position on the gene. Thus, the position and structure of the circRNA can be observed visually, which makes the information easy to understand.

Usage and access
GreenCircRNA mainly includes the following modules: Home, Species, Search, Download and FAQ (Figure 2). Users can browse, search and download circRNA information through the web interface of GreenCircRNA.
Species Basic information of many circRNAs belonging to 69 plant species is included in this module. There is an individual interface of each plant containing detailed circRNA associated information, including a list of all circRNAs in this plant, a histogram showing the relative expression levels of all circRNAs, a length distribution histogram of the circRNA full-length sequences and a circRNA-miRNA-mRNA network illustrating the potential relationships of  these RNAs. Furthermore, there is a separate page for e very circRNA that displays detailed information for the circRNA, including the location of the circRNA in the ge nome, the relative expression level, the full-length sequence, the host gene and annotation information. Users can obtain all information for a given circRNA on this page, and this page also provides a circRNA network that shows circRNAs as miRNA decoys in a tabular and graphical manner.
Search The module enables users to search for circRNAs by host gene, miRNA ID, circRNA ID and SRA ID. In addition, a subset search is available in this module, and users can search circRNAs by a series of criteria, such as plant species name, chromosome and relative expression level.
Download Related information for circRNAs, including basic information, circ-genome-seq, full-length sequences and network, can all be downloaded for free in CSV format or fasta format from the 'download' module.

Data summary
The GreenCircRNA database covers 69 plant species, which is the highest number of plant species in all plant circRNA databases to date. We downloaded a total of 4116 transcriptome sequencing data sets from SRA and EMBL-EBI for circRNA identification and eventually obtained 213 494 circRNAs. These circRNAs were classified into three categories: exon circRNAs, intron circRNAs and intergenic region circRNAs. Among all the identified circRNAs, 95 010 (44.50%) belong to the exon category, 65 175 (30.53%) belong to the intergenic region category and 53 309 (24.97%) belong to the intron category. We then extracted the sequences of all circRNAs from genome sequences (circ-genome-seq). Furthermore, we assembled  Figure 5A, taking circRNA Zmays_3: 225352281|225 387 878 as an example. The first column is a circRNA that acts as a miRNA decoy, the second column shows miRNAs and the third column contains mRNAs that act as miRNA targets. This circRNA may act as a decoy for three miRNAs: zma-miR171k-3p/zma-miR171h-3p, zma-miR395a-5p and zma-miR395e-5p/zma-miR395h-5p/zma-miR395j-5p/zma-miR395p-5p.
68 237 full-length sequences of circRNAs using CIRI-full and predicted the conditions that circRNAs act as miRNA decoys for 38 plant species (Table 1).

A case study
Taking maize as an example, we downloaded 181 transcriptome sequencing data sets from the SRA and EMBL-EBI databases for circRNA identification and obtained 12 035 circRNAs in total, including 3111 (25.85%) belonging to exons, 7806 (64.86%) belonging to introns and 1118 (9.29%) belonging to intergenic regions ( Figure 3A). By contrast, the results of software CIRI predicted more exon circRNAs, and the result of software CIRCexplorer predicted more intron circRNAs. In addition, 2630 fulllength sequences of circRNAs in maize were assembled by CIRI-full; the relative expression levels and length distributions of the circRNA full-length sequences are displayed in histograms ( Figure 4A and B). Moreover, an individual page shows detailed information for each circRNA, including the location in the genome, the relative expression level and visualization. For instance, the exon circRNA Zmays_10:10199025|10 199 243 is located in the middle of the third exon of the gene 'GRMZM2G046284.v6a' (Figure 3B). We analyzed circR-NAs that may act as miRNA decoys in maize and showed their relationships by the circRNA-miRNA-mRNA network and a table that includes these RNAs involved in the network on the single circRNA page. For example, circRNA Zmays_3:225352281|225 387 878 may act as decoys for three miRNAs: zma-miR171k-3p/zma-miR171h-3p, zma-miR395a-5p and zma-miR395e-5p/zma-miR395h-5p/zma-miR395j-5p/zma-miR395p-5p ( Figure 5).

Discussion and future prospects
Increasing evidence has proven that circRNAs play important roles in various biological processes, but few studies have examined circRNAs in plants. Furthermore, specific circRNA data for analyzing the sequence and structure of circRNAs are not available for most plant species, and the functions and mechanisms of most circRNAs are unclear.
Although two databases of plant circRNAs have been built, these databases cover relatively fewer plant species. In this study, we developed GreenCircRNA, a comprehensive database of plant circRNAs that includes circRNAs in 69 plants, and analyzed the potential decoy function of these circRNAs in 38 plants. Users can freely search and download information related to circRNAs. We hope that as a platform, GreenCircRNA will help researchers to study the basic properties and characteristics of plant circRNAs and will be useful for further research on the internal structure, translation function and mechanism of circRNAs. This database can still be improved, and we will continuously identify and collect more circRNAs and update and improve GreenCircRNA to provide accurate information regarding plant circRNAs and circRNAs as miRNA decoys.