PmiRExAt: plant miRNA expression atlas database and web applications

High-throughput small RNA (sRNA) sequencing technology enables an entirely new perspective for plant microRNA (miRNA) research and has immense potential to unravel regulatory networks. Novel insights gained through data mining in publically available rich resource of sRNA data will help in designing biotechnology-based approaches for crop improvement to enhance plant yield and nutritional value. Bioinformatics resources enabling meta-analysis of miRNA expression across multiple plant species are still evolving. Here, we report PmiRExAt, a new online database resource that caters plant miRNA expression atlas. The web-based repository comprises of miRNA expression profile and query tool for 1859 wheat, 2330 rice and 283 maize miRNA. The database interface offers open and easy access to miRNA expression profile and helps in identifying tissue preferential, differential and constitutively expressing miRNAs. A feature enabling expression study of conserved miRNA across multiple species is also implemented. Custom expression analysis feature enables expression analysis of novel miRNA in total 117 datasets. New sRNA dataset can also be uploaded for analysing miRNA expression profiles for 73 plant species. PmiRExAt application program interface, a simple object access protocol web service allows other programmers to remotely invoke the methods written for doing programmatic search operations on PmiRExAt database. Database URL: http://pmirexat.nabi.res.in.


Introduction
Discovery of functional endogenous microRNAs (miRNAs), which negatively regulate gene expression at the posttranscriptional level in eukaryotes, has dramatically increased in the recent past. Software, tools and web servers enabling large scale RNA-seq and small RNA (sRNA)-seq expression meta-analysis along with comparative and integrative interpretation have started to V C The Author(s) 2016. Published by Oxford University Press.

Page 1 of 10
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
(page number not for citation purposes) proliferate. RNASeqExpressionBrowser (1) and RNA-Seq Atlas (2) offers gene expression analysis and visualization, MIRPIPE (3) supports quantification of miRNAs in niche model organisms lacking genomic sequences, mirEX 2 (4) supports pri-miRNA expression analysis for Arabidopsis thaliana, Hordeum vulgare and Pellia endiviifolia, omiRas (5) is used for differential expression (DE) between two given conditions by uploading sRNA sequencing data and PsRobot (6) takes sRNA sequence fasta or plain text files as input. omiRas and PsRobot require genome sequences for analysis and prediction of new miRNA. miRNA play important role in plant development during different growth stages and stress conditions. Understanding gene regulatory networks involving plant miRNA is critical to design biotechnology-based approaches for crop improvement (7). Here, we report PmiRExAt, a new online database resource that provides the most comprehensive comparative view yet of plant miRNAs (miRs) expression in multiple tissues and development stages of wheat (W), rice (R) and maize (M).

Data sources, data mining and analysis
Non-redundant miRNA collection In this study, the miRNA sequences of three majorly cultivated food crops namely wheat, rice and maize were retrieved from miRBase (release 20) (8), plant miRNA database (PMRD) (9) and recent publications. miRNA sequence redundancy was removed using perl script. One thousand eight hundred and fifty-nine non-redundant (NR) out of 2045 redundant miRNA of wheat, 2330 NR out of 3509 redundant miRNA of rice and 283 NR out of 630 redundant miRNA of maize were analysed further ( Figure 1, Table 1 and Supplementary Table S1).

Development of miRNA expression matrix
SRA datasets were converted into BLAST databases by multiple steps of file processing using SRA toolkit and NCBI BLAST 2.2.28þ package (41). Collected NR mature miRNA were used as query against respective wheat, rice and maize sRNA databases. BLASTn program and in-house shell scripts were used for computing miRNA abundance, following stringent criteria of 100% identity, 0 mismatch and 100% miRNA sequence coverage with sRNA database reads.

Data normalization and visualization
Normalization was done by converting hit counts into transcript per million (TPM ¼ number of miR count in dataset Â 1 000 000/total reads in dataset). Heatmaps were developed after log 2 transformation of TPM values. Normalized expression data (TPM) was sorted on ordinal basis and distributed in 10 categories according to the respective species miRNA numbers for each library/dataset. The heatmaps were developed for each species showing category 1-10 (Supplementary Figure S2 and Table S2).

Differential expression analysis
EdgeR package (43) was used to calculate DE of miRNA. Library-wise DE analysis was performed using normalized TPM values of each library. Tissue preferential analysis miRNA showing tissue preferential expression were screened by the cumulative TPM 80-fold greater than the mean TPM from other tissues (44) along with Shannon entropy calculations using ROKU package (45). We used default parameter of ROKU viz. upperlimit was kept at default value of 0.25 (specifying the maximum percent of tissue as outlier to each miRNA). ROKU picks tissue-specific patterns from expression data of different tissues and it ranks genes by its overall tissue-specificity using Shannon entropy and an outlier detection method for detecting tissues specific to each gene. Shannon entropy was introduced by Claude Shannon for use in communications technology. It is a measure of the information content. Using the combined approach, we found 14 miRNA preferentially expressing in leaves and 2 in spikes of wheat, whereas in rice 2 miRNA in root, 10 in leaf, 5 in anther and 8 in endosperm and in maize 1 miRNA in shoot, 2 in leaf, 2 in anther, 10 in ear, 1 in pollen and 1 in silk ( Figure 2, Table 2). EdgeR package was also used for computing tissue-wise DE for pair of tissue of interest verses mean TPM from other tissues. Logarithmic fold change (logFC), logarithmic counts per million (logCPM), P-value and false discovery rate (FDR) values for such cases are available in Supplementary Table S5.

PmiRExAt database architecture and web interface
PmiRExAt is created with a motive to make miRNA expression database searching easier and user friendly. The web portal is designed with responsive web design approach aimed at crafting it to provide an optimal viewing experience. PmiRExAt has been developed using open source Web 2.0 technologies to enhance the user experience at the web portal. It is developed using Java EE 6 standard and with model-view-controller (MVC) software pattern. MySQL database is used at backend. PmiRExAt uses power of Ajax to asynchronously call server and to provide results on the same page without page refreshing.
We have made use of Hibernate object-relational mapping which consistently offers superior performance over straight Java Database Connectivity (JDBC) code in terms of runtime performance and is designed to work in an application server cluster and deliver a highly scalable architecture. PmiRExAt uses Highcharts application program interface (API) (http://www.highcharts.com) to generate heatmap ( Figure 3) and it also use Morpheus API (http://www.broadin stitute.org/cancer/software/morpheus/) for clustering. PmiRExAt uses cocktail of different web technologies and other third party libraries to provide the user a pleasant experience at PmiRExAt. It uses Bootstrap front end framework to support various screen sizes and Ajax to update web pages asynchronously by exchanging small amounts of data with the server behind the scenes. This means that only parts of a web page get updated without reloading the whole page and thus eliminates the need for unnecessary page reloads.
PmiRExAt uses jQuery Tag-it plugin to handle multitag fields as well as tag suggestions/auto-complete which thus relieves the user of remembering all miRNAs and dataset names; auto completer automatically suggests all names present at PmiRExAt database as soon as user starts typing for names. For generating the heatmaps, it uses Highcharts charting library which generates interactive and dynamic maps. On hovering the cursor over the PmiRExAt also has an API which has published functionalities provided by the web interface to other software components, which want to use already present functionalities of the web interface. API offers application-components like getting all sequences, species or to perform search on PmiRExAt database using multiple search criteria.
There are multiple tabs on web interface which provides the desired browsing, download and custom search options (Figure 4). PmiRExAt users can do desired data mining in this rich processed resource. Apart from availability of intuitive web server interface, PmiRExAt also caters a simple object access protocol (SOAP) web service which allows other programmers to remotely invoke the methods written for doing search operations on database. Quick start guide (Supplementary File S1) will help users in using web interface and API.
Comparative analysis of web resources for miRNA expression analysis sRNA-seq data can be analysed in many ways to find out different aspects of research. Many tools have been developed to analyse data for miRNA expression. Here, we compared features of such available web resources against PmiRExAt to highlight the advantages offered by PmiRExAt. See feature comparison in Table 3.

Usage and utility
Search miRNA expression by miRNA IDs or sequences A NR database of wheat 1859 miRNA, rice 2330 miRNA and maize 283 miRNA was developed from miRBase (release 20, 21), PMRD, plant non-coding RNA database (PNRD) and few miRNA from the publications (Table 1). On the basis of > 1000 TPM in individual datasets, 45 miRNA in wheat, 55 miRNA in rice and 27 miRNA of maize were considered highly expressing miRs. There were many miRNA which were showing zero cumulative abundance (576/1859 wheat, 320/2330 rice and 23/283 maize).

Search miRNA expression in particular tissue of species
Expression of miRNA in a particular tissue can be searched by choosing the tissues of analysed datasets of selected species. User can select one or more than one tissue at a time. After clicking on search button, the expression count matrix will be displayed on interface. User can save expression matrix by 'Export table data' button and user can also click on the hyperlink of miRs which will lead to the source database miRBase (release 21) and PMRD of miRNA for getting more information about the miRNA precursor, stem loop structure, function and its target. User can also generate expression heatmap by clicking on 'Generate heatmap' button and can also generate clustered heatmap. User can download the heatmap in different picture formats like jpeg, pdf, etc. from 'chart context menu' at upper most right-hand side of heatmap.

Custom search for newly detected miRNA sequences in 117 datasets
Maximum five novel sequences can be uploaded at a time for computing their expression matrix against 117 WRM datasets. For this user will have to select 'Start Search' from drop down menu of 'Custom search' then user can input miRNA sequences in fasta format. User has to enter a functional email ID for receiving the result. User can also customize BLASTn parameters viz. percent identity, mismatch and query coverage or user can choose default values. After user has submitted the job, a random key unique to each job is generated on interface that can be used to track the running job or downloading the results.

Custom search of miRNAs expression in new library of sRNA sequences
Newly generated sRNA libraries can be analysed for the PmiRExAt miRs and all other plant species miRNA sequences of miRBase (release 21). Here, user needs to register to get benefits of this facility. After registering the user needs to login and upload the SRA file in zip format and choose the desired plant species to develop expression matrix. Custom search feature is also facilitated with tracking the running job status or download the results by entering the key generated at the time of job submission.

SOAP API and client
There is link to access SOAP web service and Wsdl for API. SOAP message can be formulated and parsed in any chosen languages by application developers. This functionality will be helpful to other programmers/software components to connect to PmiRExAt API.

Download files
All the processed data contained in database that is used to generate the expression tables and heatmaps can be downloaded. (vii) Highresolution heatmaps are generated on the web interface that helps in visualization and interpretations. This web resource and service will help plant science community in studying expression patterns of miRNAs. This website and web service is free and open to all users. Meta-analysis of the publicly available sRNA-seq datasets showed significant expression patterns of several miRs. Data mining in this developed resource has already led to identification of tissue preferential expressing and conserved miRNA. PmiRExAt will help in exploring public sRNA-seq expression data to find supporting evidence for users' findings and hypotheses. These expression profiles can be used as a proxy for relative expression levels of miRNA sequences. It will aid in studying plant miRNA gene function by studying where, when and in response to what these miRNA are expressed. As we expect this project to get bigger in near future, so PmiRExAt is developed keeping an eye on the scalable aspects of the datasets viz. species, miRs, etc. We will keep adding novel miRNA sequences and new sRNA libraries of wheat, rice and maize for better comparative analysis. We will also be adding other agri-food plant species in PmiRExAt database. We will further classify datasets on the basis of developmental stage for more specificity in comparative analysis of miRNA. Micro RNA expression matrices will be useful for studying miRNA regulatory networks in plants.

Supplementary data
Supplementary data are available at Database Online.