NcPath: a novel platform for visualization and enrichment analysis of human non-coding RNA and KEGG signaling pathways

Abstract Summary Non-coding RNAs play important roles in transcriptional processes and participate in the regulation of various biological functions, in particular miRNAs and lncRNAs. Despite their importance for several biological functions, the existing signaling pathway databases do not include information on miRNA and lncRNA. Here, we redesigned a novel pathway database named NcPath by integrating and visualizing a total of 178 308 human experimentally validated miRNA–target interactions (MTIs), 32 282 experimentally verified lncRNA–target interactions (LTIs) and 4837 experimentally validated human ceRNA networks across 222 KEGG pathways (including 27 sub-categories). To expand the application potential of the redesigned NcPath database, we identified 556 798 reliable lncRNA–protein-coding genes (PCG) interaction pairs by integrating co-expression relations, ceRNA relations, co-TF-binding interactions, co-histone-modification interactions, cis-regulation relations and lncPro Tool predictions between lncRNAs and PCG. In addition, to determine the pathways in which miRNA/lncRNA targets are involved, we performed a KEGG enrichment analysis using a hypergeometric test. The NcPath database also provides information on MTIs/LTIs/ceRNA networks, PubMed IDs, gene annotations and the experimental verification method used. In summary, the NcPath database will serve as an important and continually updated platform that provides annotation and visualization of the pathways on which non-coding RNAs (miRNA and lncRNA) are involved, and provide support to multimodal non-coding RNAs enrichment analysis. The NcPath database is freely accessible at http://ncpath.pianlab.cn/. Availability and implementation NcPath database is freely available at http://ncpath.pianlab.cn/. The code and manual to use NcPath can be found at https://github.com/Marscolono/NcPath/. Supplementary information Supplementary data are available at Bioinformatics online.


Introduction
Understanding the mechanisms of gene regulation is a major challenge in molecular biology and bioinformatics. The rapid development of high-throughput sequencing technologies (Pai et al., 2021) and the emergence of new multi-omics technologies (Nam et al., 2021;Sun et al., 2016) significantly expanded research on gene regulation at the transcriptional, post-transcriptional, translational and post-translational levels. Previous studies showed that RNAprotein interactions regulate gene expression by controlling various post-transcriptional processes, which in turn directly or indirectly affect disease development (Liu et al., 2020). The dysregulation of non-coding RNAs, in particular microRNAs (miRNAs) and long non-coding RNAs (lncRNAs), is closely associated with a variety of biological processes and disease development (Siomi et al., 2009), whereby it is essential to obtain additional references and evidence to elucidate the molecular mechanisms involved . miRNAs are small non-coding RNAs with around 18-26 nucleotides that are transcribed from DNA sequences into primary miRNAs, and subsequently processed into precursor and mature miRNAs in animal and plant species. Since the first miRNA gene  was discovered in 1993 (Lee et al., 1993), more than 35 000 miRNA sequences have been identified across more than 270 organisms (Kozomara et al., 2019). miRNA can induce mRNA decapping and alkenylation via base binding with the complementary sequence of the 3 0 -untranslated region, which negatively regulates gene expression by accelerating mRNA degradation or suppressing mRNA translation, affecting major pathways in a post-transcriptional fashion (Bartel, 2018;Jonas et al., 2015;Shivdasani, 2006;Stroynowska-Czerwinska et al., 2014). A high number of studies reported that miRNAs play a role in crucial various cell activities, such as cell cycle (Carleton et al., 2007), cell proliferation (Cheng et al., 2005), differentiation (Turner et al., 2011), apoptosis (Cheng et al., 2005), metabolism (Chen et al., 2006), cellular signaling (Shivdasani et al., 2006), embryonic development (Alvarez-Garcia et al., 2005), virus defense (Moy et al., 2014) and hematopoietic processes (Lin et al., 2011). In addition, miRNAs have been associated with several diseases, especially in different types of cancer (Akao et al., 2006;Hirota et al., 2012;Tavazoie et al., 2008), whereby these non-coding RNA molecules represent good candidate biomarkers for potential diagnosis and prognosis of cancer and other diseases (Jiao et al., 2019).
Combining the above database of miRNAs and lncRNAs can greatly promote our understanding and propel research efforts on non-coding RNAs. However, not only the data and methods presented in most databases need to be updated, but also most of these tools fail to simultaneously integrate mRNA, miRNA and lncRNA for comprehensive analysis. In addition, only few studies explored the integration of both miRNA and lncRNA with pathway databases. The development of a more comprehensive relationship of non-coding RNAs and gene regulation in human biological pathways thus represents an urgent research goal at present. Importantly, enrichment analyses based on reference genes and noncoding RNA (miRNA and lncRNA) sets will be useful for analyzing non-coding RNA lists of interest submitted by different users.
In order to facilitate the study of pathway-related miRNA, we reported the first version of the NcPath database named miRþPathway (Pian et al., 2020), which allows users to search relationship networks of 115 pathway-related miRNAs and provides an miRNA-based gene enrichment analysis tool. Since it was first released in 2019, several more MTIs and lncRNA-target interactions (LTIs) were identified, in particular through experimentally validated approaches. Hence, this work stems from the need to update the old version with more resources and functions to improve the tool. Accordingly, we developed a novel human pathway visualization database named NcPath ( Fig. 1 and Table 1), which focuses on accommodating human pathway information with various available resources for non-coding RNA-target interactions, and performs visualization and enrichment analysis of non-coding RNA lists submitted by users. NcPath supports 178 308 human experimentally validated MTIs and 32 282 experimentally verified LTIs across 222 KEGG pathways (including 5 main categories and 27 sub-categories). We also recorded 4837 experimentally validated human ceRNAs relationships between lncRNAs and miRNAs. Furthermore, by integrating co-expression relations, co-transcription factor (TF)-binding interactions, co-histone-modification (HM) interactions, ceRNA relations, cis-regulation relations and lncPro Tool predictions between LncRNAs and protein-coding genes (PCG) in dozens of databases, we were able to identify 556 798 reliable lncRNA-PCG interaction pairs involved in different gene pathways. Moreover, gene enrichment analysis associated with gene regulation, miRNAs and lncRNAs can be performed simultaneously.
NcPath bridges the gap between biological pathways and human lncRNAs/miRNAs to enhance our understanding of lncRNAs/ miRNAs function, particularly the investigation of the roles played by these types of RNAs in physiological and pathological processes. In summary, NcPath constitutes a more complete platform that provides two types of non-coding RNA sets in pathways for users, and performs visualization and enrichment analysis of non-coding RNA sets submitted by users.

Data selection and processing
The new version of NcPath supports non-coding RNAs annotation records and targets for Homo sapiens, which were obtained from the latest release of org. Hs.eg.db (Release 3.11), miRBase (v.22.1) (Kozomara et al., 2019) and GENCODE (Frankish et al., 2021). In the case of miRNAs, the validated MTIs were acquired from miRTarBase v.8 (Chou et al., 2018), which has accumulated >2 200 449 verified MTIs. The ceRNA relationships between lncRNA, miRNA and mRNA were obtained from LncACTdb3.0 (Wang et al., 2022). For lncRNAs, the experimentally verified LTIs were collected from RNAinter (Lin et al., 2020) and NPInter v4.0 (Teng et al., 2020) databases. We obtained all the relationship pairs about human lncRNA and PCG from RNAinter, which can be divided into 'strong' and 'weak' relationships according to the experimental means. In order to enrich the number of 'strong' relationships, we collected the human lncRNA-PCG interactions from the NPInter database, which identified by Clip-seq. Meanwhile, the lncRNA genes and PCG were present in the GENCODE annotation file (Frankish et al., 2021) of V33. Finally, 36 363 and 75 672 lncRNA-PCG relationship pairs were collected in NPInter v4.0 and RNAinter, respectively. In total, 110 486 experimentally verified and non-repetitive LTIs were obtained. Among them, 32 282 LTIs associated with the PCGs in 222 pathways were used in NcPath.
In addition, other six types of lncRNA-mRNA interactions, for which the specific details are as follows: i. Co-expression relation between LncRNAs and PCGs. We used the annotation file of the whole transcriptome downloaded from GENCODE V33 (Frankish et al., 2021) (including 22 742 PCG and 17 907 lncRNAs) and expression data downloaded from GTEx (we selected 27 normal tissues with each sample size more than 30, including 7806 samples) to compute the Pearson correlation coefficient (PCC) and adjusted P-value (fdr) between lncRNA and PCG, respectively. We then used the mean PCC of 27 tissues as the final co-expression score of lncRNA and PCG. The higher the absolute value of the average correlation coefficient, the stronger the interaction of lncRNA and PCG. ii. ceRNA relation between lncRNAs and mRNAs. The interaction relationships of miRNAs (2656 miRNAs in the miRBase database) and their target genes were downloaded from seven distinct databases, specifically: StarBase ( , 2020). We collected 99 010 miRNA-lncRNA and 1 869 961 miRNA-mRNA interaction pairs in total. A hypergeometric test was performed to identify the ceRNA relation of lncRNAs and mRNAs. The corrected BH P-value and the shared miRNAs of each lncRNA-mRNA pair were then calculated. The smaller the corrected BH P-value, the stronger the interaction of lncRNA and mRNA. iii. Co-TF-binding interaction of LncRNA-PCG pairs. A total of 11 348 human TF ChIP-seq and 11 079 human HM ChIP-seq datasets involving 1117 TFs and 77 HM patterns, respectively, were collected from the Cistrome database (Zheng et al., 2019). The datasets with <30 targets were removed, with 9489 TF ChIP-seq and 9384 HM datasets left for analysis.
Considering that a higher number of common TFs (HMs) combined with lncRNA and mRNA led to a stronger co-existence, we measured the possibility of co-existence using the Jaccord coefficient: iv. TF (i) and TF (j) represent the TF set interacting with the mRNA i and the lncRNA j, respectively. v. Co-HM interaction of LncRNA-PCG pairs. Similarly, the co-HM score was defined to measure the possibility of coexistence in terms of HM. This definition assumes that the pairs of identified genes were subjected to co-selection pressures (more fortuitous than expected) during evolution and are therefore considered to be functionally related. The higher the co-TF score and the co-HM score, the stronger the interaction of lncRNA and mRNA. vi. lncRNA cis-regulation relation. Genes located 20KB upstream or downstream of an lncRNA were considered cis-regulated by the lncRNA. LncRNA and mRNA annotation files were downloaded from GENCODE (Frankish et al., 2021), with a total of 21 128 cis-regulated lncRNA-mRNA relationship pairs being identified. vii. lncRNA-protein interaction pairs identified by the prediction tool lncPro. Firstly, the lncRNA and protein sequences were downloaded from GENCODE V33. In order to improve efficiency, we modified the source code of lncPro (Mann et al., 2017) to calculate the properties of all proteins in batches, and then calculated the interaction scores between all transcripts and protein sequences. The final lncRNA-mRNA interaction score was defined as the mean of all possible interaction scores between multiple transcripts and multiple corresponding protein sequences. After this, we used lncPro to generate the scores (0-100) measuring the binding possibility between protein and lncRNA using sequence information. The higher the score, the stronger the interaction of lncRNA and mRNA.
For the specific process and results of screening the reliable relationship between lncRNA and mRNA, please refer to the section 'Screening of the reliable relationship between lncRNA and mRNA' in the Supplementary Material. By integrating the very comprehensive MTIs and LTIs mentioned above, we defined the relationship between gene and non-coding RNAs as 'strong' and 'weak' according to their collected means. Specifically, we defined the relationship as strong after verification by multiple powerful verification methods, such as Luciferase reporter assay or Clip-seq collected in RNAinter and NPInter. The relationship verified by other six computational methods, were deemed as insufficient evidence and called weak.

Pathway platforms and non-coding RNAs set enrichment analysis
In this update, a total of 222 human pathways were manually redrawn by integrating information from non-coding RNAs and original KEGG pathways, which cover five main categories in humans, specifically: genetic information processing, environmental information processing, cellular processes, organismal systems and human diseases. To determine whether specific non-coding RNAs are associated with specific signaling pathways, we used enrichment analysis functions. The enrichment significance P-value for that reference set is calculated as: For each pair of non-coding RNA and signaling pathway, we applied a hypergeometric test to evaluate whether the pathway contains significantly more target genes than expected by chance. The P-values were BH-adjusted and a significance level of 0.05 was selected.

Web server implementation
The current version of NcPath was organized using MySQL 5.7.17 (http://www.mysql.com) and operates on a Linux-based Aliyun Web server. Adopting the idea of separating front and back ends, the web server uses Golang Gin v1.7.7 and Python V3.6 mixed development. NcPath was built using microservice architecture, and gRPC was used for communication between microservices. The frontend is built using common HTML, CSS and Javascript libraries, including the Vuetify framework for styling, Apache Echarts for the visualization library and the Vue.js framework for the side. Vue.js is a popular front-end technology and the mainstream progressive framework for building user interfaces. We chose this software due to better performance advantages compared to other front-end mainstream frameworks.

Overview of the NcPath database
The main elements of NcPath, including the redrawn KEGG pathways, the collection of MTIs/LTIs/ceRNA networks and the user interface are shown in Figure 1. NcPath provides a user-friendly interface to browse, search and download detailed information on all reference non-coding RNAs sets in pathways. In particular, NcPath provides enrichment analysis of non-coding RNA sets.

Browsing interface for conveniently retrieving noncoding RNA sets in pathways
To provide a user-friendly interface, we redesigned the displayed pathway map based on KEGG datasets. NcPath enables users to intuitively determine the non-coding RNAs that target a gene in a pathway, the genes that are regulated by non-coding RNAs in a pathway and the non-coding RNAs that participate in a pathway ( Fig. 2A). The user can efficiently browse the interactions of interest with the pathway-centric, miRNA-centric and the lncRNA-centric, in one-to-many or many-to-many relationships. In the choose box in the upper left corner of the 'Browse' interface, users can select or search for a specific pathway by inputting the pathway's name and click on the 'GO' button on the right to enter the pathway screen. When users browse a pathway map, the red number shown in the upper right corner of the gene represents the sum of the number of non-coding RNAs (miRNAs and lncRNAs) known to regulate this gene ( Fig. 2A). The number of relationships between the gene and the associated non-coding RNAs will be displayed in the floating window when the cursor of the mouse touches the region where the gene is located in the map ( Fig. 2A). The right column shows the list of non-coding RNAs related to target gene of choice. There are two main categories, including miRNAs and lncRNAs, each with a list of subcategories with strong and weak relationships. If users wish to obtain the gene set related to the non-coding RNA of interest in reverse, they can place the mouse cursor on the RNA entry in the list on the right column, and all genes related to the target non-coding RNA will be displayed with an orange border. In addition, there is a search box in the middle of the screen that allows users to directly locate the input target gene or the related genes of input non-coding RNA. The specific details on the gene and related non-coding RNA in the new pathway map can be viewed upon clicking. For example, the user can click on the RNA located in the right column to enter its basic annotation information and obtain the list of all pathways associated with this RNA (Fig. 2E). If the user clicks the gene icon of the pathway map, the page will redirect to the interactive network and annotation interface ( Fig. 2B and C). The interactive network diagram (Fig. 2B) allows users to visualize the interaction relationships between non-coding RNAs and PCG. NcPath provides five types of interactions based on experimental validations and predictions of datasets, including miRNA-strong, lncRNA-strong, miRNA-weak, lncRNA-weak and ceRNA, which are represented by five different colors in the network (Fig. 2B). The information being displayed can be controlled by the user by clicking on the relationship icons. In addition, the MTIs/LTIs references related to the target gene will be displayed in the middle of the page, including the PMID hyperlink, the support type and the method used for experimental verification of the interactions (Fig. 2C). To facilitate further studies of the function and mechanism of non-coding RNAs, ceRNA networks are also displayed in the list at the bottom of the 'Results' page.

Effective online tool for non-coding RNA set enrichment analysis
After generating the non-coding RNA and target gene network with NcPath, users can proceed to the interpretation of the results, an essential part of the analysis process. NcPath provides non-coding RNA (lncRNA and miRNA) set enrichment analysis for users that includes a total of two input modes (Fig. 2F). In the single-class mode, the user can decide from five upload options, i.e. whether to select a single lncRNA, a single miRNA, a list of lncRNAs, miRNAs or genes. In addition, a multi-class mode query (lncRNAs, miRNAs and mRNA are included) can be initiated to find function pathways that have a significant impact on these merged inputs. The details on the enrichment results are shown on the return page; users can click on the title bar of the table to sort, or on the 'p-adjust' to use the adjusted P-value and preferentially display the pathway information that is significantly enriched for the target non-coding RNAs set (Fig. 2G). The bottom part of the page shows the visualization results of the enrichment analysis, including bubble charts and histograms (Fig. 2H). Users can click the pathway name in the enrichment result list to enter the pathway for browsing, or click the number in the 'genes' column to obtain the gene set enriched in the pathway. All significant reference pathways and visualization results for the enrichment analysis are provided for review and download. NcPath thus provides rapid access to commonly used gene/miRNA/ lncRNA set enrichment tools that help researchers focusing on the relevant hits.

Newly designed and more user-friendly interface
The 'Search' page is organized as an interactive table that allows users to quickly search for non-coding RNA sets and pathwayrelated information. Users can search for non-coding RNAs of interest using multiple features, including gene symbol, entrez ID or accession (Fig. 2D). The results page of the query returns the basic information and related pathways of non-coding RNAs or genes (Fig. 2E). For example, if users enter an lncRNA search, the program displays a reference table for each lncRNA that includes gene symbol, Entrez ID, organism, map location, description, other aliases and summary. This information enables a rapid understanding of the related functions of the specific lncRNA. Users can browse more information from the drop-down menu on the right. In particular, the results page returns all pathways related to the query non-coding RNAs, which are available to users upon clicking on each category. It is worth mentioning that, in the process of browsing information, users can enter the specific annotation webpage when encountering nouns of interest in the pathway map or annotation information through the hyperlinks. This will allow users to obtain more relevant details, e.g. by selecting the hyperlink in the references of the non-coding RNA to browse in more detail. In the lower part of the 'Search' interface, users can browse all non-coding RNA-related interaction tables related to the target pathway by searching the pathway name, which makes it easier to directly obtain the desired information (Fig. 2D).
3.5 Use case 1--non-coding RNAs related to critical illness in COVID-19 As the first use case, we studied the target genes and pathway networks of CDKN2B-AS1, a gene that has been previously described in humans and associated with various functions (Shen et al., 2020;Thakur et al., 2021). CDKN2B-AS1 is involved in a large number of Secondly, based on the research of coronavirus disease-COVID-19, we are able to search for this pathway in the NcPath database to obtain the total interaction set of all non-coding RNAs involved in the target pathway (Fig. 3B). We also can browse the gene network in the pathway map (Fig. 3C) and the regulation of each gene by non-coding RNAs (which ones and how many) (Fig. 3D). The previous research showed that low expression levels of IFNAR2 are associated with life-threatening COVID-19 disease (Pairo-Castineira et al., 2021), and that the interferon-inducible oligoadenylate synthetase (OAS) genes are implicated in susceptibility to SARS-CoV, based on candidate gene association studies performed in Vietnam and China (Hamano et al., 2005;He et al., 2006). In other words, IFNAR2 and OAS are both key genes in the coronavirus disease-COVID-19. In NcPath, we can find that the OAS gene cluster is not only affected by other genes, but also regulated by 25 miRNAs and 360 lncRNAs, such as there is an interaction relationship between RMRP and OAS1 (Fig. 3D). In addition, we found that IFNAR2 is regulated by 163 miRNAs and 549 lncRNAs (Fig. 3F). Through the NcPath database, researchers can visually and directly find the ncRNA regulation information associated with these important genes in the pathway.
The above operations are only to browse and search the information obtained by searching the NcPath, providing users with interactive information recorded in the database. But, how to obtain the CDKN2B-AS1 target gene enrichment pathway? We input CDKN2B-AS1 into the lncRNA enrichment frame for enrichment analysis, and the list of results is shown in Figure 3G and H, most of which can be found to be related to CDKN2B-AS1 by searching the literature Pan et al., 2021;Tian et al., 2017). In addition, we focus on those pathways that have been enriched but not verified experimentally, which can be considered as one of the directions for researchers to further explore the mechanism related to CDKN2B-AS1. Our results show that NcPath's interactive regulatory network can aid the researchers for the rapid query of experimentally verified regulatory relationships, and help guiding the discovery of potential regulatory mechanisms.
3.6 Use case 2--integrated lncRNA/miRNA/gene analysis in squamous cell carcinoma The regulation of gene expression in physiological and pathophysiological processes is a central question in biomedical and life science research. To explore the role played by differential miRNAs, lncRNAs and mRNA expression patterns simultaneously, it is necessary to implement enrichment analysis of these genes and noncoding RNAs. We used NcPath to perform functional analyses on the genes differentially expressed in tongue squamous cell carcinoma (Zhou et al., 2019). The dysregulated mRNA, miRNA and lncRNA expression profiles of squamous cell carcinoma and normal tissues were extracted from The Cancer Genome Atlas. The 10 most significant pathways uncovered are provided as a results table (Fig. 3I), with the bubble chart and histogram shown in Figure 3J. The top hit with an adjusted P-value of 3.82 Â 10 211 was the 'Cell cycle', and a previous study showed that silencing SHMT2 inhibits the progression of tongue squamous cell carcinoma through cell cycle regulation (Liao et al., 2021). The second most significant hit was 'mTOR signaling pathway', with an adjusted P-value of 1.04 Â 10 28 , followed by 'Adherens junction', and 'Cellular senenscence'. Importantly, numerous experiments demonstrated these pathways affect the occurrence and development of tongue squamous cell carcinoma (Coccè et al., 2017;Li et al., 2018Li et al., , 2019Lorch et al., 2007;Lu et al., 2021;Yan et al., 2020;Yang et al., 2019). It is worth mentioning that the addition of differential ncRNAs information can be significantly enriched to more pathways than the results of mRNA list enrichment analysis alone (see Supplementary Tables S2-S4 for details). In particular, in the joint list enrichment results, these topranked additional enriched pathways have been reported to be definitively associated with tongue squamous cell carcinoma compared to mRNA list analysis alone. (Coccè et al., 2017;Li et al., 2018Li et al., , 2019Liao et al., 2021;Lorch et al., 2007;Lu et al., 2021;Yan et al., 2020;Yang et al., 2019). This use case again demonstrates the practical application of NcPath, which allows for comprehensive analysis using pairings of three RNA classes, and helps researchers focusing on potentially more relevant biological findings.

Discussion
We present a significant update of our web server NcPath that allows for the integrative analysis of miRNA, lncRNA and target genes with pathway interaction networks. While the original version was focused on miRNA, we now offer support for lncRNA and a higher number of pathways. NcPath is the first database providing an enrichment analysis on miRNAs, lncRNAs and mRNAs together as input. The development of new technologies and the accumulation of experimental data resulted in the generation of an increasing number of miRNA-, lncRNA-and ceRNA-related information. We will also include additional experimental sets to extend our data sources and support more powerful enrichment analysis tools. In the future, NcPath will continue supplementing more non-coding RNA classes and new features to further enhance functional interpretations, following the developments in biology. In addition, we will focus on expanding the number of species and collections, and providing users with more efficient enrichment analysis methods in the future.