LnCeVar: a comprehensive database of genomic variations that disturb ceRNA network regulation

Abstract LnCeVar (http://www.bio-bigdata.net/LnCeVar/) is a comprehensive database that aims to provide genomic variations that disturb lncRNA-associated competing endogenous RNA (ceRNA) network regulation curated from the published literature and high-throughput data sets. LnCeVar curated 119 501 variation–ceRNA events from thousands of samples and cell lines, including: (i) more than 2000 experimentally supported circulating, drug-resistant and prognosis-related lncRNA biomarkers; (ii) 11 418 somatic mutation–ceRNA events from TCGA and COSMIC; (iii) 112 674 CNV–ceRNA events from TCGA; (iv) 67 066 SNP–ceRNA events from the 1000 Genomes Project. LnCeVar provides a user-friendly searching and browsing interface. In addition, as an important supplement of the database, several flexible tools have been developed to aid retrieval and analysis of the data. The LnCeVar–BLAST interface is a convenient way for users to search ceRNAs by interesting sequences. LnCeVar–Function is a tool for performing functional enrichment analysis. LnCeVar–Hallmark identifies dysregulated cancer hallmarks of variation–ceRNA events. LnCeVar–Survival performs COX regression analyses and produces survival curves for variation–ceRNA events. LnCeVar–Network identifies and creates a visualization of dysregulated variation–ceRNA networks. Collectively, LnCeVar will serve as an important resource for investigating the functions and mechanisms of personalized genomic variations that disturb ceRNA network regulation in human diseases.

binding sites. The miRNA-lncRNA interactions were predicted using three miRNA target prediction methods: miRanda (v2010) (3), TargetScan (v. 6.0) (4) and RNAhybrid (v.2.1.2) (5) with strict thresholds (miRanda: score >160 and energy <-20; TargetScan: context score<-0.4; RNAhybrid: mfe <-25 and P<0.01). A functional variation was identified if the different genotypes of a variation could change the miRNA-lncRNA interaction (gain, loss or alternative score). The miRNA-mRNA regulations that were validated by strong experimental methods, such as luciferase reporter assay, PCR and western blot, were derived from TarBase (v8) (6) and miRTarBase (v2018) (7). If the lncRNA and mRNA interacted with the same miRNA, the lncRNA-miRNA-mRNA competing triplet was termed a candidate ceRNA interaction. To further identify functional lncRNA-variation-ceRNA events, we used a multivariate multiple regression model to investigate whether a given variation regulated the expression of the host lncRNA and downstream competing mRNA (see below).

Identification of lncRNA-SNP-ceRNA events
For each lncRNA-SNP-ceRNA unit, we applied a multivariate multiple regression model to explore whether a given SNP could produce or alter the status of some ceRNA relationship ( Figure S2A). So, for the potential competing lncRNA ( ) and mRNA ( ) pair, we included predictors that might have an effect on their expression levels. In particular, they were genotypes of variants across samples (G), the residual of the miRNA expression ( ) value calculated by PEER software (8), the PEER factor of the lncRNA expression level ( ), the PEER factor of the mRNA expression level ( ) and the first three principal components derived from an individual's genotype. The variants used for principal component analysis were filtered according to a previous study (9) and computed using EIGENSTRAT (10). The detailed model was designed as: where ε represents the error vector ( 1 , 2 ) ′ and is assumed to follow a Gaussian distribution.
The PEER factor can correct known technical and biological covariates in the expression profile (9,11); to avoid overfitting, we chose to use 10 PEER factors in each separate population, and to use 30 PEER factors in the combined population ( Figure S2B-C). So, for each analyzed lncRNA-SNP-ceRNA, we can estimate the effect of G on and (referred as and ).
We tested the significance of the model using Pillai's trace test statistics. Further, we expect the trends of the effects of a certain variant on lncRNA and miRNA are opposite in direction, so we only retained lncRNA-SNP-ceRNA units with × < 0 and false discovery rate (FDR) < 0.05.

Identification of lncRNA-CNV-ceRNA events
Similar to the lncRNA-SNP-ceRNA identification process, we also identified CNV-mediated ceRNA units using a multivariate multiple regression model ( Figure S3A). We propose if there exists a CNV in the lncRNA region, it can have an effect on lncRNA expression and alter the competing status of lncRNA and mRNA. So for any lncRNA ( ) and mRNA ( ) competing pair, we sought to investigate the effect of the CNV level (C) of the lncRNA; also, we corrected the miRNA expression ( ) effect, the PEER factor of lncRNA ( ), mRNA ( ) and CNV ( ).

Identification of lncRNA-mutation-ceRNA events
For somatic mutations detected in TCGA and COSMIC samples, we identified mutations located in lncRNA regions that can affect the binding affinity between the mutant and normal reference alleles. Then we mapped them into lncRNA-miRNA-mRNA competing triplets to form mutation-ceRNA events ( Figure S3). In this step, the lncRNA-miRNA-mRNA competing triplets were downloaded from the LncACTdb 2.0 database (2).

Performance of functional analysis based on ceRNA theory
LnCeVar develops the LnCeVar-Function and LnCeVar-Hallmark tools to perform functional analysis of lncRNAs based on a "guilt-by-association" strategy. Ontology annotation, a total of 5,917 gene sets representing functional terms were collected.
We manually curated gene sets of the ten cancer hallmark processes, which have been determined to promote tumor growth and metastasis (12). Gene sets from corresponding GO terms were mapped to each of the cancer hallmarks (13).

Construction of ceRNA networks disturbed by genomic variations
The

Implementation of the LnCeVar-BLAST interface
The LnCeVar-BLAST interface is a convenient way for users to query the dataset by inputting