Abstract

Motivation

Long non-coding RNAs (lncRNAs) are essential in many molecular pathways, and are frequently associated with disease but the mechanisms of most lncRNAs have not yet been characterized. Genetic variations, including single nucleotide polymorphisms (SNPs) and structural variations, are widely distributed in the genome, including lncRNA gene regions. As the number of studies on lncRNAs grows rapidly, it is necessary to evaluate the effects of genetic variations on lncRNAs.

Results

Here, we present LncVar, a database of genetic variation associated with long non-coding genes in six species. We collected lncRNAs from the NONCODE database, and evaluated their conservation. We systematically integrated transcription factor binding sites and m6A modification sites of lncRNAs and provided comprehensive effects of SNPs on transcription and modification of lncRNAs. We collected putatively translated open reading frames (ORFs) in lncRNAs, and identified both synonymous and non-synonymous SNPs in ORFs. We also collected expression quantitative trait loci of lncRNAs from the literature. Furthermore, we identified lncRNAs in CNV regions as prognostic biomarker candidates of cancers and predicted lncRNA gene fusion events from RNA-seq data from cell lines. The LncVar database can be used as a resource to evaluate the effects of the variations on the biological function of lncRNAs.

Availability and Implementation

LncVar is available at http://bioinfo.ibp.ac.cn/LncVar.

Supplementary information

Supplementary materials are available at Bioinformatics online.

1 Introduction

With the advent of high-throughput sequencing technology, a large number of long non-coding RNAs (lncRNAs) has been identified in various species. Increasing evidence has revealed that lncRNAs play multiple regulatory roles in biological processes, such as chromatin remodeling and gene transcription (Ulitsky and Bartel, 2013). Genetic variations, including single nucleotide polymorphisms (SNPs) and structural variations, are widely distributed in the human genome. Variations in long non-coding gene loci may affect the sequence, structures, expression levels and biological functions of lncRNA transcripts originating from these loci (Hu et al., 2014; Pandey et al., 2014).

The transcription of some lncRNAs is regulated by transcriptional factors binding to their promoter loci (Guttman et al., 2009). Variations in the promoter loci might thus affect the transcription of the lncRNA genes. For example, a SNP, rs944289, was shown to be significantly associated with susceptibility to papillary thyroid carcinoma (Jendrzejewski et al., 2012). Rs944289 is located in a binding site for the transcription factors CEBPA and CEBPB in the promoter of the long non-coding gene PTCSC3 (Papillary Thyroid Carcinoma Susceptibility Candidate 3), and could affect PTCSC3 transcription.

Over the last years, chromosome conformation capture technologies (such as 3C, 4C, 5C, Hi-C, DNase Hi-C) have been developed (Dekker et al., 2013). Thousands of long-range interactions between gene promoters and distal functional elements have been identified in GM12878, K562 and HeLa-S3 cells using 5C (Sanyal et al., 2012). The distal functional elements included enhancers, promoters and transcriptional factors binding sites. Variations in these distal regulatory elements might affect the transcription of target genes, including lncRNA genes. Chromosome conformation capture technologies have brought insights into transcription regulation of lncRNA genes through chromatin looping (Dekker et al., 2013).

More than 100 types of RNA modified nucleotides have been discovered in a wide variety of living organisms, which may affect the activity, cellular location and stability of the modified RNA molecules (Li and Mason, 2014), and which may possibly contribute to the course of various diseases. N6-methyladenosine (m6A), discovered in the 1970s, is the most abundant internal modified nucleotide in mRNAs and lncRNAs (Dominissini et al., 2012). Previous studies have revealed that the m6A modification regions contained a consensus sequence RRACH (Harper et al., 1990; Wei and Moss, 1977). Recent studies have also confirmed the finding using a novel approach, m6A-seq (Dominissini et al., 2012). The biological roles of m6A modification remain to be determined. A recent study found that the carboxy-terminal domain of YTHDF2 selectively bound to m6A-containing mRNA, whereas the amino-terminal domain was responsible for the localization of the complex to cellular RNA decay sites (Wang et al., 2014). This finding indicated that the dynamic m6A modification is recognized by selectively binding proteins to affect the translation status and lifetime of RNA.

Ribosome profiling is a new developed technique that uses specialized messenger RNA sequencing to determine which mRNAs are being actively translated (Ingolia et al., 2009). Recent studies reported that many lncRNAs are bound by ribosomes through ribosome profiling (Ingolia et al., 2011). Several micropeptides encoded by putative lncRNAs have been reported to play important roles in cellular regulation. Myoregulin (MLN), a conserved micropeptide encoded by a putative lnc RNAs, can impede Ca2+ uptake into the sarcoplasmic reticulum through interacting with SERCA directly (Anderson et al., 2015). It is therefore of interest to identify synonymous and non-synonymous variations in lncRNAs that could putatively encode micropeptides.

Copy number variation (CNV) is a form of structural variation. Recently, a genome-wide survey on CNVs of lncRNAs was conducted in tumor specimens from 12 cancer types, and more than 3000 lncRNAs genes were found to be located in regions with focal CNVs (Hu et al., 2014). This study also reported that the copy number and expression of one lncRNA gene, FAL1, were correlated with outcome in ovarian cancer. Other structural variations, such as chromosome translocations and interstitial deletions and inversions, might result in the fusion of two previously separated genes. Recent studies have shown several lncRNA genes to be fused to other protein-coding or non-coding genes. The lncRNA gene GAS5 was found fused to BCL6 in a patient with B-cell lymphoma and fused to CENPL in patients with breast cancer (Nakamura et al., 2008; Norton et al., 2013), and TTTY15 and SNHG8 were found fused to USP9Y and PHF17, respectively, in patients with prostate tumor (Ren et al., 2012). However, so far, no database provides curated information on lncRNA gene fusion events.

Currently, there are several databases about lncRNAs and SNPs, including LincSNP (Ning et al., 2014) and lncRNASNP (Gong et al., 2014). The LincSNP database contains disease-associated SNPs in lincRNAs, whereas the lncRNASNP database has focused on the effects of SNPs on lncRNA secondary structure and lncRNA-miRNA interaction. Neither of the two databases includes information on the effects of SNPs on transcription and modification of lncRNAs, or effects of structural variations on lncRNAs. Here, we have therefore developed LncVar, a database on genetic variations associated with long non-coding genes. We obtained lncRNA genes of nine species (H. sapiens, M. musculus, D. rerio, C. elegans, D. melanogaster, A. thaliana, R. norvegicus, B. taurus and G. gallus) from the NONCODE database, and evaluated their conservation across these nine species. We collected genetic variations only in six species (H. sapiens, M. musculus, D. rerio, C. elegans, D. melanogaster and A. thaliana). There are no enough genetic variation data in the other three species. We characterized all SNPs in the lncRNA genes in the six species, and provided comprehensive information of the effects of SNPs on transcription and modification of lncRNAs. We collected putatively translated open reading frames (ORFs) in lncRNAs, and identified both synonymous and non-synonymous SNPs in ORFs. We also collected expression quantitative trait loci (eQTLs) of lncRNAs from the literature. Furthermore, we identified lncRNAs in CNV regions as prognostic biomarker candidates of cancers and predicted lncRNA gene fusion events from RNA-seq data from cell lines. The aim of developing LncVar is to offer a user-friendly web interface through which users can freely access and conveniently mine genetic variations associated with lncRNA genes.

2 Data collection and processing

LncRNAs of nine species were obtained from the NONCODE database (Xie, C., et al., 2014). Totally, 163 774 lncRNA entries are recorded in LncVar. We applied two methods liftOver (Rosenbloom et al., 2015) and PhastCons (Siepel et al., 2005) to evaluate the conservation of lncRNAs across the nine species (Table1, Supplementary Materials). The majority of lncRNAs were independent transcriptional units, while the lncRNA boundaries were poorly annotated. Enrichment of H3K4me3 at 10-kb intervals surrounding transcription start sites of expressed lncRNAs suggested that lncRNAs possessed actively regulated promoters (Iyer et al., 2015). The 5kb regions upstream of the lncRNA genes’ transcription start sites were considered as potential promoters. SNPs were obtained from NCBI dbSNP Build 138.

Table 1

Data statistics of LncVar database

SpeciesHumanMouseZebrafishFruitflyWormArabidopsisRatChickenCow
LncRNA54 07246 47563039612892247725 555699318 046
Conserveda29 43631 11037620 54139813 118
SNP1 966 2092 314 00915351 7222452140
lncTFBSb700 9961 141 91037 5512183
Predictedc1 101 311344 73718 70731169
SplncTFBSd268 2051 189 6072434
m6Ae66 67511 0128
Peptidef142444041533663
eQTL148
CNVg321
Fusionh908
SpeciesHumanMouseZebrafishFruitflyWormArabidopsisRatChickenCow
LncRNA54 07246 47563039612892247725 555699318 046
Conserveda29 43631 11037620 54139813 118
SNP1 966 2092 314 00915351 7222452140
lncTFBSb700 9961 141 91037 5512183
Predictedc1 101 311344 73718 70731169
SplncTFBSd268 2051 189 6072434
m6Ae66 67511 0128
Peptidef142444041533663
eQTL148
CNVg321
Fusionh908
a

Number of conserved lncRNAs evaluated using liftOver and PhastCons.

b,c,d,e,f

Number of SNPs in lncTFBS, predicted lncTFBS, SplncTFBS, m6A modification region and ORF, respectively.

g

Number of CNV regions of 30 cancers from TCGA Copy Number Portal at Broad Institute.

h

Number of lncRNA gene fusion events predicted using RNA-seq data.

Table 1

Data statistics of LncVar database

SpeciesHumanMouseZebrafishFruitflyWormArabidopsisRatChickenCow
LncRNA54 07246 47563039612892247725 555699318 046
Conserveda29 43631 11037620 54139813 118
SNP1 966 2092 314 00915351 7222452140
lncTFBSb700 9961 141 91037 5512183
Predictedc1 101 311344 73718 70731169
SplncTFBSd268 2051 189 6072434
m6Ae66 67511 0128
Peptidef142444041533663
eQTL148
CNVg321
Fusionh908
SpeciesHumanMouseZebrafishFruitflyWormArabidopsisRatChickenCow
LncRNA54 07246 47563039612892247725 555699318 046
Conserveda29 43631 11037620 54139813 118
SNP1 966 2092 314 00915351 7222452140
lncTFBSb700 9961 141 91037 5512183
Predictedc1 101 311344 73718 70731169
SplncTFBSd268 2051 189 6072434
m6Ae66 67511 0128
Peptidef142444041533663
eQTL148
CNVg321
Fusionh908
a

Number of conserved lncRNAs evaluated using liftOver and PhastCons.

b,c,d,e,f

Number of SNPs in lncTFBS, predicted lncTFBS, SplncTFBS, m6A modification region and ORF, respectively.

g

Number of CNV regions of 30 cancers from TCGA Copy Number Portal at Broad Institute.

h

Number of lncRNA gene fusion events predicted using RNA-seq data.

The processing of genetic variations associated with long non-coding genes is depicted in Figure 1. Data statistics was displayed in Table 1 and Supplementary Table S1. The detailed procedure is explained in the following sections.

The procedure of data processing. (A) Data processing of SNPs associated with lncRNAs. (B) Data processing of identifying lncRNAs as prognostic biomarker candidates. (C) Data processing of identifying lncRNA gene fusion events
Fig. 1

The procedure of data processing. (A) Data processing of SNPs associated with lncRNAs. (B) Data processing of identifying lncRNAs as prognostic biomarker candidates. (C) Data processing of identifying lncRNA gene fusion events

2.1 SNPs in lncRNA transcription regulatory regions

To assess the impacts of SNPs on lncRNA genes transcription (taking data processing in the human genome as an example), we downloaded 508 ChIP-seq datasets from portals to the ENCODE data at UCSC (http://genome.ucsc.edu/ENCODE/dataMatrix/encodeChipMatrixHuman.html), including 84 cell lines and 137 transcriptional factors. These data originated from nine laboratories (Broad, Harvard, HudsonAlpha, Stanford, UChicago, USC, UT-A, UW and Yale). We obtained the peak regions (flanking 50bp of the summit site) from the narrowPeak files (downloaded from ENCODE data at UCSC), removed the duplicated peak regions for the same transcriptional factor in different cell lines and thus obtained confident transcriptional factor binding sites (TFBS). We then used BEDTools v2.9.0 (Quinlan and Hall, 2010) to find TFBS that were located entirely within the 5kb regions serving as promoters of lncRNA genes (lncTFBS). By comparing the genomic coordinates of the lncTFBS with those of SNPs, we found 700 996 SNPs located within the lncTFBS (Table 1, Supplementary Materials). These SNPs might affect the transcription of lncRNA genes through interfering the transcriptional factors binding to the promoter regions. Although we collected lncRNAs from nine species, we obtained ChIP-seq data from only five species (H. sapiens, M. musculus, D. rerio, C. elegans and D. melanogaster). The same analysis pipeline was applied to the other four species (Supplementary Materials).

We also obtained the position weight matrix (a commonly used representation of regulatory motif) of 127 transcriptional factors from the JASPAR database (Mathelier et al., 2016), and predicted TFBS in the 5 kb promoter regions of the lncRNA genes (Supplementary Material). By comparing the genomic coordinates of lncTFBS and SNPs, we identified 1 101 311 SNPs in the predicted lncTFBS (Table 1). We extracted the sequences of the lncTFBS from the reference genome as Ref-TFBS, and changed the reference allele in the Ref-TFBS to the alternative allele as Alt-TFBS. We next calculated the log-odds (LOD, L) score for each lncTFBS (Claverie and Audic, 1996). The LOD score change of the lncTFBS (ΔL) was calculated by the LOD score difference using ΔL=LaltLref, whereLref and Lalt are the LOD scores of the Ref-TFBS and the Alt-TFBS, respectively. The larger the absolute value of ΔL, the greater the influence of the SNP on TF binding properties. The same analysis pipeline was also applied to M. musculus, C. elegans, D. melanogaster and A. thaliana (Supplementary Materials).

The spatial organization of genomes plays an essential role in the regulation of gene expression. Using the newly developed chromosome conformation capture technologies, the spatial organization of genomes is being explored at unprecedented resolution (Dekker, et al., 2013). We downloaded genomic spatial contact data in five cell lines (H1, K562, HeLa, IMR90 and GM12878) from the literature (Jin et al., 2013; Ma et al., 2015; Sanyal et al., 2012). We extracted the spatial interaction partners of the 5kb promoter regions of the lncRNA genes, and identified TFBS (splncTFBS) located in the regions in contact (overlapping with at least one base). By comparing the genomic coordinates of splncTFBS and those of SNPs, we found 268 205 SNPs in the human splncTFBS (Table 1). These SNPs might affect the transcription of lncRNA genes through long-range looping interactions. The same analysis pipeline was also applied to M. musculus and D. melanogaster (Supplementary Materials).

2.2 eQTLs of lncRNA genes

eQTLs have brought insights into the regulation of lncRNAs. Two studies have identified eQTLs associated with several hundred lncRNAs in two human populations through combining genotype data and lncRNA expression levels (Kumar et al., 2013; Montgomery et al., 2010). Totally, 513 entries were recorded in LncVar (Table 1, Supplementary Materials).

2.3 Impacts of SNPs on lncRNA m6A modification

With the advent of high-throughput sequencing technology, a new immunocapturing approach, m6A-seq, have been developed for transcriptome-wide localization of m6A at high resolution. We obtained 32 human m6A-seq datasets, 22 mouse m6A-seq datasets and 1 arabidopsis m6A-seq dataset (Batista et al., 2014; Dominissini et al., 2012; Fustin et al., 2013; Hess et al., 2013; Luo et al., 2014; Meyer et al., 2012; Wang et al., 2014; Zhao et al., 2014). We mapped the reads to respective reference genomes using TopHat v2.0.9 (Trapnell et al., 2009), called peaks using MACS v2.1.0 (Zhang et al., 2008) and selected peaks with an enrichment score of more than 2. We next identified all possible m6A modification regions with a consensus sequence RRACH (Dominissini et al., 2012) in the peak regions within the lncRNA transcripts. We removed duplicated peak regions originating from the same cell line in different datasets, and replaced their enrichment scores with the median values. By comparing the genomic coordinates of the m6A regions with those of SNPs, we found 66 675 SNPs in the consensus sequence of m6A modification regions of human lncRNAs (Supplementary Materials). The statistics of SNPs in m6A regions in the M. musculus and A. thaliana genomes were shown in Table 1.These SNPs might affect the m6A modification on lncRNAs.

2.4 SNPs in micropeptides encoded by putative lncRNAs

We manually collected micropeptides from the literature that are reported to be encoded by putative lncRNAs and detected by several technologies (Banfai et al., 2012; Bazzini et al., 2014; Iyer et al., 2015; Ruiz-Orera et al., 2014; Slavoff et al., 2013; Wilhelm et al., 2014). We mapped the sequences of micropeptides to NONCODE, and identified the lncRNA transcripts that could encode the micropeptides. By comparing the genomic coordinates of SNPs and those of the potential micropeptide-encoding lncRNAs, we found 415 synonymous and 1009 non-synonymous SNPs (Supplementary Materials). Similar statistics for SNPs related to micropeptide-encoding transcripts in M. musculus, D. melanogaster and A. thaliana were shown in Table 1. We have integrated the positions of the SNPs, the position of the altered amino acid and the sequence of the altered micropeptide (if SNPs were non-synonymous) into LncVar.

2.5 LncRNAs in CNVs as prognostic biomarker candidates of cancers

Genomic instability may cause CNVs, and some CNVs contribute to tumorigenesis. The expression levels of lncRNA genes located in CNV loci might be affected. To systematically identify lncRNA genes in CNVs as prognostic biomarkers of cancers, we obtained CNV regions of 30 cancers from the TCGA Copy Number Portal at Broad Institute, and RNA sequencing (RNA-seq) profiles across 30 cancer types from the TCGA. By comparing the genomic coordinates, we identified lncRNA genes located in these CNV regions and reannotated these lncRNA genes using the NONCODE IDs. We extracted the expression levels of these lncRNAs from the RNA-seq profiles, downloaded clinical data on the patients from the TCGA and performed survival analysis. Patient clinical data have been obtained in a manner conforming with IRB and/or granting agency ethical guidelines. We identified 732 lncRNAs as prognostic biomarker candidates (P-value < 0.05, Log-Rank test) and plotted Kaplan–Meier curves for these candidates (Supplementary Materials). These lncRNAs were located in 321 CNV regions which might affect their expressions (Table 1).

2.6 LncRNA genes involved in fusion events

To identify more lncRNA gene fusion events, we downloaded RNA-seq data of seven cell lines from the ENCODE project, and predicted lncRNA genes involved in fusion events using deFuse and FusionMap with the default parameters (Ge et al., 2011; McPherson et al., 2011). As this resulted in many false positives owing to the sequence similarity between genes and their pseudogenes, the results were filtered so that two genes with more than 95% sequence similarity were discarded from the analysis (Supplementary Materials). This yielded 908 putative lncRNA gene fusion events predicted by at least one method in all of the seven cell lines (Table 1). The involved lncRNA genes were reannotated using the NONCODE IDs. We also included supporting read counts for each predicted fusion event to assist users in obtaining confident results.

3 Web interface

Data entries of m6A modification regions, micropeptides, CNVs and gene fusion events were designated systematically in LncVar. The entries of the same data type from one organism were numbered sequentially, starting with a symbol representing the data type and the organism. For example, ‘M6AMMU059693’ denotes an m6A modification region in a mouse lncRNA gene (the beginning ‘M6A’ stands for ‘m6A modification’ and the following ‘MMU’ stands for M. musculus). LncRNA gene IDs from NONCODE, SNP IDs from dbSNP and IDs designated by LncVar were used as primary keys to organize all the data tables in MySQL. The LncVar website was implemented on a LAMP (Linux + Apache + MySQL + PHP) server. Open source JavaScript framework jQuery.js and datatable.js were employed to display almost all the main data tables. LncVar works in all major internet browsers.

The LncVar website provides a user-friendly interface that allows users to search, browse and download data conveniently (Fig. 2). A quick-search box was designed on the homepage, and an advanced search page was provided with multiple possibilities for searching by IDs, genomic positions and batch search by a list of IDs. Furthermore, a fuzzy search box was designed on the upper right of each data table to help users find interested items quickly.

Overview of LncVar web interface. (A) Web page of SNPs associated with lncRNAs. (B) Web page of lncRNAs. (C) Web page of CNV regions. (D) Web page of lncRNA gene fusion events. (E) Detailed information of SNP in SplncTFBS. (F) Detailed information of SNP in lncTFBS obtained from ChIP-seq. (G) Detailed information of SNP in predicted lncTFBS. (H) Detailed information of SNP in m6A modification region. (I) Detailed information of eQTL. (J) Detailed information of SNP in micropeptide encoded by putative lncRNA. (K) Kaplan–Meier curve of lncRNA as prognostic biomarker candidate. (L) The flanking sequence of FUSHSA000001 fusion site
Fig. 2

Overview of LncVar web interface. (A) Web page of SNPs associated with lncRNAs. (B) Web page of lncRNAs. (C) Web page of CNV regions. (D) Web page of lncRNA gene fusion events. (E) Detailed information of SNP in SplncTFBS. (F) Detailed information of SNP in lncTFBS obtained from ChIP-seq. (G) Detailed information of SNP in predicted lncTFBS. (H) Detailed information of SNP in m6A modification region. (I) Detailed information of eQTL. (J) Detailed information of SNP in micropeptide encoded by putative lncRNA. (K) Kaplan–Meier curve of lncRNA as prognostic biomarker candidate. (L) The flanking sequence of FUSHSA000001 fusion site

By clicking buttons on the navigation bar, users can browse the information of lncRNAs, SNPs, CNVs and gene fusion events. The lncRNA section contains information of lncRNAs from nine species. The basic information on each lncRNA includes lncRNA description, isoform and conservation in other species. Information on lncRNAs from H. sapiens, M. musculus, D. rerio, C. elegans, D. melanogaster and A. thaliana also includes associated variations. The SNP section provides the basic information on SNPs and their possible effects on lncRNA expression, modification and function. Users can get detailed information by clicking the ‘Details’ button (Fig. 2E–J). In the CNV section, we listed the lncRNAs that were identified as prognostic biomarker candidates for cancers, and plotted the Kaplan–Meier curve for each candidate (Fig. 2K). In the fusion section, we provided detailed information of each lncRNA gene fusion event as well as the flanking sequence of the fusion site (Fig. 2L).

We have provided a genome browser to display the information collected in the LncVar database (Fig. 3). The genome browser was also implemented to display the detailed information of each data entry at the bottom of each page. Users can view interested genomic positions by inputting text in the ‘chr:start.end’ format, like ‘1:100.900’, in the search box on the top of the browser. Users can also drag the scale bar to adjust the scale of genome. A popup box will open to show detailed information by clicking an element in the genome browser.

Overview of the genome browser, including eight data tracks: Genome, LncRNA, SNP, m6A, Predicted TFBS, Peptide, CNV and ChIP-seq
Fig. 3

Overview of the genome browser, including eight data tracks: Genome, LncRNA, SNP, m6A, Predicted TFBS, Peptide, CNV and ChIP-seq

4 Discussion

LncRNAs are pervasively transcribed in the genomes of various species. An increasing number of studies have revealed the significance of lncRNAs in many biological processes. Variations in the lncRNA gene loci or associated genomic sequences may affect their biological functions. Disease-associated variations could facilitate research into the biological functions of the lncRNAs. It is necessary to evaluate all the possible effects of variations on lncRNAs. In this study, we have systematically integrated transcription factor binding sites, m6A modification regions and ORFs of the lncRNAs, and identified all presently known SNPs in these regions. Furthermore, we have identified lncRNAs in CNVs as prognostic biomarker candidates of cancers and predicted lncRNA gene fusion events from RNA-seq data.

Genome-wide association studies (GWASs) have been widely used to discovery common genetic variation contributing to normal and pathological traits and clinical drug responses. But explaining the biological mechanism of these associations is the major challenge. We obtained a list of published common genetic variation associated with traits from GWAS Catalog (Welter et al., 2014). Compared with genetic variation in LncVar, we found several hundred of SNPs in LncVar have also been reported to be associated with various traits. For example, rs12615966 has been reported to be associated with pancreatic cancer (Low et al., 2010). This SNP is located in the intergenic region of chromosome 2, and the nearest neighbor gene is a long non-coding gene NONHSAG028805. Rs12615966 is located in the upstream of NONHSAG028805. We predicted the possible transcription factor binding sites in the upstream of NONHSAG028805, and found that rs12615966 was located in a predicted binding site of HIF1A. HIF1A has been reported to be associated with metastasis and poor survival in a variety of tumor types (Rankin and Giaccia, 2016). Another example is rs7148498, which has been found to be associated with amyotrophic lateral sclerosis (Xie et al., 2014). Rs7148498 is located in a predicted binding site of SOX10 in the upstream of NONHSAG015802. SOX10 has been reported to be associated with many neurological diseases (Bondurand and Sham, 2013). The prediction in LncVar brought insights to the targets of associations identified by GWASs. The LncVar database links lncRNAs with genetic variations and can be used as a resource to evaluate the effects of these variations on the biological functions of the lncRNAs.

In the future, lncRNAs will attract increasing attention from researchers working in the fields of biology and medicine. The LncVar database will assist researchers in mining information on the lncRNAs from the perspective of genetic variations. The number of reported lncRNAs will grow rapidly and include an increasing number of species. LncVar will be kept updated and built as a helpful repository for lncRNA research. Current release of LncVar contains lncRNAs from nine species, and genetic variations in six of these species. In the future, more genetic variations in more species will be collected and integrated into LncVar. Experimentally validated effects of genomic variation associated with the lncRNAs will also be collected.

Acknowledgements

We thank Geir Skogerbø for careful reading and valuable suggestions on the manuscript. We are grateful to the anonymous reviewers whose constructive critiques helped us improve our manuscript.

Funding

National Natural Science Foundation of China (31520103905) and National High Technology Research and Development Program (‘863’ Program) of China (2014AA021502).

Conflict of Interest: none declared.

References

Anderson
 
D.M.
 et al. (
2015
)
A micropeptide encoded by a putative long noncoding RNA regulates muscle performance
.
Cell
,
160
,
595
606
.

Banfai
 
B.
 et al. (
2012
)
Long noncoding RNAs are rarely translated in two human cell lines
.
Genome Res
.,
22
,
1646
1657
.

Batista
 
P.J.
 et al. (
2014
)
m(6)A RNA modification controls cell fate transition in mammalian embryonic stem cells
.
Cell Stem Cell
,
15
,
707
719
.

Bazzini
 
A.A.
 et al. (
2014
)
Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation
.
EMBO J
.,
33
,
981
993
.

Bondurand
 
N.
,
Sham
M.H.
(
2013
)
The role of SOX10 during enteric nervous system development
.
Dev. Biol
.,
382
,
330
343
.

Claverie
 
J.M.
,
Audic
S.
(
1996
)
The statistical significance of nucleotide position-weight matrix matches
.
Comput. Appl. Biosci
.,
12
,
431
439
.

Dekker
 
J.
 et al. (
2013
)
Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data
.
Nat. Rev. Genet
.,
14
,
390
403
.

Dominissini
 
D.
 et al. (
2012
)
Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq
.
Nature
,
485
,
201
206
.

Fustin
 
J.M.
 et al. (
2013
)
RNA-methylation-dependent RNA processing controls the speed of the circadian clock
.
Cell
,
155
,
793
806
.

Ge
 
H.
 et al. (
2011
)
FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution
.
Bioinformatics
,
27
,
1922
1928
.

Gong
 
J.
 et al. (
2014
)
lncRNASNP: a database of SNPs in lncRNAs and their potential functions in human and mouse
.
Nucleic Acids Res
.,
43
,
D181
D186
.

Guttman
 
M.
 et al. (
2009
)
Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals
.
Nature
,
458
,
223
227
.

Harper
 
J.E.
 et al. (
1990
)
Sequence specificity of the human mRNA N6-adenosine methylase in vitro
.
Nucleic Acids Res
.,
18
,
5735
5741
.

Hess
 
M.E.
 et al. (
2013
)
The fat mass and obesity associated gene (Fto) regulates activity of the dopaminergic midbrain circuitry
.
Nat. Neurosci
.,
16
,
1042
1048
.

Hu
 
X.
 et al. (
2014
)
A functional genomic approach identifies FAL1 as an oncogenic long noncoding RNA that associates with BMI1 and represses p21 expression in cancer
.
Cancer Cell
,
26
,
344
357
.

Ingolia
 
N.T.
 et al. (
2009
)
Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling
.
Science
,
324
,
218
223
.

Ingolia
 
N.T.
 et al. (
2011
)
Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes
.
Cell
,
147
,
789
802
.

Iyer
 
M.K.
 et al. (
2015
)
The landscape of long noncoding RNAs in the human transcriptome
.
Nat. Genet
.,
47
,
199
208
.

Jendrzejewski
 
J.
 et al. (
2012
)
The polymorphism rs944289 predisposes to papillary thyroid carcinoma through a large intergenic noncoding RNA gene of tumor suppressor type
.
Proc. Natl. Acad. Sci. USA
,
109
,
8646
8651
.

Jin
 
F.
 et al. (
2013
)
A high-resolution map of the three-dimensional chromatin interactome in human cells
.
Nature
,
503
,
290
294
.

Kumar
 
V.
 et al. (
2013
)
Human disease-associated genetic variation impacts large intergenic non-coding RNA expression
.
PLoS Genet
.,
9
,
e1003201.

Li
 
S.
,
Mason
C.E.
(
2014
)
The pivotal regulatory landscape of RNA modifications
.
Annu. Rev. Genomics Hum. Genet
.,
15
,
127
150
.

Luo
 
G.Z.
 et al. (
2014
)
Unique features of the m6A methylome in Arabidopsis thaliana
.
Nat. Commun
.,
5
,
5630.

Low
 
S.K.
 et al. (
2010
)
Genome-wide association study of pancreatic cancer in Japanese population
.
PLoS One
,
5
,
e11824.

Ma
 
W.
 et al. (
2015
)
Fine-scale chromatin interaction maps reveal the cis-regulatory landscape of human lincRNA genes
.
Nat. Methods
,
12
,
71
78
.

Mathelier
 
A.
 et al. (
2016
)
JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles
.
Nucleic Acids Res
.,
44
,
D110
D115
.

McPherson
 
A.
 et al. (
2011
)
deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data
.
PLoS Comput. Biol
.,
7
,
e1001138.

Meyer
 
K.D.
 et al. (
2012
)
Comprehensive analysis of mRNA methylation reveals enrichment in 3' UTRs and near stop codons
.
Cell
,
149
,
1635
1646
.

Montgomery
 
S.B.
 et al. (
2010
)
Transcriptome genetics using second generation sequencing in a Caucasian population
.
Nature
,
464
,
773
777
.

Nakamura
 
Y.
 et al. (
2008
)
The GAS5 (growth arrest-specific transcript 5) gene fuses to BCL6 as a result of t(1;3)(q25;q27) in a patient with B-cell lymphoma
.
Cancer Genet. Cytogenet
.,
182
,
144
149
.

Ning
 
S.
 et al. (
2014
)
LincSNP: a database of linking disease-associated SNPs to human large intergenic non-coding RNAs
.
BMC Bioinformatics
,
15
,
152.

Norton
 
N.
 et al. (
2013
)
Gene expression, single nucleotide variant and fusion transcript discovery in archival material from breast tumors
.
PLoS One
,
8
,
e81925.

Pandey
 
G.K.
 et al. (
2014
)
The risk-associated long noncoding RNA NBAT-1 controls neuroblastoma progression by regulating cell proliferation and neuronal differentiation
.
Cancer Cell
,
26
,
722
737
.

Quinlan
 
A.R.
,
Hall
I.M.
(
2010
)
BEDTools: a flexible suite of utilities for comparing genomic features
.
Bioinformatics
,
26
,
841
842
.

Rankin
 
E.B.
,
Giaccia
A.J.
(
2016
)
Hypoxic control of metastasis
.
Science
,
352
,
175
180
.

Ren
 
S.
 et al. (
2012
)
RNA-seq analysis of prostate cancer in the Chinese population identifies recurrent gene fusions, cancer-associated long noncoding RNAs and aberrant alternative splicings
.
Cell Res
.,
22
,
806
821
.

Rosenbloom
 
K.R.
 et al. (
2015
)
The UCSC Genome Browser database: 2015 update
.
Nucleic Acids Res
.,
43
,
D670
D681
.

Ruiz-Orera
 
J.
 et al. (
2014
)
Long non-coding RNAs as a source of new peptides
.
Elife
,
3
,
e03523.

Sanyal
 
A.
 et al. (
2012
)
The long-range interaction landscape of gene promoters
.
Nature
,
489
,
109
113
.

Siepel
 
A.
 et al. (
2005
)
Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes
.
Genome Res
.,
15
,
1034
1050
.

Slavoff
 
S.A.
 et al. (
2013
)
Peptidomic discovery of short open reading frame-encoded peptides in human cells
.
Nat. Chem. Biol
.,
9
,
59
64
.

Trapnell
 
C.
 et al. (
2009
)
TopHat: discovering splice junctions with RNA-Seq
.
Bioinformatics
,
25
,
1105
1111
.

Ulitsky
 
I.
,
Bartel
D.P.
(
2013
)
lincRNAs: genomics, evolution, and mechanisms
.
Cell
,
154
,
26
46
.

Wang
 
X.
 et al. (
2014
)
N6-methyladenosine-dependent regulation of messenger RNA stability
.
Nature
,
505
,
117
120
.

Wang
 
Y.
 et al. (
2014
)
N6-methyladenosine modification destabilizes developmental regulators in embryonic stem cells
.
Nat. Cell Biol
.,
16
,
191
198
.

Wei
 
C.M.
,
Moss
B.
(
1977
)
Nucleotide sequences at the N6-methyladenosine sites of HeLa cell messenger ribonucleic acid
.
Biochemistry
,
16
,
1672
1676
.

Welter
 
D.
 et al. (
2014
)
The NHGRI GWAS Catalog, a curated resource of SNP-trait associations
.
Nucleic Acids Res
.,
42
,
D1001
D1006
.

Wilhelm
 
M.
 et al. (
2014
)
Mass-spectrometry-based draft of the human proteome
.
Nature
,
509
,
582
587
.

Xie
 
C.
 et al. (
2014
)
NONCODEv4: exploring the world of long non-coding RNA genes
.
Nucleic Acids Res
.,
42
,
D98
D103
.

Xie
 
T.
 et al. (
2014
)
Genome-wide association study combining pathway analysis for typical sporadic amyotrophic lateral sclerosis in Chinese Han populations
.
Neurobiol. Aging
,
35
,
1778 e1779
1778 e1723
.

Zhang
 
Y.
 et al. (
2008
)
Model-based analysis of ChIP-Seq (MACS)
.
Genome Biol
,
9
,
R137.

Zhao
 
X.
 et al. (
2014
)
FTO-dependent demethylation of N6-methyladenosine regulates mRNA splicing and is required for adipogenesis
.
Cell Res
,
24
,
1403
1419
.

Author notes

The authors wish it to be known that, in their opinion, the first three authors should be regarded as joint First Authors.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)
Associate Editor: Janet Kelso
Janet Kelso
Associate Editor
Search for other works by this author on:

Supplementary data