VARAdb: a comprehensive variation annotation database for human

Abstract With the study of human diseases and biological processes increasing, a large number of non-coding variants have been identified and facilitated. The rapid accumulation of genetic and epigenomic information has resulted in an urgent need to collect and process data to explore the regulation of non-coding variants. Here, we developed a comprehensive variation annotation database for human (VARAdb, http://www.licpathway.net/VARAdb/), which specifically considers non-coding variants. VARAdb provides annotation information for 577,283,813 variations and novel variants, prioritizes variations based on scores using nine annotation categories, and supports pathway downstream analysis. Importantly, VARAdb integrates a large amount of genetic and epigenomic data into five annotation sections, which include ‘Variation information’, ‘Regulatory information’, ‘Related genes’, ‘Chromatin accessibility’ and ‘Chromatin interaction’. The detailed annotation information consists of motif changes, risk SNPs, LD SNPs, eQTLs, clinical variant-drug-gene pairs, sequence conservation, somatic mutations, enhancers, super enhancers, promoters, transcription factors, chromatin states, histone modifications, chromatin accessibility regions and chromatin interactions. This database is a user-friendly interface to query, browse and visualize variations and related annotation information. VARAdb is a useful resource for selecting potential functional variations and interpreting their effects on human diseases and biological processes.


INTRODUCTION
Recent advances in high-throughput technologies, such as DNA sequencing and genome-wide association studies (GWAS), have generated a flood of data associating variants with complex human diseases and phenotypes (1). These variants often lie far from known genes, presumably in transcriptional regulatory regions, such as distal enhancers, transcription factor (TF) binding sites, and accessible chromatin regions (2). Several studies have shown that disease-associated variants were significantly enriched in regulatory elements that are dispersed widely across the genome. Many variants such as rs1198588, rs4442975 and rs117480515 have been found to affect TF binding sites within regulatory regions and regulate gene expression (3)(4)(5). Furthermore, variants nearby or located in enhancers were found to disrupt enhancer-promoter loop resulting in disease and gene expression dysregulation (6)(7)(8)(9). For instance, rs4442975, a strong candidate for causality, flanks a transcriptional enhancer that physically interacts with the promoter of IGFBP5. Researchers have demonstrated that rs4442975 is associated with FOXA1 binding and the low expression of IGFBP5 increased breast cancer susceptibility. In colorectal cancer, risk SNP rs6983267 was found to increase TCF7L2 binding and enhancer activity to regulate c-MYC expression (10,11). Therefore, annotation and analysis of non-coding variants can help to explain GWAS results and understand the genetic structure of diseases. Importantly, integrating variations with genetic and epigenomic information, and long-range interactions will help to select causal regulatory variants and understand underlying regulatory mechanisms in biological processes.
In recent years, several databases and web tools have been developed to annotate non-coding variants based on genetic and epigenomic information (12)(13)(14)(15). HaploReg (12) and rSNPBase (15) are both useful databases for annotating cataloged variants. Different from HaploReg and rSNPBase, RegulomeDB can interrogate the regulatory information of novel variants by using a scoring system (13). These non-coding variant databases and web tools have provided effective platforms and available data for variant exploration. However, a flood of genomic datasets, such as enhancers, super enhancers, TFs, accessible chromatin re-gions and chromatin interactions, is accumulating rapidly, which promotes an urgent need to integrate and process the data for variants comprehensively and effectively. In particular, TFs from ChIP-seq data and motif changes of variants should be further provided. In addition, providing a strategy and regulatory network for variants based on genomic data will help researcher to prioritize potential functional variants in complex diseases. Fortunately, EN-CODE (16), the Roadmap Epigenomics project (17), and UCSC (18), as well as other comprehensive data sources, have uncovered the landscape of massive regulatory elements in the genome. In addition, chromosome conformation capture (3C)-based technologies, Hi-C (19)(20)(21), ChIA-PET (22,23) and 5C (24), have provided increased datasets of chromatin interactions and revealed patterns on how regulatory elements regulate the expression of their target genes (20,25,26). Together, to integrate, prioritize and analyze variation-associated information, building a human variation annotation database is necessary. VARAdb will help to explore the regulatory mechanisms of variations and discover casual variations, which may accelerate the development of variation research.
To this end, we developed a comprehensive VARiation Annotation database for human (VARAdb, http://www. licpathway.net/VARAdb/), which is focused on providing a large number of variations and annotating their potential roles with various regulatory information, in particular, non-coding variants are considered. The current version of VARAdb cataloged a total of 577 283 813 variations and provided five annotation sections: 'Variation information', 'Regulatory information', 'Related genes', 'Chromatin accessibility' and 'Chromatin interaction'. The genetic and epigenomic data involved motif changes, risk SNPs, linkage disequilibrium (LD) SNPs, eQTLs, clinical variant-druggene pairs, sequence conservation, somatic mutations, enhancers, super enhancers, promoters, TFs, chromatin states, histone modifications, ATAC accessible regions and chromatin interactions. Moreover, the database collected two types of variation-related genes: (i) variation related genes based on whether the variation is located in an enhancer or not (if it is, the variation may associate with enhancer target genes predicted by Lasso method) and (ii) variation-related genes based on distance including overlapping genes, proximal genes and the closest gene. In addition, to perform the in-depth analysis of variations, VARAdb was designed to prioritize variations based on score, annotate novel variants and provide pathway downstream analysis. VARAdb is a user-friendly database to query, browse and visualize information associated with variations. We believe that VARAdb could become a useful and effective platform for exploring potential functions and regulation of variations in human diseases and biological processes.

Annotation of variation
We collected and integrated massive data of genetic and epigenomic information as well as chromatin interactions from various data sources, and divided annotation information into five sections including 'Variation information', 'Regulatory information', 'Related genes', 'Chromatin accessibility' and 'Chromatin interaction' (Figure 1). We describe these sections in detail below.
UCSC browser and used the bigwigAverageOverBed tool to measure the conservation of each variation (38).

Motif changes.
Predicting of the effects of variations on the binding of TFs is important to transcriptional regulation. The OncoBase (39) database uses motifbreakR (40) to measure the effects of somatic mutations on TF binding motifs that consists of 2 817 position weight matrices (PWMs) of TFs from four resources including ENCODE (16), FactorBook (41), HOCOMOCO (42) and Homer (43). To score and predict the effects of variation in our database, we added the motifbreak results from OncoBase to VARAdb.

Variant-disease associations.
DisGeNET contains the largest publicly available collections of genes and variants associated to human disease, which integrates data from expert curated repositories, GWAS catalogues, animal models and the scientific Iiterature. We obtained 210 498 variant-disease associations between 117 337 variants and 10 358 diseases, traits and phenotypes from DisGeNET v6.0 (47).

Regulatory information.
We used BEDTools (v2.25.0) (48) to annotate the variations with regulatory information. This annotation information involved four categories: super enhancers; enhancers; promoters, TFs identified by ChIP-seq, chromatin states and histone modifications. The details are described below.

Super enhancers.
Super enhancers are large clusters of enhancers with a higher degree of enrichment for TFs, higher levels of transcription and stronger cell-type specificity (49). We obtained super enhancers involving 542 H3K27ac ChIP-seq samples from a SEdb database that was developed by our group (50). First, we collected H3K27ac ChIP-seq and their corresponding input control sequencing data from ENCODE (16), Roadmap (17), NCBI GEO/SRA (51), and Genomics of Gene Regulation Project (GGR) (16). Second, we used H3K27ac ChIP-seq raw fastq files as input to identify super enhancers controlling normalization and consistency across various data resources. Importantly, we learned and used the streamlined pipeline of Bowtie-MACS-ROSE, which was developed by Loven et al. (52). In detail, we ran Bowtie (53) to align all raw sequencing reads for each H3K27ac ChIP-seq file, and MACS (54) to call all peaks for the consistency of calling peaks. Then, we used the ROSE programs to identify super enhancers by stitching peaks (52).

TFs identified by ChIP-seq.
To validate the effects of variations on TF binding, we focused on TFs that were identified by ChIP-seq, which has been reported to be effective for validating the relationships between TFs and regulatory regions. From the TRlnc database that was also developed by our group, we obtained TFs from 7 734 ChIP-seq samples across 467 sample types from 44 tissues (61). We collected a total of 952 TFs from ENCODE (16), Remap (62), Cistrome (63), ChIP-Atlas (64) and GTRD (65). In VARAdb, we also considered the classifications of TFs including superclass, class, family and subfamily from TF-Class database (66).

Promoters.
We considered two types of promoters. One type was defined as 2 kb upstream and the other type was 1 kb downstream of TSSs. We determined the first type of promoters based on the basic gene annotation file of release 33 from GENCODE (68). The other was determined based on the promoter relevant states from ChromHMM core 15 states including 1 TssA (Active TSS) and 2 TssAFlnk (Flanking Active TSS).
Related genes. To enable the identification of related genes, eQTL genes from available resources and risk SNPassociated genes from GWAS were first considered because of their importance to human complex diseases and common phenotypes. We also considered another two strategies to obtain variation-related genes. One was the ROSE (52) method for predicting related genes based on distance, including overlapping genes, proximal genes, and the closest gene. The other was the Lasso algorithm for predicting enhancer target genes. We downloaded the relevant results based on the Lasso algorithm from a study by Cao et al. (69). When variations were located in the enhancers, the target genes of the enhancers were considered to be related to those variations.
Chromatin accessibility. Accessible chromatin regions can be identified using ATAC-seq and DNase-seq and have enriched multiple regulatory elements and variations. Regulatory elements located in the accessible chromatin regions were reported to associate with the regulation of the distal gene resulting in heterogeneity. We cataloged the ATAC-seq accessible regions of 99 samples from Cistrome (63) and 23 cancer types from TCGA (44), as well as DNase-seq accessible regions of 243 samples from ENCODE (16).
Chromatin interaction. Regulatory elements, such as enhancers and super enhancers, are anchored to the promoter regions of genes via chromatin looping to affect gene transcription. In VARAdb, we collected chromatin interaction data from three databases. We downloaded human chromatin interaction data of five types of experiments (Hi-C, ChIA-PET, IM-PET, 3C, 4C and 5C) from 4DGenome (70), ChIA-PET narrow peaks of five cell types (MCF-7, HeLa-S3, K562, NB4 and HCT116) from ENCODE (16), and chromatin interactions identified by the EpiTensor algorithm from OncoBase (39).

Scoring strategy
In VARAdb, each variation was scored based on its annotated records on nine annotation categories: risk SNP, eQTL, motif change, conservation, enhancer/super enhancer, promoter, TF binding, ATAC accessible region and Hi-C. For a variation, we first organized its annotated records on different categories. Then we calculated the score of the variation, which is defined as follows: where Score(i) ranges from 0 to 9, which is the number of categories the variation i is associated with. If variation i is associated with annotation category k, then the category k (i) is 1. Otherwise, the category k (i) is 0. The scoring strategy can help users to prioritize and filter the variations that are studied in their work.

A search interface for retrieving variation
VARAdb enables users to search, browse, analyze, visualize, and download variations of interest (Figure 2A). On the 'search' page, VARAdb provides five query methods for searching variation information. These include 'Search by rsID or location' (input rsIDs or a location of interest), 'Search by enhancer' (input enhancer location and select an enhancer type), 'Search by TF' (input a TF), 'Search by disease/trait/phenotype' (input disease/trait/phenotype name) and 'Search by gene' (input a gene) ( Figure 2B). On the 'Search by rsID or location' results page, users can obtain a summary  Figure 2D) and click on 'rsID' the of summary table to view the details about each variation ( Figure 2E). On the details page, VARAdb displays the variation overview, statistics charts, variation-gene-enhancer network and five annotation sections, these include: (i) regulatory information; (ii) variation information; (iii) related genes; (iv) chromatin accessibility and (v) chromatin interaction. In VARAdb, the different annotation sections have various annotating modules in VARAdb. For example, in the 'Related genes' section, users can select two modules to see genes associated with the variation ( Figure 2E). Importantly, users can download the data table of interest on the details page by clicking on the 'download' button.

User-friendly browsing
To allow users to browse all variations more conveniently and quickly, we have designed two modules 'Browse all variations' and 'Browse risk SNPs'. The 'Browse' page is an interactive and alphanumerically sort  Figure 2F). For each risk SNP, the score ranges from 6 to 9 and was calculated based on annotation categories as previously described above. The 'Show entries' drop-down menu can change the number of records per page. In addition, users can view the details of risk SNPs by clicking on 'rsID'.

Data visualization
VARAdb exhibits different charts about data distribution and annotation statistics on the web interface. The relationships between variations and proximal genes, as well as between enhancers and proximal genes, are shown using a D3 network visualization plugin ( Figure 2E). In addition, users can view regulatory information of variations in the genome by using the 'Genome-Browser' page. Through selecting useful tracks, users can further obtain detailed regulatory information, including common SNPs, conservation score, enhancers, nearby genes, conserved TF binding sites, TFBS by ChIP-seq, genome segments, ATAC accessible chromatin regions and DNase I hypersensitive sites (Figure 2G).

Statistics and download
On the 'Statistics' page, users can see can see digital and graphical statistical displays in the VARAdb ( Figure  2H). The 'Download' page exhibits 'Variation information', 'Regulatory information', 'Related genes', 'Chromatin accessibility' and 'Chromatin interaction' data for users to download. Moreover, the detailed descriptions of files are also displayed.

Online analysis tools
Novel variant annotation. Recent advances in low-cost, high-throughput DNA sequencing have expanded genetic content and resulted in dramatically increased numbers of variants in the human genome. Most of the newly identified variants are non-coding variants, which are not collected and annotated in time. Moreover, biologists need to obtain various genetic and epigenomic information of the novel variants they want to explore. To strengthen the practical application of novel variants, we have designed 'Novel variant annotation' to provide five annotation sections, where the information includes accessible chromatin, chromHMM state, histone modification, TF, chromatin interaction, super-enhancer, promoter, gene, eQTL, GWAS, somatic mutation and SNV ( Figure 2I; Figure 3A, B). User can not only manually input many novel variants (one variant with chrN position ref allele alt allele per line) but also upload a set of variants in a text file or a VCF file. For example, we have provided annotation information of a TERT promoter mutation (chr5:1295228:G>A) which is currently the best established example of recurrent cis-regulatory mutation found across various cancers (71). Users can first click 'New variant annotation' in 'Analysis', then input the genomic location (chr5:1295228) and the allele of the mutation (G>A), and select one annotation type (the default: Super enhancer) to start. On the results page, VARAdb not only displays the basic information about the mutation enriched in many super enhancers. In addition, users can directly select to see other annotation results by clicking on the nodes of the annotation image. VARAdb can help users investigate the potential functions of the novel variant according to annotation results with a friendly web interface.
Pathway downstream analysis. Regulatory elements are often enriched in many non-coding variants and play key roles in cancers. Especially, enhancers/super enhancers were reported to concentrate in multiple signaling pathways at key genes (72). The alterations of signaling pathways cover terminal DNA-bound TFs (73). In VARAdb, users can input genes of interest and select at least one pathway database (ten pathway databases including KEGG, Reactome, Net-Path, WikiPathways, PANTHER, PID, HumanCyc, CTD, SMPDB, and INOH) to perform pathway downstream analysis based on the complex regulatory networks (74)(75)(76)(77)(78)(79)(80)(81)(82)(83)(84). Users can not only manually input genes (one gene symbol per line) but also upload a gene set in a text file. In addtion, VARAdb provided an apparent introduction to the function at the bottom of the database. When analyzing, VARAdb will provide different indications explaining why there are no results. After submitting, VARAdb will identify significantly enriched pathways, downstream TFs, common SNPs located in TF binding regions, and common SNPassociated genes ( Figure 2J, Figure 3C). The enriched pathways are calculated using a hypergeometric test with the following formula: where the P value is the enrichment significance of the pathway, n genes are in the entire genome, and s genes of interest are provided, of which m genes are involved in the pathway containing k genes. In addition, the FDR is also calculated to correct for multiple testing. VARAdb will show the output table containing summary information of enriched pathways including Pathway ID, Pathway name, Pathway source, Annotated gene, Annotated gene number, Total gene number, TF and TF number, P value, and FDR. Furthermore, the detailed description of the regulatory networks can be obtained by clicking the 'Detail' button (Figure 3D, E). Users may understand the roles of SNPs in the biological process based on affected TF motifs.

Case study
To illustrate the usage of VARAdb, we searched the database with a well-studied variation rs2279590 (chr8:27456253:T>C, score = 6) by inputting the 'rsID' or its genomic location (Supplementary Figure S1A). The variation was associated with age-related disorders such as Alzheimer's disease (AD) and type 2 diabetes (85). We will validate the key predictions of the variation using VARAdb by collecting experimental data from studies. Notably, after clicking 'rsID' in the output table that first displayed the summary annotation information (Supplementary Figure S1B), the details page exhibited an overview of rs2279590, variation-gene-enhancer network, and five annotation sections (variation information, regulatory information, related genes, chromatin accessibility and chromatin interaction) (Supplementary Figure S1C). From 'Variation information' and 'Related genes', as a Risk SNP, rs2279590 was annotated as a genetic risk factor of AD and reported to regulate gene expression of CLU, which was consistent with studies of GWAS information (86,87). We found that CLU was the related gene of rs2279590 because CLU was not only the closest gene of rs2279590 but also an eQTL gene (Supplementary Figure S1C, left panel), which was validated by the experimental results of Padhy et al. (85). From 'Regulatory information', we found that rs2279590 was located in an enhancer of AD. Moreover, rs2279590 was enriched in super enhancers where CLU The results page provides general details of the enriched pathway, pathway-associated network, and information about common SNPs affecting TF binding and its proximal genes. The pathway-associated network is formed by pathway, TFs, common SNPs and common SNP-associated genes.
is the closest active gene. In addition, rs2279590 was associated with many histone marks such as H3K4me1 and H3K27ac as well as accessible chromatin regions (Supplementary Figure S1C, middle panel), which are in line with some studies (85). More importantly, the relationship between CLU and rs2279590-associated super enhancers was further validated by chromatin loops according to Hi-C and ChIA-PET data in the 'Chromatin intersection' (Supplementary Figure S1C, right panel). The relationship between variation and TF is crucial to transcriptional regulation and worth detecting. From motif change data of 'Variation information', rs2279590 was predicted to strongly affect the binding of three TFs including NFATC1, RFX1 and SREBF1. These TFs were regulated by other functional features such as receptors associated with AD (88-90) (Supplementary Figure S1C, left panel). For instance, N-methyl-D-aspartate receptors open channel blocker memantine is used to treat AD, which correlates with a reduction in T-cell receptor (TCR)-induced Ca(2+) mobilization and nuclear localization of NFATC1 (89). From the TF ChIP-seq data of 'Regulatory information', VARAdb displayed rs2279590-associated TF binding information based on ChIP-seq data, which is an effective high-throughput experiment strategy for validating TF binding results (Supplementary Figure S1C, middle panel). We found that 14 out of 26 rs2279590-related TFs were reported to be associated with AD, these were CEBPA, E2F6, EP300, ERG, ETS1, FOS, GATA2, IKZF1, NR4A1, RUNX1, SPI1, SPIB, TCF7L1 and WT1 (91)(92)(93)(94)(95)(96)(97)(98)(99)(100)(101)(102)(103)(104). For example, lower expression of TF SPI1 reduces AD risk by regulating myeloid gene expression and cell function (96). We also found that TF MAX was associated with AD but not reported by in existing studies. MAX was enriched in AD-related Gene Ontology terms such as DNA binding and protein binding based on relevant results from EN-SEMBL (105). In addition, these 14 TFs can be used to identify rs2279590-related pathways. After performing the 'Pathway downstream analysis', the output table showed that the 'ERK cascade', 'Glucocorticoid receptor regulatory network', and 'Regulation of Androgen receptor activity' were significantly enriched. Above all, these results showed the usefulness and value of VARAdb in exploring the potential function and regulation of variation.
Risk SNP rs6983267 (chr8:128413305:G>T, score = 6) was used as another example to access VARAdb. The variation is associated with colorectal cancer pathogenesis. After searching rs6983267, users can obtain various annotation results in the details page (Supplementary Figure S2A, B). We found that rs6983267 was predicted to affect the binding of TF JUND, which was further validated in HCT116 (sample type: cell line, tissue: colon) by TF ChIP-seq data from 'Regulatory information' (Supplementary Figure S2C, left panel). In colorectal cancer, rs6983267 was associated with MYC. Indeed, VARAdb identified MYC as a variationrelated gene and enhancer-related target gene, which was consistent with previous studies (10,11). Notably, Wright et al. has presented evidence that the risk region harboring rs6983267 physically interacts with the MYC oncogene in colorectal cancer and rs6983267 binds to TCF7L2 (10). In addition, an in vivo study by Sur et al. has reported that deletion of the enhancer region, including rs6983267, re-sults in resistance to intestinal tumors (106). Interestingly, we found that rs6983267 was not only located in some super enhancers and typical enhancers identified by H3K27ac ChIP-seq data in colon relevant tissue and cell type, but also in many chromatin regions with the H3K4me1 mark (Supplementary Figure S2C, middle panel), as well as accessible regions in cholangiocarcinoma from ATAC-seq data of 'Chromatin accessibility' (Supplementary Figure S2C, right panel). A study by Wright et al. validated the predictions involving rs6983267, where this variation had enhancerrelated histone marks such as H3K4me1 and was located in enhancers that form a 335-kb chromatin loop to interact with the c-MYC promoter (10). In addition, we also found that rs6983267 was associated with PCAT1 which was not reported in existing studies. In VARAdb, PCAT1 is an overlapping gene of rs6983267, which was reported to correlate with MYC (107). Moreover, GWAS data and other studies (10,11) showed that MYC was a rs6983267-associated gene. Therefore, these results suggest that VARAdb could explore the potential function and regulation of genes that are related to the variation.
The research community can access information freely in the VARAdb database without registration or logging in. The URL for VARAdb is http://www.licpathway.net/ VARAdb/.

DISCUSSION
Expanding the functional datasets collected and integrating them across more effective resources will improve the functional predictions of non-coding variants in human complex diseases and phenotypes (108). The genetic and epigenomic information, as well as chromatin interactions, are important for interpreting the roles of non-coding variants enriched in various regulatory elements (109). Compared with the existing databases (12,13,15,109), VARAdb focuses on the comprehensive annotation of a large number of variations and massive regulatory annotation information. VARAdb not only provides annotation for the cataloged variations but also the novel ones. The current version of VARAdb cataloged a total of 577 283 813 variations which are ∼three times more than similar databases and provided five annotation sections including 'Variation information', 'Regulatory information', 'Related genes', 'Chromatin accessibility', and 'Chromatin interaction', with significantly more information than similar databases (Table  1). f FANTOM5 enhancers proximal genes were predicted by the ROSE method and divided into three types: the closest, proximal and overlapping. g Variation proximal genes were also predicted by the ROSE method and divided into three types: the closest gene, proximal genes and overlapping genes. h Accessible regions predicted by ATAC-seq were obtained from Cistrome and TCGA.
VARAdb has rich annotations, scoring strategy, regulatory element information and useful analysis tools. Furthermore, VARAdb provides a user-friendly interface to browse, search, analyze, and visualize information about variations. Table 1, which compares VARAdb with other databases for information and functions, shows the advantages of VARAdb. We provide: (i) the annotation of 577 283 813 variations; (ii) the scores for 577 283 813 variations (range from 0 to 9) based on the annotated records of nine annotation categories: risk SNP, eQTL, motif change, conservation, enhancer/super enhancer, promoter, TF binding, ATAC accessible region and Hi-C; (iii) five annotation sections of variation, each of which contains detailed annotation information; (iv) online analysis tools including 'Novel variant annotation' and 'Pathway downstream analysis'; (v) user-friendly visualization of genomic information of vari-ants linking to the UCSC genome browser by adding multiple tracks; (vi) five search paths to access variation; (vii) user-friendly browsing; (viii) detail pages containing different charts for relationships between variations, risk SNPs, enhancers, genes and TFs.
VARAdb supports annotating the cataloged and novel variations in many categories. In the process of developing the database, we considered this matter. Because there are many data sources, we designed multiple sections and data modules for effectively managing updates in a modular way. This will effectively ensure the controllability of data and improve the efficiency of updating the database. Specifically, we designed five annotation sections: 'Variation information', 'Regulatory information', 'Related genes', 'Chromatin accessibility', and 'Chromatin interaction'. For each section, we divided many sources of the same or similar characters into a group, which is an independent module. In other words, each module contains several data sources shown by a drop-down menu in the database. Because the stored information is based on the module of five sections, we can control the quality of data and ensure that VARAdb is user-friendly and effective. We developed a workflow for updating the database (Supplementary Figure S3). When updating each data source, the quality of data will be controlled according to the modular process. If a series of requirements are met, the data source will be updated and the score of variation may be also calculated again (Supplementary Figure S3). In the future, the sources of the annotation data may accumulate rapidly. Therefore, we will continue to collect available resources, process these data to enrich the regulatory information and provide a new scoring strategy based on TF binding sites, eQTLs, enhancers and super enhancers, ATAC accessible regions and Hi-C considering tissue/cell-specific features. Moreover, we will add new methods of analysis in VARAdb such as upstream analysis. VARAdb will help users to predict the function of variation and provide perspectives of variation regulation.

SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.