RTPDB: a database providing associations between genetic variation or expression and cancer prognosis with radiotherapy-based treatment

Abstract In recent years, lots of studies have reported the relationship between genetic variation or expression and cancer prognosis with radiotherapy-based treatment. However, due to limitation in available journals or literature database, inconsistent nomenclature system of genetic variation and cancer and time-consuming investigation on literature searching and reading, considerable researches could hardly get found and cited. In this study, we constructed the Radiotherapy Prognosis Database (RTPDB), which contains a comprehensive resource about genes and related cancer prognosis. It included 775 studies, which consist of 275 Single Nucleotide Polymorphism (SNP) studies with 59 765 patients, 261 genes, 708 SNPs, 16 tumors and 16 treatment types, and 500 expression studies with 55 751 patients, 264 genes, 27 tumors and 15 treatment types. The names of genes and their variants were converted and displayed in the form of the official symbol. The detailed information of the tumor, treatment and prognosis were classified. We hope RTPDB will be a useful resource with great potential for researches on genes, variants and cancer prognosis.


Introduction
Radiotherapy is a common treatment for cancer today, alone or in combination with other treatments. According to the American Society of Radiation Oncology, >60% of cancer patients will receive radiotherapy-radiotherapy using high-energy radiation to shrink tumors and kill cancer cells-contributing to 40% of curative treatment for cancer (1). When the DNA of a cancer cell is damaged by radiation, it will stop dividing or it will die, and then it will be eliminated by the immune system. As surgery will inevitably remove normal tissues, chemotherapy has its drawbacks in unavoidably killing normal cells. Radiotherapy not only kills cancer cells but also affects normal cells around the cancer cells, and further leads to side effects. Based on the time of occurrence, it can be divided into acute and late side effects. Various cancer patients have different sensitivity to radiotherapy. In general, high sensitivity to radiotherapy and mild side effects are important for long-term survival (2).
Analogous to pharmacogenomics, the term radiogenomics is used to explain the differences in radiotherapy response between individuals. In 2009, a Radiogenomics Consortium was established to facilitate and promote multi-center collaboration of researchers linking genetic variation with response to radiotherapy (3). It might lead to improved decision making, and as a result, improved patient outcomes (4). During the past years, thousands of researches have studied the relationship between genetic variation or expression and patient prognosis who received radiotherapy-based treatment. Most of them focus on genes which take part in cell growth, differentiation, proliferation, and apoptosis, such as XRCC1 (5) and TP53 (6). Unlike pharmacogenomics, radiogenomics studies now lack a database similar to PharmGKB, which is responsible for the aggregation, curation, integration and dissemination of knowledge regarding the impact of human genetic variation on drug response (7). Due to the relatively small number of radiogenomic studies compared to the pharmacogenomics and the dispersion of the literature, the advances in radiogenomics have been hindered. Most of the researches could hardly get found and cited. Therefore, a high-quality resource platform with unlimited use, standard nomenclature and convenient searching process is believed to be of great value in the understanding of gene variants or expression and cancer prognosis under radiotherapy.
In this paper, we describe the Radiotherapy Prognosis Database (RTPDB), a comprehensive online database established to collect the associations between genetic variants or expression and cancer prognosis of patients who received radiotherapy-based treatment and were documented in biomedical literature. It is the first database for genes and related cancer radiotherapy prognosis. The database offers exciting opportunities for scientists and clinicians to better explore the overview of the relationship between genes and related cancer radiotherapy prognosis. In addition, it will be helpful for researchers to understand the mechanism of cancer prognosis with radiation treatment. The RTPDB can be publicly accessed from http://www.rtpdb.com/.

Software design and implementation
In RTPDB database, all data sets were organized in our web server using the client-server model based on Python, Django, JavaScript and PostgreSQL. The database is available at http://www.rtpdb.com/. RTPDB contains pages for searching, browsing and downloading.

Literature searching and inclusion
In addition to the common database such as China Knowledge Infrastructure, PubMed, Web of Science, EMBASE and Google Scholar, we also conducted a detailed manual review for the references of the studies included in the database by two different authors (C.D.Z. and Y.Y.). Articles published before August 2018 were searched with a combination of mesh term and keywords as follows: Single Nucleotide Polymorphisms, Gene Expression Neoplasms, Radiotherapy and Prognosis. No restriction on publication language, year or geographic region was imposed. All eligible studies were retrieved with the PDF file. The literature was included according to the following criteria: (1) studying the relationship between gene variants or expression and cancer prognosis, (2) patients received radiotherapy with or without other treatments and (3) using one of the statistics that Odds Ratio (OR), Hazard Radio (HR) and Risk ratio (RR) and their 95% Confidence Interval (CI) to evaluate the relationship.

Alias conversion for genes and variants
We converted all the gene symbols into the official symbols with NCBI Gene database (8) and used Gene ID to get the official full name, gene type, alias, summary and gene pathways provided by MyGene.info (9). For variants, the HGVS (Humane Genome Variation Society) name or else were converted into dbSNP RS ID with Google search, SNPedia (10) and NCBI dbSNP database (8). The allele frequency of each SNP in African, Ad Mixed American, East Asian, European and South Asian were collected from 1000 Genomes (11).

Classification for treatment, tumor and prognosis
For treatment, we first confirm the type of treatment the patients received, for example, radiotherapy, chemotherapy, surgery and hormone therapy. Then, we need to confirm whether all patients received the above treatments. Finally, the treatment could be classified into 'Radiotherapy +/± Chemotherapy +/± Surgery +/± Hormone therapy', which means all patients received radiotherapy and all (±)/partial (±) patients received other treatments.

Data extraction
Aside from the above information, we also included the following data: patients' clinical information including total patient number, age (median, mean and range), sex ratio, ethnicity, tumor stage and patient number in each stage, study's basic information including publication data (year, type and journal), language, abstract and the relationship between genetic variants or expression and prognosis. In expression studies, we also collected the detection method and cut-off value of gene expression. All data were manually curated and collected. Some studies do not provide the complete information, so you may get 'Not Provided' in some fields.

Calculating OR
Partial studies use Fisher's exact test to evaluate the association between gene variants or expression and cancer prognosis. Compared with Fisher's exact test, OR could quantify the association with 95% CI. We use data in 4-fold

Gene Ontology and Pathway Enrichment Analysis
After collecting the studies matching the criteria of our study, we extracted all the significant genes (SNPs were annotated by their genes) for further analysis. Gene ontology and pathway enrichment analysis were performed on DAVID 6.8 (https://david.ncifcrf.gov/). The enrichment P values of both GO (Gene Ontology) and pathway enrichment analyses were set as significant when P < 0.05. GO analysis and pathway analysis result table can be accessed in the Download page of the website.

Results and discussion
The RTPDB web interface The data in RTPDB can be easily accessed from http://www. rtpdb.com/. First, users need to choose 'Search SNP' or 'Search Expression' at homepage (Figure 1). Second, enter their interested gene, variant and tumor and click 'Search'. Then click the literature title and the corresponding result will be shown on the result page. In the result page, users can acquire study information, clinical information of patients, treatment types, allele frequency of SNP and the relationship between genetic variants or expression and prognosis (OR/HR/RR with 95% CI). Besides, when you search 'Expression', the RTPDB provides the detection method and cut-off value of gene expression. The database also provides hyperlinks to original references for each included study. The website is compatible with Google Chrome, and we highly recommend using Google Chrome for RTPDB.

Data included in the database
The literature search yielded >18 000 publications. To meet the need of RTPDB construction, we selected literatures that provide the association between gene variants or gene expression and cancer prognosis of patients who received radiotherapy-based treatment. More importantly, the associations were evaluated by OR, HR and RR with 95% CI. After filtering, the studies unable to meet the inclusion criteria were excluded based on title, abstract or method and result.
In  Table 1). Database included 115 516 patients which consist of 59 765 patients in variant studies and 55 751 patients in expression studies. The number of patients in most studies is <200 (Figure 4). A study with 150 patients will only have a power of <30% to detect the association (12). It is unlikely to get a more comprehensive result with a small sample size, so further researches should include more patients.   Top 10 tumors in SNP and Expression are shown in Figure 5A and B, respectively. Breast cancer, lung cancer, esophageal cancer and nasopharyngeal cancer are the most studied cancers, which is basically consistent with the morbidity and mortality in many reports of cancer statistics. A total of 275 variant studies included 261 genes and 708 SNPs, and 500 expression studies included 264 genes. Interestingly, only one of the 700 SNPs is located in the intergenic region. Most studies have paid attention to genes that take part in the important pathways such as cell cycle. However, the intergenic region of a genome may have influences on the function of genes. Top 10 SNPs and genes which have been studied are shown in Figure 5C and D, respectively.
Less than one-fifth of studies use radiotherapy alone. Most of them choose radiotherapy combination with chemotherapy, surgery and hormone therapy ( Figure 6, Supplementary Tables 2 and 3). Whether before or after surgery, radiotherapy will shrink the tumor so that the risk of recurrence will be reduced. Each relationship contains four major items, which are tumor, gene with variant or expression, prognosis with endpoint and OR/HR/RR with 95% CI. RTPDB consists of 2608 relationships between SNPs and prognosis and 1874 relationships between gene expression and prognosis.
The significant genes were combined together and input to DAVID web server. Top 15 biological process (such as DNA repair, response to X-ray), cellular component (such as nucleoplasm, replication fork) and molecular function (such as damaged DNA binding, double-stranded DNA binding) were plotted in Supplementary Figure 1. For pathway analysis, we excluded pathways associated with specific tumors and left those pathways that are more instructive for biological processes, such as VEGF signaling pathway, T cell receptor signaling pathway, etc. The detailed figure was shown in Supplementary Figure 2.

Conclusion
Thousands of studies have reported the relationship between SNPs or gene expression and cancer prognosis of patients who received radiotherapy-based treatment. In this  article, we developed the RTPDB database to collect and curate the associations. The number of entries in RTPDB is not very large. That is because of the follow up of cancer patients after treatment is time-consuming. However, many researchers began to realize the importance of gene variants and expression in cancer prognosis. The prognosisrelated gene and variant may become the biomarker for personalized treatment. With the development of studies, more cancer prognosis-related genes and variants are expected to be published and included into RTPDB. The purpose of RTPDB is to provide comprehensive resource about the association between gene variants or expression and cancer prognosis.
Next step, we plan to update RTPDB every 2 months with new published studies, and other types of variants will be included if the number of studies is enough. Meanwhile, the open access data related to radiotherapy in The Cancer Genome Atlas, Gene Expression Omnibus and Sequence Read Archive will be integrated into the RTPDB database. We believe that RTPDB would be useful for the studies focusing on the associations among gene variants or expression and cancer prognosis and will offer more help when more data were included in the future.

Supplementary data
Supplementary data are available at Database Online.