-
PDF
- Split View
-
Views
-
Cite
Cite
Xavier Farré, Nino Spataro, Frederic Haziza, Jordi Rambla, Arcadi Navarro, Genome-phenome explorer (GePhEx): a tool for the visualization and interpretation of phenotypic relationships supported by genetic evidence, Bioinformatics, Volume 36, Issue 3, February 2020, Pages 890–896, https://doi.org/10.1093/bioinformatics/btz622
- Share Icon Share
Abstract
Association studies based on SNP arrays and Next Generation Sequencing technologies have enabled the discovery of thousands of genetic loci related to human diseases. Nevertheless, their biological interpretation is still elusive, and their medical applications limited. Recently, various tools have been developed to help bridging the gap between genomes and phenomes. To our knowledge, however none of these tools allows users to retrieve the phenotype-wide list of genetic variants that may be linked to a given disease or to visually explore the joint genetic architecture of different pathologies.
We present the Genome-Phenome Explorer (GePhEx), a web-tool easing the visual exploration of phenotypic relationships supported by genetic evidences. GePhEx is primarily based on the thorough analysis of linkage disequilibrium between disease-associated variants and also considers relationships based on genes, pathways or drug-targets, leveraging on publicly available variant-disease associations to detect potential relationships between diseases. We demonstrate that GePhEx does retrieve well-known relationships as well as novel ones, and that, thus, it might help shedding light on the patho-physiological mechanisms underlying complex diseases. To this end, we investigate the potential relationship between schizophrenia and lung cancer, first detected using GePhEx and provide further evidence supporting a functional link between them.
GePhEx is available at: https://gephex.ega-archive.org/.
Supplementary data are available at Bioinformatics online.
1 Introduction
Genome-wide association studies (GWAS) based on SNP-arrays, and more recently on exome and full-genome sequencing, have been the choice strategies for genome-phenome studies exploring links between genomic variants and complex diseases (Lowe and Reddy, 2015). Over the last decade, huge amounts of information have been gathered contributing to a better understanding of disease etiology and pathophysiology and paving the way to improved diagnosis, prognosis and treatment (Boycott et al., 2013; Visscher et al., 2017). Despite many advances, most disease-associated loci remain functionally elusive (Boycott et al., 2013; Visscher et al., 2017), limiting the medical exploitation of the knowledge stored in publicly available databases (Lappalainen et al., 2015; Lek et al., 2016; MacArthur et al., 2017). Moreover, once a link between a genetic variant and a given phenotype has been established, there is no simple procedure to understand what are the detailed physio-pathological mechanisms in which this variant is involved, and, thus, its actionable potential remains difficult to ascertain (Hoskinson et al., 2017; MacArthur et al., 2014).
Recently, a plethora of tools have been developed that aim to go beyond GWAS findings, trying to bridge the gaps between genomes and phenomes. Just to mention a few notorious examples, the web-server GWAB was designed to identify genes related to a specific disease through a network-based boosting algorithm based on the P-values of GWAS hits (Shim et al., 2017); while the aim of the web-platform Gene ORGANizer is linking genes to the organs they affect (Gokhman et al., 2017). Graph-GPA (Chung et al., 2017) is a powerful tool able to identify risk variants for complex traits by leveraging on pleiotropies through the joint analysis of multiple datasets. However, its usage is limited to users with strong bioinformatics background and it requires that full summary statistics are available for the traits under analysis, so it cannot be used to analyze all available phenotypes simultaneously. Finally, the recently published PleioNet, is a tool to explore networks of phenotypic relationships using genes as the basic functional unit (Gao and Huang, 2019).
In our view, there are several avenues for improvement of extant tools for the detection of links between genomes and phenomes. First, the common procedure of testing each SNP independently does not consider the fact that these SNPs (or others in close linkage disequilibrium (LD)) may be associated to many other traits. Moreover, most bioinformatics tools available to exploit genome-phenome data are not user-friendly and have flat-start learning curves, which makes them inaccessible to most clinical researchers and practitioners.
To our knowledge, no tool exists yet that takes simultaneous advantage of LD between disease-associated variants, the genes in which these variants map, the pathways in which these genes take part and the drugs that target these genes to obtain an heuristic about the whole set of genetic variants potentially related to a given disease. As recently demonstrated, available information on LD and known genetic associations can be leveraged to discover and report relationships among diseases (Rodríguez et al., 2017). The Genome-Phenome Explorer (GePhEx) presented here is a web-based application designed to identify and represent potential phenotypic relationships. GePhEx’s core functionality is to find phenotypes with shared genetic architecture (including population-specific, LD-based links between diseases) with the goal of unveiling unexpected and unknown phenotypic relationships supported by genetic data.
GePhEx can: (i) accept various types of biological entities −SNPs, genomic regions, genes, phenotype, pathways and drugs− as input; (ii) for these entities, retrieve the genetic variants potentially associated to human diseases through a LD-based strategy and (iii) generate an easy to interpret, interactive graphical object representing phenotypic relationships supported by genetic evidences.
We demonstrate that GePhEx is effective and reliable in the sense that it can identify known disease relationships that are strongly supported by epidemiological studies and available genetic knowledge. Crucially, GePhEx is also able to detect novel relationships that could be used as a starting point for further investigations or add to ongoing research. We present a representative use case involving schizophrenia (SCZ) and lung cancer for which LD data, analysis of pathways and study of drug targets support the existence of a functional link between the two diseases. Thus, GePhEx can contribute to the identification of phenotypic relationships with relevant implications for the identification of comorbidities.
2 Materials and methods
2.1 Possible queries
GePhEx is a web-based application written in Python 3.6.4 (Van Rossum and Drake, 2011) using the Django (version 2.0.3) framework (Available at: https://djangoproject.com). Its core functionality is to detect shared genetic architectures between phenotypes. To this end, GePhEx leverages on the associations reported in the GWAS catalog (automatically downloaded each 6 months, latest download on March 20, 2018, version 1.0.1) (MacArthur et al., 2017) and infers potential phenotypic relationships through the usage of LD information; thus, including both functional and correlational relationships.
SNPs are the main query entity to find relationships between phenotypes. Even if GePhEx allows also searching by genes, regions, phenotypes, pathways and drugs, all the different types of searches converge to a list of SNPs.
If SNPs are searched, GePhEx retrieves GWAS Catalog entries in which the input SNPs appear in the column ‘SNPS’. In the case of a region query, GePhEx returns all the SNPs corresponding to the entries in which the chromosome and position columns are comprised within the limits of the searched region. When searching for genes, GePhEx extracts all entries in which the genes appear in either the ‘Reported’ or ‘Mapped’ genes columns. Similarly, for the phenotype query GePhEx retrieves all the SNPs that have been reported to be associated to a given trait. Phenotype queries can be done either by free text search or through the usage of EFO terms (Malone et al., 2010), with GePhEx returning all entries related to the phenotype and its children in the EFO tree. Both pathway and drug queries make use of APIs implemented in Reactome (Croft et al., 2014; Fabregat et al., 2018) and ChEMBL (Bento et al., 2014), respectively, to infer the SNPs related to the biological entities of interest. In both cases, GePhEx firstly retrieves the genes related to the given query and then the list of SNPs is obtained as explained above. In the case of a pathway query, GePhEx retrieves the genes contained in the pathway of interest and all its children in the Reactome pathway tree. For queried drugs, GePhEx retrieves only high-confidence target genes, keeping only drug activities with a ‘standard_type’ equal to: IC50, EC50, XC50, AC50, Ki, Kd or Potency. Amongst the obtained activities, only those that have the maximum confidence score (CS = 9), a ‘standard_value’ ≤1000 and ‘standard_units’ equal to ‘nM’ are kept. Then, target genes are obtained considering these high-quality activities related to a given drug of interest.
2.2 LD information
For each variant in the GWAS Catalog, variants in LD are obtained for all the available populations of Phase 3 of the 1000 Genomes Project (1 kGP, Auton et al., 2015). For a given SNP, a region of 500 kb centered at the variant of interest position is downloaded and variation in LD is obtained through Plink (version 1.9, Chang et al., 2015). Variants in LD are pre-computed and loaded into a database to speed up the computation carried out by GePhEx. Each query is based on a three steps procedure. In brief, for all the possible queries, GePhEx: (i) generates a list of SNPs directly related to the query and complements this list adding variants in LD; (ii) retrieves from the GWAS catalog any phenotype related to the SNPs obtained in (i) and, (iii) reports the phenotypic relationships supported by genetic data. Two phenotypes are considered as related if it exists at least one pair of SNPs having an r2 value equal or higher than the user-defined threshold and for which each of the variants in the pair is associated, respectively, to one of the phenotypes. All potential phenotypic relationships obtained from a given query are then graphically displayed.
2.3 Input/output GePhEx interface
Users can directly query the different biological entities supported by GePhEx by entering the query of interest into the text box and selecting the type of query from the drop-down menu. Additionally, users can broaden their search including LD information by selecting ‘LD association’, setting an r2 threshold (from r2 = 0.5 to r2 = 1) and selecting one of the available populations from the 1 kGP. A P-value filter is also available, allowing to restrict queries to SNPs with given levels of statistical support. In the case of a phenotype query, by selecting ‘Include child terms’, users can also include entries that are related to all the children phenotypes of a given term in the EFO hierarchy.
The relationships obtained by GePhEx from a given query are then visualized through an interactive hierarchical edge bundling graph programmed in JavaScript. The thickness of the phenotypic connections is proportional to the number of different SNP pairs supporting a given relationship, without correcting for the possible LD among the variants of the various SNP pairs. Phenotypes that directly share associated SNPs have links in blue (direct links), while those relationships that are obtained through LD are linked in green (indirect links). Relationships that are supported by both direct and indirect links are indicated only as direct.
Phenotypes are grouped together into ad hoc phenotypic relevant categories. To this end, the absolute paths for all the phenotypes in the GWAS catalog were previously obtained from the EFO hierarchical tree. A total of 131 possible groups were then obtained through manual inspection of the absolute paths. These groups represent intermediate, biologically relevant, nodes in the EFO tree that can group together a substantial amount of available terms in the GWAS catalog. For phenotypes having more than one absolute path in the EFO tree, GePhEx assigns categories considering the path having the minimum average tree distance to the other traits obtained in the same query.
Users can interact with the obtained graph in the ‘Summary’ page, selecting and deselecting relationships of interest, which in turn dynamically filters the content of the output tables. Data supporting the obtained graph can also be visualized in the ‘Summary’ page as a table listing traits relationships and their supporting SNPs. Aside from the interactive graph, GePhEx produces two additional tables containing GWAS Catalog entries concerning (i) the SNPs directly related to the query (‘SNP association’ page); (ii) the SNPs in LD with the SNPs in (i) (‘SNP in LD association’ page). Finally, and given that some phenotypes have been subject to deeper study than others, GePhEx presents the results of Hypergeometric test as a rough indicator of how likely would be to obtain the number of SNPs associated to each trait if they were randomly drawn from the GWAS Catalog.
Variant annotation and links to external resources are also provided in the output tables. Variant annotation is obtained using the Variant Effect Predictor (Ve! P) API (McLaren et al., 2016), which returns the most severe consequence of the alternate alleles of a set of variants of interest. External resources are obtained from the European Genome-phenome Archive (EGA) (Lappalainen et al., 2015). When available, GePhEx retrieves the links to the GWAS studies stored at the EGA and allows the users to be redirected to the web page hosting the original data that produced an association of interest. In addition, GePhEx uses the EGA Beacon service (Available at: https://ega-archive.org/beacon, version 0.3) to check whether any given genomic variant produced by a query is present among the publicly available datasets stored at EGA.
3 Results
To evaluate the usefulness of GePhEx in unveiling novel relationships, we performed a systematical investigation of the phenotypic relationships involving all the disease phenotypes described in the GWAS Catalog using a simple set of parameters. We started by retrieving all variants associated to the whole list of available diseases in the EFO tree (Malone et al., 2010). Out of the original 9164 disease EFO terms, for only 433 of them there was at least one associated variant in the GWAS Catalog (accessed on February 20, 2018). We retrieved all the variants associated to these terms and all their children terms in the EFO hierarchical tree. Then, variants in LD were retrieved through the ENSEMBL API (Yates et al., 2015) considering 1 kGPs Phase 3 data and a population of European ancestry (CEU) (Auton et al., 2015). Finally, relationships between diseases were inferred with a LD threshold of r2 ≥ 0.8. A total of 3502 disease relationships were identified, out of which ∼48%, ∼21% and ∼2.6% were supported by a single SNP pair, ≥20 SNP pairs and ≥50 SNP pairs, respectively (Fig. 1). Interestingly, ∼50% of the traits participate in at least 10 relationships (Supplementary Fig. S1A), indicating that the whole set of identified phenotypic relationships does not involve a limited set of phenotypes. Moreover, the 53.4% of traits are involved in 1 or more relationships supported by at least 5 different SNP pairs (Supplementary Fig. S1B), suggesting that strong relationships are widespread over the whole set of available phenotypes.

Number of different phenotypic relationships supported by a certain number of different SNPs pairs. Each column represents the base 10 logarithm of the number of different phenotypic relationships detected through the LD procedure implemented in GePhEx. The x-axis shows and the number of different SNPs pairs that support these relationships. Phenotype relationships supported by more than 50 different pairs of SNPs were collapsed together
To assess the reliability of the implemented LD procedure, we checked whether the strongest relationships identified by GePhEx involve diseases of the same type. To do so, we used the amount of intra-group disease relationships as a proxy to establish the reliability of the detected phenotypic links. Relationships involving diseases of the immune system tend to be supported by a very high number of different SNP pairs. Similarly, cardiovascular diseases tend to be related to each other and analogous findings were observed for cancers and mental/behavioral disorders. Apparently, immune system diseases tend to be strongly linked (>10 SNP pairs supporting relationships) to some diseases related to the digestive and respiratory systems and to metabolism. Similarly, some metabolic diseases were linked to cardiovascular diseases (Fig. 2, Supplementary Fig. S2). A closer look at these inter-group relationships highlights well-known disease comorbidities, indicating that the known potential relationships identified by GePhEx are backed by solid epidemiological data. Indeed, the digestive system pathologies that have been connected to those of the immune system are sclerosis cholangitis, celiac disease, primary biliary cirrhosis, cirrhosis of liver and liver disease (Aron and Bowlus, 2009; Ciccocioppo et al., 2005; Sipeki et al., 2014). Similarly, it is well known that the immune system plays a role in the etiologies of type 2 diabetes (metabolic disease), metabolic syndrome (metabolic disease) and asthma (respiratory system disease) (Finn and Bigby, 2009; Itariu and Stulnig, 2014; Paragh et al., 2014). Finally, robust literature and epidemiological data also support the relationships of the metabolic diseases hypertriglyceridemia, metabolic syndrome, obesity and type II diabetes mellitus with various cardiovascular diseases (Han et al., 2016; Leon and Maddox, 2015; Mottillo et al., 2010; Poirier et al., 2006). Overall, our results demonstrate that our approach can detect well-known phenotypic relationships and, thus, that it can be used to suggest novel ones. As an example, one of the strongest relationships we detected with our exploration is supported by 48 different SNP pairs (Supplementary Table S1) and involves SCZ and lung carcinoma (LC), two diseases that at a first glance would seem unrelated. However, the incidence of cancer in patients suffering SCZ has been debated during the past century and various epidemiological studies report both inverse and direct comorbidities (Hodgson et al., 2010). This inconsistency could be due to differences between the populations under investigation, each of them characterized by its particular genetic background, evolutionary history and specific environmental factors. For instance, a meta-analysis of cancer incidence in more than 500 000 participants showed an increased risk for breast cancer and decreased risk for melanoma and lung cancer (Catalá-López et al., 2014). In contrast, a large UK cohort study did not find any significant difference in the incidence of colorectal cancer, breast cancer and lung cancer between SCZ cases and controls (Osborn et al., 2013). Given that SCZ patients are more exposed to increasing risk factors for tumors (e.g. poor diet, alcohol drinking, tobacco smoking, physical inactivity) than the general population (Connolly, 2005), the very fact that SCZ patients do not suffer a dramatic increase in cancer incidence raises the question about the possible protective role of SCZ. A potential explanation for reduced cancer risk could be the usage of anti-psychotic molecules to treat SCZ. Interestingly, various anti-psychotic molecules appear to have anti-tumor properties in vitro (Carrillo and Benítez, 1999; Motohashi et al., 2000; Shen et al., 2017). In contrast, Catts and co-workers identified a lower than expected overall occurrence of cancer in first-degree relatives of patients with SCZ (Catts et al., 2008), supporting the idea that a protective genetic component could explain the reduced risk of cancer. Interestingly, various tumor suppressor genes have been implicated in SCZ susceptibility (Cui et al., 2005; Lim et al., 2005; Ozbey et al., 2011). A recent transcriptomics meta-analysis showed a significant enrichment of genes dis-regulated in opposite directions in SCZ and lung cancer, suggesting an inverse comorbidity between the two diseases (Ibáñez et al., 2014).

Strength of disease relationships detected through the LD procedure implemented in GePhEx. Each single cell represents the number of SNP pairs supporting the relationship between two specific diseases. Only the 88 diseases involved in relationships supported by at least 20 SNPs were considered and represented in the figure. A similar figure considering the whole set of identified relationships can be found in Supplementary Fig. S2. Each considered phenotype disease was classified in one of the disease groups shown in the x- and y-axes. The darker the color, the higher the number of SNPs pairs supporting the relationship between two given diseases. Gray dashed lines highlight the relationships involving diseases belonging to the same group
To test the inverse comorbidity hypothesis, we explored haplotypes harboring all the detected SNP pairs supporting the relationship between LC and SCZ. For each variant associated to SCZ (EFO: 0000692), we obtained a list of all the variants with LD r2 ≥ 0.8 associated to LC (EFO: 0001071) (Fig. 3). Out of the original 48 different SNP pairs supporting the relationship between SCZ and LC, only for 22 pairs all the published GWAS were fully concordant in identifying the same risk alleles for both the SNPs in each pair. For 16 pairs of SNPs, the information regarding the two risk alleles was not fully concordant across all GWAS, with at least one study indicating a different risk allele than the rest. For another 10 pairs, haplotype information was not available in Ldlink (Machiela and Chanock, 2015), so we only considered the 22 concordant pairs. Interestingly, for 21 SNP pairs out of these 22, we observed that the highest haplotype frequency was the one involving the risk allele for one disease and the protective allele for the other. Only for one SNP pair (rs11778040-rs7839435) the most frequent haplotype was either the one involving both risk and both protective alleles. In addition, 4 different variants (rs67682613, rs2596500, rs7383287, rs13212562) have been related to both SCZ and LC in different association studies and for all of them the risk allele for SCZ was reported as protective for LC (Supplementary Table S1). The vast majority of SNP pairs (20 out of 22) did map on chromosome 6 in proximity of MHC locus, suggesting a role of the immune system in the hypothesized inverse comorbidity (see results below). In order to check if the links found on chromosome 6 are the result of independent association, we considered data from 1 kGP Phase 3. SNPs were sorted by coordinates and the leftmost SNP was considered as seed for the first ‘signal segment’. SNPs were recursively added if they were in LD (r2 ≥ 0.8) with any other variant already part of the same signal segment. When no variants were added to a specific segment, the leftmost SNP not yet added to any signal is considered as seed for a new signal segment. In total, eight signals were detected by our procedure, indicating that inverse comorbidity between LC and SCZ is supported by independent signals.

GePhEx detected relationship involving SCZ and lung cancer traits. The graph is obtained querying on GePhEx the 25 SNPs related to SCZ taking part of the 48 SNP pairs considered for the haplotype analysis. The query was performed considering SNPs in LD (r2 ≥ 0.8) on CEU population. In yellow are highlighted the links connecting SCZ to the various forms of lung cancer. Line thickness is proportional to the number of SNP pairs supporting a given relationship. Direct and indirect links are in green and blue, respectively (details in Section 2)
Subsequently, we explored the potential functional interconnections between the two diseases through the comparison of the metabolic pathways that could have a relevant role for both diseases. To this end, we extracted from the GWAS Catalog the list of unique genes reported to have a role either in SCZ or LC, separately (1047 and 386 genes, respectively, see Supplementary Table S2). We then performed a pathway enrichment analysis using Reactome (Croft et al., 2014; Fabregat et al., 2018). Out of the 1094 and 771 potential pathways, 91 and 70 were showing a nominal P-value lower than 0.05, while 72 and 25 were enriched beyond random expectations for SCZ and LC, respectively. Interestingly, among the pathways surviving strict Bonferroni correction (based on the total number of pathways), 22 were shared between the two diseases (30% of the SCZ and 88% of the LC pathways are shared), indicating the commonality of a prominent number of metabolic routes (chi-squared P-value = 2.22 x 10−67) and providing insight on the patho-physiological mechanism that could link SCZ and LC (Supplementary Table S3). Similar results were obtained when considering the genes appearing in the ‘Mapped genes’ column of the GWAS Catalog (data not shown).
In addition, we observed that 98 genes were found among the reported genes for both LC and SCZ in the GWAS Catalog. Once more, we performed a pathway enrichment analysis focusing on these genes. Out of the 332 possible pathways, 71 were significantly enriched after Boferroni correction. For each enriched pathway, we reconstructed its path in the Reactome tree to pinpoint high level metabolic routes that could explain the functional relationships between SCZ and LC (Supplementary Table S3). Interestingly, 22 of the enriched pathways fall within the ‘Immune system’ category (Reactome ID: R-HSA-168256), and specifically 16 of them are located within ‘Adaptive immune system’ (Reactome ID: R-HSA-1280218) category. Within the ‘Adaptive immune system’ category, both ‘Class I MHC mediated antigen processing and presentation’ (Reactome ID: R-HSA-983169) and ‘TCR signaling’ (Reactome ID: R-HSA-202403) seem to play a particularly relevant role. For the top level category ‘Neuronal System’ (Reactome ID: R-HSA-112316), all the seven enriched pathways were found to be related to ‘Acetylcholine binding and downstream events’ (Reactome ID: R-HSA-181431) and specifically to ‘Presynaptic nicotinic acetylcholine receptors’ and ‘Post-synaptic nicotinic acetylcholine receptors’, indicating that these receptors could play an important role to explain the potential relationship between the two diseases (Rectome ID: R-HSA-622323 and Reactome ID: R-HSA-622327, respectively). Our results strongly agree with a recent meta-analysis, in which the nicotinic acetylcholine receptors CHRNA3, CHRNA5 and CHRNB4 were highlighted as pleiotropic loci supporting the shared genetic architecture of LC and SCZ (Zuber et al., 2018). Additionally, 16 and 10 of the enriched pathways were located within the categories ‘Gene expression’ (Rectome ID: R-HSA-74160) and ‘Cell cycle’ (Rectome ID: R-HSA-1640170), respectively. Overall, our findings highlight some of the pathways well known to be related to SCZ aetiology (i.e. Nicotinic Acetylcholine receptors), while some others are known to be related to tumorigenesis (Gene expression and Cell cycle). Besides these obvious metabolic routes, our analysis also highlights the potential role of the adaptive immune system, shedding light on the molecular mechanism behind the hypothesized inverse comorbidity.
Finally, we also explored whether the approved drugs used for both diseases could provide information on the putative functional connections between LC and SCZ. For each approved drug in Chembl (Bento et al., 2014) (max_phase_for_ind = 4), we retrieved the list of target genes as explained previously, considering for this particular use-case assays in which the target was molecular (confidence_score > 2). For 1024 out of the 1689 approved drugs in Chembl it was possible to retrieve the target genes (Supplementary Table S4). Among these drugs, 16 and 11 were reported to be prescribed to treat SCZ and LC, respectively. Unfortunately, for 5 drugs used for SCZ (Brexpiprazole, Lurasidone, Lurasidone Hydrochloride, Aripiprazole Lauroxil, Paliperidone Palmitate) the target genes could not be retrieved; similarly, for 7 drugs used for LC (Alectinib Hydrochloride, Osimertinib Mesylate, Ramucirumab, Nivolumab, Afatinib Dimaleate, Bevacizumab, Necitumumab) it was not possible to obtain the list of target genes. Thus, we retrieved a total of 337 and 1000 target genes for SCZ and LC, respectively. Interestingly, 24 target genes were shared, indicating that the drugs used to treat both conditions target a similar set of genes (chi-squared P-value = 5.14 x 10−30), further supporting the possible relationship between LC and SCZ (Supplementary Table S5).
4 Discussion
Difficulties in the functional interpretation of most of the loci and variants that have been associated to disease and their limited medical exploitation (Boycott et al., 2013; Visscher et al., 2017) generates a need for novel approaches helping to fill the gap between genomes and the phenomes. To this end, we created GePhEx, a web-server to ease the visualization of phenotypic relationships based on genetic evidences. We demonstrate that GePhEx can identify known phenotypic relationships and comorbidities, indicating that genetic information in public databases agrees with evidences supported by epidemiological data. Additionally, we study a use case in detail and show that GePhEx is useful for exploring unexpected phenotypic relationships, which in turn could shed light on the molecular mechanisms behind the proposed phenotypic links. Our results reinforce the hypothesis of inverse comorbidity between LC and SCZ, and suggest potential metabolic routes and genes that could functionally explain it.
Aside from the case reported here, our very simple run of GePhEx detected some other unexpected interesting phenotypic links. For instance, a high number of SNP pairs support the relationship between chronic obstructive pulmonary disease and some eye conditions such as Age-Related Macular Degeneration and Fuchs’ endothelial dystrophy, for which 68 and 20 different pairs were observed, respectively. Furthermore, 10 different SNP pairs were supporting the link between allergy and squamous cell carcinoma, for which the relationship is still under debate. A recent meta-analysis indicated hypertension as a risk factor for breast carcinoma (Han et al., 2017), while our procedure identified 13 potential genomic loci underlying the suggested phenotypic link. Overall, our analysis shows that GePhEx is useful to identify novel potential comorbidities, paving the way to further analysis aimed at ascertaining them and, eventually, understanding their causes.
Only the integration of data from different sources will allow a comprehensive understanding of the molecular mechanisms leading to disease. GePhEx contributes to such goal by cross-referencing various types of biological databases to visualize phenotypic relationships based on genetic evidence. GePhEx’s simple and self-explanatory interactive graphical visualization makes it accessible to most clinical researchers and practitioners on one side, while the implementation of APIs in future GePhEx versions will enable systematic and whole genome scale analysis.
Acknowledgements
The authors thank Sabela de la Torre and Mauricio Moldes for their useful contribution for the implementation of customized APIs to cross-reference GePhEx results with data deposited at EGA. They also thank Angel Carreño and Alfred Gil for their contribution in providing the infrastructural resources needed to deploy GePhEx.
Funding
This work was supported by Ministerio de Economía y Competitividad (MINECO): BFU2015-68649-P (MINECO/FEDER, UE), and by the Agencia Estatal de investigación: AEI-PGC2018-101927-B-I00 (FEDER/UE), by Direcció General de Recerca, Generalitat de Catalunya (2017SGR880) and by the Spanish National Institute of Bioinformatics (PT17/0009/0020), the REEM (RD16/00150017) of the Instituto de Salud Carlos III. This research has also received funding from the European Union's Horizon 2020 research and innovation programme 2014–2020 under Grant Agreement N°. 634143 (MedBioinformatics).
Conflict of Interest: none declared.
References
Author notes
Xavier Farré and Nino Spataro wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.