-
PDF
- Split View
-
Views
-
Cite
Cite
Nikolas Pontikos, Jing Yu, Ismail Moghul, Lucy Withington, Fiona Blanco-Kelly, Tom Vulliamy, Tsz Lun Ernest Wong, Cian Murphy, Valentina Cipriani, Alessia Fiorentino, Gavin Arno, Daniel Greene, Julius OB Jacobsen, Tristan Clark, David S Gregory, Andrea M Nemeth, Stephanie Halford, Chris F Inglehearn, Susan Downes, Graeme C Black, Andrew R Webster, Alison J Hardcastle, UKIRDC, Vincent Plagnol, Phenopolis: an open platform for harmonization and analysis of genetic and phenotypic data, Bioinformatics, Volume 33, Issue 15, August 2017, Pages 2421–2423, https://doi.org/10.1093/bioinformatics/btx147
- Share Icon Share
Abstract
Phenopolis is an open-source web server providing an intuitive interface to genetic and phenotypic databases. It integrates analysis tools such as variant filtering and gene prioritization based on phenotype. The Phenopolis platform will accelerate clinical diagnosis, gene discovery and encourage wider adoption of the Human Phenotype Ontology in the study of rare genetic diseases.
A demo of the website is available at https://phenopolis.github.io. If you wish to install a local copy, source code and installation instruction are available at https://github.com/phenopolis. The software is implemented using Python, MongoDB, HTML/Javascript and various bash shell scripts.
Supplementary data are available at Bioinformatics online.
1 Introduction
The molecular diagnosis of rare genetic diseases requires detailed clinical phenotypes and processing of large amounts of genetic data. This motivates large-scale collaborations between clinicians, geneticists and bioinformaticians across multiple sites where patient data are pooled together to increase the chances of finding the likely pathogenic mutations in rare cases, and validating novel genes. For example, the UK Inherited Retinal Dystrophy Consortium (UKIRDC) has set up a collaboration between London, Manchester, Oxford and Leeds to solve retinal dystrophies. A complication of multi-site collaborations is that discrepancies in phenotype definitions and interpretation of genetic variants can complicate the genetic diagnosis (Yen et al., 2016). A solution to reduce the variability introduced by different sequencing analysis pipelines is to analyze the sequence data centrally and store the annotated variants in a normalized database. On the clinical side, phenotype harmonization can be improved by using nomenclatures such as the Human Phenotype Ontology (HPO) Köhler et al. (2014) to translate specific clinical features into a standardized, computer interpretable format. We have integrated these two approaches into Phenopolis, an interactive website that combines genetic and phenotypic databases. With the help of HPO-encoded phenotypes, Phenopolis is able to prioritize causative genes using different sources of evidence, such as published disease gene associations from the Online Mendelian Inheritance in Man (OMIM) (Supplementary Material Section S1) (Hamosh et al., 2005), abstract relevance from Pubmed publications (Supplementary Material Section 2), as well as model organism phenotype ontology analysis using Exomiser (Supplementary Material Section 3) (Robinson et al., 2013). Additionally, Phenopolis uncovers gene phenotype relationships within the stored patient data through variant filtering and statistical enrichment of HPO terms using Phenogenon (Supplementary Material Section 4) and SimReg (Supplementary Material Section 5) (Greene et al., 2016). The online version, available at https://phenopolis.github.io, includes four example patients with inherited retinal dystrophies and access to per gene analysis, to illustrate our methods.
2 Phenopolis implementation
2.1 Clinical data collection
The collection of clinical phenotype data was done retrospectively from patient records and entered using the Phenotips platform (Girdea et al., 2013), which provides an interface for translating detailed clinical phenotypes into HPO terms. Several patient diagnoses were translated to their closest match using HPO terminology. This included mode of inheritance and modifiers such as age of onset and laterality when available. The distribution of high-level HPO terms in Phenopolis is given in Supplementary Figure S8.
2.2 Genetic data collection
Our internal exome database, UCLex, currently comprises 4, 449 patients, collected from various research groups since 2012. Four patients solved with genetic mutations in DRAM2 (El-Asrag et al., 2015) and TTLL5 (Sergouniotis et al., 2014) are made available on the demo account.
2.3 Analysis of genetic data
The short read sequence data were aligned using novoalign (version 3.02.08), and variants and indels were called according to GATK best practices (joint variant calling followed by variant quality score recalibration) (McKenna et al., 2010). The variants were then annotated using the Variant Effect Predictor (McLaren et al., 2016), output to JSON format, post processed by a Python script and loaded into a Mongo database.
2.4 Website implementation
The Phenopolis website was implemented using the Python Flask web framework by extending the ExAC code base [1] running on top of a Mongo database (Fig. 1A). Javascript was used for visualizations and interactive features. Once a user has logged in, the website provides five main entry points:

(A) Overview of the pipeline. HPO-encoded phenotypes are entered using Phenotips (Girdea et al., 2013). The Variant Call Format files are annotated by the Variant Effect Predictor (McLaren et al., 2016) and translated to JSON for import into MongoDB. OMIM, Pubmed and ExAC data are also imported into the Mongo database, on which we run the PubmedScore, Exomiser, SimReg and Phenogenon to score the genes. A Python Flask server is used as the front-end to display the four entry points to the website. (B) Venn diagram visualization of HPO-gene overlap highlighting ARL2BP. (C) Phenogenon visualization of gene ARL2BP (recessive mode). The size of the circles is inversely proportional to the P-value. Clicking on the nodes brings up information about the individuals and variants. ‘Rod-cone dystrophy’ and ‘Nyctalopia’ are significantly enriched for ARL2BP with respective P-values of 0.00172 and 0.00051
The home page: summary statistics of genetic and phenotypic data, as well as auto-completing search bar to search by phenotype, gene name or patient id.
The all patients page: summary data of all patients and their candidate genes for which the user has access permission.
The patient page: the patient phenotypes and a table of filtered variants per patient prioritized based on gene. The causal variants are expected to be in this list, ranked at the top of the table.
The gene page: the variants and the patients in which they occur, as well as the gene-HPO analysis.
The phenotype page: a prioritized list of genes per phenotype, based on known association and gene enrichment analysis.
3 Applications
3.1 Clinical application: gene prioritization by patient
Given a list of genetic variants and the phenotype of a patient, the first task towards a molecular diagnosis is to prioritize potentially causative genes. For each case, variants are first filtered based on user-defined thresholds:
Allele count less than five in our internal database and in ExAC (Lek et al., 2016).
Kaviar frequency less than 0.05
Exclude non-exonic variants or variants on non-coding transcripts. Splicing variants are kept.
3.2 Research application: HPO signature per gene
Given a sufficiently large and phenotypically diverse collection of cases, gene to phenotype patterns starts emerging. In order to assign phenotype associations per genes based on our patient database, we have developed a gene-based HPO enrichment and visualization tool, Phenogenon, (Fig. 1C). We have also integrated the existing SimReg tool, which suggests a characteristic phenotype per gene (Greene et al., 2016). Both methods work on a filtered list of variants and are explained in detail in the Supplementary Material Sections 4 and 5.
3.3 Research application: genes ranked per HPO term
Individuals with the specified HPO term and confirmed pathogenic mutations are listed on this page. We retrieve the list of known disease genes from the gene-HPO mapping from OMIM and we score these genes with Phenogenon to assess their support in our dataset. Furthermore, we rank all genes according to their Phenogenon score for this HPO term to enable gene discovery in our dataset.
4 Discussion
There are currently several closed-source commercial online alternatives that provide variant filtering and prioritization, e.g. Saphetor, Congenica and Omicia. However their costs limit broad usage and they are not readily extensible. There are also open-source alternatives such as Gemini (Paila et al., 2013) and seqr (https://seqr.broadinstitute.org/) but currently neither provides full integration with HPO. As it stands, Phenopolis is an ideal platform for studying pleiotropic genes (Supplementary Fig. S4) and how variation in different parts of the same gene could lead to different seemingly unrelated phenotypes. In the next iteration of our software, we plan to integrate tissue expression databases, allowing for genes and transcripts to be prioritized by cell type when the disease affects a specific tissue type. We also plan on interfacing with the Genomics England PanelApp to retrieve relevant genes and contribute novel disease genes. Collection of phenotypes and prioritization of genes can help elucidate which features are informative for a particular gene and warrant close inspection in clinic. Currently, a limitation to obtaining detailed phenotypes for our retrospective cases is the manual input of HPO terms and we are investigating data mining of health records to pull data efficiently. Given the utility of this software within the UKIRDC, we hope it will be of use to other groups collaborating on the genetics of rare diseases.
Acknowledgements
We wish to acknowledge all patients that contributed their DNA and phenotypes to this project. We also acknowledge the Computer Science High Performance Cluster for providing us with the computing platform on which to analyze our data and host our webserver. We wish to thank Michael Morgan for his advice on Phenogenon. Members of the UKIRDC are: Graeme Black, Georgina Hall, Stuart Ingram, Rachel Gillespie, Forbes Manson, Panagiotis Sergouniotis, Chris Inglehearn, Carmel Toomes, Manir Ali, Martin McKibbin, James Poulter, Kamron Khan, Emma Lord, Andrea Nemeth, Susan Downes, Stephanie Halford, Jing Yu, Stefano Lise, Gavin Arno, Alessia Fiorentino, Nikolas Pontikos, Vincent Plagnol, Michel Michaelides, Alison J. Hardcastle, Michael E. Cheetham, Andrew R. Webster and Veronica van Heyningen.
Funding
This work was supported by the UKIRDC, funded by R.P. Fighting Blindness and Fight for Sight.
Conflict of Interest: none declared.
References
Author notes
The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint Authors.
Members of the UK Inherited Retinal Dystrophy Consortium are listed in the acknowledgements.