-
PDF
- Split View
-
Views
-
Cite
Cite
Khalid A Fakhro, Amal Robay, Juan L Rodrigues-Flores, Jason G Mezey, Alya A Al-Shakaki, Omar Chidiac, Dora Stadler, Joel A Malek, Abu Bakr Imam, Arwa Sheikh, Asmaa Azzam, Ibrahim Janahi, Izzat Khanjar, Kamal Osman, Maen Abu Ziki, Mohamed Adnan Mahmah, Mohamed Selim, Nuha Numeiri, Rehab Ali, Shenela Lakhani, Fizza Butt, Tawfeg Ben Omran, Ronald G Crystal, Point of Care Exome Sequencing Reveals Allelic and Phenotypic Heterogeneity Underlying Mendelian disease in Qatar, Human Molecular Genetics, Volume 28, Issue 23, 1 December 2019, Pages 3970–3981, https://doi.org/10.1093/hmg/ddz134
- Share Icon Share
Abstract
The effectiveness of next generation sequencing at solving genetic disease has motivated the rapid adoption of this technology into clinical practice around the world. In this study, we use whole exome sequencing (WES) to assess 48 patients with Mendelian disease from 30 serial families as part of the “Qatar Mendelian Disease pilot program” – a coordinated multi-center effort to build capacity and clinical expertise in genetic medicine in Qatar. By enrolling whole families (parents plus available siblings), we demonstrate significantly improved discriminatory power for candidate variant identification over trios for both de novo and recessive inheritance patterns. For the same index cases, we further demonstrate that even in the absence of families, variant prioritization is improved up to 8-fold when a modest set of population-matched controls is used vs large public databases, stressing the poor representation of Middle Eastern alleles in presently available databases. Our in-house pipeline identified candidate disease variants in 27 of 30 families (90%), 23 of which (85%) harbor novel pathogenic variants in known disease genes, pointing to significant allelic heterogeneity and founder mutations underlying Mendelian disease in the Middle East. For 6 of these families, the clinical presentation was only partially explained by the candidate gene, suggesting phenotypic expansion of known syndromes. Our pilot study demonstrates the utility of WES for Middle Eastern populations, the dramatic improvement in variant prioritization conferred by enrolling population-matched controls and/or enrolling additional unaffected siblings at the point-of-care, and 25 novel disease-causing alleles, relevant to newborn and premarital screening panels in regional populations.
Introduction
As next generation sequencing (NGS) is rapidly adopted into clinical settings, the medical genetics bottleneck is shifting from data generation to interpretation and accessibility. In recent years, NGS has found remarkable success in Mendelian disease, where rare variants with large effect in single genes result in readily detectable phenotypes (1–4). Due to their putative effect on protein function and their predicted pathogenicity, candidate deleterious alleles largely share characteristic hallmarks (low frequency, protein-altering, affecting evolutionarily conserved sites) that make them poised for discovery by NGS (5). However, recent studies have reported widely variable ‘solve’ rates, ranging from 25% in outbred populations and up to 80% in consanguineous settings (4,6,7). On average, over the past 3–5 years, large-scale Mendelian disease efforts have identified approximately 250 to 300 new disease genes per year (1,2). These genes are being cataloged in community efforts, such as the database of Online Mendelian inheritance in Man (OMIM), which currently contains 5255 Mendelian disease genes, or the Human Gene Mutation Database, containing 6611 genes. It is estimated that a further more than 3000 Mendelian phenotypes remain to be solved genetically (2).
Progress to discover new Mendelian genes will require coordination and recruitment on a global scale to identify patients with novel diseases of suspected monogenic origin. A particular focus could be on areas of the world where consanguinity levels are high and family sizes are large, increasing the odds of identifying multiple individuals in whom the same founder mutations segregate (8–10). However, the use of NGS to identify disease-causing variants in poorly characterized populations presents several challenges. First, teasing out rare pathogenic variants from rare population-specific polymorphisms is difficult without sufficient control chromosomes. For example, we recently discovered that up to 15% of ‘variants’ detected in more than 1000 Arabs when aligned to reference genome GRCh37/hg19 had a minor allele frequency (MAF) > 50% in the same cohort and therefore should be considered reference alleles for this population (11). Second, it is challenging to estimate pathogenicity of novel variants from genomic data alone. Despite advances in computational scoring approaches, the sheer number of variants produced per NGS experiment and the number of genes whose functions are unknown leaves large numbers of variants of unknown significance per exome or genome. These challenges have hindered clinical implementation of NGS and have hampered widespread adoption in clinical settings due to the difficulty in parsing actionable results from background variation.
Despite these challenges, the need to identify causative genes for genetic disorders is urgent, given that Mendelian diseases on aggregate affect ~8% of live births and are the leading cause of morbidity and mortality in children worldwide (12). This poses serious financial burdens on healthcare systems—in the cases where healthcare intervention is available, the total cost of care over an individual’s lifetime may exceed $5 000 000 (13). Thus, the need for genetic diagnosis is important, as identifying deleterious alleles may allow more effective prevention strategies, e.g. pre-implantation diagnosis for future pregnancies, and targeted therapeutic interventions where possible. Importantly, identification of a genetic diagnosis through clinical exome sequencing has the potential to shave years off of the ‘diagnostic odyssey’ encountered by traditional approaches, thus imparting significant social and psychological benefits to families of affected children (14).
In Gulf Arab countries, where consanguinity rates exceed 50% (15) and families are large, there is a high prevalence of recessive conditions—many of which are novel or undiagnosed—affecting multiple members in extended pedigrees (15–17). This region presents an ideal opportunity for the implementation of NGS technologies as a first-line clinical test as part of standard care. In order to assess the utility of such point-of-care whole exome sequencing (WES) in a wide spectrum of genetic diseases in Qatar, we launched a ‘Qatari Mendelian Disease Pilot Program’—a multi-institutional effort to enroll and sequence all affected subjects, their parents and unaffected siblings to rapidly diagnose idiopathic disorders (Fig. 1). To aid in this effort, we combined the data with 1376 healthy ethnically matched controls (whom we sequenced as part of a population characterization study) to discriminate deleterious variants from rare population-specific polymorphisms (11). In the present study, we present results from the first 30 families—identifying candidate causative variants (predicted deleterious alleles in known disease-causing genes) in 21 families (~70%) and probable candidate variants (predicted deleterious variants in genes with partial phenotypic overlap with clinical features) in another 6 (~20%). We also compare the value of population-matched controls versus additional family members for variant identification, which can guide the design of similar Mendelian programs in understudied populations around the world.
Workflow for the Qatar Mendelian Disease Pilot Program. Families were enrolled from multiple clinics based on suspicion of monogenic disorder. For each family, WES was carried out on all available family members. Identified variants were prioritized based on segregation with disease and putative effect on protein function as predicted by multiple conservation and impact scores. For 30 studied families, 21 had deleterious variants in known genes explaining phenotype (4 known, 17 novel), 6 had partial candidate that could not fully explain phenotype, and 3 remain unsolved.
Results
Subject enrollment and depth of sequencing consideration
We enrolled 30 families with Mendelian disorders, comprising 48 affected individuals and all available family members (n = 82). All 130 subjects were assessed by WES (Figs 1 and 2). In order to evaluate the utility of WES at the point of care, we designed this pilot study to evaluate two key parameters: (1) To what depth should a patient’s exome be routinely sequenced? (2) Should additional family members beyond the index case be recruited to identify candidate causative variants?
Thirty families with Mendelian disorders enrolled in the study. All affected from all families (n = 48, shaded boxes/circles) were sequenced, along with available family members (n = 82, thick dark boxes/circles). Family members unavailable for sequencing (gray boxes/circles) are identified. Consanguinity is represented by two lines connecting parents, multiple siblings of the same gender are represented as a number inside a box/circle.
To answer the first question, we sequenced to a wide range of depths, from 45× to 199× (median, 99.9×). We observed a subtle positive correlation between sequencing depth and both the total number of variants called (plateau at ~75×, Supplementary Material, Fig. S1A) and the overall variant quality [plateau at ~115×, where >99% of variants had a Phred-scaled genotyping quality (GQ) of at least 30]. Notably, at ~78×, 95% of variants had a GQ > 50, a threshold of less than 1 error per exome (Supplementary Material, Fig. S1A). Further, at exon-level coverage, we observed that above 60×, all subjects have > 90% of all exons covered by 20 or more unique reads, rising to > 95% above 80× (Supplementary Material, Fig. S1B).
Despite observing an overall similar total number of variants per exome (median: 42 835 SNPs and 3963 indels), the distribution of variant zygosity differed significantly among individuals based on consanguinity. The average consanguineous index case had ~13.5% more homozygous variants exome-wide than non-consanguineous subjects (Yates P = 0.0001; Supplementary Material, Fig. S2). This was also supported by a 13-fold higher mean inbreeding coefficient (F) per consanguineous subject vs non-consanguineous individuals (0.094 vs 0.007), representing a large proportion of individuals being first cousins or double-first cousins.
Utility of ancestrally matched controls in prioritizing deleterious variants
Given the severity of phenotypes enrolled to this study, we implemented a variant prioritization pipeline to identify rare, protein-altering variants as disease candidates (see Materials and Methods). The variants from each index case were divided into 3 categories: (1) all rare variants (‘Rare’), (2) rare variants predicted damaging (‘Rare_D’) and (3) rare variants predicted damaging and affecting evolutionarily conserved residues (‘Rare_DC’). When rarity was investigated against all publically available global databases (including ExAC, GnomAD, 1000Genomes, ESP—n > 130 000 individuals), the average affected subject in the study had 708 Rare, 519 Rare_D and 161 Rare_DC variants in heterozygous state (Fig. 3A) and 78 Rare, 60 Rare_D and 14 Rare_DC in homozygous state (Fig. 3B).
The use of population-matched controls for allele frequency comparisons significantly reduces the number of candidate variants across three different functional categories: Rare, protein-altering (blue); rare, protein-altering and predicted damaging (red); and rare, protein-altering, predicted damaging and affecting a highly conserved residues (green). (A) Heterozygous. (B) Homozygous. For each category, rarity is defined as MAF < 1% in all public databases (solid line) and at MAF < 1% versus Qatari population-database of 1376 exomes (dashed line). The effect of comparison to both groups is observed across all three categories, significantly supporting the use of ethnically matched controls in Mendelian analysis pipelines.
In contrast, when rarity was further evaluated against allele frequencies from an internal database of only ~1400 Qataris, the number of candidate variants were reduced by 28 to 64% across the different groups, with the average individual having 478 Rare, 331 Rare_D and 116 Rare_DC heterozygous variants and 28 Rare, 19 Rare_D and 7 Rare_DC homozygous variants (Fig. 3). Thus, although the number of matched controls in our internal databases is more than 75 orders of magnitude smaller than all publically available databases combined (3000 vs 250 000 alleles), the advantages of population-matching were significant.
Improved variant prioritization by assessing additional family members
While population-matched controls significantly reduced candidates for downstream evaluation, there still remained tens to hundreds of candidate variants for disease causality in each family. We had originally designed this pilot study to assess the value of enrolling as many family members as possible at the point of care. We could assess this formally for 16 of 30 families where both parents were available. Notably, because we enrolled multiple siblings per family, we were able to leverage the large family sizes in the Middle East to test 37 unique parents–child ‘trios’ from these 16 families.
For de novo variants, we observed a wide range of high quality protein-altering candidates (n = 0 to 10) among all 37 ‘index’ subjects (Fig. 4A). Importantly, the number of de novo variants did not appear to be correlated to differences in depth of coverage between each index and either parent (Fig. 4A). For families where siblings were available, we leveraged the sibling(s) exomes to determine the fraction of likely true de novo mutations—those specific to the index and not shared by a sibling. By applying this sibling-sharing filter, we observed a significant reduction in the number of candidate de novo events. Comparison against the first or second sibling when available reduced the number from an average of 4.02 events per exome, to 1.71 (comparison to first sibling) and then 0.56 (comparison to second sibling) de novo protein-altering mutations per trio (Fig. 4B and C), i.e. sequencing of even a small set of siblings helped reclassify ~50% of high quality, high-coverage candidate de novo variants as possibly inherited.
Enrollment of additional family members significantly reduces the number of candidate de novo (A–C) or recessive (D–F) coding variants under consideration for disease causality. Alhough only 16 of 30 families had both parents available for study, the availability of additional siblings allows the assessment of 37 unique ‘trios’. (A) Number of de novo variants present in ‘index’ subjects is shown (black dotted line) along with the ratio of depth of coverage of index subject to each parent (index:mother in red, index:father in blue). Comparison to (B) one or (C) two siblings significantly reduces the number of candidate de novo variants unique to the index. Similar trends are observed for recessive variants. (D) The total number of coding recessive variants in each of 37 individuals (black dotted line) along with the ratio of depth of coverage of index subject to each parent (index:mother in blue, index:father in red). (E) Less than half of these are shared by one sibling, and (F) only a fraction of these are shared by shared by two siblings.
Similarly, for recessive alleles, the average ‘index’ subject had 808 (range 576 to 1072) high quality rare protein-altering homozygous sites at which both parents were heterozygous carriers. Similar to de novo events, the number of variants did not appear to be affected by the depth of sequencing of each ‘trio’ (Fig. 4D). Since an advantage of recruiting multiplex families is the sharing of homozygous rare variants by multiple affected siblings, we investigated how many of an individual’s variants were shared by one or two additional siblings. On average, 211, 31 and 1 homozygous variants were shared by two, three or four siblings, respectively, where available (Fig. 4 E and F and data not shown). This remarkable reduction in variants supports the utility and potential speed of diagnosis when using exome sequencing to identify causative variants in multiplex families when all members are sequenced.
Together, the data suggest that both de novo and recessive candidate gene identification studies would benefit significantly from enrollment of as many siblings as possible directly at the point of care, significantly reducing the number of potential candidates to evaluate manually and interpret as disease-causing in Mendelian families.
Identification of disease-causing variants
For variants surviving the population-based and family-based filtration described above, we prioritized rare protein-altering variants affecting highly conserved residues, segregating with disease in each family (see Methods). For complete families for whom we had recruited additional siblings, this process significantly reduced the number of candidate disease; whereas for families with missing members, the number of candidates remained elevated, requiring substantial downstream interpretation. For each family, we evaluated de novo, recessive (including homozygosity and compound heterozygosity), X-linked and dominant (where applicable) modes of inheritance, and candidate genes harboring putatively deleterious variants were compiled and further annotated using human and model organism literature (see Materials and Methods). All final evidence was curated by clinical and molecular geneticists. Following thorough analysis, we could broadly divide the 30 families into three categories: (i) solved (n = 21), (ii) partially solved (n = 6) and (iii) unsolved (n = 3).
Allelic heterogeneity of Mendelian disease in Qatar
For 21 solved families, we identified 23 predicted pathogenic variants in 19 disease-related genes. Six of these were predicted loss of function variants (four indels, one nonsense and one splice site), while the remaining 17 were missense. As expected due to the rarity of Mendelian diseases, these 21 families represented a wide variety of clinical phenotypes affecting different organ systems, enrolled from 11 hospital clinics. Six families had predominantly neurological phenotypes, ranging from known syndromes such as Lafora syndrome (NHLRC1) and Leigh’s disease (NDUFS4) to more complex phenotypes including developmental delay and mental retardation, caused by variants in TTBK2, EIF2B2, TBCD and CDK19. We also observed five families with collagenopathies comprising a wide constellation of complex clinical abnormalities, including a novel founder mutation (p.Gly721fs) in the COL4A3 (MIM: 203780) causing Alport syndrome shared by two families initially enrolled separately into the study, but later discovered to share a common grandparent. Other rare syndromes for which we discovered candidate pathogenic alleles include Hemolytic anemia (PKLR), Usher syndrome (MYO7A), terminal osseous dysplasia (FLNA), Papillion Lefevre syndrome (CTSC), progressive familial intrahepatic cholestasis (ABCB4), Vici syndrome (EPG5) and Bartter syndrome (SLC12A1). Importantly, only 4 of 21 families had variants that were previously reported as pathogenic, versus 17 families (81%) with novel deleterious alleles identified in this study (Table 1). The novelty of these variants versus global databases suggests they are likely specific to the Middle East. Notably, 18 of 21 families had recessive causes while the remaining 3 were due to de novo mutations, consistent with the expected genetic architecture in this highly consanguineous population. (Table 2).
Family ID . | Diagnosisb . | Systemc . | #Seq (#aff)d . | Cnsge . | Gene (MIM)f . | MoIg . | HGVS Impacth . | PolyPHEN|GERP|CADDi . | Max gAFj . | Qatar AFj . |
---|---|---|---|---|---|---|---|---|---|---|
GD001 | Lafora disease | Neurological | 5 (3) | Yes | NHLRC1 (254780) | Rec | c.1168T>G p.Tyr390Asp | 0.992|5.76|25.9 | Novel | 3.99 × 10–3 |
GD002 | Severe developmental delay | Neurological | 5 (1) | Yes | TBCD (617193) | Rec | c.1423G>A p.Ala475Thr | 1|5.53|35 | 1.65 × 10−5 | Novel |
GD003 | Familial aortic stenosis | Cardiac | 4 (2) | No | COL5A1 (130000) | Dom | c.1889G>A p.Arg630Gln | 0.993|3.59|15.82 | 1.4 × 10−4 | 3.63 × 10−4 |
GD004 | Arthrogryposis multiplex and skeletal deformities | Multiple | 3 (1) | Yes | COL2A1 (108300) | Rec | c.985C>T p.Pro329Ser | 0.989|4.56|19.78 | Novel | Novel |
GD005 | Terminal osseous dysplasia | Skeletal | 4 (1) | Yes | FLNA (300017) | Rec | c.1451G>A p.Arg484Gln | 0.996|4.53|25.8 | 5.80 × 10−5 | Novel |
GD007 | Leigh’s disease | Neurological | 3 (1) | Yes | NDUFS4 (256000) | Rec | c.464_468dupCCAAG p.Ser157fsl | Unknown|5.6|22.6 | Novel | Novel |
GD008 | Surfactant deficiency | Pulmonary | 5 (1) | Yes | ABCA3 (610921) | Rec | c.1831T>C p.Cys611Arg | 1|6.17|24.3 | 8.24 × 10−6 | Novel |
GD009 | Progressive familial intrahepatic cholestasis | GI | 5 (3) | Yes | ABCB4 (602347) | Rec | c.526C>T p.Arg176Trp | 1|1.86|19.72 | 8.24 × 10−6 | 2.91 × 10–3 |
GD010 | Vici syndrome | Multiple | 3 (1) | Yes | EPG5 (242840) | Reck | c.7736G>A p.Arg2579Gln | 1|5.18|30.6 | 2.44 × 10−5 | Novel |
Reck | c.4475C>G p.Ala1492Gly | 0.891|4.71|23.4 | Novel | Novel | ||||||
GD011 | Developmental delay, hypotonia and seizures | Neurological | 5 (1) | No | CDK19 (614720) | DeN | c.517G>A; p.Asp173Asn | 1|5.29|32 | Novel | Novel |
GD012 | Global developmental delay, mental retardation | Neurological | 4 (1) | Yes | TTBK2 (604432) | Reck | c.3526C>T p.His1176Tyr | 0.996|5.24|26.7 | 3.30 × 10−5 | Novel |
Reck | c.2030C>G p.Thr677Arg | 0.998|4.89|24.6 | 1.32 × 10−4 | 3.63 × 10−4 | ||||||
GD014 | Leukoencephalopathy | Neurological | 6 (1) | Yes | EIF2B2 (603896) | Rec | c.283A>G p.Arg95Gly | 0.006|5.6|25.7 | Novel | 1.45 × 10−3 |
GD017 | Alport syndrome | Renal | 4 (1) | Yes | COL4A3 (203780) | Rec | c.2162delG p.Gly721fs | unknown|5.94|unknown | Novel | Novel |
GD018 | Alport syndrome | Renal | 4 (2) | Yes | COL4A3 (203780) | Rec | c.2162delG p.Gly721fs | Unknown|5.94|unknown | Novel | Novel |
GD019 | Papillon Lefevre syndrome | Dermatology | 4 (2) | Yes | CTSC (245000) | Rec | c.872_874delGTAp.Ser292del | Unknown|5.95|unknown | Novel | Novel |
GD020 | Bartter syndrome | Renal | 6 (2) | Yes | SLC12A1 (501678) | Rec | c.1316G>A p.Arg439Gln | 1|5.52|35 | 8.24 × 10−6 | Novel |
GD021 | Ichthyosis | Dermatology | 4 (3) | No | CYP4F22 (604777) | Rec | c.728G>A p.Arg243Hisl | 1|5.39|18.32 | 4.12 × 10−5 | 3.63 × 10−4 |
GD022 | Arterial tortuosity syndrome | Cardiovascular | 4 (3) | Yes | SLC2A10 (208050) | Rec | c.243C>G p.Ser81Argl | 0.999|4.53|17.94 | Novel | 7.63 × 10−3 |
GD027 | Hemolytic anemia | Hematology | 4 (1) | No | PKLR (266200) | DeN | c.823G>A p.Gly275Arg | 1|4.48|23.9 | 8.24E-06 | Novel |
GD028 | Usher syndrome, type 1B | Audiovisual | 4 (2) | Yes | MYO7A (276900) | Rec | c.5392C>T p.Gln1798*l | Unknown|4.68|40 | 1.66 × 10−5 | Novel |
GD030 | Stickler syndrome, type 1 | Multiple | 5 (1) | Yes | COL2A1 (108300) | Rec | c.2094+1G>A | Unknown|5.65|27.7 | Novel | Novel |
Family ID . | Diagnosisb . | Systemc . | #Seq (#aff)d . | Cnsge . | Gene (MIM)f . | MoIg . | HGVS Impacth . | PolyPHEN|GERP|CADDi . | Max gAFj . | Qatar AFj . |
---|---|---|---|---|---|---|---|---|---|---|
GD001 | Lafora disease | Neurological | 5 (3) | Yes | NHLRC1 (254780) | Rec | c.1168T>G p.Tyr390Asp | 0.992|5.76|25.9 | Novel | 3.99 × 10–3 |
GD002 | Severe developmental delay | Neurological | 5 (1) | Yes | TBCD (617193) | Rec | c.1423G>A p.Ala475Thr | 1|5.53|35 | 1.65 × 10−5 | Novel |
GD003 | Familial aortic stenosis | Cardiac | 4 (2) | No | COL5A1 (130000) | Dom | c.1889G>A p.Arg630Gln | 0.993|3.59|15.82 | 1.4 × 10−4 | 3.63 × 10−4 |
GD004 | Arthrogryposis multiplex and skeletal deformities | Multiple | 3 (1) | Yes | COL2A1 (108300) | Rec | c.985C>T p.Pro329Ser | 0.989|4.56|19.78 | Novel | Novel |
GD005 | Terminal osseous dysplasia | Skeletal | 4 (1) | Yes | FLNA (300017) | Rec | c.1451G>A p.Arg484Gln | 0.996|4.53|25.8 | 5.80 × 10−5 | Novel |
GD007 | Leigh’s disease | Neurological | 3 (1) | Yes | NDUFS4 (256000) | Rec | c.464_468dupCCAAG p.Ser157fsl | Unknown|5.6|22.6 | Novel | Novel |
GD008 | Surfactant deficiency | Pulmonary | 5 (1) | Yes | ABCA3 (610921) | Rec | c.1831T>C p.Cys611Arg | 1|6.17|24.3 | 8.24 × 10−6 | Novel |
GD009 | Progressive familial intrahepatic cholestasis | GI | 5 (3) | Yes | ABCB4 (602347) | Rec | c.526C>T p.Arg176Trp | 1|1.86|19.72 | 8.24 × 10−6 | 2.91 × 10–3 |
GD010 | Vici syndrome | Multiple | 3 (1) | Yes | EPG5 (242840) | Reck | c.7736G>A p.Arg2579Gln | 1|5.18|30.6 | 2.44 × 10−5 | Novel |
Reck | c.4475C>G p.Ala1492Gly | 0.891|4.71|23.4 | Novel | Novel | ||||||
GD011 | Developmental delay, hypotonia and seizures | Neurological | 5 (1) | No | CDK19 (614720) | DeN | c.517G>A; p.Asp173Asn | 1|5.29|32 | Novel | Novel |
GD012 | Global developmental delay, mental retardation | Neurological | 4 (1) | Yes | TTBK2 (604432) | Reck | c.3526C>T p.His1176Tyr | 0.996|5.24|26.7 | 3.30 × 10−5 | Novel |
Reck | c.2030C>G p.Thr677Arg | 0.998|4.89|24.6 | 1.32 × 10−4 | 3.63 × 10−4 | ||||||
GD014 | Leukoencephalopathy | Neurological | 6 (1) | Yes | EIF2B2 (603896) | Rec | c.283A>G p.Arg95Gly | 0.006|5.6|25.7 | Novel | 1.45 × 10−3 |
GD017 | Alport syndrome | Renal | 4 (1) | Yes | COL4A3 (203780) | Rec | c.2162delG p.Gly721fs | unknown|5.94|unknown | Novel | Novel |
GD018 | Alport syndrome | Renal | 4 (2) | Yes | COL4A3 (203780) | Rec | c.2162delG p.Gly721fs | Unknown|5.94|unknown | Novel | Novel |
GD019 | Papillon Lefevre syndrome | Dermatology | 4 (2) | Yes | CTSC (245000) | Rec | c.872_874delGTAp.Ser292del | Unknown|5.95|unknown | Novel | Novel |
GD020 | Bartter syndrome | Renal | 6 (2) | Yes | SLC12A1 (501678) | Rec | c.1316G>A p.Arg439Gln | 1|5.52|35 | 8.24 × 10−6 | Novel |
GD021 | Ichthyosis | Dermatology | 4 (3) | No | CYP4F22 (604777) | Rec | c.728G>A p.Arg243Hisl | 1|5.39|18.32 | 4.12 × 10−5 | 3.63 × 10−4 |
GD022 | Arterial tortuosity syndrome | Cardiovascular | 4 (3) | Yes | SLC2A10 (208050) | Rec | c.243C>G p.Ser81Argl | 0.999|4.53|17.94 | Novel | 7.63 × 10−3 |
GD027 | Hemolytic anemia | Hematology | 4 (1) | No | PKLR (266200) | DeN | c.823G>A p.Gly275Arg | 1|4.48|23.9 | 8.24E-06 | Novel |
GD028 | Usher syndrome, type 1B | Audiovisual | 4 (2) | Yes | MYO7A (276900) | Rec | c.5392C>T p.Gln1798*l | Unknown|4.68|40 | 1.66 × 10−5 | Novel |
GD030 | Stickler syndrome, type 1 | Multiple | 5 (1) | Yes | COL2A1 (108300) | Rec | c.2094+1G>A | Unknown|5.65|27.7 | Novel | Novel |
aHigh quality, rare, pathogenic variants detected in genes known to cause disease in 21 of 30 families tested. The family identifiers correspond to those in Figure 1.
bDiagnosis—final diagnosis made based on clinical and genetic findings.
cSystem—major organ system(s) affected by disease.
dSeq—number of family members sequenced; Aff—number of sequenced family members affected.
eCnsg—consanguinity of parents, Yes: parents are first or second degree relatives.
fGene—candidate gene harboring a diagnostic variant, MIM: Unique gene identification number from OMIM.
MoI—mode of inheritance based on variant zygosity in diagnostic gene, Rec—recessive (homozygous or compound heterozygous); Dom—dominant, inherited from an affected parent; DeN: de novo.
hFor each gene, the causative variant in the family is displayed in HGVS format, including effect on cDNA and predicted effect on amino acid sequence.
iFor each variant, pathogenicity scores computed by PolyPHEN, GERP and CADD.
Maximum allele frequency includes public databases (dbSNP, 1000 Genomes Project and ExAC) followed by the Qatari database of 1376 individuals.k
kFamilies where affected members were compound heterozygous; thus all fields are identical except the biallelic variants, which appear on separate rows.
lFor four families, the causative variants discovered in this study were previously reported (known disease-causing variants).
Family ID . | Diagnosisb . | Systemc . | #Seq (#aff)d . | Cnsge . | Gene (MIM)f . | MoIg . | HGVS Impacth . | PolyPHEN|GERP|CADDi . | Max gAFj . | Qatar AFj . |
---|---|---|---|---|---|---|---|---|---|---|
GD001 | Lafora disease | Neurological | 5 (3) | Yes | NHLRC1 (254780) | Rec | c.1168T>G p.Tyr390Asp | 0.992|5.76|25.9 | Novel | 3.99 × 10–3 |
GD002 | Severe developmental delay | Neurological | 5 (1) | Yes | TBCD (617193) | Rec | c.1423G>A p.Ala475Thr | 1|5.53|35 | 1.65 × 10−5 | Novel |
GD003 | Familial aortic stenosis | Cardiac | 4 (2) | No | COL5A1 (130000) | Dom | c.1889G>A p.Arg630Gln | 0.993|3.59|15.82 | 1.4 × 10−4 | 3.63 × 10−4 |
GD004 | Arthrogryposis multiplex and skeletal deformities | Multiple | 3 (1) | Yes | COL2A1 (108300) | Rec | c.985C>T p.Pro329Ser | 0.989|4.56|19.78 | Novel | Novel |
GD005 | Terminal osseous dysplasia | Skeletal | 4 (1) | Yes | FLNA (300017) | Rec | c.1451G>A p.Arg484Gln | 0.996|4.53|25.8 | 5.80 × 10−5 | Novel |
GD007 | Leigh’s disease | Neurological | 3 (1) | Yes | NDUFS4 (256000) | Rec | c.464_468dupCCAAG p.Ser157fsl | Unknown|5.6|22.6 | Novel | Novel |
GD008 | Surfactant deficiency | Pulmonary | 5 (1) | Yes | ABCA3 (610921) | Rec | c.1831T>C p.Cys611Arg | 1|6.17|24.3 | 8.24 × 10−6 | Novel |
GD009 | Progressive familial intrahepatic cholestasis | GI | 5 (3) | Yes | ABCB4 (602347) | Rec | c.526C>T p.Arg176Trp | 1|1.86|19.72 | 8.24 × 10−6 | 2.91 × 10–3 |
GD010 | Vici syndrome | Multiple | 3 (1) | Yes | EPG5 (242840) | Reck | c.7736G>A p.Arg2579Gln | 1|5.18|30.6 | 2.44 × 10−5 | Novel |
Reck | c.4475C>G p.Ala1492Gly | 0.891|4.71|23.4 | Novel | Novel | ||||||
GD011 | Developmental delay, hypotonia and seizures | Neurological | 5 (1) | No | CDK19 (614720) | DeN | c.517G>A; p.Asp173Asn | 1|5.29|32 | Novel | Novel |
GD012 | Global developmental delay, mental retardation | Neurological | 4 (1) | Yes | TTBK2 (604432) | Reck | c.3526C>T p.His1176Tyr | 0.996|5.24|26.7 | 3.30 × 10−5 | Novel |
Reck | c.2030C>G p.Thr677Arg | 0.998|4.89|24.6 | 1.32 × 10−4 | 3.63 × 10−4 | ||||||
GD014 | Leukoencephalopathy | Neurological | 6 (1) | Yes | EIF2B2 (603896) | Rec | c.283A>G p.Arg95Gly | 0.006|5.6|25.7 | Novel | 1.45 × 10−3 |
GD017 | Alport syndrome | Renal | 4 (1) | Yes | COL4A3 (203780) | Rec | c.2162delG p.Gly721fs | unknown|5.94|unknown | Novel | Novel |
GD018 | Alport syndrome | Renal | 4 (2) | Yes | COL4A3 (203780) | Rec | c.2162delG p.Gly721fs | Unknown|5.94|unknown | Novel | Novel |
GD019 | Papillon Lefevre syndrome | Dermatology | 4 (2) | Yes | CTSC (245000) | Rec | c.872_874delGTAp.Ser292del | Unknown|5.95|unknown | Novel | Novel |
GD020 | Bartter syndrome | Renal | 6 (2) | Yes | SLC12A1 (501678) | Rec | c.1316G>A p.Arg439Gln | 1|5.52|35 | 8.24 × 10−6 | Novel |
GD021 | Ichthyosis | Dermatology | 4 (3) | No | CYP4F22 (604777) | Rec | c.728G>A p.Arg243Hisl | 1|5.39|18.32 | 4.12 × 10−5 | 3.63 × 10−4 |
GD022 | Arterial tortuosity syndrome | Cardiovascular | 4 (3) | Yes | SLC2A10 (208050) | Rec | c.243C>G p.Ser81Argl | 0.999|4.53|17.94 | Novel | 7.63 × 10−3 |
GD027 | Hemolytic anemia | Hematology | 4 (1) | No | PKLR (266200) | DeN | c.823G>A p.Gly275Arg | 1|4.48|23.9 | 8.24E-06 | Novel |
GD028 | Usher syndrome, type 1B | Audiovisual | 4 (2) | Yes | MYO7A (276900) | Rec | c.5392C>T p.Gln1798*l | Unknown|4.68|40 | 1.66 × 10−5 | Novel |
GD030 | Stickler syndrome, type 1 | Multiple | 5 (1) | Yes | COL2A1 (108300) | Rec | c.2094+1G>A | Unknown|5.65|27.7 | Novel | Novel |
Family ID . | Diagnosisb . | Systemc . | #Seq (#aff)d . | Cnsge . | Gene (MIM)f . | MoIg . | HGVS Impacth . | PolyPHEN|GERP|CADDi . | Max gAFj . | Qatar AFj . |
---|---|---|---|---|---|---|---|---|---|---|
GD001 | Lafora disease | Neurological | 5 (3) | Yes | NHLRC1 (254780) | Rec | c.1168T>G p.Tyr390Asp | 0.992|5.76|25.9 | Novel | 3.99 × 10–3 |
GD002 | Severe developmental delay | Neurological | 5 (1) | Yes | TBCD (617193) | Rec | c.1423G>A p.Ala475Thr | 1|5.53|35 | 1.65 × 10−5 | Novel |
GD003 | Familial aortic stenosis | Cardiac | 4 (2) | No | COL5A1 (130000) | Dom | c.1889G>A p.Arg630Gln | 0.993|3.59|15.82 | 1.4 × 10−4 | 3.63 × 10−4 |
GD004 | Arthrogryposis multiplex and skeletal deformities | Multiple | 3 (1) | Yes | COL2A1 (108300) | Rec | c.985C>T p.Pro329Ser | 0.989|4.56|19.78 | Novel | Novel |
GD005 | Terminal osseous dysplasia | Skeletal | 4 (1) | Yes | FLNA (300017) | Rec | c.1451G>A p.Arg484Gln | 0.996|4.53|25.8 | 5.80 × 10−5 | Novel |
GD007 | Leigh’s disease | Neurological | 3 (1) | Yes | NDUFS4 (256000) | Rec | c.464_468dupCCAAG p.Ser157fsl | Unknown|5.6|22.6 | Novel | Novel |
GD008 | Surfactant deficiency | Pulmonary | 5 (1) | Yes | ABCA3 (610921) | Rec | c.1831T>C p.Cys611Arg | 1|6.17|24.3 | 8.24 × 10−6 | Novel |
GD009 | Progressive familial intrahepatic cholestasis | GI | 5 (3) | Yes | ABCB4 (602347) | Rec | c.526C>T p.Arg176Trp | 1|1.86|19.72 | 8.24 × 10−6 | 2.91 × 10–3 |
GD010 | Vici syndrome | Multiple | 3 (1) | Yes | EPG5 (242840) | Reck | c.7736G>A p.Arg2579Gln | 1|5.18|30.6 | 2.44 × 10−5 | Novel |
Reck | c.4475C>G p.Ala1492Gly | 0.891|4.71|23.4 | Novel | Novel | ||||||
GD011 | Developmental delay, hypotonia and seizures | Neurological | 5 (1) | No | CDK19 (614720) | DeN | c.517G>A; p.Asp173Asn | 1|5.29|32 | Novel | Novel |
GD012 | Global developmental delay, mental retardation | Neurological | 4 (1) | Yes | TTBK2 (604432) | Reck | c.3526C>T p.His1176Tyr | 0.996|5.24|26.7 | 3.30 × 10−5 | Novel |
Reck | c.2030C>G p.Thr677Arg | 0.998|4.89|24.6 | 1.32 × 10−4 | 3.63 × 10−4 | ||||||
GD014 | Leukoencephalopathy | Neurological | 6 (1) | Yes | EIF2B2 (603896) | Rec | c.283A>G p.Arg95Gly | 0.006|5.6|25.7 | Novel | 1.45 × 10−3 |
GD017 | Alport syndrome | Renal | 4 (1) | Yes | COL4A3 (203780) | Rec | c.2162delG p.Gly721fs | unknown|5.94|unknown | Novel | Novel |
GD018 | Alport syndrome | Renal | 4 (2) | Yes | COL4A3 (203780) | Rec | c.2162delG p.Gly721fs | Unknown|5.94|unknown | Novel | Novel |
GD019 | Papillon Lefevre syndrome | Dermatology | 4 (2) | Yes | CTSC (245000) | Rec | c.872_874delGTAp.Ser292del | Unknown|5.95|unknown | Novel | Novel |
GD020 | Bartter syndrome | Renal | 6 (2) | Yes | SLC12A1 (501678) | Rec | c.1316G>A p.Arg439Gln | 1|5.52|35 | 8.24 × 10−6 | Novel |
GD021 | Ichthyosis | Dermatology | 4 (3) | No | CYP4F22 (604777) | Rec | c.728G>A p.Arg243Hisl | 1|5.39|18.32 | 4.12 × 10−5 | 3.63 × 10−4 |
GD022 | Arterial tortuosity syndrome | Cardiovascular | 4 (3) | Yes | SLC2A10 (208050) | Rec | c.243C>G p.Ser81Argl | 0.999|4.53|17.94 | Novel | 7.63 × 10−3 |
GD027 | Hemolytic anemia | Hematology | 4 (1) | No | PKLR (266200) | DeN | c.823G>A p.Gly275Arg | 1|4.48|23.9 | 8.24E-06 | Novel |
GD028 | Usher syndrome, type 1B | Audiovisual | 4 (2) | Yes | MYO7A (276900) | Rec | c.5392C>T p.Gln1798*l | Unknown|4.68|40 | 1.66 × 10−5 | Novel |
GD030 | Stickler syndrome, type 1 | Multiple | 5 (1) | Yes | COL2A1 (108300) | Rec | c.2094+1G>A | Unknown|5.65|27.7 | Novel | Novel |
aHigh quality, rare, pathogenic variants detected in genes known to cause disease in 21 of 30 families tested. The family identifiers correspond to those in Figure 1.
bDiagnosis—final diagnosis made based on clinical and genetic findings.
cSystem—major organ system(s) affected by disease.
dSeq—number of family members sequenced; Aff—number of sequenced family members affected.
eCnsg—consanguinity of parents, Yes: parents are first or second degree relatives.
fGene—candidate gene harboring a diagnostic variant, MIM: Unique gene identification number from OMIM.
MoI—mode of inheritance based on variant zygosity in diagnostic gene, Rec—recessive (homozygous or compound heterozygous); Dom—dominant, inherited from an affected parent; DeN: de novo.
hFor each gene, the causative variant in the family is displayed in HGVS format, including effect on cDNA and predicted effect on amino acid sequence.
iFor each variant, pathogenicity scores computed by PolyPHEN, GERP and CADD.
Maximum allele frequency includes public databases (dbSNP, 1000 Genomes Project and ExAC) followed by the Qatari database of 1376 individuals.k
kFamilies where affected members were compound heterozygous; thus all fields are identical except the biallelic variants, which appear on separate rows.
lFor four families, the causative variants discovered in this study were previously reported (known disease-causing variants).
Six of thirty families with candidate pathogenic variants in known disease-related genes that partially explain phenotypic presentationa
ID . | Clinical featuresb . | #Seq (#Aff)c . | Cnsgd . | Recessivee . | Dominante . | Candidate disease genes?f . | HGVS Impactg . | Segregationh . |
---|---|---|---|---|---|---|---|---|
GD006 | Neurological: developmental delay hypotonia; pulmonary: respiratory distress; renal: hydronephrosis | 3 (1) | No | ANO8, GSAP | PURA | PURA (616158) Autosomal Dominant Mental Retardation | c.267delC; p.Ala89fs | De novo |
GD013 | GI: jejunal atresia, apple peel appearance and hyperglycemia | 3 (1) | No | NUMA1, RGS11 | ACTB | ACTB (243310) Baraitser–Winter syndrome | c.1043C>T; p.Ser348Leu | De novo |
GD015 | Neurological: hypotonia, developmental delay and epilepsy | 6 (1) | Yes | HSPA6, MFAP3L | LARGE, HAS2 | LARGE (603590) Associated with recessive congenital muscular dystrophy and mental retardation; however, no mutation found on other allele | c.G1449C p.Asp483Glu | De novo |
GD016 | Visual: myopia (AD); facial: mildly dysmorphic ears | 6 (4) | No | 0 | PCSK9, MAK16, TLL2 | TLL2 (606743) regulates BMP2 and BMP4, both implicated in myopia | c.1381G>A; p.Glu461Lys | Dominant from affected parent |
GD023 | GI: Protein-losing enteropathy, high tool alph-1 antitrypsin; Facial: puffy eyelids; General: sacral and lower limb edema; Immunology: low IgG levels require monthly immunoglobulin | 4 (1) | Yes | EGFR, FAM92A1, MED25, RBL2, SORBS1, THSD1, ZNF423 | 0 | MED25 (610197): Charcot–Marie tooth and Basel–Vanagaite–Smirin-Yosef Syndrome ZNF423 (604557): Joubert syndrome | MED25: c.1101+4G>A ZNF423: c.2738C>T p.Pro913Leu | Rec, Rec |
GD024 | Neurological: infantile epileptic encephalopathy; Facial: dysmorphic features growth retardation | 3 (1) | Yes | DOCK8, FIBIN, LRP10, PLA2G4F, REC8, TBL3, TNRC18, ZBTB33 | SRGAP2, THBS4 | SRGAP2 (606524): Associated with neuronal migration, deleted in a patient with epileptic encephalopathy | c.380G>A; p.Arg127His | De novo |
ID . | Clinical featuresb . | #Seq (#Aff)c . | Cnsgd . | Recessivee . | Dominante . | Candidate disease genes?f . | HGVS Impactg . | Segregationh . |
---|---|---|---|---|---|---|---|---|
GD006 | Neurological: developmental delay hypotonia; pulmonary: respiratory distress; renal: hydronephrosis | 3 (1) | No | ANO8, GSAP | PURA | PURA (616158) Autosomal Dominant Mental Retardation | c.267delC; p.Ala89fs | De novo |
GD013 | GI: jejunal atresia, apple peel appearance and hyperglycemia | 3 (1) | No | NUMA1, RGS11 | ACTB | ACTB (243310) Baraitser–Winter syndrome | c.1043C>T; p.Ser348Leu | De novo |
GD015 | Neurological: hypotonia, developmental delay and epilepsy | 6 (1) | Yes | HSPA6, MFAP3L | LARGE, HAS2 | LARGE (603590) Associated with recessive congenital muscular dystrophy and mental retardation; however, no mutation found on other allele | c.G1449C p.Asp483Glu | De novo |
GD016 | Visual: myopia (AD); facial: mildly dysmorphic ears | 6 (4) | No | 0 | PCSK9, MAK16, TLL2 | TLL2 (606743) regulates BMP2 and BMP4, both implicated in myopia | c.1381G>A; p.Glu461Lys | Dominant from affected parent |
GD023 | GI: Protein-losing enteropathy, high tool alph-1 antitrypsin; Facial: puffy eyelids; General: sacral and lower limb edema; Immunology: low IgG levels require monthly immunoglobulin | 4 (1) | Yes | EGFR, FAM92A1, MED25, RBL2, SORBS1, THSD1, ZNF423 | 0 | MED25 (610197): Charcot–Marie tooth and Basel–Vanagaite–Smirin-Yosef Syndrome ZNF423 (604557): Joubert syndrome | MED25: c.1101+4G>A ZNF423: c.2738C>T p.Pro913Leu | Rec, Rec |
GD024 | Neurological: infantile epileptic encephalopathy; Facial: dysmorphic features growth retardation | 3 (1) | Yes | DOCK8, FIBIN, LRP10, PLA2G4F, REC8, TBL3, TNRC18, ZBTB33 | SRGAP2, THBS4 | SRGAP2 (606524): Associated with neuronal migration, deleted in a patient with epileptic encephalopathy | c.380G>A; p.Arg127His | De novo |
aFor six families where a single gene did not fully explain the clinical presentation, all genes harboring rare, predicted pathogenic variants are listed according to their observed segregation (recessive or dominant). Of these, any gene(s) previously linked to any of the clinical features present in the patient (Column 2) are then listed under the ‘Candidate genes’ (Column 7) with their corresponding MIM number and a summary of associated phenotype(s) reported for this gene. All family IDs correspond to those in Figure 1.
bClinical features: summary of all major clinical features in the affected individuals in this family, some of which remain unexplained by the candidate disease gene(s).
cSeq—number of family members sequenced; Aff—number of sequenced family members affected.
dCnsg—consanguinity of parents. Yes: parents are first or second degree relatives.
eAll genes harboring putatively damaging variants (rare, predicted damaging by multiple scores) consistent with recessive or dominant (de novo) inheritance.
fAny diseases known to be caused by variants in the candidate genes that may partially overlap clinical features. For each candidate gene, its MIM number appears in parentheses.
gFor each candidate gene, the candidate damaging variant is displayed in HGVS format, including effect on cDNA and predicted effect on amino acid sequence.
hSegregation—assumed segregation of disease based on whether the candidate gene variant was homozygous or heterozygous in the affected members.
Six of thirty families with candidate pathogenic variants in known disease-related genes that partially explain phenotypic presentationa
ID . | Clinical featuresb . | #Seq (#Aff)c . | Cnsgd . | Recessivee . | Dominante . | Candidate disease genes?f . | HGVS Impactg . | Segregationh . |
---|---|---|---|---|---|---|---|---|
GD006 | Neurological: developmental delay hypotonia; pulmonary: respiratory distress; renal: hydronephrosis | 3 (1) | No | ANO8, GSAP | PURA | PURA (616158) Autosomal Dominant Mental Retardation | c.267delC; p.Ala89fs | De novo |
GD013 | GI: jejunal atresia, apple peel appearance and hyperglycemia | 3 (1) | No | NUMA1, RGS11 | ACTB | ACTB (243310) Baraitser–Winter syndrome | c.1043C>T; p.Ser348Leu | De novo |
GD015 | Neurological: hypotonia, developmental delay and epilepsy | 6 (1) | Yes | HSPA6, MFAP3L | LARGE, HAS2 | LARGE (603590) Associated with recessive congenital muscular dystrophy and mental retardation; however, no mutation found on other allele | c.G1449C p.Asp483Glu | De novo |
GD016 | Visual: myopia (AD); facial: mildly dysmorphic ears | 6 (4) | No | 0 | PCSK9, MAK16, TLL2 | TLL2 (606743) regulates BMP2 and BMP4, both implicated in myopia | c.1381G>A; p.Glu461Lys | Dominant from affected parent |
GD023 | GI: Protein-losing enteropathy, high tool alph-1 antitrypsin; Facial: puffy eyelids; General: sacral and lower limb edema; Immunology: low IgG levels require monthly immunoglobulin | 4 (1) | Yes | EGFR, FAM92A1, MED25, RBL2, SORBS1, THSD1, ZNF423 | 0 | MED25 (610197): Charcot–Marie tooth and Basel–Vanagaite–Smirin-Yosef Syndrome ZNF423 (604557): Joubert syndrome | MED25: c.1101+4G>A ZNF423: c.2738C>T p.Pro913Leu | Rec, Rec |
GD024 | Neurological: infantile epileptic encephalopathy; Facial: dysmorphic features growth retardation | 3 (1) | Yes | DOCK8, FIBIN, LRP10, PLA2G4F, REC8, TBL3, TNRC18, ZBTB33 | SRGAP2, THBS4 | SRGAP2 (606524): Associated with neuronal migration, deleted in a patient with epileptic encephalopathy | c.380G>A; p.Arg127His | De novo |
ID . | Clinical featuresb . | #Seq (#Aff)c . | Cnsgd . | Recessivee . | Dominante . | Candidate disease genes?f . | HGVS Impactg . | Segregationh . |
---|---|---|---|---|---|---|---|---|
GD006 | Neurological: developmental delay hypotonia; pulmonary: respiratory distress; renal: hydronephrosis | 3 (1) | No | ANO8, GSAP | PURA | PURA (616158) Autosomal Dominant Mental Retardation | c.267delC; p.Ala89fs | De novo |
GD013 | GI: jejunal atresia, apple peel appearance and hyperglycemia | 3 (1) | No | NUMA1, RGS11 | ACTB | ACTB (243310) Baraitser–Winter syndrome | c.1043C>T; p.Ser348Leu | De novo |
GD015 | Neurological: hypotonia, developmental delay and epilepsy | 6 (1) | Yes | HSPA6, MFAP3L | LARGE, HAS2 | LARGE (603590) Associated with recessive congenital muscular dystrophy and mental retardation; however, no mutation found on other allele | c.G1449C p.Asp483Glu | De novo |
GD016 | Visual: myopia (AD); facial: mildly dysmorphic ears | 6 (4) | No | 0 | PCSK9, MAK16, TLL2 | TLL2 (606743) regulates BMP2 and BMP4, both implicated in myopia | c.1381G>A; p.Glu461Lys | Dominant from affected parent |
GD023 | GI: Protein-losing enteropathy, high tool alph-1 antitrypsin; Facial: puffy eyelids; General: sacral and lower limb edema; Immunology: low IgG levels require monthly immunoglobulin | 4 (1) | Yes | EGFR, FAM92A1, MED25, RBL2, SORBS1, THSD1, ZNF423 | 0 | MED25 (610197): Charcot–Marie tooth and Basel–Vanagaite–Smirin-Yosef Syndrome ZNF423 (604557): Joubert syndrome | MED25: c.1101+4G>A ZNF423: c.2738C>T p.Pro913Leu | Rec, Rec |
GD024 | Neurological: infantile epileptic encephalopathy; Facial: dysmorphic features growth retardation | 3 (1) | Yes | DOCK8, FIBIN, LRP10, PLA2G4F, REC8, TBL3, TNRC18, ZBTB33 | SRGAP2, THBS4 | SRGAP2 (606524): Associated with neuronal migration, deleted in a patient with epileptic encephalopathy | c.380G>A; p.Arg127His | De novo |
aFor six families where a single gene did not fully explain the clinical presentation, all genes harboring rare, predicted pathogenic variants are listed according to their observed segregation (recessive or dominant). Of these, any gene(s) previously linked to any of the clinical features present in the patient (Column 2) are then listed under the ‘Candidate genes’ (Column 7) with their corresponding MIM number and a summary of associated phenotype(s) reported for this gene. All family IDs correspond to those in Figure 1.
bClinical features: summary of all major clinical features in the affected individuals in this family, some of which remain unexplained by the candidate disease gene(s).
cSeq—number of family members sequenced; Aff—number of sequenced family members affected.
dCnsg—consanguinity of parents. Yes: parents are first or second degree relatives.
eAll genes harboring putatively damaging variants (rare, predicted damaging by multiple scores) consistent with recessive or dominant (de novo) inheritance.
fAny diseases known to be caused by variants in the candidate genes that may partially overlap clinical features. For each candidate gene, its MIM number appears in parentheses.
gFor each candidate gene, the candidate damaging variant is displayed in HGVS format, including effect on cDNA and predicted effect on amino acid sequence.
hSegregation—assumed segregation of disease based on whether the candidate gene variant was homozygous or heterozygous in the affected members.
Genetic findings and clinical features of three patients in unsolved families
ID . | Clinical featuresb . | #Seq (#Aff)c . | Cnsgd . | Genes with recessive variantse . | Genes with de novo variantse . |
---|---|---|---|---|---|
GD025 | Pulmonary: meconium aspiration syndrome, chronic lung changes with unknown cause, suspected interstitial lung disease, suspected primary ciliary dyskinesia, sepsis, congenital surfactant deficiency | 5 (1) | No | - | PAPS22 |
GD029 | Gastrointestinal: hepatomegaly for investigation and high liver function test. Glycogen storage disease. Ultrasound abdomen and pelvis: mild hepatomegaly. Other systems: intact | 4 (1) | Yes | KHNYN, RBM41, MBTPS2 | INCENP, MMP12 |
GD026 | GI: hyperbilirubinemia, suspected citrine deficiency or glycogen storage disease. The reported mutation in this condition is SLC25A13; vision: retinitis pigmentosa; facial: dysmorphic features | 5 (3) | Yes | RXFP2, FAM47A, SLC24A5, NKX1-2 | - |
ID . | Clinical featuresb . | #Seq (#Aff)c . | Cnsgd . | Genes with recessive variantse . | Genes with de novo variantse . |
---|---|---|---|---|---|
GD025 | Pulmonary: meconium aspiration syndrome, chronic lung changes with unknown cause, suspected interstitial lung disease, suspected primary ciliary dyskinesia, sepsis, congenital surfactant deficiency | 5 (1) | No | - | PAPS22 |
GD029 | Gastrointestinal: hepatomegaly for investigation and high liver function test. Glycogen storage disease. Ultrasound abdomen and pelvis: mild hepatomegaly. Other systems: intact | 4 (1) | Yes | KHNYN, RBM41, MBTPS2 | INCENP, MMP12 |
GD026 | GI: hyperbilirubinemia, suspected citrine deficiency or glycogen storage disease. The reported mutation in this condition is SLC25A13; vision: retinitis pigmentosa; facial: dysmorphic features | 5 (3) | Yes | RXFP2, FAM47A, SLC24A5, NKX1-2 | - |
aFor three families, no candidate genes with putatively deleterious variants were able to explain any of the clinical findings. All Family IDs correspond to those in Figure 1.
bClinical features—summary of all major clinical features in the affected individuals in this family, which are unexplained by the candidate disease gene(s).
cSeq—number of family members sequenced; Aff—number of sequenced family members affected.
dCnsg—consanguinity of parents. Yes: parents are first or second degree relatives.
eAll genes harboring putatively damaging variants (rare, predicted damaging by multiple scores) segregating with disease in the family, based on recessive or dominant (de novo) inheritance.
Genetic findings and clinical features of three patients in unsolved families
ID . | Clinical featuresb . | #Seq (#Aff)c . | Cnsgd . | Genes with recessive variantse . | Genes with de novo variantse . |
---|---|---|---|---|---|
GD025 | Pulmonary: meconium aspiration syndrome, chronic lung changes with unknown cause, suspected interstitial lung disease, suspected primary ciliary dyskinesia, sepsis, congenital surfactant deficiency | 5 (1) | No | - | PAPS22 |
GD029 | Gastrointestinal: hepatomegaly for investigation and high liver function test. Glycogen storage disease. Ultrasound abdomen and pelvis: mild hepatomegaly. Other systems: intact | 4 (1) | Yes | KHNYN, RBM41, MBTPS2 | INCENP, MMP12 |
GD026 | GI: hyperbilirubinemia, suspected citrine deficiency or glycogen storage disease. The reported mutation in this condition is SLC25A13; vision: retinitis pigmentosa; facial: dysmorphic features | 5 (3) | Yes | RXFP2, FAM47A, SLC24A5, NKX1-2 | - |
ID . | Clinical featuresb . | #Seq (#Aff)c . | Cnsgd . | Genes with recessive variantse . | Genes with de novo variantse . |
---|---|---|---|---|---|
GD025 | Pulmonary: meconium aspiration syndrome, chronic lung changes with unknown cause, suspected interstitial lung disease, suspected primary ciliary dyskinesia, sepsis, congenital surfactant deficiency | 5 (1) | No | - | PAPS22 |
GD029 | Gastrointestinal: hepatomegaly for investigation and high liver function test. Glycogen storage disease. Ultrasound abdomen and pelvis: mild hepatomegaly. Other systems: intact | 4 (1) | Yes | KHNYN, RBM41, MBTPS2 | INCENP, MMP12 |
GD026 | GI: hyperbilirubinemia, suspected citrine deficiency or glycogen storage disease. The reported mutation in this condition is SLC25A13; vision: retinitis pigmentosa; facial: dysmorphic features | 5 (3) | Yes | RXFP2, FAM47A, SLC24A5, NKX1-2 | - |
aFor three families, no candidate genes with putatively deleterious variants were able to explain any of the clinical findings. All Family IDs correspond to those in Figure 1.
bClinical features—summary of all major clinical features in the affected individuals in this family, which are unexplained by the candidate disease gene(s).
cSeq—number of family members sequenced; Aff—number of sequenced family members affected.
dCnsg—consanguinity of parents. Yes: parents are first or second degree relatives.
eAll genes harboring putatively damaging variants (rare, predicted damaging by multiple scores) segregating with disease in the family, based on recessive or dominant (de novo) inheritance.
Phenotypic heterogeneity and novel genes
For six families, making a molecular diagnosis with WES proved more challenging. Five of these families also had novel, predicted deleterious mutations in known disease-causing genes. However, in addition to the isolated disorders with which those genes had been previously reported, these five patients had multiple organ system abnormalities outside of the phenotype(s) classically associated with each gene (Table 2). These may represent phenotypic expansions of known diseases or patients in whom more than one disease may be manifest.
For example, the index case in GD006 presented with a combination of developmental delay, hypotonia, respiratory distress and renal hydronephrosis. WES uncovered a de novo frameshift variant (p.Ala89fs) in PURA (MIM: 616158), a gene associated with autosomal dominant mental retardation (18). While patients with variants in the PURA gene have been reported to present with isolated respiratory difficulties, this is the first case with renal involvement and hypotonia. Another example is the gene SRGAP2 (MIM: 606524), important for neuronal migration and associated with epileptic encephalopathy (19). While previous reports have found this gene deleted in human patients (20), we discovered a de novo variant (p.Arg127His) that is predicted highly damaging (CADD: 28, GERP: 5.4) in this gene in family GD024, where the index suffered growth retardation and had dysmorphic features in addition to infantile epileptic encephalopathy. These may represent phenotypic expansion or may be caused by one of the nine other genes harboring putatively deleterious identified variants in this family, none of which are linked to phenotypes in the literature. Similarly, our analysis identified a possible novel disease gene causing severe myopia in family GD016, where four affected individuals in three generations were found to share a dominant mutation in tolloid-like-2 (TLL2, MIM: 606743), a bone morphogenetic proteins (BMP)–like peptidase without previous disease association. Notably, BMP-like peptidases activate certain TGF-β complexes, including BMP2 (21), in which variants have been previously reported to cause myopia (22), indicating a possible mechanistic link between the pathway and the disease phenotype in this family. Observing this putatively deleterious variant among only three that segregate with all affected individuals in the family supports a role in disease, but it does not explain the additional dysmorphic facial features also observed in this family.
Thus, in these six families, additional clinical features may represent simple phenotype expansions or may be a result of variants in other candidate genes with putatively damaging variants segregating in each family. We provide a list of such genes for each family, and believe this level of data sharing could become standard when reporting NGS variation in incompletely diagnosed families (Table 2).
Unsolved families
For three families, we could not identify gene candidates explaining the phenotype (Table 2). Two of these families represented complex phenotypes with consanguinity, yet none of the recessive variants occurred in a gene of known function in the literature. For example, in family GD026, three individuals were affected by a syndrome causing vision defects, dysmorphic features and gastrointestinal complications. The affected individuals shared putatively deleterious recessive variants in four genes, only one of which was previously reported to be involved in a disease (NKX1–2; Entrez: 390010—associated with transposition of the great arteries) (23). However, no cardiovascular features were observed in any of the affected siblings. Similarly, family GD025 had a single index case with no history of consanguinity whose single de novo variant occurred in a gene (PAPS22) that could not be directly linked to the complex pulmonary phenotype. These families could have pathogenic non-coding or structural variants that may be solved by whole genome sequencing.
Discussion
This study represents the pilot phase of the Qatari Mendelian Disease Program, a multi-stakeholder effort to recruit, sequence and identify causative genes for a wide range of monogenic and Mendelian disorders in Qatar. Altogether, we enrolled 48 patients and their families (total of 130 individuals in 30 unique families) and used WES to identify putative genetic causes. We were able to identify candidate diagnostic mutations or partial causes in 25 unique genes in 27 families (90%), 23 (85%) of which harbored novel alleles predicted deleterious in genes known to cause Mendelian disease. This proportion of new pathogenic variants reinforces the observation that there is significant allelic heterogeneity underlying Mendelian diseases in Middle Eastern populations and represents founder mutations relevant to newborn screening programs within this population.
In addition to developing custom Mendelian disease sequencing and variant interpretation capacity for Qatar, we demonstrate several other features that are important for consideration in the design of disease-centric sequencing programs in the Middle East, which could be generalized to other populations that are poorly represented in global databases.
First, while clinical sequencing protocols aim for very high mean exome depth (>120×) (4,24), we observed that both the number of high quality variants as well as overall exon-level coverage plateaus as early as ~75×. Importantly, even as low as 45×, we still observed > 96% of variants with a genotype quality more than 30. Notably, even in cases where a child and parent were sequenced to highly different depths (in some cases, up to three-fold deeper in the child compared to either parent), we did not observe a significant increase in de novo variants or decrease in Mendelian-consistent recessive variants, both of which would be expected if sequencing depth in parents was insufficient to detect the alternate alleles with high enough confidence. Together, these observations could have implications for multiplexing and for lowering NGS costs.
Second, in addition to depth of sequencing considerations, we also demonstrated the importance of enrolling as many siblings as available. We observed, for example, that index cases may appear to have as many as 10 de novo protein-altering variants when compared only against their parents, requiring significant time and resource investment in validation experiments. However, introduction of a single sibling can reduce that number by more than half, while introduction of two siblings reduced the median number of high quality protein-altering de novo variants to 0.5 per individual, consistent with previous reports (25,26).
Third, the effect of additional siblings was also significant for prioritization of recessive variants. Paradoxically, though increased consanguinity increases the likelihood of a disease being caused by recessive variants, it also increases the fraction of rare homozygous variants per individual, making the analysis of singleton patients with consanguineous parents more challenging than is generally assumed. In these cases, we demonstrated the benefit of working with Middle Eastern families whose large sizes facilitate the assessment of multiple siblings. The effect of adding siblings to the analysis of recessive variants was even more drastic than for de novo variation. Between 12% and 42% of recessive variants discovered in an index case were shared by a single sibling, and only 1.3% to 11% were shared by two siblings. For families where there were three affected siblings, for example, GD001, the only variant remaining after filtration was the disease-causing variant. Conversely, in settings where siblings do not share the phenotype, the additional siblings can help sort benign family-specific polymorphisms from bona fide disease variants. Thus, by increasing number of sampled individuals in a family, the ability to pinpoint candidate variants efficiently is significantly enhanced. As the cost of sequencing declines and NGS adoption in clinical practice becomes mainstream, our data support enrollment of siblings to speed up and improve variant prioritization upstream in the analytical pipeline, before the time-consuming manual evaluation of remaining candidate deleterious mutations.
An important factor in our ability to solve a substantial fraction of cases was the size of families in the Middle East, a feature that may not translate well to different parts of the world where nuclear families are smaller. Nonetheless, in the absence of siblings, we found that using closely matching population controls can also yield substantial benefits for variant prioritization. In this study, we use less than 1500 sequenced Qatari controls to eliminate >60% of variants absent or present at very low frequency in all known publically available databases (>250 000 alleles combined). This effect was observed for variants of all classes, including those that were predicted damaging or affected evolutionarily conserved residues. We previously reported how variant calling against the publically available reference genome (GRCh37) resulted in erroneous classification of thousands of ‘variants’ that were in fact ‘reference’ in our population due to their high frequency (>50%) (11). This study demonstrates the utility of population-specific databases on the other side of the allelic spectrum, where variants appearing to be rare in public databases can be eliminated from further consideration due to their elevated frequencies within a highly homogenous subpopulation. Such advantages can be generalized to other population isolates that are poorly represented in global variation databases.
Utility of WES for Middle Eastern consanguineous populations
Despite enrolling a wide range of phenotypes, WES was successful at finding candidate genetic variants for 90% of all cases. While such diagnostic rates may appear high relative to international efforts sequencing Mendelian disease (1–4), they are consistent with the range reported for previous Middle Eastern cohorts using NGS (6,7). There are several possible explanations for this. First, by recruiting families with multiple affected individuals, we select for genetic disease vs conditions that may appear genetic but could be environmental, or early post-zygotic (27,28). Second, large pedigrees facilitate enrollment of siblings and parents, reducing the variant search space significantly to home in on causative candidates. Finally, while international consortia have a very large number of ethnicities in their studies, we have a relatively homogenous population, which was aided by relatively homogenous population-specific controls, providing a notable advantage in variant interpretation.
One of the main findings of this study is the significant allelic heterogeneity underlying Mendelian disease in the Middle East (23 families with newly identified deleterious variants in known disease genes). Supporting the pathogenicity of these variants is their rarity in both public and population-matched databases, their predicted alteration of highly conserved residues, their predicted pathogenicity scores, predicted impact on protein function, and their segregation patterns within each family. Given the long-term genetic isolation of the Qatari population from the rest of the world (29), it is very likely that many similarly pathogenic novel alleles will exist in the Arab gene pool, absent from public databases. In a recent study, we show that up to 14% of 20m+ variants vs GRCh37 have become major alleles (MAF > 50%) in 1376 Qataris (11), and another ~6% of all variants are completely novel to Qataris. Hundreds of these are predicted pathogenic, leaving the population at elevated recessive disease burden risk if these alleles meet in consanguineous unions. A more systematic effort of identifying and sequencing patients in the larger healthcare system will be required to truly assess the impact of this elevated rare disease burden on the population. Moreover, new discoveries can be implemented within future newborn and premarital screening programs, allowing public health efforts to prioritize specific conditions for pre-emptive treatment and targeted care.
In addition to identifying candidate genetic causes of monogenic conditions in 90% of families studied in Qatar, this study presents several important considerations for the utility of WES in understudied global populations where consanguinity may be prevalent. First, we show that while increased sequencing depth can return more high quality variants and ensure that more exons are deeply covered, that effect appears to plateau at ~75×, which could cut costs for future sequencing studies. Second, we demonstrate the significant discriminatory power when adding siblings and parents to NGS analysis of both de novo and recessive variants. This feature may not be possible in parts of the world where nuclear families are small, but where available, sibling enrollment should be encouraged given the increase in efficiency prioritizing causative variants. Third, even in the absence of first degree relatives, we show just even a modest number of population-specific control alleles can significantly reduce the number of candidate variants that remain after filtering against databases containing a number of alleles several orders of magnitude greater. Finally, we believe that this approach can be readily adopted by other nationwide efforts attempting to uncover population-specific burden for Mendelian disease and could yield significant headway in discovering novel genes and alleles relevant to global health.
Materials and Methods
Subject recruitment
Thirty families were assessed (Figs 1 and 2) comprising 130 individuals of which 48 were affected. Mendelian phenotypes fitting the inclusion criteria of suspected monogenic origin were identified by referral from 11 clinical departments and invited to participate in this study. The range of phenotypes collected varied from isolated disorders to complex syndromes. Available members of each family were enrolled for sequencing with clinical protocols and informed consent approved by the institutional review board at Weill-Cornell Medical College in Qatar and Hamad Medical Corporation.
Whole exome sequencing
On average, all individuals were sequenced to a mean depth of 90.1× using the Illumina HiSeq 2000 system (100 bp PE) with Agilent SureSelect V5 Exome kits. The individual with the least coverage had >98% and >83% of coding bases covered by at least 4 and 20 unique reads, respectively (see sequencing metrics in Supplementary Material, Table S1). Variant files were prepared for each individual separately using the BWA/GATK best practices workflow (30) and then for each family together, and annotated using SeattleSeq v138 (31). Variation in an additional 1376 Qatari controls (11) and in gnomAD (32) and additional pathogenicity scores were added separately to the variant files using custom scripts. For all families, possible inheritance patterns considered included recessive, dominant, de novo, compound heterozygous and X-linked where applicable. All details of variant discovery pipelines, including variant quality control and Sanger validation are as described in Fakhro et al. (33).
Variant annotation/prioritization
Each sequenced proband had an average of 47 000 high quality exome variants. Conservation score filters were employed to identify variants predicted damaging and affecting highly conserved residues using GERP, PhastCONS, Sift, PolyPhen and CADD as described previously (33). Intergenic and intronic variants (except those predicted to affect splicing) were also removed. Together, these filters eliminated over three quarters of the variants. MAF filters were employed from publically available databases (dbSNP, ExAC, gnomAD, 1000Genomes Project and the NHLBI ESP) to keep only rare variants (<1% frequency for recessive alleles and <0.1% frequency for dominant alleles). In order to further eliminate common population specific polymorphisms, we assessed allele frequency of all variants in more than 2752 chromosomes from our internal database of Qatari controls (n = 1376 individuals) comprising adults (age 30+) without rare disease, whose four grandparents were born in Qatar and whose exomes were sequenced on the same platform (11). Candidate variants surviving filtration were annotated using available literature from human and mouse studies, and concordance between phenotypes of known disease genes and clinical presentation in patients was evaluated by certified medical geneticists.
Acknowledgements
We thank the WCM-Q Genomics Core. We also thank N. Mohamed for the help in preparing this manuscript.
Conflict of Interest statement. None declared.
Funding
The Qatar Foundation and the Weill Cornell Medical College in Qatar; and the Qatar National Research Foundation (NPRP 09-741-3-193, NPRP 5-436-3-116, NPRP 7-1425-3-370 and NPRP P8-1913-3-396).