Abstract

Fanconi anemia (FA) is a human recessive genetic disease resulting from inactivating mutations in any of 16 FANC (Fanconi) genes. Individuals with FA are at high risk of developmental abnormalities, early bone marrow failure and leukemia. These are followed in the second and subsequent decades by a very high risk of carcinomas of the head and neck and anogenital region, and a small continuing risk of leukemia. In order to characterize base pair-level disease-associated (DA) and population genetic variation in FANC genes and the segregation of this variation in the human population, we identified 2948 unique FANC gene variants including 493 FA DA variants across 57 240 potential base pair variation sites in the 16 FANC genes. We then analyzed the segregation of this variation in the 7578 subjects included in the Exome Sequencing Project (ESP) and the 1000 Genomes Project (1KGP). There was a remarkably high frequency of FA DA variants in ESP/1KGP subjects: at least 1 FA DA variant was identified in 78.5% (5950 of 7578) individuals included in these two studies. Six widely used functional prediction algorithms correctly identified only a third of the known, DA FANC missense variants. We also identified FA DA variants that may be good candidates for different types of mutation-specific therapies. Our results demonstrate the power of direct DNA sequencing to detect, estimate the frequency of and follow the segregation of deleterious genetic variation in human populations.

INTRODUCTION

Fanconi anemia (FA) is an uncommon recessive genetic disease that results from loss of function of any of the 16 genes belonging to the FANC gene family. FA is characterized by an elevated risk of developmental abnormalities, progressive anemia leading to bone marrow failure and an elevated risk of leukemia in the first decade of life (1–4). Bone marrow transplantation can provide an effective cure for marrow failure in FA patients, but does not address associated developmental defects or attenuate the risk of developing carcinomas of the head and neck and anogenital region or the small continuing risk of leukemia (3–9). The proteins encoded by the 16 FANC genes function together in a common DNA damage recognition and repair pathway with other proteins (10–12) (Fig. 1). Of the 16 FANC proteins, 6 were previously identified as part of different DNA damage repair pathways that may interact with or complement the FA pathway: FANCD1/BRCA2, FANCJ/BRIP1, FANCN/PALB2, FANCO/RAD51C, FANCP/SLX4 and FANCQ/XPF/ERCC4 (10–12).

Figure 1.

Model of the Fanconi functional pathway and protein components. The 16 proteins encoded by FANC genes for a DNA damage response network in which DNA damage activates signaling kinases in the DNA damage response network to phosphorylate multiple FA ‘core complex’ and the D2/I proteins. Activated FANCL is the E3 ligase that monoubiquitylates FANCD2/I to recruit and target downstream components including nucleases to DNA damage sites and stalled replication forks. FAPs, additional FANC-associated proteins; P, phosphate; Ub, monoubiquitin. Note. The PTMs shown are illustrative, not an exhaustive list. See (10–12) for additional detail.

Figure 1.

Model of the Fanconi functional pathway and protein components. The 16 proteins encoded by FANC genes for a DNA damage response network in which DNA damage activates signaling kinases in the DNA damage response network to phosphorylate multiple FA ‘core complex’ and the D2/I proteins. Activated FANCL is the E3 ligase that monoubiquitylates FANCD2/I to recruit and target downstream components including nucleases to DNA damage sites and stalled replication forks. FAPs, additional FANC-associated proteins; P, phosphate; Ub, monoubiquitin. Note. The PTMs shown are illustrative, not an exhaustive list. See (10–12) for additional detail.

We took advantage of the molecular characterization of FANC variation in FA patients, together with genomic sequencing of the FANC genes as part of the 1000 Genomes Project (1KGP) (13) and the Exome Sequencing Project (ESP) (14,15) to systematically compare all base pair-level, disease-associated (DA) FANC gene variation with variation identified in the 7578 individuals included in these sequencing projects. Our goals were to determine how the spectrum (types and sites) of FANC gene variation differed between individuals affected with FA versus population-derived variation data; to identify new, potentially deleterious FANC variants in population sequencing data; to quantify how well functional prediction algorithms correctly identified known, DA FANC missense variants and to provide an estimate of the carrier frequency of deleterious FANC variants by direct variant counting in population data. A subsidiary goal was to identify subsets of FA DA variants that may be good candidates for different types of mutation-specific therapies.

RESULTS

Molecular spectrum of FANC variation

We identified 2948 unique small base pair-level variants among the 16 FANC genes. These included 493 DA or disease-causative (DC) variants listed in the Fanconi Anemia Mutation Database (FAMutdB), and 2508 genomic variants from ESP and 1KGP data (Table 1). We compiled and maintained two separate databases, one containing only unique variants, and a second that included data on variant frequency. For the following analyses we used only unique variants. We identified 1753 additional unique, base pair-level variants in FANCD1/BRCA2 using data in the Breast Cancer Information Core (BIC) (16). These variants showed extensive overlap in other mutation databases, and a majority had no defined association with FA. Thus, we excluded BIC-derived variants from subsequent analyses apart from one comparison mentioned below. BIC data were accessed on 30 May 2013.

Table 1.

FANC gene variants by type in FAMutdB and ESP/1KGP sequencing data

Genea FAMutdBb (n = 493)
 
ESP/1KGPb (n = 2508)
 
1 bp subs ±1 bp indel >1 bp simple >1 bp complex 1 bp subs ±1 bp indel >1 bp simple >1 bp complex 
FANCL 51 
FANCD2 19 169 
FANCE 16 64 
FANCG 18 11 61 
FANCC 27 73 
FANCF 56 
BRCA2 17 373 10 
FANCM 249 
FANCI 14 175 
FANCP 359 
ERCC4 134 
PALB2 153 
FANCA 154 41 59 292 
RAD51C 43 
BRIP1 127 
FANCB 64 
Total (%) 295 (59.8%) 86 (17.4%) 106 (21.5%) 6 (1.2%) 2443 (97.4%) 32 (1.3%) 33 (1.3%) 0 (0) 
Genea FAMutdBb (n = 493)
 
ESP/1KGPb (n = 2508)
 
1 bp subs ±1 bp indel >1 bp simple >1 bp complex 1 bp subs ±1 bp indel >1 bp simple >1 bp complex 
FANCL 51 
FANCD2 19 169 
FANCE 16 64 
FANCG 18 11 61 
FANCC 27 73 
FANCF 56 
BRCA2 17 373 10 
FANCM 249 
FANCI 14 175 
FANCP 359 
ERCC4 134 
PALB2 153 
FANCA 154 41 59 292 
RAD51C 43 
BRIP1 127 
FANCB 64 
Total (%) 295 (59.8%) 86 (17.4%) 106 (21.5%) 6 (1.2%) 2443 (97.4%) 32 (1.3%) 33 (1.3%) 0 (0) 

aFANC genes are ordered according to their locations in the human genome beginning with chromosome 1. Several have well-known common names, e.g. BRCA2=FANCD1; ERCC4=XPF=FANCP; PALB2=FANCN; RAD51C=FANCO and BRIP1=BACH1=FANCJ.

b DA genetic variation was downloaded from the FAMutdB (http://www.rockefeller.edu/fanconi/). Human genomic variation data were obtained from the 1KGP (http://www.1000genomes.org) and ESP data archives. Single base substitutions in ESP were identified in Fu et al. (9), whereas insertion/deletion and >1 bp variant data were downloaded from the Exome Variant Server (http://evs.gs.washington.edu/EVS/). The full data set detailing the above variants by FANC gene and source is available upon request.

The molecular type distribution of FANC variants was heavily biased in favor of base substitutions: 2689 (91.2%) single base substitutions were observed, together with 114 (3.9%) single base insertions/deletions, 139 (4.7%) simple>1 bp variants and 6 (0.2%) complex >1 bp variants (Table 1). In ESP/1KGP data, we found 97.4% of all FANC variants were single base substitutions. In contrast, 59.8% of variants in the DA FAMutdB were base substitutions. This skewing likely reflects the strong enrichment for variants in the FAMutdB data that lead to loss of function. The molecular type distribution of variation in the X-linked FANCB gene did not differ from the other 15 autosomal FANC genes (Fisher's exact test, P = 0.20 and 0.82 for data from FAMutdB and ESP/1KGP, respectively). Figure 2 provides a visual representation of the broad distribution of base pair-level variation by type across the open reading frames of the three FANC genes mutated in a majority of FA patients: FANCA, FANCC and FANCG.

Figure 2.

Landscape of base pair-level genetic variation in FANCA/C/G. (A) Biallelic deleterious variants in FANCA, FANCC or FANCG have been identified in 67% of the FA patients included in FAMutdB data (see Table 1). The open reading frame of each gene is represented by the central rectangle, where vertical lines indicate exon boundaries. Amino acid residue and nucleotide counts for open reading frames are indicated, respectively, above and below the horizontal line. The locations and molecular variant type of DA (filled symbols) and ESP/1KGP sequencing project genomic variants (open symbols) are indicated by circles (base substitutions), upright or inverted triangles (indels) or diamonds (for more complex mutations; see text for additional detail). (B) Variant landscape key for symbols indicating different molecular types of FA DA variation (filled symbols) and ESP/1KGP genomic variants by molecular type (open symbols).

Figure 2.

Landscape of base pair-level genetic variation in FANCA/C/G. (A) Biallelic deleterious variants in FANCA, FANCC or FANCG have been identified in 67% of the FA patients included in FAMutdB data (see Table 1). The open reading frame of each gene is represented by the central rectangle, where vertical lines indicate exon boundaries. Amino acid residue and nucleotide counts for open reading frames are indicated, respectively, above and below the horizontal line. The locations and molecular variant type of DA (filled symbols) and ESP/1KGP sequencing project genomic variants (open symbols) are indicated by circles (base substitutions), upright or inverted triangles (indels) or diamonds (for more complex mutations; see text for additional detail). (B) Variant landscape key for symbols indicating different molecular types of FA DA variation (filled symbols) and ESP/1KGP genomic variants by molecular type (open symbols).

We compared variants in FANCD1/BRCA2 using data from the FAMutdB and from the BIC in order to determine whether there were distinguishing molecular characteristics of variation that was FA-associated as opposed to breast- or ovarian cancer-associated. Inherited deleterious variants in both copies of the FANCD1/BRCA2 gene give rise to a strongly penetrant early-onset FA phenotype dominated by features of FA in conjunction with a high risk of leukemia, brain tumors (most often medulloblastoma) and Wilms’ tumor. This is in contrast to single inherited deleterious FANCD1/BRCA2 variants that confer a strongly elevated lifetime risk of breast and ovarian cancer (17–22). FA-associated FANCD1/BRCA2 gene variants were distributed similarly to those found in the BIC and in the same proportions, with a small difference in the fraction falling in exon 11 in FA patients (40%, n = 14) versus ESP/1K individuals (50%, n = 196). Of note, only two variants found in the FAMutdB were not already present in BIC data (results not shown). Thus, FA-associated FANCD1/BRCA2 variants did not differ significantly in location, type or predicted severity from those identified in breast/ovarian cancer patients. Rather, it is the inheritance of two deleterious FANCD1/BRCA2 alleles, which confers a strongly penetrant, early FA phenotype.

Mutability and mutation hotspot analysis

FANCA, FANCC and FANCG are the most commonly mutated genes in FA patients, accounting for 67.7% of unique variants contained in the FAMutdB. These three genes constitute 14.6% of the total open reading frame and splice junction sequences that define the 16 FANC genes. We thus asked whether these genes were disproportionately mutable or harbored mutation hotspots. In order to do this we used a recently developed probabilistic model (23) to confirm an unusually high proportion of FA DA variants concentrated in these three genes (χ2-test, P = 1.05 × 10−160 for FANCA, P = 2.43 × 10−11 for FANCC and P = 3.73 × 10−17 for FANCG) (Fig. 3A) despite their containing only 17.5% of genomic variation among the FANC genes in a combination of ESP and 1KGP data. Thus despite their importance in FA, FANCA, FANCC and FANCG do not appear to be intrinsically more mutation-prone than the other 13 FANC genes (Fig. 3B).

Figure 3.

FANC gene mutation hotspot, gene and type-specific analyses. (A) Mutation hotspots containing significantly more mutations than predicted were identified in all 16 FANC genes using ESP/1KGP variant data and a mutation hotspot window of 5 bp (see text for additional detail). We identified 858 potential mutation hotspots using genomic data for each gene, listed left to right along a horizontal axis in their chromosomal location order and color-coded. The footnote in Table 1 cross-correlates FANC gene designations with alternative gene names. The location of each hotspot (identified by a χ2-test P < 0.05 threshold depicted by the dotted line) is indicated by gene-specific, color-coded dots with hotspot significance indicated by the –log10 (P) value. FA DA variants in each hotspot window were then plotted below the horizontal line with the number of different variants/hotspot indicated. No significant difference was observed for FA DA variants located in or outside of mutation hotspots (Mann–Whitney rank sum test P = 0.44). (B) The comparison of the observed variants in ESP/1KGP and FAMutdB with the number expected for each gene, respectively, by the χ2-test. Significant results (P < 0.001) are marked by an asterisk (‘*’), with significantly overrepresented results in red and underrepresented in blue. FANC genes are again ordered according to their locations in the human genome. (C) Comparison of the number of observed and expected variants in ESP/1KGP (above) and FAMutdB variants (below the horizontal axis) as a function of predicted functional consequence (i.e. synonymous, missense, nonsense, splice and frameshift). The χ2-test was used to compare the observed number with the expected number. Significant results (P < 0.001) are marked by an asterisk (‘*’), with significantly overrepresented results in red and underrepresented in blue.

Figure 3.

FANC gene mutation hotspot, gene and type-specific analyses. (A) Mutation hotspots containing significantly more mutations than predicted were identified in all 16 FANC genes using ESP/1KGP variant data and a mutation hotspot window of 5 bp (see text for additional detail). We identified 858 potential mutation hotspots using genomic data for each gene, listed left to right along a horizontal axis in their chromosomal location order and color-coded. The footnote in Table 1 cross-correlates FANC gene designations with alternative gene names. The location of each hotspot (identified by a χ2-test P < 0.05 threshold depicted by the dotted line) is indicated by gene-specific, color-coded dots with hotspot significance indicated by the –log10 (P) value. FA DA variants in each hotspot window were then plotted below the horizontal line with the number of different variants/hotspot indicated. No significant difference was observed for FA DA variants located in or outside of mutation hotspots (Mann–Whitney rank sum test P = 0.44). (B) The comparison of the observed variants in ESP/1KGP and FAMutdB with the number expected for each gene, respectively, by the χ2-test. Significant results (P < 0.001) are marked by an asterisk (‘*’), with significantly overrepresented results in red and underrepresented in blue. FANC genes are again ordered according to their locations in the human genome. (C) Comparison of the number of observed and expected variants in ESP/1KGP (above) and FAMutdB variants (below the horizontal axis) as a function of predicted functional consequence (i.e. synonymous, missense, nonsense, splice and frameshift). The χ2-test was used to compare the observed number with the expected number. Significant results (P < 0.001) are marked by an asterisk (‘*’), with significantly overrepresented results in red and underrepresented in blue.

We also searched for mutation hotspots in the FANC genes, beginning with genomic data and using a probabilistic model. We defined mutation hotspots using a 5bp window to identify gene regions that were significantly enriched for variation compared with the expected (χ2-test, P < 0.05). The rationale for using a 5 bp window was that windows <5 bp might exclude several types of known hotspots consisting of mononucleotide runs or repeats, whereas much larger windows of, e.g. 20 or 50 bp, might reduce the sensitivity of hotspot detection by averaging signal over a larger number of nucleotides. A total of 858 putative mutation hotspots were identified by these criteria in ESP/1KGP data (Fig. 3A). Nearly half (46.2%) of population variants were located in these mutation hotspots despite hotspots representing only 7.4% of the total mutable sequence, a hotspot enrichment for variation of 6.24-fold.

In order to gain additional insight into the mechanistic origins of FANC variation, we determined whether sequence features of hotspot regions might be promoting variation or the generation of specific molecular types of variant. This analysis was performed by looking for the overrepresentation of known mutable sequences including tandem repeats (identified by Tandem Repeat Finder) and methylated cytosine-guanine (CpG) dinucleotides (identified across 82 cell types from ENCODE) within or bridging hotspot boundaries. CpG sites were slightly enriched in mutation hotspots using a hotspot window of 5 bp, compared with non-hotspots [Fisher's exact test, P = 0.09, odds ratio (OR) 1.562 (0.952–2.562)]. In contrast, methylated CpG sites were significantly enriched in mutation hotspots [Fisher's exact test, P = 0.0009, OR = 2.945 (1.639–5.292)]. When we defined mutation hotspots by using a window of size 10 or 20 bp, methylated CpG sites were hotspot-enriched, though not significantly [Fisher's exact test, P > 0.05, OR = 1.649 (0.709–3.835) for 10 bp hotspots, and OR = 1.321 (0.472–3.694) for 20 bp hotspots]. We observed no significant association of hotspots with tandem repeats using hotspot windows of 5, 10 or 20 bp (Fisher's exact test, P > 0.05). FA DA variants identified in the FAMutdB were not enriched in mutation hotspots (5 bp window) when compared with non-hotspots (Mann–Whitney test, P = 0.44). This lack of significant enrichment was also observed when using hotspot windows of 10 or 20 bp (Mann–Whitney test, P > 0.05).

Visual inspection of the DNA sequence of the 16 ‘hottest’ FANC mutation hotspots, as defined by their –log10(P) value (Fig. 3A), was also done to search for additional clues to mutation mechanism. These hotspots were located in seven different FANC genes and contained 56 different variants. All hotspot-associated variants were in an 11 bp window that contained the initially identified 5 bp hotspot sequence and flanking DNA defined by the 5′-most and 3′-most mutated base pairs across all 16 hotspots. Hotspot variants were predominantly (93%) base substitutions, with a strong bias in favor of transitions (39 transitions versus 13 transversions). We observed 2 single base deletions at these 16 hotspots, together with 1 and 10 bp insertions. The only conspicuous DNA sequence motif found in 4 of the hotspot-associated 11 bp windows were mononucleotide runs of 4, 5 or 7 G:C base pairs that themselves contained from 1 to 4 variants/hotspot. Several of the hotspot base substitutions and one of the single base deletions may thus have arisen by DNA strand slippage with or without base insertion during DNA replication.

Previous studies have observed an increase in base miscalls and other errors in GC-rich regions sequenced using next-generation sequencing (NGS) technologies (24). Thus we determined whether 91 reported motifs that induce NGS sequencing errors (25) were present or enriched in our hotspot-associated 11 bp windows. However, none of these 91 motifs was identified in any of our 16 hottest mutation hotspot sequences. There was a modest enrichment of error-inducing sequence motifs across all mutation hotspots [Fisher's exact test, P = 0.03, OR = 1.87 (1.08–3.23)]. Thus variations in ESP/1K data that fell in our 16 hottest mutation hotspots are unlikely to be sequencing miscalls. Of note, the subset of FA-associated variants identified in these hotspots were originally identified and confirmed by capillary, as opposed to NGS, sequencing.

Predicted variant functional consequences

The distribution of variants by predicted consequence in ESP/1KGP data was 66.0% missense and 29.1% synonymous amino acid substitutions, together with 4.9% for three other functional categories (nonsense, splice-altering changes and insertion/deletions causing a frameshift with premature chain termination). The corresponding distribution for FA DA variation was 33.3% missense, 2.6% synonymous, 21.9% nonsense, 2.0% splice-altering and 40.2% frameshifts. Variants that truncated a FANC protein (e.g. variants leading to frameshifts and chain termination) were significantly enriched in the FAMutdB database (Fig. 3C). We found 53 (10.7%) FA-associated variants in ESP/1KGP population sequencing data: 49 in ESP, and 28 in 1KGP data. There were six nucleotide positions at which we observed different polymorphisms in both FAMutdB and in ESP/1KGP data that led to the same predicted protein changes. There were also 14 different nucleotide positions at which we observed both disease-associated FAMutdB and ESP/1KGP variants. These positions in FAMutdB data contained nine variants predicted to lead to one missense substitution, four nonsense codons leading to chain termination, and nine frameshifts leading to chain termination. In ESP/1KGP data, variation at these positions was predicted to generate 11 different missense and 3 synonymous amino acid substitutions. A closer look at insertion–deletion (indel) variants revealed that 99% caused disruption of the open reading frame either by a direct reading frame shift or by splice disruption; <1% resulted in the addition or removal of a codon while preserving the open reading frame. Thus virtually all of these FA-associated variants are predicted to disrupt FANC protein structure and/or function. Finally, the predicted consequence of variants identified in the X-linked FANCB gene did not differ from variants observed in the other 15 autosomal FANC genes (Fisher's exact test, P = 0.52 and 0.17 for data from FAMutdB and ESP/1KGP, respectively).

In order to determine what fraction of all possible FANC gene single base variants were likely to be deleterious, we first generated a list of all possible single base variants across the 16 FANC genes and then imputed the subset giving rise to 123 073 potential missense substitutions. We then determined whether these potential missense substitutions were likely to be deleterious based on a majority rule using six different, widely used algorithms including SIFT (26,27), PolyPhen2 (28), a likelihood ratio test (LRT) (29), Mutation Taster (30), GERP++ (31) and PhyloP (32). Missense substitutions were considered deleterious if called by four or more of the six different prediction algorithms. Using this approach, 26.9% (33 148 of 123 073) potential missense substitutions were annotated as deleterious. We used the same approach to annotate all of our FA DA or genomic variants leading to missense substitutions. As expected, FA-associated variants leading to missense substitutions were more frequently predicted to be highly deleterious, 43.3% (71/164) as compared with ESP/1K missense variants, 23.6% (391/1,655) (Fig. 4).

Figure 4.

Predicted deleterious calls for FA-associated, genomic and all possible missense FANC gene variants. ‘Predicted deleterious’ calls for all FA DA missense variants, all missense variants identified in ESP/1KGP data and all possible FANC missense variants were made using six different prediction algorithms and a majority rule (see text for detail). Missense variants were ‘predicted deleterious’ when called by four or more of the six metrics. Variants grouped by predicted deleteriousness that were significantly overrepresented are indicated by a plus sign (‘+’), and those significantly underrepresented by an asterisk (*) using a cutoff (P < 0.05). FA DA missense variants were significantly right-shifted, whereas ESP/1KGP and all possible missense variants were left-shifted in terms of predicted deleteriousness.

Figure 4.

Predicted deleterious calls for FA-associated, genomic and all possible missense FANC gene variants. ‘Predicted deleterious’ calls for all FA DA missense variants, all missense variants identified in ESP/1KGP data and all possible FANC missense variants were made using six different prediction algorithms and a majority rule (see text for detail). Missense variants were ‘predicted deleterious’ when called by four or more of the six metrics. Variants grouped by predicted deleteriousness that were significantly overrepresented are indicated by a plus sign (‘+’), and those significantly underrepresented by an asterisk (*) using a cutoff (P < 0.05). FA DA missense variants were significantly right-shifted, whereas ESP/1KGP and all possible missense variants were left-shifted in terms of predicted deleteriousness.

Potential for mutation-specific functional rescue

A subset of FA DA variants might be candidates for therapeutic exon skipping, or stop codon read-through to restore the open reading frame (33–39). In order to assess mutation-specific therapeutic potential, we determined the proportion of ‘skippable’ exons in the 16 FANC genes and the number of chain-terminating nonsense codons that were well positioned to facilitate premature termination codon (PTC) suppression or ‘read-through’. The 16 FANC genes collectively consist of 307 exons, 104 of which (34%) could be skipped without disrupting the downstream open reading frame. These individually ‘skippable’ FANC exons collectively contained 32% of all reported FA DA variants. We identified 2702 base substitutions in FAMutdB and ESP/1KGP data, of which 144 (5.3%) created a new nonsense codon within a FANC gene open reading frame. Among FA DA variants, 108 base substitutions (21.9%) created a new nonsense codon. Ten of these (9.3% of all DA variants) were located within 50 nucleotides of the terminal intron–exon junction and thus might be good candidates for PTC suppression or read-through (36–39).

Population-based variant and pathway analysis

In order to determine the prevalence of potential disease-causing FANC variants in data from the ESP/1KGP projects, we used two sets of FA-associated variants: all FA DA variants in the FAMutdB except for a single, high frequency polymorphic variant; and a more stringently defined variant subset with a very high likelihood of being deleterious by virtue of generating null alleles (disease-causing or DC variants). Among 493 variants identified in the FAMutdB, 387 were predicted to be disease-causing (78.5%). Of note, among the clinically ascertained DA variants also identified in ESP/1KGP data, only 39.6% (21/53) were predicted to be putatively DC. In order to examine the segregation of these DA and DC FANC variants in ESP and 1KGP data, we determined the number of individuals who carried zero, one or more than one DA or DC variant in one or more FANC genes. We also searched for potential FA patients, defined by their carrying two deleterious variants in a single FANC gene. The proportion of ESP/1KGP individuals carrying single DA alleles was remarkably high: 78.5% (5950/7578; Table 2A). These frequency estimates were 18.4-fold higher than the frequency of individuals carrying the more restrictive subset of FA DC variants.

Table 2.

Distribution of FA disease variants across the 16 FANC genes in ESP/1KGP populations

Variants/genesa ESP_AA
 
ESP_EA
 
1KGP_AFR 1KGP_EUR 1KGP_ASI 1KGP_AME 
Obs. Exp. P Obs. Exp. P Obs. Exp. P Obs. Exp. P Obs. Exp. P Obs. Exp. P 
(A) Individuals who carry at least one derived allele DA or DC variant 
 DA variantsa 
  0,0 209 211.3 0.87 1185 1192.3 0.80 13 14.5 0.67 112 102.3 0.26 32 42.0 0.09 77 79.0 0.76 
  S,S 631 617.9 0.53 2141 2073.4 0.03 56 57.4 0.84 172 172.7 0.94 109 100.3 0.28 77 69.3 0.24 
  M,S 222 220.1 0.89 216 304.7 10−7 25 22.5 0.58 27 22.7 0.36 25 30.7 0.28 7.7 0.80 
  M,M 1155 1167.7 0.59 756 727.6 0.25 137 136.6 0.95 68 81.3 0.10 113 106.0 0.39 13 18.0 0.21 
 DC variants 
  0,0 2019 2024.1 0.70 4208 4207.3 0.94 213 213.5 0.90 364 364.3 0.94 279 279.0 NA 171 170.9 0.96 
  S,S 198 192.7 0.69 90 89.8 0.99 18 17.5 0.90 15 14.5 0.90 0.0 NA 3.1 0.96 
  M,S 0.0 NA 0.0 NA 0.0 NA 0.0 0.85 0.0 NA 0.0 NA 
  M,M 0.2 0.64 0.9 0.35 0.0 NA 0.2 0.70 0.0 NA 0.0 NA 
(B) Individuals who carry homozygous derived allele DA or DC variants 
 DA variantsa 
  0,0 1821 1800.6 0.27 3782 3776.2 0.79 168 165.5 0.72 338 336.3 0.78 245 244.5 0.93 164 164.2 0.96 
  S,S 357 392.0 0.05 512 516.2 0.84 56 60.3 0.52 38 41.7 0.54 33 33.1 0.98 10 9.8 0.96 
  M,S 18 7.3 8 × 10−5 2.6 0.72 1.4 1 × 10−4 0.1 2 × 10−26 0.6 0.43 0.0 NA 
  M,M 21 17.1 0.34 3.0 0.56 3.8 0.15 0.9 0.34 0.7 0.72 0.0 NA 
 DC variants 
  0,0 2214 2214.8 0.60 4298 4298.0 NA 231 231.0 NA 379 379.0 NA 279 279.0 NA 174 174.0 NA 
  S,S 2.2 0.60 0.0 NA 0.0 NA 0.0 NA 0.0 NA 0.0 NA 
  M,S 0.0 NA 0.0 NA 0.0 NA 0.0 NA 0.0 NA 0.0 NA 
  M,M 0.0 NA 0.0 NA 0.0 NA 0.0 NA 0.0 NA 0.0 NA 
Variants/genesa ESP_AA
 
ESP_EA
 
1KGP_AFR 1KGP_EUR 1KGP_ASI 1KGP_AME 
Obs. Exp. P Obs. Exp. P Obs. Exp. P Obs. Exp. P Obs. Exp. P Obs. Exp. P 
(A) Individuals who carry at least one derived allele DA or DC variant 
 DA variantsa 
  0,0 209 211.3 0.87 1185 1192.3 0.80 13 14.5 0.67 112 102.3 0.26 32 42.0 0.09 77 79.0 0.76 
  S,S 631 617.9 0.53 2141 2073.4 0.03 56 57.4 0.84 172 172.7 0.94 109 100.3 0.28 77 69.3 0.24 
  M,S 222 220.1 0.89 216 304.7 10−7 25 22.5 0.58 27 22.7 0.36 25 30.7 0.28 7.7 0.80 
  M,M 1155 1167.7 0.59 756 727.6 0.25 137 136.6 0.95 68 81.3 0.10 113 106.0 0.39 13 18.0 0.21 
 DC variants 
  0,0 2019 2024.1 0.70 4208 4207.3 0.94 213 213.5 0.90 364 364.3 0.94 279 279.0 NA 171 170.9 0.96 
  S,S 198 192.7 0.69 90 89.8 0.99 18 17.5 0.90 15 14.5 0.90 0.0 NA 3.1 0.96 
  M,S 0.0 NA 0.0 NA 0.0 NA 0.0 0.85 0.0 NA 0.0 NA 
  M,M 0.2 0.64 0.9 0.35 0.0 NA 0.2 0.70 0.0 NA 0.0 NA 
(B) Individuals who carry homozygous derived allele DA or DC variants 
 DA variantsa 
  0,0 1821 1800.6 0.27 3782 3776.2 0.79 168 165.5 0.72 338 336.3 0.78 245 244.5 0.93 164 164.2 0.96 
  S,S 357 392.0 0.05 512 516.2 0.84 56 60.3 0.52 38 41.7 0.54 33 33.1 0.98 10 9.8 0.96 
  M,S 18 7.3 8 × 10−5 2.6 0.72 1.4 1 × 10−4 0.1 2 × 10−26 0.6 0.43 0.0 NA 
  M,M 21 17.1 0.34 3.0 0.56 3.8 0.15 0.9 0.34 0.7 0.72 0.0 NA 
 DC variants 
  0,0 2214 2214.8 0.60 4298 4298.0 NA 231 231.0 NA 379 379.0 NA 279 279.0 NA 174 174.0 NA 
  S,S 2.2 0.60 0.0 NA 0.0 NA 0.0 NA 0.0 NA 0.0 NA 
  M,S 0.0 NA 0.0 NA 0.0 NA 0.0 NA 0.0 NA 0.0 NA 
  M,M 0.0 NA 0.0 NA 0.0 NA 0.0 NA 0.0 NA 0.0 NA 

The observed (Obs.) and expected (Exp.) number of individuals who carry none, one or more than one DA or DC variant were categorized into four groups: (0,0), (S, S), (M,S) and (M,M) individuals carried, respectively, no variants in any FANC gene; 1 variant in one gene; >1 variant in one gene or >1 variant in multiple genes, respectively. The observed distribution was compared with the expected one from 10 000 simulations, with significantly marked in bold (P < 0.05).

aThis analysis was conducted using 52 DA variants excluding one major derived allele, or a more restrictive set of 21 DC variants (see text for additional detail).

In ESP data, 8.9% of African Americans carried one DC allele. We also identified three potential FA patients who carried the same FANCC variant allele in homozygous form at position chr9: 97 934 359 (Table 2B). This variant had a derived allele frequency (DAF) of 0.045 in African American ESP data and was observed in heterozygous form in European American ESP data (two occurrences, DAF = 0.0002), in African 1KGP data (18 occurrences, DAF = 0.036) and once in American 1KGP data (DAF = 0.003). Among European Americans in ESP data, 2.1% carried one DC allele, although no homozygous individual was identified (Table 2). Among 1KGP data, seven DC variants were identified in 36 individuals: 18 African-ancestry individuals, 15 European-ancestry individuals and 3 American-ancestry individuals; no homozygous potential FA patients were identified. In aggregate, 4.3% (324 of 7578) of all individuals represented in the ESP/1KGP populations carried at least one DC allele, and we identified three potential FA patients in the combined ESP/1KGP population, a frequency of 4 × 10−4 (3 of 7578). These estimates of the frequency of individuals carrying ≥1 FA DA or DC variant are substantially higher than previous estimates of carrier frequency that have ranged from 0.55 to 1.08%, depending upon the reference population and method used for estimation (2,40).

We also asked how prevalent partial loss of FA pathway function might be by searching for DA or DC alleles in two or more FANC genes in single individuals. Among ESP/1KGP individuals, 29.6% (2242/7578) carried a DA variant in >1 FANC gene, though none carried ≥2 DC variants in different FANC genes (Table 2). These predicted frequencies did not differ from expectation (Table 2). Thus partial loss of FA pathway function (Fig. 1), driven by deleterious variation in ≥2 FANC genes, does not appear to be subject to negative selection.

DISCUSSION

This analysis focused on characterizing base pair-level genetic variation in the 16 genes of the FA gene family (FANC). We used the well-documented, clinically ascertained collection of 493 unique FANC variants from the FAMutdB (41), together with genetic variation data in the 16 FANC genes identified in 7578 individuals in the ESP and 1KGP genomic sequencing projects (13,15,42). These sources together included 493 variants ascertained in FA patients or family members, and 2508 variants identified in individuals included in the ESP and 1KGP genomic sequencing projects. Insertions and deletions leading to frameshifts predominated in FA patients, whereas single base substitutions represented the majority of variants (97.4%) in ESP/1KGP data (Table 1). Several of the FANC genes have a substantial proportion of larger variants including deletions and duplications identified in FA patients (43,44). It should be possible to systematically compare the FA DA large-scale variation with population variation data once larger numbers of these large scale DA and population variants are defined at the molecular level (45).

We found 10.7% of clinically ascertained FA DA variants in population sequencing data (49 variants in ESP, and 28 variants in 1KGP data), together with a surprisingly high proportion of ESP/1KGP individuals who carried a known FA DA variant (78.5%, or 5950/7578; Table 2A). This is a substantially higher frequency for deleterious genetic variation in the FANC gene family that suggested by previous estimates of carrier frequency developed from FA disease prevalence data (2,40). This seeming discrepancy is likely accounted for by our use of a sequence-based approach, which enumerated variation across all 16 FANC genes and included all documented base pair-level DA variants. Neither of these variables is typically fully accounted for in carrier frequency estimates based on disease prevalence. Thus our results are in accord with and extend recent analyses showing that individuals may carry 100s of potentially deleterious variants (46,47). No readily identifiable heterozygote phenotype has been identified for DA FANC variants (2–4,7) (however, see (48) for one intriguing clue). The high proportion of individuals in ESP/1K data that carry multiple DA FANC variants in ≥2 FANC genes raises the possibility of pathway functional haploinsufficiency. This might be clinically penetrant in specific genetic backgrounds or in response to unusual environmental exposure to, e.g. aldehydes or DNA cross-linking drugs or agents.

Despite the high carrier frequency for DA FANC variants, we identified only three potential FA patients in ESP/1KGP populations or a frequency of 4 × 10−4 (3 of 7578). This is likely to underestimate of the frequency of potential FA patients, as we used only the most stringently defined set of FA DC variants and our genomic data were not haplotype resolved. The FA clinical phenotype shows a wide range of penetrance, with an increasing number of individuals being recognized to have FA only in adulthood on the basis of hematologic complications or neoplasia (2–4). These ‘atypical’ FA patients may be of high interest to examine further to identify additional genetic or environmental modifiers of FA disease penetrance.

When we determined how well functional prediction algorithms correctly identified known, FA DA missense variants, even a consensus approach using six widely employed functional prediction algorithms correctly identified fewer than half of FA DA variants. Fully one-third of FANC DA variants had low or no ‘predicted deleterious’ scores (scores of 0–2; Fig. 4). These results emphasize the continuing need to couple computational and experimental approaches to determine the functional consequences of potential DA variants and variants of unknown functional significance (VUSs).

Our analysis identified a large subset of FA-associated variants that are good candidates for mutation-specific therapies. Fully a third (104, or 34%) of all FANC exons are potentially ‘skippable’, and these exons contain 32% of all FA DA variants. Exon skipping as a therapeutic strategy has been most extensively explored in Duchenne muscular dystrophy (33–35). Among DA variants, 108 (21.9%) create new nonsense codons. Ten of these (9.3%) are within 50 bp of the terminal intron–exon junction, and thus good candidates for PTC suppression or read-through therapies (36–39): efficacy depends on stop codon identity, location and sequence context (49). Two prevalent FANC DA variants may be especially good candidates for PTC therapy: both create new stop codons in the terminal exon of FANCC and one, generated by a FANCC c.1897 C>T base substitution, is present in 17% of the FANCC complementation group (or ∼2% of all) FA patients (41). FA DA missense variants might also be good candidates for functional rescue or complementation by drugs or small molecules. This strategy, which depends on the biochemical consequences of specific variants, has been successful in another recessive disease, cystic fibrosis (50–57).

In summary, we have defined base pair-level genetic variation including DA variation in FA patients and in 7578 individuals included in the ESP and 1KGP genomic sequencing projects at 57 240 potential mutation sites across all 16 FANC genes. We identified a remarkably high frequency of FA DA variants in population data despite the rarity of FA, have provided clues to the mechanistic origins of FANC genetic variation and have identified the subset of FA DA variants that may be amenable to mutation-specific therapies. Our analytical approach is general and thus can be used to gain insight into the nature and functional consequences of genetic variation in other recessive DNA damage response/repair syndromes as well as in many other single gene or multi-gene disorders.

MATERIALS AND METHODS

Variation data sources

The principal source for FANC gene DA variation was the FAMutdB (http://www.rockefeller.edu/fanconi/) maintained by Dr Arleen Auerbach and colleagues at the Rockefeller University (41). The FAMutdB catalogs FA patient-associated genetic variation, and the number of times each unique variant has been reported in the context of FA. Human genomic variation data were obtained from 1KGP (13) and ESP (14,15) data archives. 1KGP Phase I data include 1092 individuals sampled from 14 populations drawn from Europe, East Asia, sub-Saharan Africa and America. This data set aimed to obtain genetic variation data that would be broadly representative for the vast majority of individuals within a continent, though with no associated, publicly available phenotype or clinical information (13). ESP data were generated as part of the NHLBI-funded GO (grand opportunity) ESP. ESP samples were drawn from several well-phenotyped populations (i.e. 4298 European Americans and 2217 African Americans) and sequenced with the goal of discovering new genes and mechanisms contributing to heart, lung and blood disorders (15). All study participants provided written informed consent for the use of their DNA in studies aimed at identifying genetic risk variants for disease and for broad data sharing.

Additional data on the FANCD1/BRCA2 gene were downloaded from the NHGRI Breast and Ovarian Cancer Mutation Database (BIC) (16). We also included additional variation data identified by PubMed and Google Scholar searches that were not captured in the FAMutdB. The final data capture using database sources and publication searches was completed on 1 November 2013.

Molecular spectrum of FANC variation

Variants were categorized into four molecular types: single base substitutions; single base insertions or deletions; and variants that involved >1 bp and were either ‘simple’ (e.g. insertion or deletion of multiple base pairs with no other changes) or ‘complex’ (e.g. combinations of base pair changes, insertions, deletions or other rearrangements). The reference DNA sequence, open reading frame and inferred protein sequence for each FANC gene were taken from the most recent RefSeq gene, mRNA and protein accessions for each gene. The positions of amino acid residues were annotated according to the largest isoform for each gene, and genome sequences from human genome assembly GRCh37.p13 were accessed via the Locus Reference Genomic (LRG) initiative website (http://www.lrg-sequence.org/) to provide stable, curated reference sequences. Sequence variation was cataloged using Human Genome Variation Society nomenclature guidelines (58,59). All unique FA DA variants and their locations within the 16 FANC genes were compiled, together with a list of the number of individual reports of each DA variant to provide a measure of variant prevalence in FAMutdB data. An analysis of global genomic variation in the 16 FANC genes was performed by identifying and categorizing all small genetic variants identified in 1KGP and ESP data using the same reference sequences and mutation nomenclature guidelines. Our analysis focused on coding region variants with the exception of those occurring within canonical splice sites, as we reasoned that a majority of DA variation was within coding regions. We excluded large structural rearrangements such as duplications, deletions or rearrangements involving ≥1 exon from subsequent analyses, and note that when applied to FANCA, the most frequently mutated gene in FA patients, this led to the exclusion of ≤10% of the DA variants cataloged in the FAMutdB. We did not systematically analyze FANC gene copy number variants. Our final data set of analyses included 2948 unique FANC gene variants, across 57 240 potential base pair sites, in the 16 FANC human genes.

Mutation hotspot analysis

We used a previously developed probabilistic model (23) to determine whether mutation ‘hotspots’ existed in any of the FANC genes. This model incorporates an overall rate of mutation and relative locus-specific rates, based on fixed differences between humans and chimpanzees, for each gene coding and splice junction sequence. It also takes into account codon structure, and transition/transversion ratios in order to predict the relative probability of observing mutations in a specific gene region. This model was applied by taking a total of NT variants observed in FANC genes, corresponding to NT mutation events assuming an infinite alleles model (60), and randomly distributing these mutation events according to their relative probabilities to obtain the expected variant distribution in the absence of hotspots. Then we compared the observed number of variants mutations identified in ESP/1KGP data with the number expected by use of a χ2-test. We looked at hotspots defined by a small sliding window (e.g. 5 bp) within each gene, as well as for each gene as a whole, prior to examining the nature of variants at these hotspots in FAMutdB and ESP/1KGP data.

In order to determine whether there might be an underlying sequence-dependent basis for mutations or mutation hotspots, we downloaded the coordinates of simple tandem repeats identified by Tandem Repeats Finder from the UCSC Genome Browser (61,62). We determined the coordinates of CpG sites and their methylation status using ENCODE project data, where reduced representation bisulfite sequencing (RRBS) was used to quantitatively profile DNA methylation at an average of 1.2 million CpGs in each of 82 cell lines and tissues (63). We defined a CpG site as ‘methylated’ when the site was methylated in >50% of molecules sequenced on average. Fisher's exact test was then used to compare the distribution of simple tandem repeats or methylated CpG sites across mutation hotspots and non-hotspots. We also visually examined the sequences of the 16 most significant hotspots and their associated variants together with the flanking 20 bp up- and downstream of each hotspot for mechanistic clues to mutagenesis (see ‘Results’ for additional description).

Variation functional consequences

In order to assess variant functional consequences, we first predicted the consequences of all possible single base changes by determining the number and proportion of all single base changes at each base pair position that did not modify the amino acid sequence (synonymous substitutions), or that led to an amino acid substitution (missense substitutions), a stop/nonsense codon, or altered the reading frame by inserting or deleting base pairs or by leading to a splice alteration. In order to identify missense substitutions that were potentially deleterious, we used six different functional prediction methods in concert together with a majority rule (four or more of the six ‘predicting deleterious’) as in previous studies (15). The functional prediction algorithms used were SIFT (26,27), PolyPhen2 (28), a LRT (29), Mutation Taster (30), GERP++ (31) and PhyloP (32). Functional predictions were made for all FA DA variants; for all variants identified in ESP/1KGP sequencing data and for all possible single base substitutions (n = 123 073) predicted to lead to missense substitutions across all 16 FANC genes.

Potential of disease-associated variants for functional recovery

In order to determine what fraction of DA FANC variants might be candidates for functional recovery by exon skipping to restore the open reading frame, we assessed all 16 FANC genes using the FAMutdB Reading Frame Checker utility (41) to determine whether or not an exon skip would result in a reading frame shift. We then determined what fraction of FA DA variants were candidates for the skipping of a single exon to restore the protein open reading frame. For simplicity and with an eye to potential translation, we investigated only instances in which the skipping of single exons removed a DA variant while restoring the open reading frame. The potential of PTC suppression or ‘read-through’ was assessed by identifying FA DA PTCs that were <50 nucleotides upstream of the last exon–intron junction, as these PTCs are the most likely to be rescued by PTC read-through as they often escape nonsense-mediated decay (36–39,49).

A population-based, pathway-focused analysis of FANC variation

In order to determine the distribution of FA DA variants in sequencing data, we first determined the frequency of DA FA variants in the 4298 European Americans and 2217 African Americans included in ESP data and in the 1063 unrelated individuals (excluding related individuals with second order) with ancestry from four geographical regions (Europe, East Asia, sub-Saharan Africa and America) included in 1KGP data (13,15,42) . FA DA variants (DA variants) were defined by their presence in the FAMutdB and their low frequency. We excluded one derived allele in FANCA (c.2346 C>T) with a very high allele frequency (0.6) that was likely fortuitously captured in FAMutdB data simply by virtue of prevalence rather than disease association. In order to avoid the uncertainty of phasing, especially for rare variants, we used only genotyping data.

We next used DA FANC variants to identify potential FA patients in the 1KGP/ESP populations and determine the frequency of DA alleles to compare with previous estimates of carrier frequency from demographic data (40); and whether individuals with multiple variants in the same or in different FANC genes were observed at the expected frequency in the 1KGP/ESP cohorts. The last of these analyses was undertaken to determine whether DA variants in more than one FANC gene might be subject to negative selection. Two complimentary analyses were performed: one in which all DA variants defined by their presence in the FAMutdB were included; and a second, more stringent analysis that included only the subset of DA variants (here termed DC variants) that were highly deleterious: nonsense, splice-altering and frameshift-promoting variants, together with the subset of missense variants predicted to be highly deleterious using the approach described above.

Permutation testing was used to evaluate the significance of whether multiple variants of either class were more or less frequent than expected. Because only genotype information was considered here, we assumed the independence of loci during the permutation process. Briefly, we randomly resampled genotypes for individuals in each variant and summarized the number of FA DA or DC variants found in each simulated individual. After 1000 replicates, we compared the observed distribution of DA/DC variants for each individual with the simulated distribution from permutation.

AUTHORS' CONTRIBUTIONS

This study was designed by all four authors. Data compilation and analysis were done largely by K.J.R. and W.F., with input from J.M.A. and R.J.M.Jr The manuscript was written by K.J.R., W.F. and R.J.M.Jr and reviewed by all four authors.

ACKNOWLEDGEMENTS

This work was supported by a UW-HHMI Integrative Research Internship in Summer 2012, and both a UW Mary Gates Undergraduate Research Fellowship and a Herschel and Caryl Roman Undergraduate Science Scholarship award through the Department of Genome Sciences to K.J.R. and by NIH grant R01 HL120948 to R.J.M.Jr We thank Drs Arleen Auerbach, Frank Lach and Johan den Dunnen for help with FAMutdB resources, Dr Evan Eichler for a discussion of copy number variant resources and Alden F.M. Hackmann for graphics help and proofreading.

Conflict of Interest statement. None declared.

REFERENCES

1
Fanconi
G.
Familial constitutional panmyelocytopathy, Fanconi's anemia (F.A.)
Semin. Hematol.
 , 
1967
, vol. 
4
 (pg. 
233
-
241
)
2
Schroeder
T.M.
Tilgen
D.
Kruger
J.
Vogel
F.
Formal genetics of Fanconi anemia
Hum. Genet.
 , 
1976
, vol. 
32
 (pg. 
257
-
288
)
3
Auerbach
A.D.
Fanconi anemia and its diagnosis
Mutat. Res.
 , 
2009
, vol. 
668
 (pg. 
4
-
10
)
4
Neveling
K.
Endt
D.
Hoehn
H.
Schindler
D.
Genotype–phenotype correlations in Fanconi anemia
Mutat. Res.
 , 
2009
, vol. 
668
 (pg. 
73
-
91
)
5
Alter
B.P.
Cancer in Fanconi anemia, 1927–2001
Cancer
 , 
2003
, vol. 
97
 (pg. 
425
-
440
)
6
Rosenberg
P.S.
Greene
M.H.
Alter
B.P.
Cancer incidence in persons with Fanconi anemia
Blood
 , 
2003
, vol. 
101
 (pg. 
822
-
826
)
7
Kutler
D.I.
Singh
B.
Satagopan
J.
Batish
S.D.
Berwick
M.
Giampietro
P.F.
Hanenberg
H.
Auerbach
A.D.
A 20-year perspective on the International Fanconi Anemia Registry (IFAR)
Blood
 , 
2003
, vol. 
101
 (pg. 
1249
-
1256
)
8
Rosenberg
P.S.
Alter
B.P.
Ebell
W.
Cancer risks in Fanconi anemia: findings from the German Fanconi Anemia Registry
Haematologica
 , 
2008
, vol. 
93
 (pg. 
511
-
517
)
9
Alter
B.P.
Giri
N.
Savage
S.A.
Peters
J.A.
Loud
J.T.
Leathwood
L.
Carr
A.G.
Greene
M.H.
Rosenberg
P.S.
Malignancies and survival patterns in the National Cancer Institute inherited bone marrow failure syndromes cohort study
Br. J. Haematol.
 , 
2010
, vol. 
150
 (pg. 
179
-
188
)
10
Kee
Y.
D'Andrea
A.D.
Molecular pathogenesis and clinical management of Fanconi anemia
J. Clin. Invest.
 , 
2012
, vol. 
122
 (pg. 
3799
-
3806
)
11
Garaycoechea
J.I.
Patel
K.J.
Why does the bone marrow fail in Fanconi anemia?
Blood
 , 
2013
, vol. 
123
 (pg. 
26
-
34
)
12
de Winter
J.P.
Joenje
H.
The genetic and molecular basis of Fanconi anemia
Mutat. Res.
 , 
2009
, vol. 
668
 (pg. 
11
-
19
)
13
Consortium
T.G.P.
An integrated map of genetic variation from 1,092 human genomes
Nature
 , 
2012
, vol. 
491
 (pg. 
56
-
65
)
14
2014
 
NHLBI Exome Sequencing Project Exome Variant Server http://evs.gs.washington.edu/EVS/ (date last accessed, 9 April 2014)
15
Fu
W.
O/'Connor
T.D.
Jun
G.
Kang
H.M.
Abecasis
G.
Leal
S.M.
Gabriel
S.
Altshuler
D.
Shendure
J.
Nickerson
D.A.
, et al.  . 
Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants
Nature
 , 
2013
, vol. 
493
 (pg. 
216
-
220
)
16
NHGRI Breast Cancer Information Core
2014
 
BIC: The Breast Cancer Information Core: http://research.nhgri.nih.gov/bic/ (date last accessed, 7 August 2014)
17
Howlett
N.G.
Taniguchi
T.
Olson
S.
Cox
B.
Waisfisz
Q.
de Die-Smulders
C.
Persky
N.
Grompe
M.
Joenje
H.
Pals
G.
, et al.  . 
Biallelic inactivation of BRCA2 in Fanconi Anemia
Science
 , 
2002
, vol. 
297
 (pg. 
606
-
609
)
18
Hirsch
B.
Shimamura
A.
Moreau
L.
Baldinger
S.
Hag-alshiekh
M.
Bostrom
B.
Sencer
S.
D'Andrea
A.D.
Association of biallelic BRCA2/FANCD1 mutations with spontaneous chromosomal instability and solid tumors of childhood
Blood
 , 
2004
, vol. 
103
 (pg. 
2554
-
2559
)
19
Offit
K.
Levran
O.
Mullaney
B.
Mah
K.
Nafa
K.
Batish
S.D.
Diotti
R.
Schneider
H.
Deffenbaugh
A.
Scholl
T.
, et al.  . 
Shared genetic susceptibility to breast cancer, brain tumors, and Fanconi anemia
J. Natl. Cancer Inst.
 , 
2003
, vol. 
95
 (pg. 
1548
-
1551
)
20
Wagner
J.E.
Tolar
J.
Levran
O.
Scholl
T.
Deffenbaugh
A.
Satagopan
J.
Ben-Porat
L.
Mah
K.
Batish
S.D.
Kutler
D.I.
, et al.  . 
Germline mutations in BRCA2: shared genetic susceptibility to breast cancer, early onset leukemia, and Fanconi anemia
Blood
 , 
2004
, vol. 
103
 (pg. 
3226
-
3229
)
21
Myers
K.
Davies
S.M.
Harris
R.E.
Spunt
S.L.
Smolarek
T.
Zimmerman
S.
McMasters
R.
Wagner
L.
Mueller
R.
Auerbach
A.D.
, et al.  . 
The clinical phenotype of children with Fanconi anemia caused by biallelic FANCD1/BRCA2 mutations
Pediat. Blood Cancer
 , 
2012
, vol. 
58
 (pg. 
462
-
465
)
22
King
M.-C.
Marks
J.H.
Mandell
J.B.
New York Breast Cancer Study Group
Breast and ovarian cancer risks due to inherited mutations in BRCA1 and BRCA2
Science
 , 
2003
, vol. 
302
 (pg. 
643
-
646
)
23
O'Roak
B.J.
Vives
L.
Fu
W.
Egertson
J.D.
Stanaway
I.B.
Phelps
I.G.
Carvill
G.
Kumar
A.
Lee
C.
Ankenman
K.
, et al.  . 
Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders
Science
 , 
2012
, vol. 
338
 (pg. 
1619
-
1622
)
24
Dohm
J.C.
Lottaz
C.
Borodina
T.
Himmelbauer
H.
Substantial biases in ultra-short read data sets from high-throughput DNA sequencing
Nucleic Acids Res.
 , 
2008
, vol. 
36
 pg. 
e105
 
25
Allhoff
M.
Schonhuth
A.
Martin
M.
Costa
I.
Rahmann
S.
Marschall
T.
Discovering motifs that induce sequencing errors
BMC Bioinform.
 , 
2013
, vol. 
14
 pg. 
S1
 
26
Ng
P.C.
Henikoff
S.
Predicting deleterious amino acid substitutions
Genome Res.
 , 
2001
, vol. 
11
 (pg. 
863
-
874
)
27
Kumar
P.
Henikoff
S.
Ng
P.C.
Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm
Nat. Protocols
 , 
2009
, vol. 
4
 (pg. 
1073
-
1081
)
28
Adzhubei
I.A.
Schmidt
S.
Peshkin
L.
Ramensky
V.E.
Gerasimova
A.
Bork
P.
Kondrashov
A.S.
Sunyaev
S.R.
A method and server for predicting damaging missense mutations
Nat. Methods
 , 
2010
, vol. 
7
 (pg. 
248
-
249
)
29
Chun
S.
Fay
J.C.
Identification of deleterious mutations within three human genomes
Genome Res.
 , 
2009
, vol. 
19
 (pg. 
1553
-
1561
)
30
Schwarz
J.M.
Rodelsperger
C.
Schuelke
M.
Seelow
D.
MutationTaster evaluates disease-causing potential of sequence alterations
Nat. Methods
 , 
2010
, vol. 
7
 (pg. 
575
-
576
)
31
Davydov
E.V.
Goode
D.L.
Sirota
M.
Cooper
G.M.
Sidow
A.
Batzoglou
S.
Identifying a high fraction of the human genome to be under selective constraint using GERP++
PLoS Comput. Biol.
 , 
2010
, vol. 
6
 pg. 
e1001025
 
32
Pollard
K.S.
Hubisz
M.J.
Rosenbloom
K.R.
Siepel
A.
Detection of nonneutral substitution rates on mammalian phylogenies
Genome Res.
 , 
2010
, vol. 
20
 (pg. 
110
-
121
)
33
Aartsma-Rus
A.
Overview on DMD exon skipping
Methods Mol. Biol.
 , 
2012
, vol. 
867
 (pg. 
97
-
116
)
34
Hoffman
E.P.
Bronson
A.
Levin
A.A.
Takeda
S.i.
Yokota
T.
Baudy
A.R.
Connor
E.M.
Restoring dystrophin expression in Duchenne muscular dystrophy muscle: progress in exon skipping and stop codon read through
Am. J. Pathol.
 , 
2011
, vol. 
179
 (pg. 
12
-
22
)
35
Goyenvalle
A.
Seto
J.T.
Davies
K.E.
Chamberlain
J.
Therapeutic approaches to muscular dystrophy
Hum. Mol. Genet.
 , 
2011
, vol. 
20
 (pg. 
R69
-
R78
)
36
Schweingruber
C.
Rufener
S.C.
Zünd
D.
Yamashita
A.
Mühlemann
O.
Nonsense-mediated mRNA decay—mechanisms of substrate mRNA recognition and degradation in mammalian cells
Biochim. Biophys. Acta
 , 
2013
, vol. 
1829
 (pg. 
612
-
623
)
37
Huang
L.
Wilkinson
M.F.
Regulation of nonsense-mediated mRNA decay
Wiley Interdiscp Rev RNA
 , 
2012
, vol. 
3
 (pg. 
807
-
828
)
38
Bidou
L.
Allamand
V.
Rousset
J.-P.
Namy
O.
Sense from nonsense: therapies for premature stop codon diseases
Trends Mol. Med.
 , 
2012
, vol. 
18
 (pg. 
679
-
688
)
39
Keeling
K.M.
Wang
D.
Conard
S.E.
Bedwell
D.M.
Suppression of premature termination codons as a therapeutic approach
Crit. Rev. Biochem. Mol. Biol.
 , 
2012
, vol. 
47
 (pg. 
444
-
463
)
40
Rosenberg
P.S.
Tamary
H.
Alter
B.P.
How high are carrier frequencies of rare recessive syndromes? Contemporary estimates for Fanconi Anemia in the United States and Israel
Am. J. Med. Genet. A
 , 
2011
, vol. 
155
 (pg. 
1877
-
1883
)
41
Auerbach
A.D.
Lach
F.
2013
 
The Fanconi Anemia Mutation Database: http://www.rockefeller.edu/fanconi/mutate/ (date last accessed, 7 August 2014)
42
Tennessen
J.A.
Bigham
A.W.
O'Connor
T.D.
Fu
W.
Kenny
E.E.
Gravel
S.
McGee
S.
Do
R.
Liu
X.
Jun
G.
, et al.  . 
Evolution and functional impact of rare coding variation from deep sequencing of human exomes
Science
 , 
2012
, vol. 
337
 (pg. 
64
-
69
)
43
Ameziane
N.
Errami
A.
Léveillé
F.
Fontaine
C.
de Vries
Y.
van Spaendonk
R.M.L.
de Winter
J.P.
Pals
G.
Joenje
H.
Genetic subtyping of Fanconi anemia by comprehensive mutation screening
Hum. Mutat.
 , 
2008
, vol. 
29
 (pg. 
159
-
166
)
44
Chandrasekharappa
S.C.
Lach
F.P.
Kimble
D.C.
Kamat
A.
Teer
J.K.
Donovan
F.X.
Flynn
E.
Sen
S.K.
Thongthip
S.
Sanborn
E.
, et al.  . 
Massively parallel sequencing, aCGH, and RNA-Seq technologies provide a comprehensive molecular diagnosis of Fanconi anemia
Blood
 , 
2013
, vol. 
121
 (pg. 
e138
-
e148
)
45
Mills
R.E.
Walter
K.
Stewart
C.
Handsaker
R.E.
Chen
K.
Alkan
C.
Abyzov
A.
Yoon
S.C.
Ye
K.
Cheetham
R.K.
, et al.  . 
Mapping copy number variation by population-scale genome sequencing
Nature
 , 
2011
, vol. 
470
 (pg. 
59
-
65
)
46
MacArthur
D.G.
Balasubramanian
S.
Frankish
A.
Huang
N.
Morris
J.
Walter
K.
Jostins
L.
Habegger
L.
Pickrell
J.K.
Montgomery
S.B.
, et al.  . 
A systematic survey of loss-of-function variants in human protein-coding genes
Science
 , 
2012
, vol. 
335
 (pg. 
823
-
828
)
47
Xue
Y.
Chen
Y.
Ayub
Q.
Huang
N.
Ball
E.V.
Mort
M.
Phillips
A.D.
Shaw
K.
Stenson
P.D.
Cooper
D.N.
, et al.  . 
Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing
Am. J. Hum. Genet.
 , 
2012
, vol. 
91
 (pg. 
1022
-
1032
)
48
Berwick
M.
Satagopan
J.M.
Ben-Porat
L.
Carlson
A.
Mah
K.
Henry
R.
Diotti
R.
Milton
K.
Pujara
K.
Landers
T.
, et al.  . 
Genetic heterogeneity among Fanconi anemia heterozygotes and risk of cancer
Cancer Res.
 , 
2007
, vol. 
67
 (pg. 
9591
-
9596
)
49
McElroy
S.P.
Nomura
T.
Torrie
L.S.
Warbrick
E.
Gartner
U.
Wood
G.
McLean
W.H.I.
A lack of premature termination codon read-through efficacy of PTC124 (Ataluren) in a diverse array of reporter assays
PLoS Biol.
 , 
2013
, vol. 
11
 pg. 
e1001593
 
50
Dietz
H.C.
New therapeutic approaches to Mendelian disorders
N Engl. J. Med.
 , 
2010
, vol. 
363
 (pg. 
852
-
863
)
51
Rogan
M.P.
Stoltz
D.A.
Hornick
D.B.
Cystic fibrosis transmembrane conductance regulator intracellular processing, trafficking, and opportunities for mutation-specific treatment
Chest
 , 
2011
, vol. 
139
 (pg. 
1480
-
1490
)
52
Bear
C.E.
Li
C.
Kartner
N.
Bridges
R.J.
Jensen
T.J.
Ramjeesingh
M.
Riordan
J.R.
Purification and functional reconstitution of the cystic fibrosis transmembrane conductance regulator (CFTR)
Cell
 , 
1992
, vol. 
68
 (pg. 
809
-
818
)
53
Rowe
S.M.
Miller
S.
Sorscher
E.J.
Cystic fibrosis
New Engl. J. Med.
 , 
2005
, vol. 
352
 (pg. 
1992
-
2001
)
54
Van Goor
F.
Hadida
S.
Grootenhuis
P.D.J.
Burton
B.
Cao
D.
Neuberger
T.
Turnbull
A.
Singh
A.
Joubran
J.
Hazlewood
A.
, et al.  . 
Rescue of CF airway epithelial cell function in vitro by a CFTR potentiator, VX-770
Proc. Natl. Acad. Sci. USA
 , 
2009
, vol. 
106
 (pg. 
18825
-
18830
)
55
Accurso
F.J.
Rowe
S.M.
Clancy
J.P.
Boyle
M.P.
Dunitz
J.M.
Durie
P.R.
Sagel
S.D.
Hornick
D.B.
Konstan
M.W.
Donaldson
S.H.
, et al.  . 
Effect of VX-770 in persons with cystic fibrosis and the G551D-CFTR mutation
N Engl. J. Med.
 , 
2010
, vol. 
363
 (pg. 
1991
-
2003
)
56
Ramsey
B.W.
Davies
J.
McElvaney
N.G.
Tullis
E.
Bell
S.C.
Dřevínek
P.
Griese
M.
McKone
E.F.
Wainwright
C.E.
Konstan
M.W.
, et al.  . 
A CFTR potentiator in patients with cystic fibrosis and the G551D mutation
N Engl. J. Med.
 , 
2011
, vol. 
365
 (pg. 
1663
-
1672
)
57
Jih
K.-Y.
Hwang
T.-C.
Vx-770 potentiates CFTR function by promoting decoupling between the gating cycle and ATP hydrolysis cycle
Proc. Natl. Acad. Sci. USA
 , 
2013
, vol. 
110
 (pg. 
4404
-
4409
)
58
Antonarakis
S.E.
Group
T.N.W.
Recommendations for a nomenclature system for human gene mutations
Hum. Mutat.
 , 
1998
, vol. 
11
 (pg. 
1
-
3
)
59
den Dunnen
J.T.
Antonarakis
S.E.
Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion
Hum. Mutat.
 , 
2000
, vol. 
15
 (pg. 
2
-
7
)
60
Kimura
M.
Crow
J.F.
The number of alleles that can be maintained in a finite population
Genetics
 , 
1964
, vol. 
49
 (pg. 
725
-
738
)
61
Kuhn
R.M.
Haussler
D.
Kent
W.J.
The UCSC genome browser and associated tools
Brief. Bioinform.
 , 
2013
, vol. 
14
 (pg. 
144
-
161
)
62
Meyer
L.R.
Zweig
A.S.
Hinrichs
A.S.
Karolchik
D.
Kuhn
R.M.
Wong
M.
Sloan
C.A.
Rosenbloom
K.R.
Roe
G.
Rhead
B.
, et al.  . 
The UCSC Genome Browser database: extensions and updates 2013
Nucleic Acids Res.
 , 
2013
, vol. 
41
 (pg. 
D64
-
D69
)
63
ENCODE Project Consortium
An integrated encyclopedia of DNA elements in the human genome
Nature
 , 
2012
, vol. 
489
 (pg. 
57
-
74
)

Author notes

The first two authors should be regarded as joint first authors.