FATHMM-XF: accurate prediction of pathogenic point mutations via extended features

Abstract Summary We present FATHMM-XF, a method for predicting pathogenic point mutations in the human genome. Drawing on an extensive feature set, FATHMM-XF outperforms competitors on benchmark tests, particularly in non-coding regions where the majority of pathogenic mutations are likely to be found. Availability and implementation The FATHMM-XF web server is available at http://fathmm.biocompute.org.uk/fathmm-xf/, and as tracks on the Genome Tolerance Browser: http://gtb.biocompute.org.uk. Predictions are provided for human genome version GRCh37/hg19. The data used for this project can be downloaded from: http://fathmm.biocompute.org.uk/fathmm-xf/ Supplementary information Supplementary data are available at Bioinformatics online.


Introduction
Many classifiers have been proposed for predicting the impact of single-nucleotide variants (SNVs) in the human genome (see Liu et al., 2017). Initially these focused on non-synonymous mutations in coding regions of the genome, but most documented pathogenic SNVs come from non-coding regions, so more recent methods make predictions genome wide (Kircher et al., 2014;Shihab et al., 2015). CADD (Kircher et al., 2014) has emerged as a standard for predicting pathogenic SNVs, although its performance has been challenged (Liu et al., 2017). The recent GAVIN method adjusts CADD scores in a gene-specific manner, achieving greater accuracy than CADD, whilst assigning distinct Pathogenic and Benign labels that simplify interpretation (van der Velde et al., 2017).
Here we present FATHMM with an eXtended Feature set (FATHMM-XF) which yields highly accurate predictions for SNVs across the entire human genome. FATHMM-XF assigns a confidence score (a p-score) to every prediction, to simplify interpretation, and focus analysis on a subset of high-confidence predictions (cautious classification). In all tests, FATHMM-XF matches or outperforms competing methods, with its best performance in non-coding regions, where the majority of pathogenic SNVs are likely to be found. With cautious classification, FATHMM-XF consistently exceeds 94% accuracy on subsets of 80% of the highest-confidence predictions from benchmark test sets.

511
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.  (Shihab et al., 2015(Shihab et al., , 2017b. We construct four additional feature groups from conservation scores, the Variant Effect Predictor (McLaren et al., 2016); annotated gene models, and the DNA sequence itself (Supplementary Section S3). We convert feature groups into kernels to evaluate different combinations and kernel-based models. k-fold crossvalidation is commonly used to evaluate models, but can introduce bias if, for example, the same gene is represented in both training and test sets. Instead, we use leave-one-chromosome-out cross-validation (LOCO-CV): for each fold we set aside one chromosome for testing and use the remaining chromosomes for training. We use Platt scaling (Platt, 1999) to assign a p-score to each prediction (the probability that a particular SNV is pathogenic). For cautious classification, we then establish confidence thresholds to analyse sub-populations of high-confidence predictions.

Results
For non-coding regions, the best model incorporates five feature groups, achieving 92.3% accuracy in LOCO-CV (Supplementary  Table S6). Briefly, these feature groups encapsulate sequence conservation, proximity to genomic features (e.g. splice sites or transcription start sites) and chromatin accessibility. Cautious classification reaches 99% peak accuracy at a p-score threshold of s ¼ 0:96 ( Supplementary Fig. S2). This high-confidence subset of examples (p 0.96 or 0.04) comprises nearly 40% of test examples, demonstrating that the threshold is not prohibitively restrictive. Relaxing the threshold enlarges this subset dramatically whilst retaining high accuracy: at s ¼ 0:80, we cover 90% of examples with accuracy over 95% (Supplementary Section S4).
For coding regions, the best model uses six feature groups, reaching 88.0% accuracy (Supplementary Table S8). Again, conservation features are most informative, along with proximity to genomic features and nucleotide sequence features (Supplementary Section S3). Cautious classification achieves peak accuracy of 98% at s ¼ 0:97 ( Supplementary Fig. S2). This highest-confidence subset again comprises nearly 40% of examples; at s ¼ 0:80, it includes 80% of examples with accuracy above 94.0%. We use these peak accuracy thresholds (0.96 for non-coding, 0.97 for coding) in subsequent analyses.
To evaluate how well FATHMM-XF will generalise, we tested all methods on test sets we assembled from ClinVar data (Landrum et al., 2014)

Discussion
At default thresholds, FATHMM-XF matches or outperforms competing methods using an eclectic mixture of data sources. Even when all methods are optimised, FATHMM-XF yields substantially higher accuracy in all of our tests (Supplementary Figs S7-S10). Under cautious classification, accuracy exceeds 95%, whilst producing predictions for up to 80% of positions genome-wide. While the proposed classifiers achieve high accuracy, further improvement seems possible. Notably, all methods exhibit low PPV on non-coding data except for FATHMM-XF's cautious classification. Analysis of these variants ( Supplementary Fig. S1) reveals differences in the proportions of intron and UTR variants represented in the training and test sets. Hence region-specific models may improve performance in noncoding regions, just as GAVIN's gene-specific thresholding improves accuracy for CADD scores-by up to 26 percentage points in our tests. We will explore these approaches in future work. The FATHMM-XF web server for GRCh37/hg19 is available at fathmm.biocompute.org.uk/fathmm-xf, and as tracks on the Genome Tolerance Browser (gtb.biocompute.org.uk; Shihab et al., 2017a).