-
PDF
- Split View
-
Views
-
Cite
Cite
Hunter J Melton, Zichen Zhang, Chong Wu, SUMMIT-FA: a new resource for improved transcriptome imputation using functional annotations, Human Molecular Genetics, Volume 33, Issue 7, 1 April 2024, Pages 624–635, https://doi.org/10.1093/hmg/ddad205
- Share Icon Share
Abstract
Transcriptome-wide association studies (TWAS) integrate gene expression prediction models and genome-wide association studies (GWAS) to identify gene-trait associations. The power of TWAS is determined by the sample size of GWAS and the accuracy of the expression prediction model. Here, we present a new method, the Summary-level Unified Method for Modeling Integrated Transcriptome using Functional Annotations (SUMMIT-FA), which improves gene expression prediction accuracy by leveraging functional annotation resources and a large expression quantitative trait loci (eQTL) summary-level dataset. We build gene expression prediction models in whole blood using SUMMIT-FA with the comprehensive functional database MACIE and eQTL summary-level data from the eQTLGen consortium. We apply these models to GWAS for 24 complex traits and show that SUMMIT-FA identifies significantly more gene-trait associations and improves predictive power for identifying “silver standard” genes compared to several benchmark methods. We further conduct a simulation study to demonstrate the effectiveness of SUMMIT-FA.
Introduction
Genome-wide association studies (GWAS) have identified extensive lists of disease-associated variants [1], most of which reside in non-coding regions of the genome [2]. In response, large-scale expression quantitative trait loci (eQTL) analyses [3–5] and transcriptome-wide association studies (TWAS) [6–12] have emerged to investigate the molecular mechanisms underlying complex diseases. TWAS involves a two-step procedure that integrates gene expression reference panels and trait-specific GWAS. First, gene expression is regressed on cis-eQTL genotypes to create expression prediction models for each gene. Second, the relationship between predicted gene expressions and GWAS traits is determined through an association test, elucidating putatively causal effects of genes on traits of interest.
The power of TWAS depends on the accuracy of the expression prediction model in Step 1 and the sample size of GWAS in Step 2 [6, 13]. Although the sample size of GWAS is increasing rapidly thanks to extensive efforts of recent consortia, the prediction accuracy of the expression prediction model remains a limiting factor. To address this challenge, several methods have been proposed to improve the prediction model accuracy by leveraging auxiliary information from other tissues [10, 12], functional annotations or atlases of regulatory elements [14, 15], and summary-level expression reference panels (SUMMIT) [6]. For instance, by using the summary-level expression reference panel with a much larger sample size, SUMMIT outperforms many benchmark methods in terms of expression prediction model accuracy and subsequent TWAS power [6]. However, SUMMIT only relies on the summary-level expression reference panel and does not account for comprehensive functional annotations that may improve expression prediction model accuracy. Functional annotations [16–21] can provide a useful prior probability that a genetic variant causally affects complex phenotypes and expression levels and have been successfully incorporated to improve power for rare variant association analysis in GWAS [17, 18, 22, 23]. We hypothesize that leveraging functional annotations could further improve the accuracy of SUMMIT models, leading to higher power of TWAS.
To test our hypothesis, we develop SUMMIT-FA (Summary-level Unified Method for Modeling Integrated Transcriptome using Functional Annotations), an extension of the SUMMIT [6], that improves the accuracy of gene expression prediction models by leveraging annotation resources from the Multi-dimensional Annotation-Class Integrative Estimation (MACIE) database [18] to prioritize functional variants. MACIE synthesizes multiple categories of annotations, such as evolutionary conservation annotations and epigenetic annotations, and consistently provides the state-of-the-art performance in discriminating between functional and non-functional variants [18]. Briefly, we craft gene expression prediction models for whole blood tissue using a summary-level eQTL database provided by the eQTLGen consortium [5], the largest-to-date publicly available meta-analysis featuring samples from 37 cohorts and 31 684 blood samples, and functional annotations provided by MACIE [18]. These models are then combined with the previous set of SUMMIT [6] models to significantly improve its performance. The elevated utility of SUMMIT-FA is thoroughly demonstrated via simulation studies and application to GWAS summary statistics for 24 complex traits. Notably, SUMMIT-FA increases expression prediction accuracy in whole-blood tissue, including for genes with low expression heritability. Moreover, it enhances power to detect associations between genes and phenotypes beyond preceding benchmark methods and achieves higher predictive power for identifying “silver standard” genes, where “silver standard” genes are curated from information independent of GWAS results (see Methods section). To demonstrate the use of the method, we conduct a case study in Hypertension. A database of the SUMMIT-FA models and results is available as a resource to the community.
Results
SUMMIT-FA overview
SUMMIT-FA extends SUMMIT [6], a recently developed TWAS framework that leverages large-scale eQTL summary-level data to predict gene expression levels, to further improve accuracy in expression prediction and subsequent power for association testing. SUMMIT-FA follows much the same scaffolding as SUMMIT: for each prospective gene, nine expression prediction models are trained on eQTL summary-level data from 31 684 whole-blood samples provided by eQTLGen [5]. Subsequently, for each model with satisfactory prediction accuracy (R2 ≥ 0.005), associations between predicted gene expression levels and phenotypes are tested. The Cauchy combination test [24, 25] is then employed to aggregate results. The main improvement of SUMMIT-FA over its predecessor is the incorporation of four new gene expression prediction models that leverage functional annotations from the MACIE database [18]. In short, a variant with a high MACIE_anyclass score, indicating a higher probability of functionality, will incur a reduced penalty in the regression process, consequently leading to the prioritization of variants that are deemed more likely to be causally linked a priori. Additionally, each of the four models uses only SNPs with MACIE_anyclass greater than a given cutoff. More details on the methodology can be found in the Methods section, and the model framework is shown in Fig. 1.

SUMMIT-FA workflow. SUMMIT-FA proceeds in two steps. First, build gene expression prediction models. Second, test associations between traits of interest and predicted expression, aggregating the results from nine distinct prediction models. Alt-text: SUMMIT-FA uses summary-level cis-eQTLs from the eQTLGen consortium, functional annotations from MACIE, and LD reference panels from the 1000 genomes project to construct nine sets of weights for gene expression prediction, or imputation. The weights are tuned using individual-level cis-eQTLs from GTEx. The weights are then used to test associations between genes and traits in TWAS.
SUMMIT-FA constructs more analyzable expression models
To demonstrate the elevated utility of SUMMIT-FA, we compared the model accuracy for gene expression prediction in whole blood tissue for SUMMIT-FA, SUMMIT (its predecessor) [6], and five benchmark methods: PrediXcan [8], TWAS-FUSION [7], UTMOST [10], MR-JTI [12], and Lassosum [26]. To ensure a fair comparison, we focused only on genes with estimated R2 ≥ 0.01 for SUMMIT-FA and SUMMIT (to match the other methods) that exist in the eQTLGen data. While we often refer to R2 as prediction accuracy, technically it is the proportion of variation in gene expression levels that can be explained by the prediction model. PrediXcan, TWAS-FUSION, UTMOST, and MR-JTI were trained using GTEx data, and the prediction accuracies (R2) were based on a cross-validation procedure and provided by the authors, respectively. SUMMIT-FA, SUMMIT, and Lassosum were trained using summary-level eQTL data from the eQTLGen, and prediction accuracies were determined in an independent test dataset drawn from GTEx version 8 that was not included in the eQTLGen meta-analysis. SUMMIT-FA constructed satisfactory (in this case, R2 ≥ 0.01) expression prediction models for more genes than the preceding methods: 12 132 for SUMMIT-FA vs. 9749 for SUMMIT, 7512 for PrediXcan, 5411 for TWAS-FUSION, 7236 for UTMOST, 9576 for MR-JTI, and 8249 for Lassosum. Crucially, SUMMIT-FA successfully developed models for the majority of genes that SUMMIT and the benchmark methods did, 10 018 (81.9%) out of the 12 230 unique genes across all other methods. Furthermore, SUMMIT-FA built satisfactory prediction models (with R2 ≥ 0.01) for an additional 950 genes beyond what was accomplished by SUMMIT or any of the five benchmark methods. Compared with SUMMIT alone, SUMMIT-FA built satisfactory prediction models (with R2 ≥ 0.01) for additional 2383 genes (24.4% improvement). These improvements demonstrate the marked utility of MACIE functional annotations in TWAS methods. The direct impact of their inclusion is seen as SUMMIT-FA achieved significantly higher prediction accuracy across the eQTLGen gene set than SUMMIT (P < 2.2 × 10−16 per the paired Wilcoxon rank-sum test) and other competing methods (PrediXcan, TWAS-FUSION, UTMOST, MR-JTI, and Lassosum; all P < 2.2 × 10−16 per the paired Wilcoxon rank-sum test).
SUMMIT-FA pinpoints significantly more associations
We explored the downstream performance of SUMMIT-FA in identifying significant associations by applying it to GWAS summary statistics of 24 complex phenotypes (see Supplementary Table 1 for a list of phenotypes, Supplementary Data 1 for full SUMMIT-FA association scan results). We then compared results from SUMMIT-FA to SUMMIT and the five aforementioned benchmark methods: PrediXcan, TWAS-FUSION, UTMOST, MR-JTI, and Lassosum. While both SUMMIT-FA and SUMMIT successfully analyze all genes with genetic heritability (i.e. prediction R2) ≥ 0.005, we first focused on all genes with heritability ≥ 0.01 for a fair comparison with other benchmark methods (Fig. 2b). Across all traits, SUMMIT-FA identified 4049 significant gene-phenotype associations, which is a 21.3% increase over SUMMIT (P = 7.2 × 10−5 via the one-sided Wilcoxon signed-rank test), a 161.4% increase over PrediXcan (P = 4.8 × 10−5), a 183.7% increase over TWAS-FUSION (P = 4.7 × 10−5), a 124.2% increase over UTMOST (P = 5.6 × 10−5), a 132.3% increase over MR-JTI (P = 9.5 × 10−7), and a 102.6% increase over Lassosum (P = 4.7 × 10−5).

Method comparison (a) UpSet plot on common gene set for which R2 ≥ 0.01. (b) number of significant gene-trait associations for all methods and genes across 24 GWAS. (c) Number of significant gene-trait associations for all methods on a common set of genes across 24 GWAS. (d) ROC plot for identifying silver standard genes. Alt-text for (a): The UpSet plot shows counts of the number of common genes (which were analyzed by all methods) with expression prediction R^2 greater than or equal to 0.01 for all seven methods (SUMMIT-FA, SUMMIT, Lassosum, MR-JTI, PrediXcan, TWAS-FUSION, and UTMOST) and various combinations. Approximately 1300 genes have expression prediction R^2 greater than or equal to 0.01 for SUMMIT-FA only, larger than any of the other methods. Alt-text for (b): The bar plot displays counts of the number of statistically significant associations between genes and traits for all seven methods across all 24 GWAS. SUMMIT-FA identifies 4691 such associations, 4049 from genes with expression prediction R^2 greater than or equal to 0.01 and 642 from genes with expression prediction R^2 between 0.005 and 0.01. The next best performing method, SUMMIT, identifies 3998 significant associations. Alt-text for (c): The bar plot shown here also displays counts of the number of statistically significant associations between genes and traits for all seven methods across all 24 GWAS, but now only considering a common set of genes for which each method attains expression prediction R^2 greater than or equal to 0.01. SUMMIT-FA identifies 1473 such associations. The next best performing method, SUMMIT, identifies 1445. Alt-text for (d): The ROC plot shows the sensitivity and specificity for each method in identifying “silver standard” genes. SUMMIT-FA achieves the highest AUC at 0.809, which is followed by SUMMIT (0.777), Lassosum (0.751), PrediXcan (0.731), MR-JTI (0.702), UTMOST (0.700), and TWAS-FUSION (0.685).
Next, we compared each method on a common set of 3948 genes that could be analyzed by each method (Fig. 2c). For the common set of genes, both SUMMIT and SUMMIT-FA built expression prediction models with high prediction accuracy. The average R2 values for SUMMIT-FA and SUMMIT were 0.167 and 0.161, respectively. As a result, SUMMIT-FA and SUMMIT performed similarly and identified 1473 and 1445 significant associations. However, the improvement of SUMMIT-FA over the other benchmark methods is still quite large, with a 32% increase in significant associations over the next highest-performing method, Lassosum. This result is aligned with our simulation study: including MACIE functional annotations is quite impactful for genes with lower expression heritability but less so for genes with high expression heritability.
Lastly, recalling that SUMMIT provided a step forward in the field by allowing the analysis of genes with lower heritability (0.005 ≤ R2 < 0.01), we examined these genes in more detail. For SUMMIT-FA, 14 309 genes possess R2 ≥ 0.005; of these, 2177 have 0.005 ≤ R2 < 0.01. The 2177 lower heritability genes produce 642 significant gene-trait associations, a relatively similar ratio to that of genes with R2 ≥ 0.01: 12132 genes and 4049 significant gene-trait associations. This gives further credence to the idea that lower heritability genes have notably larger causal effect sizes on complex phenotypes [6, 27].
SUMMIT-FA improves predictive power for identifying “silver standard” genes
We evaluated the performance of SUMMIT-FA and the six benchmark methods to a set of “silver standard” genes, which are curated from information independent of GWAS results and considered highly likely to be causal in mediating associations between phenotypes and GWAS loci. As per Barbeira et al.[28], the silver standard gene set consists of 1258 gene-phenotype pairs from the Online Mendelian Inheritance in Man (OMIM) database [29] and an additional 29 pairs crafted from rare variant results in exome-wide association studies. All silver standard gene-phenotype pairs are in Supplementary Data 2.
We used the silver standard gene set to compare sensitivity and specificity among the methods (Fig. 2d) with a receiver operator curve (ROC). SUMMIT-FA produced the highest area under the curve (AUC) (0.809), though the benchmark methods (AUCs ∈ [0.685,0.777]) still performed reasonably well, reinforcing the notion that TWAS methods can successfully identify putatively causal genes. We conducted a one-sided DeLong test for the difference in AUC for the receiver-operator curves, resulting in a significant difference between SUMMIT-FA and TWAS-FUSION (P = 0.015), MR-JTI (P = 0.012), and UTMOST (P = 0.002). Potentially significant differences were found between SUMMIT-FA and SUMMIT (P = 0.056), PrediXcan (P = 0.080), and Lassosum (P = 0.058). Notably, the TWAS framework, including SUMMIT-FA, can be considered a type of Mendelian randomization or instrumental variables regression [30, 31]. The optimal instrument in this context is the prediction model with the highest accuracy [32]. As SUMMIT-FA increased the accuracy of its gene expression prediction models, it outperformed its contemporaries in identifying “silver standard” genes.
Simulation results
We conducted a simulation study to evaluate the accuracy and subsequent downstream power to detect associations of models produced by SUMMIT-FA using the randomly selected gene CHURC1. We compared our method with its predecessor SUMMIT and three benchmark methods: Lassosum, TWAS-FUSION, and PrediXcan. We evaluated the accuracy, or R2, of gene expression prediction models built from the five methods (Fig. 3a). Results showed that SUMMIT-FA drastically outperforms the two methods that rely on individual-level expression reference panels (and thus have a significantly smaller sample size), TWAS-FUSION and PrediXcan. This is expected, as increasing the sample size of the reference has been shown to increase accuracy in gene expression imputation [6]. The two other methods that rely on larger, summary-level reference panels, SUMMIT and Lassosum, are significantly more accurate, though SUMMIT-FA still outperforms both. As an example, when phenotype heritability |${h}_p^2$| = 0.1, gene expression heritability |${h}_e^2$| = 0.01, and sparsity level |${p}_{\mathrm{causal}}$| = 0.05, the median prediction R2 for 1000 replicates for SUMMIT-FA is 0.81%, higher than the 0.75% for SUMMIT and 0.61% for Lassosum. Similar results for other values of the phenotypic heritability |${h}_p^2$| are shown in Supplementary Figures 1 and 2.

Simulation performance comparison based on gene CHURC1 (a) plots of imputation R2 in test samples by SUMMIT-FA, SUMMIT, Lassosum, TWAS-FUSION, and PrediXcan for varied proportion of causal SNPs |${p}_{\mathrm{causal}}$| and expression heritability |${h}_e^2$|, with phenotypic heritability |${h}_p^2$| = 0.1. (b) Subsequent power measurements for each method with phenotypic heritability |${h}_p^2$| = 0.1. Empirical power was determined by the proportion of P-values < 2.5 × 10−6. Alt-text for (a): Bar plots displaying gene expression prediction R^2 in simulations for SUMMIT-FA, SUMMIT, Lassosum, TWAS-FUSION, and PrediXcan for varied proportion of causal SNPs p_causal (0.01, 0.05, 0.1, 0.2) and expression heritability h_e^2 (0.005, 0.01, 0.1), with phenotypic heritability h_p^2 = 0.1. In every case, SUMMIT-FA outperforms SUMMIT (as an example, for h_e^2 = 0.01, and p_causal = 0.05, the median prediction R^2 for 1000 replicates for SUMMIT-FA is 0.81%, 0.75% for SUMMIT 0.61% for Lassosum, and much lower for TWAS-FUSION and PrediXcan), and this is more evident under lower heritability settings. Alt-text for (b): Line plots displaying the empirical power to determine an association between the gene and trait in simulations for SUMMIT-FA, SUMMIT, Lassosum, TWAS-FUSION, and PrediXcan for varied proportion of causal SNPs p_causal (0.01, 0.05, 0.1, 0.2) and expression heritability h_e^2 (0.005, 0.01, 0.1), with phenotypic heritability h_p^2 = 0.1. In each case, SUMMIT-FA performs best, though only slightly in higher expression heritability settings.
We also demonstrated the elevated power of the downstream association tests for SUMMIT-FA (Fig. 3b). Across different values for |${p}_{\mathrm{causal}}$| and |${h}_e^2$|, only SUMMIT approaches SUMMIT-FA’s power. For example, when |${h}_p^2$| = 0.1, |${h}_e^2$| = 0.005, and |${p}_{\mathrm{causal}}$| = 0.05, the power for SUMMIT-FA is 0.798, 0.723 for SUMMIT, 0.282 for Lassosum, 0.002 for TWAS-FUSION, and 0.0 for PrediXcan.
It has been shown that genes with lower expression heritability (i.e. prediction R2 ∈ [0.005,0.01)) generally have larger effect sizes on complex phenotypes [27] and thus are important to be tested in TWAS [6]. As such, it is paramount that SUMMIT-FA retains SUMMIT’s ability to achieve satisfactory performance for genes with low expression heritability. We simulated data with |${h}_e^2$| = 0.005 to examine this, which is shown in Fig. 3a. SUMMIT-FA consistently achieves satisfactory results; as an example, when |${h}_e^2$| = 0.005, and |${p}_{\mathrm{causal}}$| = 0.05, SUMMIT-FA’s median prediction R2 is 0.36%, 14.6% higher than SUMMIT’s 0.31%.
To ensure that genetic architecture is not the driving force of the simulation results (Supplementary Figs 4–5), we demonstrated consistent results for an additional randomly selected gene, KRIT1. Furthermore, we ran 20000 simulation replicates under the null hypothesis to evaluate Type 1 error rates. Every method controlled Type 1 error rates well (Supplementary Fig. 3). We further explored the Type 1 error rate for SUMMIT-FA by running 1000 simulation replicates for 100 randomly selected genes under the null hypothesis (Supplementary Fig. 6); SUMMIT-FA again controlled Type I error well. To summarize, the simulation results demonstrate the viability of SUMMIT-FA models for use in predicting gene expression and subsequent downstream association tests. Importantly, SUMMIT-FA’s achieved elevated performance beyond its predecessor SUMMIT, particularly in the low gene expression heritability setting. This underscored the advantages of including MACIE functional annotations in the model construction.
SUMMIT-FA prioritizes SNPs in FANTOM5 CAGE-defined promoter and enhancer regions
To elucidate why incorporating the MACIE functional annotations [18] into SUMMIT-FA is helpful in genetically predicting gene expression, we examined the number of SNPs with nonzero weights within FANTOM5 CAGE-defined promoter and enhancer regions [33, 34]. We compared the proportion of SNPs in both regions in the best performing (via gene expression prediction R2) weight for SUMMIT-FA and SUMMIT, as the different penalized regression models tend to select different proportions of SNPs with nonzero weights. On average, across all genes, 2.5% of the SNPs with nonzero weights selected by the best SUMMIT-FA weight fall within an enhancer region and 1.1% within a promoter region (Supplementary Fig. 9). To contrast, 1.4% of the SNPs with nonzero weights selected by the best SUMMIT weight fall within an enhancer region and 0.6% within a promoter region. We applied the one-sided Wilcoxon signed-rank test to the proportions of SNPs in enhancer and promoter regions, and in both cases found a highly statistically significant difference in favor of SUMMIT-FA (P < 2.2 × 10−16).
While these percentages are small, we note that the FANTOM5 CAGE-defined enhancer and promoter datasets respectively include 520 987 (0.65%) and 110 895 (0.14%) variants out of the approximately 80 million non-coding variants in the 1000 Genomes Project [18]. Therefore, both SUMMIT-FA and SUMMIT include non-coding regulatory SNPs at a higher rate than they appear in general, though SUMMIT-FA does so to a greater extent due to incorporating the MACIE functional annotations. Since SNPs in non-coding regulatory regions are likely to impact gene expression, it is important that they are included in prediction models. Thus, methods like SUMMIT-FA that better prioritize such SNPs will better predict gene expression.
SUMMIT-FA identifies 99 putatively causal genes for hypertension
To demonstrate the usage of models created by SUMMIT-FA, we conducted a TWAS for hypertension using GWAS summary statistics from 337 119 participants with European ancestry from the UK Biobank. We determined 369 significantly associated genes with the Bonferroni-corrected significance threshold, which are shown in Figure 4 |$3.51\times{10}^{-6}$|. For these 369 associated genes, we conducted sensitivity analysis with 2ScML [35], a new robust method for identifying causal effects in TWAS. This indicated that 99 of the genes were putatively causal (Supplementary Table 2). Of these 99 genes, PrediXcan, TWAS-FUSION, UTMOST, MR-JTI, Lassosum, and SUMMIT identified 48, 37, 44, 45, 67, and 83 respectively as associated with hypertension (though we eschewed the robust sensitivity analysis with these methods). In all, 11 of the 99 putatively causal genes were not found by any of the other methods (Table 1). Of the 11 novel genes, 8 have demonstrated links to hypertension in previous literature.

Hypertension Manhattan plot. Genes above the dotted line are associated with hypertension at a Bonferroni-corrected P-value threshold 3.51 × 10−6. Alt-text: A Manhattan plot showing genes significantly associated with hypertension by SUMMIT-FA. The Bonferroni-corrected p-value threshold is 3.51 × 10^−6. SUMMIT-FA identifies 369 significantly associated genes, many of which come from genes with lower estimated heritability (expression prediction R^2 between 0.005 and 0.01). There are a large number of associated genes on chromosomes 1, 6, 11, 12, 16, and 17.
Novel putatively causal genes for hypertension. Summary of output from 2ScML, a new robust pipeline for identifying causal effects in TWAS for the 11 genes not identified as associated with hypertension by other methods. The “Direction” column indicates whether the expressed gene is associated with increased (+) or decreased (−) risk of hypertension.
Gene . | Chr . | P0 . | P1 . | Direction . | PSUMMIT-FA . | P2ScML . |
---|---|---|---|---|---|---|
RP11-704M14.1 | 4 | 69 182 100 | 69 214 475 | + | 1.87*10−9 | 8.99*10−5 |
HIST1H2AE | 6 | 26 033 176 | 26 033 568 | − | 2.87*10−7 | 1.09*10−33 |
ABT1 | 6 | 26 596 953 | 26 600 739 | − | 1.27*10−6 | 1.85*10−7 |
NCR3 | 6 | 31 588 895 | 31 593 006 | + | 1.79*10−9 | 6.26*10−7 |
VWA7 | 6 | 31 765 590 | 31 777 328 | − | 1.48*10−21 | 6.34*10−7 |
VARS1 | 6 | 31 777 518 | 31 795 752 | + | 2.10*10−9 | 6.43*10−18 |
BANF1 | 11 | 66 002 228 | 66 004 146 | − | 2.43*10−7 | 1.51*10−5 |
ACACB | 12 | 109 116 587 | 109 268 226 | − | 4.90*10−7 | 4.90*10−5 |
ZNF268 | 12 | 133 181 495 | 133 214 832 | − | 1.96*10−12 | 5.78*10−10 |
RALGAPA1 | 14 | 35 538 352 | 35 809 304 | − | 5.45*10−10 | 2.57*10−6 |
LOXL1 | 15 | 73 926 462 | 73 952 136 | − | 1.04*10−10 | 3.56*10−5 |
Gene . | Chr . | P0 . | P1 . | Direction . | PSUMMIT-FA . | P2ScML . |
---|---|---|---|---|---|---|
RP11-704M14.1 | 4 | 69 182 100 | 69 214 475 | + | 1.87*10−9 | 8.99*10−5 |
HIST1H2AE | 6 | 26 033 176 | 26 033 568 | − | 2.87*10−7 | 1.09*10−33 |
ABT1 | 6 | 26 596 953 | 26 600 739 | − | 1.27*10−6 | 1.85*10−7 |
NCR3 | 6 | 31 588 895 | 31 593 006 | + | 1.79*10−9 | 6.26*10−7 |
VWA7 | 6 | 31 765 590 | 31 777 328 | − | 1.48*10−21 | 6.34*10−7 |
VARS1 | 6 | 31 777 518 | 31 795 752 | + | 2.10*10−9 | 6.43*10−18 |
BANF1 | 11 | 66 002 228 | 66 004 146 | − | 2.43*10−7 | 1.51*10−5 |
ACACB | 12 | 109 116 587 | 109 268 226 | − | 4.90*10−7 | 4.90*10−5 |
ZNF268 | 12 | 133 181 495 | 133 214 832 | − | 1.96*10−12 | 5.78*10−10 |
RALGAPA1 | 14 | 35 538 352 | 35 809 304 | − | 5.45*10−10 | 2.57*10−6 |
LOXL1 | 15 | 73 926 462 | 73 952 136 | − | 1.04*10−10 | 3.56*10−5 |
Novel putatively causal genes for hypertension. Summary of output from 2ScML, a new robust pipeline for identifying causal effects in TWAS for the 11 genes not identified as associated with hypertension by other methods. The “Direction” column indicates whether the expressed gene is associated with increased (+) or decreased (−) risk of hypertension.
Gene . | Chr . | P0 . | P1 . | Direction . | PSUMMIT-FA . | P2ScML . |
---|---|---|---|---|---|---|
RP11-704M14.1 | 4 | 69 182 100 | 69 214 475 | + | 1.87*10−9 | 8.99*10−5 |
HIST1H2AE | 6 | 26 033 176 | 26 033 568 | − | 2.87*10−7 | 1.09*10−33 |
ABT1 | 6 | 26 596 953 | 26 600 739 | − | 1.27*10−6 | 1.85*10−7 |
NCR3 | 6 | 31 588 895 | 31 593 006 | + | 1.79*10−9 | 6.26*10−7 |
VWA7 | 6 | 31 765 590 | 31 777 328 | − | 1.48*10−21 | 6.34*10−7 |
VARS1 | 6 | 31 777 518 | 31 795 752 | + | 2.10*10−9 | 6.43*10−18 |
BANF1 | 11 | 66 002 228 | 66 004 146 | − | 2.43*10−7 | 1.51*10−5 |
ACACB | 12 | 109 116 587 | 109 268 226 | − | 4.90*10−7 | 4.90*10−5 |
ZNF268 | 12 | 133 181 495 | 133 214 832 | − | 1.96*10−12 | 5.78*10−10 |
RALGAPA1 | 14 | 35 538 352 | 35 809 304 | − | 5.45*10−10 | 2.57*10−6 |
LOXL1 | 15 | 73 926 462 | 73 952 136 | − | 1.04*10−10 | 3.56*10−5 |
Gene . | Chr . | P0 . | P1 . | Direction . | PSUMMIT-FA . | P2ScML . |
---|---|---|---|---|---|---|
RP11-704M14.1 | 4 | 69 182 100 | 69 214 475 | + | 1.87*10−9 | 8.99*10−5 |
HIST1H2AE | 6 | 26 033 176 | 26 033 568 | − | 2.87*10−7 | 1.09*10−33 |
ABT1 | 6 | 26 596 953 | 26 600 739 | − | 1.27*10−6 | 1.85*10−7 |
NCR3 | 6 | 31 588 895 | 31 593 006 | + | 1.79*10−9 | 6.26*10−7 |
VWA7 | 6 | 31 765 590 | 31 777 328 | − | 1.48*10−21 | 6.34*10−7 |
VARS1 | 6 | 31 777 518 | 31 795 752 | + | 2.10*10−9 | 6.43*10−18 |
BANF1 | 11 | 66 002 228 | 66 004 146 | − | 2.43*10−7 | 1.51*10−5 |
ACACB | 12 | 109 116 587 | 109 268 226 | − | 4.90*10−7 | 4.90*10−5 |
ZNF268 | 12 | 133 181 495 | 133 214 832 | − | 1.96*10−12 | 5.78*10−10 |
RALGAPA1 | 14 | 35 538 352 | 35 809 304 | − | 5.45*10−10 | 2.57*10−6 |
LOXL1 | 15 | 73 926 462 | 73 952 136 | − | 1.04*10−10 | 3.56*10−5 |
ACACB encodes the protein Acetyl-CoA carboxylase beta, which plays a key role in the oxidation of fatty acids [36]. SNPs rs2268388, rs2268388, and rs2239607 within the gene have been previously associated with obesity and other phenotypes for which obesity is a major comorbidity, such as type 2 diabetes and hypertension [36]. LOXL1, or Lysyl oxidase like-1, codes for a member of the lysyl oxidase family, and it is known to be vital for elastin fiber maintenance [37]. This is necessary for elastic tissues and vessels to maintain strength and stability. Genetic variation within the LOXL1 locus has been associated with ocular hypertension [37]. While no results have been published linking typical hypertension to LOXL1, increased expression implies more elastin in blood vessels, and elastin presence is inversely correlated with hypertension [38]. Thus, it is expected that higher LOXL1 expression would be associated with lower hypertension, as found in this study. HIST1H2AE has been noted to be differentially expressed between hypertension patients and normal healthy controls [39].
RALGAPA1, for which higher expression is associated with lower risk of hypertension, encodes the RalGAPα1 protein, which regulates calcium homeostasis in the blood through the calcium pump Sarcoplasmic endoplasmic reticulum calcium ATPase (SERCA2) [40]. Notably, RalGAPα1 is protective of cardiac function under pressure overload conditions and decreases blood pressure by activating the calcium removal process, which is consistent with the results presented in Table 1. The BANF1 p.Ala12Thr mutation is known to lead to progeroid syndrome, a Mendelian disorder that causes severe problems in connective tissue and leads to the appearance of premature aging [41]. Progeroid syndrome often leads to hypertension; however, the p.Ala12Thr mutation is rare and this likely is not the axis of action by which BANF1 expression is protective against hypertension in this study. The SNP rs3130491, an intron in VARS1, has been associated with diastolic blood pressure [42]. It has been demonstrated in a Russian population that VWA7 is associated with SNPs (by eQTL analysis) that are associated with hypertension [43]. NCR3, Natural Cytotoxicity Triggering Receptor 3, encodes a receptor that aids in NK-lysis of tumor cells. It has been associated with diastolic blood pressure [44, 45], and the receptor’s BAG6 ligand gene has been associated with both hypertension [46] and diastolic blood pressure [46, 47]. The remaining genes, ZNF268 and ABT1, along with the long non-coding RNA expression profile RP11-704M14.1, represent novel associations with hypertension. The exciting further exploration of these novel associations is left to future work.
Discussion
By incorporating functional annotations provided by MACIE [18], models built by the new method SUMMIT-FA achieve increased gene expression prediction accuracy beyond those of its predecessor SUMMIT [6], which subsequently increased the downstream power to detect risk genes for complex phenotypes. Given its performance in the analyses of 24 distinct GWAS and a simulation study, SUMMIT-FA marks a substantial step forward from preceding methods. Specifically, SUMMIT-FA improved gene expression prediction accuracy, pinpointed more gene-trait associations, was better powered to detect silver standard genes, and, importantly, maintained and improved upon its predecessor’s ability to analyze genes with low heritability of expression (i.e. prediction R2 ∈ [0.005, 0.01)). These genes generally have larger effect sizes on complex phenotypes [6, 27], and as such, new TWAS methods ought to ensure they remain analyzable.
As a TWAS method, SUMMIT-FA may be viewed as one type of Mendelian randomization (MR) [30, 31]. Thus, it can allow for valid causal interpretations, but only when every genetic variant included in the gene expression prediction models is a valid instrumental variable (IV) [31, 48, 49]. Given that horizontal pleiotropy is widespread [50], it is likely that IV assumptions are violated, and we in turn recommend that those who wish to make use of this resource do so along with other complementary methods, such as fine-mapping [51, 52], colocalization [53], or robust sensitivity methods [35], all of which can prioritize putatively causal genes in different aspects.
We wish to highlight that TWAS-like methods are underpinned by the assumption that changes in gene expression levels lead to changes in disease risk. Consequently, most gene expression prediction models are built upon healthy subjects to avoid reverse causation. Consider cancer as a demonstrative example. Gene expression patterns may change during cancer development; thus, information collected from tumor samples would not reflect the genetic regulatory components of gene expression levels in normal healthy tissue, but rather represent downstream effects from tumor development. In this scenario, reverse causation could drive spurious results.
Naturally, SUMMIT-FA is largely motivated by our previous method SUMMIT [6] and the recent development of the MACIE functional annotations [18], which, unlike preceding annotation sets, cover the entirety of the human genome and provide multi-dimensional assessment of variant function. While some TWAS methods integrate additional datasets beyond GWAS and gene expression panels [10, 12, 54], so far as we are aware, none incorporate comprehensive functional annotation databases like MACIE (which has been shown to outperform other such annotations [18]) in the same manner as SUMMIT-FA. As demonstrated in this work, making use of MACIE can improve overall TWAS performance.
We conclude with a discussion of limitations for the present study. Primarily, we only focus on improving the gene expression prediction model accuracy and its subsequent power in blood tissues using functional annotations and summary-level expression data from eQTLGen, which is generated from participants of European ancestry. In principle, SUMMIT-FA can be extended to summary-level eQTL data from other tissues and ancestries. To pursue such ideas, we attempted to apply it to the MetaBrain [55] dataset and found that incorporating the MACIE functional annotations provided minimal gains (not shown). That said, there were ameliorating factors; we lacked an independent set of matching individual-level eQTL for tuning the SUMMIT-FA models and necessarily adapted SUMMIT-FA with BLISS [56], a new method for building biomarker prediction models using summary-level QTL data. As such, the comparison of SUMMIT-FA to SUMMIT on the MetaBrain [55] data may not be identical to the comparison on the eQTLGen [5] data. Exploring the impact of BLISS on SUMMIT-FA goes beyond the scope of this work, and we recommend caution and rigorous evaluation if SUMMIT-FA is applied to non-blood tissue eQTLs. That said, as the gene expression of many genes are shared across many tissues [57], we expect that the prediction accuracy improvement in blood may also benefit other tissues through transfer learning. Secondly, we also urge caution in applying SUMMIT-FA to other -omics data. We attempted to apply the MACIE functional annotations [18] to improve the prediction accuracy of DNA methylation models, but this resulted in only a modest improvement (not shown). This is likely because these functional annotations are not specifically tailored for DNA methylation.
Thirdly, the “M-values”, which are discussed further in the Methods section, are generated from the MACIE_anyclass score. This is an estimate of the prior probability that a variant is in any way functional, and the scores are built primarily to target gene expression regulation and may not be appropriate for use with other -omics data. Furthermore, the “M-values” may be less discriminatory in a scenario too specific for MACIE_anyclass to provide much useful additional information, such as tissue-specific risk genes expressed in a tissue-specific pattern. Fortunately, the five models featured in SUMMIT-FA that do not rely on MACIE can still drive reasonable results in such a case (see Supplementary Fig. 7 for simulated results in which MACIE is in no way associated with gene expression).
Fourthly, we focused on cis-eQTLs in the gene expression prediction models and did not consider trans-eQTLs. Furthermore, as demonstrated in one recent study [15], 3D genomic and epigenomic data provide additional information and can be used to further improve the accuracy of expression prediction models. We expect that incorporating trans-acting elements or leveraging auxiliary information from other tissues, ancestry groups, and 3D genomic and epigenomic datasets can further improve performance, though substantial additional work would be necessary. Lastly, as discussed above, causal arguments based on SUMMIT-FA (as with any TWAS method) bear scrutiny only when IV assumptions are not violated. Thus, caution and complementary methodology are suggested with regard to arguing causality. We leave these exciting topics for future studies.
Materials and methods
Predicting gene expression with penalized regression models
We create nine gene expression prediction models through penalized regression. Begin with the following model:
where Y is the N-dimensional vector of gene expression levels for a particular gene corrected for age, sex, and genetic principal components, |$X={\left({X}_1,\dots, {X}_p\right)}^{\prime }$| is the standardized genotype matrix of cis-SNPs around the gene, |$w={\left({w}_1,\dots, {w}_p\right)}^{\prime }$| is the eQTL effect size (i.e. the weights), and ϵ is mean-zero random noise. We then estimate w with a penalized regression framework.
SUMMIT-FA inherits many of the advantages of SUMMIT, including five gene expression prediction models with distinct penalties, stability from using a shrinkage estimator of the LD matrix, and the ability to combine results across models by using the Cauchy combination test [25]. Briefly, the five inherited models estimate eQTL effect sizes w by optimizing the objective function:
Through simplification and recognizing that |$\frac{X^{\prime }Y}{N}=r$|, the standardized marginal effect sizes for the cis-SNPs (the correlation between gene expression levels and these SNPs), and |$\frac{X^{\prime }X}{N}=R$|, the LD matrix, we arrive at the following objective function:
Following the SUMMIT [6], we then estimate the LD matrix R with |$\tilde{R}$|, a shrinkage estimator of a reference panel which comes from the 1000 Genomes project [58], and we estimate the standardized marginal effects r with |$\tilde{r}$|, which comes from the z-scores provided by the summary-level statistics in the eQTLGen database. We note that |$\frac{Y^{\prime }Y}{N}$| is not a function of w and can thus be safely ignored in the optimization procedure. Thus, the final objective function is:
where Jλ(·) is the main penalty term. Five penalties are employed here, LASSO [59], elastic net [60], the minimax concave penalty (MCP) [61], the smoothly clipped absolute deviation (SCAD) [62], and MNet [63]. |$\theta{w}^{\prime }w,\theta \ge 0$| is an additional L2 penalty designed to ensure a unique solution for |$\hat{w}$|. The additional penalty is included following [26], |${X}^{\prime }X$| is estimated by |$\tilde{R}$| and |${X}^{\prime }Y$| by |$\tilde{\mathrm{r}}$|, so the expression is no longer a penalized least squares problem. Therefore, the solution may be non-unique and unstable. The additional penalty regularizes the expression. We then optimize |$\tilde{f}$| via the coordinate descent algorithm [64], which is further discussed in the following section.
Incorporating MACIE functional annotations into penalized regression framework
Briefly, MACIE (Multi-dimensional Annotation-Class Integrative Estimation) is a set of multidimensional functional annotations modeling variant functionality in a composite manner [18]. MACIE scores were computed separately for (1) non-synonymous coding variants and (2) noncoding and synonymous coding variants, as the distinct types of variants are expected to have disparate functional profiles. For the non-synonymous coding variants, two scores were developed, one to assess damaging protein coding function and the other evolutionarily conserved function. To expand, non-synonymous coding variants act by changing the amino acid sequence in proteins through mutation, so the first score represents the probability of a variant damaging protein coding. The second score represents the probability of a variant being evolutionarily conserved, as functional variants are more likely to be conserved. MACIE_anyclass is the sum of these two probabilities, so, for non-synonymous coding variants, MACIE_anyclass is the probability of a variant damaging protein coding or being evolutionarily conserved.
For noncoding and synonymous coding variants, which do not directly impact the amino acid sequence of proteins, two scores were developed, one to assess epigenetic regulatory function and the other evolutionarily conserved function. The evolutionarily conserved score for these variants functions identically to the same score for the non-synonymous coding variants. The regulatory scores represent the probability that a given variant plays some role in regulating gene expression. Thus, for noncoding and synonymous coding variants, MACIE_anyclass is the probability of a variant being involved in gene regulation or being evolutionarily conserved. While the regulatory score, MACIE_regulatory, directly measures probability of being involved in regulating gene expression, variant functionality is a combination of multiple axes of action. Additionally, given that the individual annotation scores were different for the two groups of variants, we chose to use the composite score MACIE_anyclass as the overall prior probability of variant functionality in SUMMIT-FA.
That said, we did consider using MACIE_regulatory in place of MACIE_anyclass in the SUMMIT-FA models. However, across all genes, using MACIE_anyclass vs. MACIE_regulatory increased gene expression prediction R2 for 5571 genes, decreased gene expression prediction R2 for 5088 genes, and did not impact the prediction R2 for the remaining 6204, where ignoring functional annotations provided the best R2 (Supplementary Fig. 8). Thus, we felt confident that, from both biological and analytical points of view, the MACIE_anyclass set of annotations provided the best prior likelihood of variant functionality.
Proceeding with the MACIE_anyclass set of annotations, we incorporate them in two distinct ways to the model. First, we restrict the eQTL set to include variants with a likelihood greater than some threshold c of functional behavior, so the variant set becomes |$\left\{{X}_i:\mathrm{MACIE}\_{\mathrm{anyclass}}_i\ge c\right\}$|, with |$c\in \left\{0.0,0.1,0.5,0.9\right\}$|. The four thresholds lead to the four new models in SUMMIT-FA, each of which uses an elastic net penalized regression.
Second, we employ MACIE_anyclass in the elastic net penalty. From equation (1), we define the penalty for SNP i: |${J}_{\lambda}\left({w}_i\right)=\lambda \left(|{w}_i|+{w}_i^2\right){M}_i$|, which is the typical elastic net penalty with |$\alpha =0.5$| multiplied by |${M}_i$|, where |${M}_i$| is one minus the MACIE_anyclass value for SNP i. Thus, we reduce the penalty for SNPs that have, a priori, a higher likelihood of functionality. These SNPs are considered more likely to be causal and are therefore included in the gene expression models with greater frequency under this method.
The solution to the optimization, |$\hat{w}$|, is then constructed via the coordinate descent algorithm [62]. Denote by |$\left({\hat{w}}_1^{(t)},\dots, {\hat{w}}_p^{(t)}\right)$| the t-th realization of the coordinate descent process, and let |${z}_i^{(t)}={\hat{r}}_i-\sum_{j\ne i}{\hat{R}}_{ij}{\hat{w}}_i^{(t)}$|. Then the (t + 1)th update of |${\hat{w}}_i$| is
for |$i=1,\dots, p$| and |$t=0,1,2,\dots$|, where |$S\left(V,\omega \right)$| is the soft-thresholding operator
The coordinate descent algorithm then converges to a local minimum for |$\hat{w}$| [64]. Tuning parameters θ, which is restricted to the set |$\left\{0.1,0.2,\dots, 0.9\right\}$|, and |$\lambda$|, which has a solution path generated by a warm start, are chosen to maximize R2 in the tuning data.
Training and evaluating model
Gene expression predictions models were trained on eQTL summary data from eQTLGen [5], which includes effect sizes of more than 11 million SNPs derived from 31 684 blood samples. We employed only cis-SNPs, those within 1 Mbp of gene transcription start and end sites. We further filtered out any SNPs that were nonbiallelic, ambiguous, not included in the HapMap3 SNP set [8], or had minor allele frequency (MAF) < 0.01.
To choose tuning parameters, we used genotype and gene expression references from the GTEx project (version V7, dbGaP Accession number phs000424.v7.p2, https://www.gtexportal.org/home/datasets) [57]. The whole blood gene expression levels (N = 369) were processed by standardizing and normalizing RPKMs in each sample then adjusting for sex, genotyping platform, 35 PEER factors and three genotype-based principal components. Residuals were then used as the processed gene expression levels, which may be downloaded from the GTEx website. We then selected optimal tuning parameters based on the gene expression prediction R2. We note that subjects in GTEx v6 (N = 336; 1.1%) were included in eQTLGen’s meta-analysis [5], which may result in suboptimal tuning decisions.
We conducted an external validation with a fully independent set of subjects (N = 309) in GTEx v8 that were not included in GTEx v7 nor in the eQTLGen meta-analysis. Following SUMMIT [6], genes with estimated expression heritability (prediction R2) ≥ 0.005 in the validation dataset were included in downstream association analysis, as genes with low heritability may have larger causal effect sizes on complex traits [27].
Conducting association analyses
With individual-level GWAS data, we employ a generalized linear regression model
where |${X}_{\mathrm{new}}$| is the genotype data, |${Y}_{\mathrm{new}}$| is the phenotype data, and |${C}_{\mathrm{new}}$| is the covariate matrix, g is a link function, and |${X}_{\mathrm{new}}\hat{w}$| is the predicted gene expression levels. We then test |${H}_0:\beta =0$| for gene-trait associations.
With summary-level GWAS data, we apply a burden-style test
where Z is a vector of z-scores for all cis-SNPs and V is the LD matrix for all SNPs included in the analysis. V is typically estimated with an LD reference panel (such as the 1000 Genomes Project [58]).
Note that the above descriptions for conducting association analyses assumes only a single gene expression prediction model, but SUMMIT-FA includes up to nine distinct models. In the case of multiple methods building satisfactory models for a gene (i.e. gene expression prediction R2 ≥ 0.005 for that gene), each follows the aforementioned procedure to conduct an association test. Then, results are amalgamated on the gene-level via the Cauchy combination test [50], which has been widely used in the field [6, 54]. Briefly, assume K satisfactory models and consider the test statistic:
Where |${\tilde{R_i}}^2={R}_i^2/\sum_{j=1}^K{R}_j^2$| and |${p}_i$| is the p-value for the ith model. T approximately follows the standard Cauchy distribution, and the overall P-value is calculated by |$p=0.5-\arctan (T)/\pi$|. We note that the test statistic is a sum weighted by |${\tilde{R_i}}^2$|, which follows logically from the notion that larger values of |${\tilde{R_i}}^2$| come from better fitting gene expression prediction models.
Simulation setup
We demonstrated the elevated performance of SUMMIT-FA through a set of comprehensive simulation studies. Both gene expression prediction accuracy and downstream TWAS power were evaluated, and SUMMIT-FA was shown to successfully build gene expression prediction models using summary-level eQTL data. We used genotype data from 31 684 (to match the sample size of the eQTLGen data) white British subjects from the UK Biobank as training data, genotype data from 369 (to match the sample size of the GTEx v7 data) independent white British subjects from the UK Biobank as tuning data, and genotype data from 10 000 more independent white British subjects from the UK Biobank as testing data. Imputed data from 877 cis-SNPs (with MAF > 1%, Hardy-Weinberg P-value > 10−6, and imputation “info” score > 0.4) of the randomly selected gene CHURC1 were included in primary simulations. Another randomly selected gene, KRIT1, is considered in Supplementary Figs 4 and 5.
We compare SUMMIT-FA to SUMMIT [6], PrediXcan [8], TWAS-FUSION [7], and Lassosum [26] in terms of gene expression prediction accuracy and downstream power for TWAS. We considered scenarios that varied the phenotypic heritability |${h}_p^2$| (0.1, 0.2, 0.5, 0.8), expression heritability |${h}_e^2$| (0.005, 0.01, 0.1), and proportion of causal SNPs |${p}_{\mathrm{causal}}$| (0.01, 0.05, 0.1, 0.2). For each scenario, we repeated the simulations 1000 times. The statistical power was determined by the proportion of 1000 repeated simulations with association p-value less than the genome-wide significance threshold 0.05/20 000 = 2.5 × 10−6.
Gene expression levels are simulated according to |${E}_g= Xw+{\epsilon}_e$|, and phenotypes by |$=\beta{E}_g+{\epsilon}_p$|, where X is the standardized genotype matrix, w the effect size, Eg the gene expression levels, β the association coefficient, |${\epsilon}_e\sim N\left(0,1-{h}_e^2\right)$|, and |${\epsilon}_p\sim N\left(0,1-{h}_p^2\right)$|. |${h}_e^2$| is the expression heritability, the proportion of variance in gene expression explained by the SNPs, and |${h}_p^2$| is the phenotype heritability, the proportion of variance in the phenotype explained by gene expression. Given M SNPs potentially included in the model, l are selected to have nonzero effect size in order to achieve the desired |${p}_{\mathrm{causal}}$|. The l SNPs are sampled from the M total SNPs with probabilities proportional to their MACIE_anyclass scores. The effect sizes w and β were then rescaled to achieve the correct values for |${h}_e^2$| and |${h}_p^2$|. In Supplementary Fig. 7, we consider a distinct simulation scenario in which the probability a SNP receives a nonzero effect size is completely independent of its MACIE_anyclass score. To accomplish this, we follow the preceding simulations, but randomly sample the l SNPs with nonzero effect. As seen in the figure, SUMMIT-FA still achieves comparable performance to SUMMIT when MACIE provides no additional information.
We conducted type 1 error simulations based both on a single gene and 100 randomly selected genes (Supplementary Figs 3 and 6). In both cases, SUMMIT-FA controls the type 1 error rate well.
Regarding the other methods involved in the simulation, PrediXcan and TWAS-FUSION were trained on individual-level genotype data from 670 white British samples (matching the sample size of blood tissue in the GTEx v8 data). Lassosum and SUMMIT, which can employ summary-level data, were trained on the same summary-level data as that used for SUMMIT-FA. All models were compared on single-tissue eQTL information, as cross-tissue information was not our focus here. As such, we chose not to include UTMOST [10] and MR-JTI [12], as it would be disingenuous to compare them without making use of further tissues, the main contribution of these two methods. This topic is left to further research.
Comparison with existing methods
To further demonstrate the added utility provided by SUMMIT-FA, we compared it to six previous TWAS methods in whole blood tissue: SUMMIT [6], PrediXcan [8], TWAS-FUSION [7], UTMOST [10], MR-JTI [12], and Lassosum [26]. To briefly describe each, SUMMIT is the predecessor to SUMMIT-FA; it features five gene expression prediction models that are combined with the Cauchy combination test, though it does not incorporate functional annotations. PrediXcan uses the Elastic Net to create expression prediction models. TWAS-FUSION applies BLUP, BSLMM, Elastic Net, LASSO, and TOP1 in building models for expression prediction. MR-JTI and UTMOST leverage cross-tissue eQTLs when building gene expression prediction models. Lassosum is a polygenic risk score method, though it can be used to build gene expression prediction models from summary-level eQTL reference panels, which are then applied in a TWAS. SUMMIT-FA, SUMMIT, and Lassosum utilize summary-level reference datasets, while the other four TWAS methods require individual-level reference panels.
We begin by examining the gene expression prediction accuracy (R2) produced by the distinct methods. We note that while R2 is estimated in a test dataset for SUMMIT-FA, SUMMIT, and Lassosum, it is determined by cross-validation for the others, which may marginally favor PrediXcan, TWAS-FUSION, UTMOST, and MR-JTI. To demonstrate the utility of functional annotations provided by the MACIE, we tested the difference between estimated expression prediction accuracy for SUMMIT-FA and SUMMIT with a paired Wilcoxon rank-sum test, which is a nonparametric test that compares two matched samples to test whether population mean ranks differ.
We then applied the methods to analyses of 24 sets of GWAS summary statistics for complex phenotypes (detailed in Supplementary Data 1). We applied the Bonferroni correction to each method with distinct thresholds, as each method analyzed a different number of genes. In the interest of fairness, we also used a common set of genes which all models could analyze and identical Bonferroni-generated significance cutoffs. We then applied a one-sided Wilcoxon signed-rank test to the numbers of significant genes identified by the different methods.
Lastly, we compared the methods’ abilities to pinpoint causal genes that mediate associations between phenotypes and GWAS loci. We created a set of likely causal gene-trait pairs by following Barbeira et al. [28] that was independent of the GWAS results. We obtained 1287 gene-trait pairs using OMIM [29] and rare variant results from exome-wide association studies [65–67]. We employed LDetect to partition the genome into presumably independent LD blocks [68], and only included gene-trait pairs living in LD blocks with genome-wide significant variants. This resulted in 148 putatively causal pairs across 24 traits. We finally compared the methods by measuring the area under the ROC and tested whether AUC differences were significant using a one-sided DeLong test. The parametric bootstrap is also used for assessing the AUC differences and the results were similar (not shown).
Author contributions
C.W. conceived and designed the study. H.M. and Z.Z. developed the computational algorithms and wrote the SUMMIT-FA program. H.M. performed the real data analysis and simulations. H.M. tested the program and drew the workflow diagram of SUMMIT-FA. All authors wrote and proofread the manuscript.
Conflict of interest statement: The authors declare no competing interests.
Funding
National Institutes of Health (R03 AG070669) supported Z.Z. and C.W This study was conducted using the UK Biobank recourse under Application Number 48240 (https://www.ukbiobank.ac.uk/researchers/). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The authors would like to thank all the individuals for their participation in the GWAS and UK Biobank and all the researchers, clinicians, technicians and administrative staff for their contribution to the studies and for making their GWAS summary results publicly available.
Data availability
The GWAS summary data used in this study are summarized in Supplementary Data 1. The eQTL summary data are available at https://www.eqtlgen.org/cis-eqtls. html The UK Biobank is an open-access resource requiring registration, available at https://www.ukbiobank.ac.uk/researchers/. The genotype and RNA sequencing data for the GTEx project may be found at the database of Genotypes and Phenotypes (accession number phs000424.v8.p2, https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000424.v8.p2). The processed gene expression data from the GTEx project is available from the GTEx portal (https://gtexportal.org). The MR-JTI, PrediXcan, and UTMOST models may be downloaded from https://doi.org/10.5281/zenodo.3842289. The TWAS-FUSION models may be downloaded from http://gusevlab.org/projects/fusion/. The 1000 Genomes Project data may be downloaded from https://www.internationalgenome.org/data. The genetic distance data for 1000 Genomes Project may be downloaded from https://github.com/joepickrell/1000genomes-genetic-maps. The MACIE functional annotations may be downloaded from https://zenodo.org/record/5755656 (chromosomes 1–3), https://zenodo.org/record/5756449 (chromosomes 4–7), https://zenodo.org/record/5756479 (chromosomes 8–13), and https://zenodo.org/record/5756563 (chromosomes 14–22). The SUMMIT-FA models generated for this study and instructions for their use are available at OSF.IO at https://osf.io/rvpnh/. The raw data and code to replicate figures and tables in the manuscript are also available at OSF.IO at https://osf.io/rvpnh/.
Code availability
Code to use the SUMMIT-FA models and the constructed models are located at OSF.IO https://osf.io/rvpnh/. The codes and corresponding data for reproducing the results described in this study are also available on OSF.IO.
References
Author notes
Hunter J. Melton and Zichen Zhang contributed equally.