Genomic prediction of coronary heart disease

Aims Genetics plays an important role in coronary heart disease (CHD) but the clinical utility of genomic risk scores (GRSs) relative to clinical risk scores, such as the Framingham Risk Score (FRS), is unclear. Our aim was to construct and externally validate a CHD GRS, in terms of lifetime CHD risk and relative to traditional clinical risk scores. Methods and results We generated a GRS of 49 310 SNPs based on a CARDIoGRAMplusC4D Consortium meta-analysis of CHD, then independently tested it using five prospective population cohorts (three FINRISK cohorts, combined n = 12 676, 757 incident CHD events; two Framingham Heart Study cohorts (FHS), combined n = 3406, 587 incident CHD events). The GRS was associated with incident CHD (FINRISK HR = 1.74, 95% confidence interval (CI) 1.61–1.86 per S.D. of GRS; Framingham HR = 1.28, 95% CI 1.18–1.38), and was largely unchanged by adjustment for known risk factors, including family history. Integration of the GRS with the FRS or ACC/AHA13 scores improved the 10 years risk prediction (meta-analysis C-index: +1.5–1.6%, P < 0.001), particularly for individuals ≥60 years old (meta-analysis C-index: +4.6–5.1%, P < 0.001). Importantly, the GRS captured substantially different trajectories of absolute risk, with men in the top 20% of attaining 10% cumulative CHD risk 12–18 y earlier than those in the bottom 20%. High genomic risk was partially compensated for by low systolic blood pressure, low cholesterol level, and non-smoking. Conclusions A GRS based on a large number of SNPs improves CHD risk prediction and encodes different trajectories of lifetime risk not captured by traditional clinical risk scores.


Introduction
Early and accurate identification of individuals with increased risk of coronary heart disease (CHD) is critical for effective implementation of preventative lifestyle modifications and medical interventions, such as statin treatment. 1,2 To this end, risk scores such as the Framingham Risk Score (FRS) 3 and the American College of Cardiology/American Heart Association 2013 risk score (ACC/ AHA13), 1 based on clinical factors and lipid measurements, have been developed and are widely used. Although the scores can identify individuals at very high risk, a large proportion of individuals developing CHD during the next 10 years remain unidentified. In particular, they do not provide sufficient discrimination at a younger age when implementation of preventative measures is likely to provide the greatest long-term benefit.
Genetic factors have long been recognized to make a substantial contribution to CHD risk. 4 Although a positive family history is an independent risk factor for CHD, it may not completely and solely capture genetic risk. Recently, genome-wide association studies (GWAS) have identified 56 genetic loci associated with CHD at genome-wide significance. [5][6][7][8][9] Studies of the predictive power of the top single nucleotide polymorphisms (SNPs) at some of these loci either individually or in combination have typically shown small improvements in CHD risk prediction, [10][11][12][13][14][15][16][17] probably because together these variants only explain less than 20% of CHD heritability. 8 As demonstrated recently for other traits such as height and BMI, 18,19 the majority of unexplained heritability is likely hidden amongst the thousands of SNPs that did not reach genome-wide significance. Indeed, recent advances have shown that genomic prediction models that consider all available genetic variants can more efficiently stratify those at increased risk of complex disease. [20][21][22][23][24] To leverage the maximum amount of information, we examined whether a genomic risk score (GRS) comprising a large number of SNPs, including those with less than genome-wide significance, could produce clinically relevant predictive power for CHD risk.

Methods
A summary of the key methods for the study is given here. The study design is given in Figure 1. Additional details are provided in the see Supplementary material online, Supplementary Appendix.

Prospective study cohorts
We utilized two sets of prospective cohorts: (i) FINRISK, consisting of three prospective cohorts from Finland with 10-20 years of follow-up, from collections 1992, 1997, and 2002 (FR92, FR97, and FR02, respectively) 25 and (ii) the Framingham Heart Study (FHS), [26][27][28] with individuals of Western and Southern European ancestry taken from the Original and Offspring cohorts with 40-48 years of follow-up. In total, the FINRISK consisted of n ¼ 12 676 individuals and the FHS of n ¼ 3406 individuals, all of whom had the requisite data and were independent of the CARDIoGRAMplusC4D stage-2 meta-analysis utilized to generate the GRS ( Table 1). The cohorts have been genome-wide SNP genotyped and further imputed to the 1000 Genomes reference panel (see Supplementary material online, Supplementary Methods). After genotype imputation and quality control, 69 044 autosomal SNPs of the 79 128 CARDIoGRAMplusC4D SNPs were available for subsequent analyses in the FINRISK, and 78 058 autosomal SNPs available in FHS.
The outcome of interest in FINRISK was primary incident CHD event, defined as myocardial infarction (MI), a coronary revascularization procedure, or death from CHD, before age 75 years (see Supplementary ma terial online, Supplementary Methods). Individuals with prevalent cardiovascular disease (CVD) at baseline were excluded from the analysis. We censored events for individuals with an attained age of >75 years, as not all FINRISK cohorts had sufficient numbers of CHD events beyond that age. In FHS, we used the FHS definition of CHD, which included recognized/unrecognized MI or death from CHD as well as angina pectoris or coronary insufficiency (see Supplementary material online, Supplementary Methods). FHS individuals with prevalent CHD or <30 years of age at baseline were excluded, and for consistency with the FINRISK analysis, a censoring age of 75 years was also applied to the FHS analyses.
Secondary external validation of the GRS was also performed in the ARGOS study, a Dutch case/control dataset where all individuals had familial hypercholesterolemia (248 young cases with early CHD, 216 elderly controls without CHD), imputed to 1000 Genomes reference panel (74 135 SNPs of the 79 128 CARDIoGRAMplusC4D SNPs were available; see Supplementary material online, Supplementary Methods).

Statistical analysis
GRSs were generated via thinning the CARDIoGRAMplusC4D SNPs by linkage disequilibrium (LD) thresholds and evaluated using logistic regression and area under receiver-operating characteristic curve (AUC) for each threshold (see Supplementary material online, Figure S1). To avoid overfitting we only used weights (log odds) from the CARDIoGRAMplusC4D stage-2 meta-analysis, which were not based on the WTCCC-CAD or MIGen studies (see Supplementary material online, Supplementary Methods). We combined the estimates for WTCCC and MIGen-Harps using fixed-effects inverse-variance weighted metaanalysis.
Subsequent performance of the GRS was evaluated in external, independent validation data. For analysis of FINRISK, we used Cox proportional hazard models to evaluate the association of the GRS with time to incident CHD events, stratifying by sex and adjusting for geographic location and cohort, using age as the time scale. Secondary analyses adjusted for one of the clinical risk scores (FRS or ACC/AHA13), or individual baseline variables and known risk factors (cohort, geographical location, prevalent type-2 diabetes, log total cholesterol, log HDL, log systolic BP, smoking status, lipid treatment, and family history). Family history in FINRISK was self-reported and was defined as having a 1st-degree relative who had experienced MI before age 60. For FHS, we evaluated the association of the GRS with incident CHD using Cox proportional hazard models, stratifying by sex and adjusting for cohort (Original or Offspring), using age as the time scale. Family history was not available for both FHS cohorts and thus not considered in FHS analyses. Survival analyses allowing for competing risks were performed using the Aalen-Johansen estimator of survival and cause-specific Cox models (see Supplementary material online, Supplementary Methods). Model discrimination of incident CHD event was evaluated in three groups of individuals: (i) all individuals (n ¼ 12 676 in FINRISK, n ¼ 3406 in FHS), (ii) individuals aged <60 years at baseline (n ¼ 10 606 in FINRISK, n ¼ 3218 in FHS), and (iii) individuals aged !60 years at baseline (n ¼ 2070 in FINRISK, n ¼ 188 in FHS).
Discrimination of incident CHD events within 10 years was assessed using Harrell's C-index, and the difference in C-index between two models was assessed using the correlated jackknife test. Competing risk analyses were performed using the Aalen-Johansen empirical estimator of cumulative incidence and cause-specific Cox proportional hazard models. Risk reclassification was evaluated using continuous Net Reclassification Improvement (NRI), categorical NRI, and Integrated Discrimination Improvement. Meta-analysis of the discrimination statistics was

Results
To construct an optimized GRS using the WTCCC and MIGen-Harps datasets, we first generated a series of GRSs, starting with the 79 128 CARDIoGRAMplusC4D SNPs then progressively lowering the r 2 threshold for LD to reduce the redundancy of predictive information and corresponding number of SNPs in the score (Methods and Figure 1). An r 2 threshold of 0.7 provided optimal discrimination of CHD cases and controls (WTCCC and MIGen-Harps meta-analysis odds ratio(OR) ¼ 1.70 per S.D. of GRS, 95% confidence interval (CI 1.61-1.80; meta-analysis AUC ¼ 0.64, 95% CI 0.63-0.66), corresponding to 49 310 SNPs in WTCCC (see Supplementary material online, Figure S1).  Figure S2.
Using survival analyses of time to incident CHD, within FINRISK the GRS had stronger association with CHD (HR ¼ 1.74, 95% CI 1.61-1.86, per S.D.) than the 28 SNP score studied by Tikkanen et al. 11 Figures S3 and S4).
The correlation between GRS and either FRS or ACC/AHA13 scores was close to zero with almost none of the variation in GRS explained by either clinical risk score (in both FINRISK and FHS, r 2 < 0. 004 between GRS and either FRS and ACC/AHA13; see Supplementary material online, Figure S5). To further test that the CHD risk conferred by the GRS was largely independent of the effects of cholesterol, we further validated the GRS in the ARGOS familial hypercholesterolemia study, with comparable results to those obtained in WTCCC/MIGen (OR ¼ 1.49, 95% CI 1.21-1.84 per S.D. of the GRS, adjusted for sex and five principal components) (see Supplementary material online, Supplementary Methods).
To assess the predictive power of the GRS, we compared its performance in discrimination of time to CHD event (C-index) with that of family history and the FRS and ACC/AHA13 clinical risk scores. We also assessed the incremental value of the GRS on top of the clinical risk scores. In both FINRISK and FHS, addition of GRS to either FRS or ACC/AHA13 scores provided statistically significant improvements in C-index, in FINRISK:þ1.7% (P < 10 À 6 ) andþ1.6% (P < 10 À 6 ) for FRS and ACC/AHA13, respectively; in FHS: þ1.1% (P < 0.0443) and þ1.1% (P < 0.0344) for FRS and ACC/AHA13, respectively ( Figure 2). Overall, fixed-effects meta-analysis of the two studies showed that GRS improved the C-index by þ1.6% (95% CI 0.01-0.02, P < 10 À 6 ; heterogeneity: for FRS and GRS combined (FRS þ GRS) over FRS alone and, similarly, þ1.5% (95% CI 0.009-0.02, P < 10 À 6 ; heterogeneity: Figure 2). Larger increases in C-index were observed among older individuals, with the C-index of FRS þ GRS compared with FRS alone increasing by 5.1% in individuals aged !60 years at baseline, while individuals aged <60 years at baseline showed Cindex gains of 1.4% (see Supplementary material online, Figure S6). Within FINRISK, the GRS had higher C-index than family history (þ1. 9%, P < 1.3 Â 10 À 6 ).
We assessed if the GRS improved the individual 10 years risk reclassification when added to clinical risk scores. Analyses within FINRISK and FHS are given in Table 3 for FRS and in Table 4 for ACC/ AHA13. Overall, meta-analysis of the two datasets showed that the categorical Net Reclassification Improvement was 0.1 for both FRS þ GRS and ACC/AHA13 þ GRS, respectively (P < 0.0001; see Supplementary material online, Figure S7). Meta-analysis of continuous NRI was 0.344 (P < 0.001) and 0.334 (P < 0.001) for the FRS þ GRS and ACC/AHA13 þ GRS, respectively (see Supplementary material online, Figure S8). Meta-analysis of IDI scores showed gains of 0.01 (P < 0.001) and 0.009 (P < 0.001) for FRS þ GRS and ACC/AHA13 þ GRS, respectively, however IDI scores showed high heterogeneity across FINRISK and FHS (I 2 > 97%, Cochran's Q P < 0.0001, see Supplementary material online, Figure S9).
We next examined how variation in genomic risk translated into differences in cumulative lifetime risk of CHD, using Kaplan-Meier estimates stratified by GRS quintiles for men and women separately ( Figure 3). As expected, cumulative risk increased with age for both sexes, with men displaying higher absolute risk than women. In both sexes there were substantial differences in cumulative risk between GRS groups with 1.7-fold (in FHS) to 3.2-fold (in FINRISK) higher cumulative risk by age 75 in those in the top quintile of GRS vs. bottom quintile. When considering clinically relevant levels of risk, FINRISK men in the top quintile of genomic risk achieved 10% cumulative risk 18 years earlier than those in the bottom quintile (ages 52 and 70, respectively), with a comparable difference of 12 years in FHS (ages 51 and 64). Women in the top quintile of genomic risk achieved 10% cumulative risk by age 69 (FINRISK) and 64 (FHS), whereas women in the bottom quintile did not achieve 10% risk by age 75 in FINRISK, or by age 73 in FHS. Estimated lifetime CHD risk in FINRISK showed no evidence of being affected by competing risks (incident CHD vs. non-CHD death) (see Supplementary material online, Supplementary Figure S10). Similarly, a cause-specific competing-risk Cox analysis of the GRS in FINRISK, adjusting for geographical location and cohort, resulted in a similar hazard ratio as standard Cox analysis (HR ¼ 1.70, 95% CI 1.61-1.86).
Notably, in both FINRISK and FHS, women in the bottom quintile of genomic risk showed smaller differences in cumulative CHD risk when stratified by smoking. For tertiles of systolic blood pressure or total cholesterol, low genomic risk women in FINRISK showed similarly small differences in risk, but the effects in FHS for this subgroup were not consistent. Cox models allowing for interactions between the GRS and systolic blood pressure or total cholesterol did not show statistically significant interactions in either FINRISK or FHS (P > 0.2 for all).
A distinctive feature of our analysis compared with several previous prospective studies 11,29,30 examining the predictive utility of GRS for incident CHD is that the best predictive model was achieved here with SNPs that did not necessarily reach genome-wide or even statistical significance in previous GWA studies. The GRS outperformed other smaller SNP models, and shows greater promise in CHD prediction between top and bottom GRS quintiles than a recently published study testing a genetic risk score of 50 SNPs in Scandinavians 30 (GRS50 HR ¼ 1.92 vs. GRS49K HR ¼ 4.51). Genome-wide SNP models have been applied successfully to other heritable human traits which seem to follow an "infinitesimal" genetic architecture, such as height. 18 These results highlight the differing goals of GWAS and of genomic prediction: the stringent detection of causal genetic variants involved in the disease process vs. the construction of a model that robustly and maximally predicts future disease. While stringent procedures for minimizing the false positive rate of associated loci in GWAS are appropriate, these concerns are less relevant in construction of GRSs, especially when there are a large number of weakly correlated SNPs 20 and when rigorous internal and external validation is performed.
While population stratification is a potential confounder of genomic prediction studies, our use of a large worldwide multi-ethnic meta-analysis to develop the GRS together with two fully independent prospective validation datasets and three independent case/control datasets minimizes this potential. Our GRS was constructed from the CARDIoGRAMplusC4D stage-2 meta-analysis and the FINRISK and FHS individuals are both independent of that study and of broadly European ancestry; thus it is unlikely that the GRS is substantially confounded by fine-scale population structure within these cohorts. Further, the LD-thinning threshold to maximize prediction was determined in the WTCCC and MIGen datasets prior to applying the GRS to ARGOS, FINRISK, or FHS. Nevertheless, for some measures, GRS gains were less pronounced in FHS than in FINRISK. This may partly be due to the different definitions of CHD in these studies, to differences in environmental exposures, or to differences in genetic effects. 31 In addition, the FRS was developed in the FHS, leading to potential over-estimation of its association with CHD in the current analysis. Hence, there may be benefit from future development of population-specific GRSs, which may yield greater predictive power within each population.
The association of the GRS with incident CHD was not substantially attenuated by traditional risk factors or clinical risk scores derived from these risk factors. Furthermore, the GRS was strongly associated with CHD in a study consisting purely of individuals with familial hypercholesterolemia. These results suggest that genomic risk exerts its effect on CHD risk through molecular pathways that are largely independent of the effects of cholesterol, systolic blood pressure, and smoking. A hitherto unresolved question has been the extent to which a family history would capture any information that may be provided through genetic analysis. Here, we clearly demonstrate the superior performance of direct genetic information over self-reported family history of CHD, which is often incomplete and imprecise in practice and is influenced by family size and competing causes of death.
While we observed improvements in discrimination (C-index) resulting from adding the GRS to the clinical risk scores when considering adults of all ages, the improvements were substantially higher in older individuals (>60 years old). Rather than being driven by age-related differences in the effect of the GRS, these results are likely driven by differences in the clinical risk scores between the younger and older adults. Unlike the GRSs, the clinical risk scores showed substantial differences across ages, driven by temporal changes in the underlying risk factors as well as age itself. Beyond the aims of identifying older adults with high CHD risk, the invariance of genomic risk makes it particularly useful for CHD risk prediction earlier in life, in young adulthood or before, when traditional risk factors are typically not measured and less likely to be informative of risk later in life.
Our analyses focused on two clinical scores, the FRS and ACC/ AHA13. While other scores exist, for example the SCORE system, 32 we elected to use the FRS and ACC/AHA13 due to their widespread use and the fact that the FINRISK cohorts were a major contributor to the SCORE analysis, potentially biasing the analysis in FINRISK, in the same way that FRS seems to be biased towards the FHS, inflating its predictive power of the clinical risk scores there relative to the reference model.
Stratifying individual baseline smoking, systolic blood pressure, and total cholesterol levels measures into genomic risk groups revealed substantial differences in cumulative risk patterns. Importantly, this demonstrates that improved lifestyle may compensate for the innate increased CHD risk captured by the GRS. For men with high genomic Genomic prediction of coronary heart disease risk, modifiable risk factors showed large effects on cumulative CHD risk. For women, the observed impacts of smoking, systolic blood pressure, and total cholesterol were low or not detectable in the low genomic risk group, particularly in FINRISK, however, we could not determine whether this was due to inadequate statistical power or other biological effects and further studies in larger cohorts of women are necessary to determine any clinical implications.
Our results, if validated in further studies and across different populations, suggest a potential paradigmatic shift in the current CHD screening strategy which has existed for over 40 yearsnamely determination of genomic risk at an early stage with screening later in life through traditional clinical risk scores to complement background genomic risk. Based on early genomic risk stratification, individuals at higher risk may benefit from earlier engagement with nutritionists, exercise regimes, smoking cessation programs or be initiated early on medical interventions such as statin therapy or bloodpressure lowering medications to minimize future CHD risk. In this context it is notable that Mega et al. 29 recently demonstrated that the GRS of 27 CHD-associated SNPs better predicted which individuals would benefit most, both in relative and absolute terms, from statin treatment. In a study of type 2 diabetes, Florez et al. 32 has shown that the effects of increased genetic susceptibility to disease can be ameliorated by lifestyle (diet and exercise) and therapeutic (metformin) interventions. Similar possibilities exist for CHD, whereby early targeted prevention strategies based on genomic CHD risk may be implemented well in advance of clinical risk scores attaining predictive capacity at later ages. 33 Such early risk stratification will offer increased efficiency in allocating both therapeutic resources and lifestyle modifications with the potential for subsequent delay of onset of traditional risk factors and incident CHD risk.
While our study demonstrates both the independent and incremental predictive power provided by our GRS, it is important to note that even when combined with such scores, the overall positive predictive value still remains modest for an acceptable negative predictive value (see Supplementary material online, Figure S14a). Furthermore, despite overall improved reclassification of 10 years risk, some individuals who went on to develop an incident event were reclassified at a lower risk by the addition of the GRS compared with their initial classification using a clinical score (Tables 3 and 4), emphasizing the limitations of the current GRS. The magnitude of the GRS effect was weaker in FHS than in the other datasets examined (FINRISK, WTCCC-CAD, MIGen-Harps, and ARGOS; Table 2). In addition to potential technical and clinical FHS differences discussed above, these results suggest that the benefit and clinical utility of the GRS may vary between populations; further evaluation in large prospective studies of varying ancestry will be required in order to assess these differences and how best to account for them in risk prediction. In this context, it should be noted that our GRS based on a starting list of 79 128 common SNPs tested by the CARDIOGRAMplusC4D consortium could be further improved. Future studies that construct GRSs using increased sample sizes and capturing the full spectrum of common and rare variants 9,34 will likely provide additional gains in prediction and risk stratification.
In summary, this study has demonstrated the potential clinical utility of genome-scale GRS for CHD, both for early identification of individuals at increased CHD risk and for complementing existing clinical risk scores. Given recent advances and reduced cost of genotyping microarrays and sequencing-based technologies and their cost efficiency, determination of genome-wide SNP variants (including the 49 310 SNPs used here) is no longer beyond the realm of clinical application. In terms of technical feasibility, genome-wide genotyping of hundreds of thousands of SNPs is now both reliable and cost effective (<US$70 in bulk), and clinically certified genotyping services are now becoming available. Statistical SNP imputation will further expand the number of SNPs to an order of several million. Additionally, germline genotyping is a one-time cost for each individual. Further validation and cost-benefit analyses will be required in order to establish how this technology is deployed in clinical settings.

Supplementary material
Supplementary material is available at European Heart Journal online. Cardiovascular Research, NordForsk e-Science NIASC (grant no 62721) and Biocentrum Helsinki (to SR). The MI Genetics (MIGen) Consortium Study was funded by the National Heart, Lung, and Blood Institute of the United States National Institutes of Health (R01 HL087676). Genotyping was partially funded by The Broad Institute Center for Genotyping and Analysis, which was supported by grant U54 RR02027 from the National Center for Research Resources. This study makes use of data generated by the Wellcome Trust Case-Control Consortium. A full list of the investigators who contributed to the generation of the data is available from www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113 and 085475. The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 261433 (Biobank Standardisation and Harmonisation for Research Excellence in the European Union-BioSHaRE-EU). We are grateful to the CARDIoGRAMplusC4D consortium for making their large-scale genetic data available. A list of members of the consortium and the contributing studies is available at www.cardiogramplusc4d.org.
The FHS dataset was obtained from dbGaP (phs000007), approved by the University of Melbourne Health Sciences Human Ethics Sub-Committee (HREC 1442186).