-
PDF
- Split View
-
Views
-
Cite
Cite
Samuel A Lambert, Gad Abraham, Michael Inouye, Towards clinical utility of polygenic risk scores, Human Molecular Genetics, Volume 28, Issue R2, 15 October 2019, Pages R133–R142, https://doi.org/10.1093/hmg/ddz187
- Share Icon Share
Abstract
Prediction of disease risk is an essential part of preventative medicine, often guiding clinical management. Risk prediction typically includes risk factors such as age, sex, family history of disease and lifestyle (e.g. smoking status); however, in recent years, there has been increasing interest to include genomic information into risk models. Polygenic risk scores (PRS) aggregate the effects of many genetic variants across the human genome into a single score and have recently been shown to have predictive value for multiple common diseases. In this review, we summarize the potential use cases for seven common diseases (breast cancer, prostate cancer, coronary artery disease, obesity, type 1 diabetes, type 2 diabetes and Alzheimer’s disease) where PRS has or could have clinical utility. PRS analysis for these diseases frequently revolved around (i) risk prediction performance of a PRS alone and in combination with other non-genetic risk factors, (ii) estimation of lifetime risk trajectories, (iii) the independent information of PRS and family history of disease or monogenic mutations and (iv) estimation of the value of adding a PRS to specific clinical risk prediction scenarios. We summarize open questions regarding PRS usability, ancestry bias and transferability, emphasizing the need for the next wave of studies to focus on the implementation and health-economic value of PRS testing. In conclusion, it is becoming clear that PRS have value in disease risk prediction and there are multiple areas where this may have clinical utility.
Introduction
A multitude of human traits and diseases are heritable to varying degrees. Further, the genetic basis for many such traits has been established as polygenic—explained by the contributions of many genes, each with moderate or weak contribution to the trait, in contrast to Mendelian traits which are caused by variation in one gene or a small set of genes with large effect. The combination of large-scale genome variation projects, such as the HapMap (1) and 1000 Genomes projects (2), together with low-cost robust genotyping platforms, has enabled genome-wide association studies (GWAS) on large cohorts. GWAS have focused on identifying disease- or trait-associated genetic variants (typically single nucleotide polymorphisms [SNPs]) which are common in a given population (e.g. minor allele frequency [MAF] > 1%). To date, GWAS have identified thousands of loci that are associated with a range of complex human traits and diseases, including cardiovascular diseases, cancers, obesity and Alzheimer’s disease (3). These data have provided numerous insights into the genes and pathways that cause disease, but more recently the use of these data for disease risk prediction has gained interest (4–6).
There are numerous considerations related to the data and methods used to develop and validate a PRS (see (8,9) for details). Here, we briefly summarize approaches that use GWAS summary statistics (alleles and effect sizes and/or P-value) (10) rather than individual-level genotypes, although the principles are broadly similar. Initially, PRS tended to be constructed from genome-wide significant SNPs (typically, P < 5 × 10−8), which for many diseases led to weakly predictive PRS as the number of genome-wide SNPs was small (11,12). In contrast with GWAS, which was designed for detecting SNPs associated with the disease while maintaining a low false positive rate, the task of prediction allows for methods with a more lenient signal to noise trade-off. Thus, more powerful PRS can typically be constructed by incorporating larger numbers of SNPs; however, there is a trade-off between using a small number of SNPs with precise effect estimates and a large number of SNPs with increasingly noisy effect size estimates. There is no universal set of parameters for this trade-off, as they depend on the genetic architecture of the disease, genotyping density and sample size. In practice, a training set comprising individual-level genotypes and phenotypes is often used to optimize the PRS. Using an independent validation dataset, or cross-validation, allows unbiased estimation of the predictive performance, avoiding optimism due to overfitting. Generally, once predictive performance plateaus or declines in the validation set, the optimal trade-off of signal and noise has been reached.
Another consideration is linkage disequilibrium (LD), the correlations between nearby SNPs, which leads to over-representation of high LD regions in the model, thus potentially reducing its predictive performance. Common methods for constructing PRS include LD pruning (randomly removing one SNP from a pair in high LD), P-value thresholding and clumping (pruning by LD while referentially retaining more significantly-associated SNPs), as well as more complex methods that explicitly account for LD, such as LDpred (13) and lassosum (14). The result of PRS development is the set of SNPs and effect sizes that can be applied to an independent sample.
After the PRS has been constructed, it is essential to assess its predictive performance with the disease of interest in an external cohort, one not used for the underlying GWAS or for tuning the PRS. The accuracy of a PRS is bounded by the disease’s heritability (total amount of disease variance that can be explained by genetics) and current PRS agree with estimates from theory (see Box 1). For polygenic scores of quantitative traits, the effect size per standard deviation (SD) change is usually reported, as well as the proportion of variance explained (R2) by the score. However, as most diseases are binary outcomes, the effect sizes are expressed as odds ratios (OR) or hazard ratios, depending on the study design (case/control vs. prospective) and the availability of age at event. The model’s performance can be measured using variance explained (Nagelkerke’s or pseudo-R2) or classification accuracy using area under the receiver-operating characteristic curve (AUC), the area under the precision-recall curve or Harrell’s C-index (15). However, caution must be exercised when interpreting prediction metrics such as AUC or C-index without sufficient context; even small increases in these metrics can lead to several percent of the population being reclassified into different risk categories, changing their clinical management. Further, these metrics do not take into account the costs and benefits of various clinical decisions (e.g. use of statins), which can only be done within a public health and health-economic framework. In addition, when comparing metrics across studies, it is important to note that the ancestry as well as study design (e.g. covariates included in the risk model) can affect these measures (as well as the SD of the PRS).
Under an additive genetic liability threshold model, by assuming several key quantities, including population prevalence K, heritability h2 (on the liability scale) (and/or the sibling recurrence risk λs), we can derive the expected predictive power of a PRS, measured in sensitivity, specificity, AUC and other quantities (82,83).
The adjacent figure shows simulation results for two scenarios relevant to CAD (assuming a population prevalence K = 0.05 and h2 = 0.5): (a) a PRS explaining 10% of the phenotypic variance, similar to the results achieved by the latest CAD PRS (27) and (b) the results for a PRS explaining all the known heritability of CAD (50% of the phenotypic variance). Clearly, as the PRS explains more of the heritability, there is greater separation between the average scores of cases and non-cases (quantified by the AUC) and corresponding effect sizes (ORstdev). For a disease such as CAD, the expected AUC from a PRS explaining all of the known heritability is 0.9. For scenario (i), the top 5% of the population will have an average absolute (lifetime) CAD risk of 15%, but for scenario (ii) this goes up to a risk of 40%, and the top 15% of the population have a risk of >10%.

Note that the genetic liability threshold model does not have direct bearing on how to increase the heritability explained by PRS, only what are the consequences of the increase. To increase the explained heritability, we will likely need larger GWAS sample sizes (84,85), together with wider genotyping of rarer genetic variants, such as via whole-genome sequencing (86,87). In the absence of larger sample sizes, multi-trait prediction models can also be used to make small but consistent gains in predictive power (88,89).
Potential for PRS utility
We reviewed the literature for seven well-studied diseases where PRS could potentially have clinical value. These diseases include coronary artery disease (CAD), diabetes (types 1 and 2), obesity (and body mass index [BMI]), breast cancer, prostate cancer and Alzheimer’s disease. Table 1 summarizes information about each disease, their conventional risk factors, potential uses of a PRS and recent references evaluating the clinical use of PRS in each case. A common theme is the expectation that the utility of PRS will be to predict future disease risk or identify those most at risk and use this information to target treatments or alter screening paradigms. In this section, we elaborate on two examples where the clinical benefit of a PRS has been suggested: (i) providing better CAD risk estimates to guide treatments and (ii) the potential to target screening to populations at high risk of prostate cancer.
Cardiometabolic traits/diseases . | ||
---|---|---|
Obesity & BMI | ||
Risk factors | Mendelian risk factors | MC4R mutations |
Other factors | Age, sex and family history Lifestyle: diet and physical activity | |
Potential clinical utility for PRS | • Targeting lifestyle interventions and potential treatments (e.g. bariatric surgery) to those at most risk of developing obesity • BMI PRS is enriched in those who have undergone bariatric surgery in UK Biobank (32) • Predicting weight gain trajectories (32,90,91) • Useful as a risk predictor of other diseases where obesity is a causal risk factor (79) | |
CAD | ||
Risk factors | Mendelian risk factors | FH mutations: LDLR, APOB, PCSK9 |
Other factors | Age, sex and family history Systolic blood pressure, LDL or non-HDL cholesterol and BMI Lifestyle: smoking, diet and physical activity | |
Potential clinical utility for PRS | • Adds accuracy to clinical risk predictors (e.g. Framingham risk score, ACC/AHA13 (16)) • Useful for defining most benefit from statin prescription (17,18) • Useful for estimating lifetime risk trajectories (27,56) | |
Diabetes (Type 1) | ||
Risk factors | Mendelian risk factors | Maturity onset diabetes of the young (MODY) related genes HLA susceptibility alleles |
Other factors | Age, sex and family history DPT-1 metabolic risk score: BMI, glucose and C-peptide | |
Potential clinical utility for PRS | • Predicting at-risk children who are most likely to progress to disease (55,75,92,93) • Discriminating between Type 1 and 2 diabetes (93) | |
Diabetes (Type 2) | ||
Risk factors | Mendelian risk factors | Undetermined |
Other factors | Age, sex and family history BMI, waist circumference, waist-hip ratio, history of hypertension and history of high blood glucose Lifestyle: smoking, diet, physical activity level | |
Potential clinical utility for PRS | • Adding additional stratification to already accurate risk models (e.g. age, sex and BMI) (94) • Estimating lifetime risk trajectories (94) | |
Cancers | ||
Breast cancer | ||
Risk factors | Mendelian risk factors | Pathogenic BRCA1/2 mutations Lower risk pathogenic variants: PALB2, ATM and CHEK2 |
Other factors | Age, sex and family history Age at menarche, age at menopause, nulliparity and age at first childbirth, BMI and hormone replacement therapy | |
Potential clinical utility for PRS | • Currently implemented within the BOADICEA risk model (33) • Added to other models including: Gail, Tyrer-Cusick, BCSC, BI-RADS, Rosner-Colditz and NCI (29,49,53) • Has value when included in risk models that can be applied to study risk-based vs. age-based screening programs (26) • Disease subtyping: can be used to estimate genetic risk for ER-positive or negative breast cancer separately (28) | |
Prostate cancer | ||
Risk factors | Mendelian risk factors | Pathogenic BRCA1/2 mutations |
Other factors | Age, sex and family history | |
Potential clinical utility for PRS | • Improve predictions for risk-based screening and target PSA test to those with higher genetic risk (22) • PPV of the PSA test increases with genetic risk (23) | |
Other | ||
Alzheimer’s disease | ||
Risk factors | Mendelian risk factors | APOE ε4 and ε2 alleles |
Other factors | Age, sex and family history | |
Potential clinical utility for PRS | • Current polygenic scores can explain the majority of heritability for common variants (95) • Polygenic hazard scores (PHS) to estimate age-of-onset (30) |
Cardiometabolic traits/diseases . | ||
---|---|---|
Obesity & BMI | ||
Risk factors | Mendelian risk factors | MC4R mutations |
Other factors | Age, sex and family history Lifestyle: diet and physical activity | |
Potential clinical utility for PRS | • Targeting lifestyle interventions and potential treatments (e.g. bariatric surgery) to those at most risk of developing obesity • BMI PRS is enriched in those who have undergone bariatric surgery in UK Biobank (32) • Predicting weight gain trajectories (32,90,91) • Useful as a risk predictor of other diseases where obesity is a causal risk factor (79) | |
CAD | ||
Risk factors | Mendelian risk factors | FH mutations: LDLR, APOB, PCSK9 |
Other factors | Age, sex and family history Systolic blood pressure, LDL or non-HDL cholesterol and BMI Lifestyle: smoking, diet and physical activity | |
Potential clinical utility for PRS | • Adds accuracy to clinical risk predictors (e.g. Framingham risk score, ACC/AHA13 (16)) • Useful for defining most benefit from statin prescription (17,18) • Useful for estimating lifetime risk trajectories (27,56) | |
Diabetes (Type 1) | ||
Risk factors | Mendelian risk factors | Maturity onset diabetes of the young (MODY) related genes HLA susceptibility alleles |
Other factors | Age, sex and family history DPT-1 metabolic risk score: BMI, glucose and C-peptide | |
Potential clinical utility for PRS | • Predicting at-risk children who are most likely to progress to disease (55,75,92,93) • Discriminating between Type 1 and 2 diabetes (93) | |
Diabetes (Type 2) | ||
Risk factors | Mendelian risk factors | Undetermined |
Other factors | Age, sex and family history BMI, waist circumference, waist-hip ratio, history of hypertension and history of high blood glucose Lifestyle: smoking, diet, physical activity level | |
Potential clinical utility for PRS | • Adding additional stratification to already accurate risk models (e.g. age, sex and BMI) (94) • Estimating lifetime risk trajectories (94) | |
Cancers | ||
Breast cancer | ||
Risk factors | Mendelian risk factors | Pathogenic BRCA1/2 mutations Lower risk pathogenic variants: PALB2, ATM and CHEK2 |
Other factors | Age, sex and family history Age at menarche, age at menopause, nulliparity and age at first childbirth, BMI and hormone replacement therapy | |
Potential clinical utility for PRS | • Currently implemented within the BOADICEA risk model (33) • Added to other models including: Gail, Tyrer-Cusick, BCSC, BI-RADS, Rosner-Colditz and NCI (29,49,53) • Has value when included in risk models that can be applied to study risk-based vs. age-based screening programs (26) • Disease subtyping: can be used to estimate genetic risk for ER-positive or negative breast cancer separately (28) | |
Prostate cancer | ||
Risk factors | Mendelian risk factors | Pathogenic BRCA1/2 mutations |
Other factors | Age, sex and family history | |
Potential clinical utility for PRS | • Improve predictions for risk-based screening and target PSA test to those with higher genetic risk (22) • PPV of the PSA test increases with genetic risk (23) | |
Other | ||
Alzheimer’s disease | ||
Risk factors | Mendelian risk factors | APOE ε4 and ε2 alleles |
Other factors | Age, sex and family history | |
Potential clinical utility for PRS | • Current polygenic scores can explain the majority of heritability for common variants (95) • Polygenic hazard scores (PHS) to estimate age-of-onset (30) |
Cardiometabolic traits/diseases . | ||
---|---|---|
Obesity & BMI | ||
Risk factors | Mendelian risk factors | MC4R mutations |
Other factors | Age, sex and family history Lifestyle: diet and physical activity | |
Potential clinical utility for PRS | • Targeting lifestyle interventions and potential treatments (e.g. bariatric surgery) to those at most risk of developing obesity • BMI PRS is enriched in those who have undergone bariatric surgery in UK Biobank (32) • Predicting weight gain trajectories (32,90,91) • Useful as a risk predictor of other diseases where obesity is a causal risk factor (79) | |
CAD | ||
Risk factors | Mendelian risk factors | FH mutations: LDLR, APOB, PCSK9 |
Other factors | Age, sex and family history Systolic blood pressure, LDL or non-HDL cholesterol and BMI Lifestyle: smoking, diet and physical activity | |
Potential clinical utility for PRS | • Adds accuracy to clinical risk predictors (e.g. Framingham risk score, ACC/AHA13 (16)) • Useful for defining most benefit from statin prescription (17,18) • Useful for estimating lifetime risk trajectories (27,56) | |
Diabetes (Type 1) | ||
Risk factors | Mendelian risk factors | Maturity onset diabetes of the young (MODY) related genes HLA susceptibility alleles |
Other factors | Age, sex and family history DPT-1 metabolic risk score: BMI, glucose and C-peptide | |
Potential clinical utility for PRS | • Predicting at-risk children who are most likely to progress to disease (55,75,92,93) • Discriminating between Type 1 and 2 diabetes (93) | |
Diabetes (Type 2) | ||
Risk factors | Mendelian risk factors | Undetermined |
Other factors | Age, sex and family history BMI, waist circumference, waist-hip ratio, history of hypertension and history of high blood glucose Lifestyle: smoking, diet, physical activity level | |
Potential clinical utility for PRS | • Adding additional stratification to already accurate risk models (e.g. age, sex and BMI) (94) • Estimating lifetime risk trajectories (94) | |
Cancers | ||
Breast cancer | ||
Risk factors | Mendelian risk factors | Pathogenic BRCA1/2 mutations Lower risk pathogenic variants: PALB2, ATM and CHEK2 |
Other factors | Age, sex and family history Age at menarche, age at menopause, nulliparity and age at first childbirth, BMI and hormone replacement therapy | |
Potential clinical utility for PRS | • Currently implemented within the BOADICEA risk model (33) • Added to other models including: Gail, Tyrer-Cusick, BCSC, BI-RADS, Rosner-Colditz and NCI (29,49,53) • Has value when included in risk models that can be applied to study risk-based vs. age-based screening programs (26) • Disease subtyping: can be used to estimate genetic risk for ER-positive or negative breast cancer separately (28) | |
Prostate cancer | ||
Risk factors | Mendelian risk factors | Pathogenic BRCA1/2 mutations |
Other factors | Age, sex and family history | |
Potential clinical utility for PRS | • Improve predictions for risk-based screening and target PSA test to those with higher genetic risk (22) • PPV of the PSA test increases with genetic risk (23) | |
Other | ||
Alzheimer’s disease | ||
Risk factors | Mendelian risk factors | APOE ε4 and ε2 alleles |
Other factors | Age, sex and family history | |
Potential clinical utility for PRS | • Current polygenic scores can explain the majority of heritability for common variants (95) • Polygenic hazard scores (PHS) to estimate age-of-onset (30) |
Cardiometabolic traits/diseases . | ||
---|---|---|
Obesity & BMI | ||
Risk factors | Mendelian risk factors | MC4R mutations |
Other factors | Age, sex and family history Lifestyle: diet and physical activity | |
Potential clinical utility for PRS | • Targeting lifestyle interventions and potential treatments (e.g. bariatric surgery) to those at most risk of developing obesity • BMI PRS is enriched in those who have undergone bariatric surgery in UK Biobank (32) • Predicting weight gain trajectories (32,90,91) • Useful as a risk predictor of other diseases where obesity is a causal risk factor (79) | |
CAD | ||
Risk factors | Mendelian risk factors | FH mutations: LDLR, APOB, PCSK9 |
Other factors | Age, sex and family history Systolic blood pressure, LDL or non-HDL cholesterol and BMI Lifestyle: smoking, diet and physical activity | |
Potential clinical utility for PRS | • Adds accuracy to clinical risk predictors (e.g. Framingham risk score, ACC/AHA13 (16)) • Useful for defining most benefit from statin prescription (17,18) • Useful for estimating lifetime risk trajectories (27,56) | |
Diabetes (Type 1) | ||
Risk factors | Mendelian risk factors | Maturity onset diabetes of the young (MODY) related genes HLA susceptibility alleles |
Other factors | Age, sex and family history DPT-1 metabolic risk score: BMI, glucose and C-peptide | |
Potential clinical utility for PRS | • Predicting at-risk children who are most likely to progress to disease (55,75,92,93) • Discriminating between Type 1 and 2 diabetes (93) | |
Diabetes (Type 2) | ||
Risk factors | Mendelian risk factors | Undetermined |
Other factors | Age, sex and family history BMI, waist circumference, waist-hip ratio, history of hypertension and history of high blood glucose Lifestyle: smoking, diet, physical activity level | |
Potential clinical utility for PRS | • Adding additional stratification to already accurate risk models (e.g. age, sex and BMI) (94) • Estimating lifetime risk trajectories (94) | |
Cancers | ||
Breast cancer | ||
Risk factors | Mendelian risk factors | Pathogenic BRCA1/2 mutations Lower risk pathogenic variants: PALB2, ATM and CHEK2 |
Other factors | Age, sex and family history Age at menarche, age at menopause, nulliparity and age at first childbirth, BMI and hormone replacement therapy | |
Potential clinical utility for PRS | • Currently implemented within the BOADICEA risk model (33) • Added to other models including: Gail, Tyrer-Cusick, BCSC, BI-RADS, Rosner-Colditz and NCI (29,49,53) • Has value when included in risk models that can be applied to study risk-based vs. age-based screening programs (26) • Disease subtyping: can be used to estimate genetic risk for ER-positive or negative breast cancer separately (28) | |
Prostate cancer | ||
Risk factors | Mendelian risk factors | Pathogenic BRCA1/2 mutations |
Other factors | Age, sex and family history | |
Potential clinical utility for PRS | • Improve predictions for risk-based screening and target PSA test to those with higher genetic risk (22) • PPV of the PSA test increases with genetic risk (23) | |
Other | ||
Alzheimer’s disease | ||
Risk factors | Mendelian risk factors | APOE ε4 and ε2 alleles |
Other factors | Age, sex and family history | |
Potential clinical utility for PRS | • Current polygenic scores can explain the majority of heritability for common variants (95) • Polygenic hazard scores (PHS) to estimate age-of-onset (30) |
For cardiovascular disease, traditional risk factors such as systolic blood pressure, cholesterol levels and smoking habits (Table 1) are routinely used to predict risk and guide initiation of treatment (e.g. statins) to lower low-density lipoprotein (LDL) cholesterol and reduce the disease risk. Recent studies have shown that adding PRS for CAD to the Framingham risk score and the ACC/AHA pooled risk equations resulted in increased predictive power (16). Additionally, in two re-analyses of clinical trials evaluating the effect of statin use on cardiovascular disease prevention, it was shown that the treatment benefit (absolute CAD risk reduction) was highest in those with the highest CAD polygenic risk (17,18). The MI-GENES study found that disclosing CAD genetic risk to patients when deciding whether to initiate statin therapy resulted in improved LDL reduction, and the effect was again higher in those with the most genetic risk (19). Preliminary health economic analysis has also shown the potential cost benefits of using PRS in targeted testing for CAD prevention within the Finnish health system (20). Together, these results show that a PRS for CAD can inform a more accurate risk estimate and define individuals most likely to benefit from statin therapy; however, the exact net benefit will likely vary across health systems and thus will require evaluation within each one.
Another potential use case for PRS may be to increase the utility of lower sensitivity diagnostics. The serum prostate-specific antigen (PSA) test was used to screen for prostate cancer, but large trials showed that it results in a significant amount of overdiagnosis (false-positives leading to overtreatment) (21); while still used in diagnosis it has been abandoned for broad screening. Multiple prostate cancer PRS have been developed that can accurately stratify individual’s risk; a key finding from these studies has been that the probability of overdiagnosis by screening decreases as individual’s prostate cancer polygenic risk increases (22–24). This finding suggests that the PSA test could be targeted to a higher-risk population, as measured by a PRS, where the PSA test has a higher positive predictive value (PPV). In other disease areas there is similar interest in adjusting screening test frequency and/or age of initiation, and in breast cancer the WISDOM clinical trial (25) is currently evaluating the use of risk (including PRS) instead of age-based guidelines (26) to guide these decisions.
Lessons learned from PRS prediction studies
PRS define a lifetime risk trajectory
The majority of common complex diseases are late onset, with risk accumulating over time. Age is typically the strongest predictor of risk for many common diseases (Table 1), since it encapsulates the time dimension over which environmental exposures (risk factors) occur, as well as the aging process (which can accelerate disease processes) (Fig. 1). Thus the goal of risk prediction for such diseases is to evaluate whether the risk, either lifetime risk or shorter time horizons (e.g. 10-year risk), is higher than a threshold given by clinical guidelines or by age-adjusted average risk. The predicted risk can then be used to plan appropriate clinical action, whether it be treatment or increased screening. The shape of this risk trajectory can be different from birth, and is modified by an individual’s genetics as well as environment and behaviors, such as smoking, diet, exercise and medication usage.

PRS define lifetime risk trajectories. (a) Example density plot of a population according to polygenic risk. The distribution is filled and labeled according to the lowest (0–20%; blue), population average (40–60%; grey) and highest (80–100%; red) quintiles of genetic risk. (b). Example of a risk trajectory (Kaplan–Meier cumulative risk curve) for the population average (grey) and the highest and lowest quintiles of genetic risk (colored as in a). Representative risk threshold is shown for example.
Analyses across a range of complex diseases have utilized methods from survival analysis to examine how PRS affect the trajectory of cumulative risk over a lifetime, including CAD (16,27), breast cancer (28,29), prostate cancer (23), Alzheimer’s disease (30,31) and weight gain trajectories (32). These trajectories stratified by genetic risk can be estimated from an early age, prior to any clinical risk factors manifesting. For example, an average male in the UK Biobank would reach 10% cumulative risk of CAD by the age of 68 (27). On the other hand, individuals with the highest and lowest 20% of CAD PRS would attain 10% cumulative risk by 61 and 75 years, respectively. Similarly, the risk trajectory of breast cancer was modified in a cohort of Estonian women (29), whereby at age 70, the average risk of breast cancer was 5%, but was 12% for those >95th percentile of genetic risk and 2.4% in those of the bottom quintile. Taken together, and with evidence from other diseases, it is clear that genetic risk can substantially stratify individual disease risk trajectories above what can be predicted by age alone.
PRS capture risk not quantified by family history and rare monogenic mutations
Two other major predictors that have been used for disease risk prediction are (i) family history and (ii) monogenic mutations.
A family history of disease is a composite of genetic risk (both common and rare) and a shared environment. For instance, many breast cancer risk prediction methods implemented in clinical practice (e.g. BOADICEA (33)) use family history, often represented in a pedigree, to estimate risk alongside other predictors. Family history, however, suffers from several drawbacks: (i) family history depends on actual disease events occurring (a cancer diagnosis) and thus cannot detect individuals who are at high risk but have not experienced an event; (ii) complex trait theory predicts that the majority of cases of complex disease arise in individuals without any family history of disease (sporadic cases) (34) and (iii) family history information is often incomplete or imprecise in practice, leading to further reduction in its predictive power.
PRS can be thought of as a method to explicitly capture the common polygenic component of family history. Indeed, even early PRS could predict lower prevalence diseases better than family history (35). Recently, more predictive PRS for higher prevalence disease, such as CAD, have been shown to be associated with CAD independently of family history (16,36). Since family history includes an environmental as well as genetic component, we expect that as PRS get more powerful, they will better capture the common genetic component of family history, without affecting the shared environment or the monogenic (rare) component. Thus, it is likely that for prediction purposes, models combining both family history and PRS will be stronger than any one of the two single factors, and that family history will not be made redundant by PRS.
Another form of genetic risk is monogenic in origin, namely, Mendelian germline mutations with high penetrance. Such examples are given by BRCA1/2 for breast, ovarian and prostate cancers and familial hypercholesterolemia (FH, caused by LDLR/APOB/PCSK9 mutations) for CAD. While these mutations are often highly penetrant, their relative rarity in the population means that they only explain a small fraction of disease cases. Furthermore, these rare genetic variants are generally not well-genotyped by standard genome-wide genotyping arrays (nor well-imputed from reference panels) (37), such that PRS derived from standard GWAS summary statistics with typical MAF thresholds do not capture rarer variation with high accuracy.
Comparing the relative contributions of polygenic and monogenic risk is not straightforward since PRS represent continuous risk while monogenic risk is typically represented as the presence/absence of known mutations. One approach used by Khera et al. (38) was to find what proportion of the population had a PRS level high enough to be considered as equivalent to carrying monogenic mutations. For example, the top 8% CAD PRS confers an odds ratio of 3, which is similar to that of FH (38,39), but far more prevalent (1 in 13 and 1 in 200 for the PRS top 8% and FH, respectively), thus representing a much higher disease burden on the population level.
Since monogenic and polygenic risks are largely independent, individuals can inherit any combination of these two factors, and some small proportion of the population may receive both high polygenic risk as well as monogenic mutations for the same disease, putting them at extreme risk; conversely, some monogenic carriers may be at lower risk than their average peers. This has been shown for LDL cholesterol levels in carriers of both FH mutations and high CAD risk (39) and by the ability of CAD PRS to predict CAD in cohorts of high-risk FH cases (16,40). Outside of CAD, PRS for diseases have been combined with well-studied mutations to show that PRS provides additional stratification in carriers of BRCA1/2 mutations in prostate cancer (41) and breast cancer (42) and APOE ε4 carriers in Alzheimer’s disease (30,43–45). There is some evidence from Alzheimer’s disease (44) and breast cancer (42) that polygenic risk may interact non-additively with monogenic risk, but more research is needed to understand the impact on risk prediction. Ultimately, combined monogenic/polygenic scores will likely provide the most information for individual risk prediction.
PRS are largely independent of traditional risk factors and can improve current clinical risk prediction models
When considering adding PRS to risk models based on traditional risk factors, there are three main questions: (i) Is the PRS associated with disease risk independently of traditional risk factors? (ii) Does the PRS combine additively or non-additively with traditional risk factors in affecting risk? and (iii) Does the PRS increase predictive power over traditional risk factors?
The PRS for several diseases have been shown to be associated with disease risk largely independently of traditional risk factors. For example, in CAD, the association of PRS with disease is only partially attenuated by adjusting for a range of traditional risk factors such as systolic blood pressure, LDL cholesterol, BMI and others (16,27,46). In addition, the PRSs are often only weakly associated with these risk factors. This is likely due to several reasons: (i) PRS are based on a large number of SNPs representing a multitude of biological pathways, some of which are not represented by traditional risk factors, (ii) many risk factors are themselves driven both by genetics and environment, and PRS can only capture the genetic component, (iii) current PRS are incomplete in that they typically only explain a small proportion of heritability and (iv) some risk factors, such as blood pressure, can exhibit substantial temporal variation and noise in measurements, whereas the PRS is capturing a life-long effect.
A subsequent question is whether PRS and traditional risk factors combine additively in affecting disease or does one modify the other in a non-additive way (statistical interaction). Results so far in CAD (16,27) indicate that PRS and traditional risk factors combine largely additively; there is some evidence that PRS for breast cancer may interact with a minority of its risk factors including alcohol consumption, height and hormone therapy (47); however, it is unknown whether the magnitude of these interactions has substantial implications for improved risk prediction.
The final issue is whether PRS add substantial new information on top of traditional risk factors as to increase predictive power. In breast cancer, this has been tested with multiple PRS (varying in GWAS summary statistics, training datasets and number of SNPs in the score) and multiple established risk predictors (varying in the genetic and non-genetic risk factors included; models listed in Table 1 and (48,49)). In a systematic review and meta-analysis of these studies, Fung et al. found that the AUC of any risk predictor improved by 0.004 with the inclusion of a PRS, and the net reclassification improvement (a measure of change in classification accuracy based on established risk thresholds) improved in all studies but one (49); however, care should be taken when interpreting these results as all of the scores included fewer than 100 SNPs. In a recent study of PRS utility for risk prediction in 101 breast cancer families without BRCA1/2 mutations, the inclusion of a 161 SNP PRS into the BOADICEA changed screening recommendations for 11.5–19.8% of women based on the risk guidelines used (50). Using another recent PRS (PRS-77; (51)) resulted in similar fraction of risk categories changing when included into BOADICEA and a number of other risk prediction methods (BRCAPRO, BCRAT and IBIS) in a small Australian cohort (52) and a smaller 67 SNP PRS was independently predictive of risk when included in the Gail risk model along with mammographic density and endogenous hormones (53) (similar findings are observed using PRS-77 (54)). The use of larger cancer PRS (28,29,38) will likely improve risk stratification further.
PRS are most informative for prevention
While there is benefit to adding PRS to existing clinical risk scores, the unique characteristics of PRS open up possibilities for earlier prevention. Indeed, a study to predict the development of T1D in high-risk children (family history of T1D) found that a PRS was only predictive of progression to T1D before any metabolic abnormalities were present (high DPT-1 score), indicating the value of a T1D PRS for predicting those likely to progress to disease (55). For cardiovascular disease, traditional risk factors are typically not measured early in life and can have substantial temporal variation. In contrast, individuals can be genotyped early in life, and have their PRS for a wide range of complex diseases. For those at substantially increased lifetime risk of disease, but without elevated traditional risk factors, targeted lifestyle interventions could be used to reduce their risk, for example, by more frequent follow-ups or more stringent targets for traditional risk factors (e.g. cholesterol) (56).
Open questions and challenges for the PRS field
We have outlined the value and potential of PRS for disease risk prediction but there remain a number of technical, practical and ethical concerns that should be resolved before widespread clinical adoption.
Improving the replicability and comparability of PRS predictions
Currently PRS exist in the research domain, where scores methods and standards are constantly developing. The PRS for a single disease area can vary widely in their risk predictions because they will include different numbers (10–106) and non-overlapping sets of SNPs, with different effect sizes in different scores, depending on the GWAS summary statistics used to create the score (e.g. number of samples and their ancestry, phenotype definition and imputation panel for SNPs), along with the computational method and samples used to train the score. Apparent performance can also vary due to the covariates adjusted for in the risk prediction, such as age and sex. We believe this lack of consistency to be a prime concern for the PRS field, and additional resources, such as a centralized public database of published polygenic scores, are necessary to increase PRS comparability and evaluation and thus improve their potential for translation. However, further major challenges remain, including those as discussed below: increasing the diversity of genotyped cohorts to reduce the bias of PRS performance for European ancestries, investigating sex-based differences in PRS performance and delineating clinical utility in disease-specific scenarios, rather than relying on generic prediction metrics, such as AUC.
Sources of bias in PRS predictions: stratification by ancestry and sex?
Currently, the majority of PRS are developed and evaluated using individuals of European ancestries, since the majority of GWAS and genetic reference panels (used for imputation) are currently biased toward European ancestries (57,58). Because of this, it has been observed that PRS developed using data from European ancestries are less predictive in non-European ancestries (59–63). There are various possible reasons for this lack of transferability including (i) population stratification in the original summary statistics (confounding the association results), (ii) differences in LD patterns between ancestries and (iii) differences in the true genetic architectures of disease, including gene-environment interactions (58,64).
Population stratification can generally be adjusted for during the GWAS or in the evaluation of the score on new datasets, using principal component analysis or linear mixed models (65). Care must be taken even within a single ancestry group, as there can be regional variations of PRS driven by subtle population stratification (66,67). As for performance differences due to diverging LD patterns, these arise since many of the GWAS SNPs are not necessarily the causal SNP but are in LD with the causal SNP (tagging); however, due to differences in LD between populations, the causal SNPs may no longer be well tagged, leading to reduced performance (58,64).
The issue of differences in genetic architecture differing by ancestry groups is difficult to assess without large GWAS in non-European ancestries. So far, the evidence from diseases such as T2D is that the genetic architecture is largely concordant between European and non-European ancestries (68–71), and directionally concordant effect sizes between different ancestries have been observed in multiple other comparisons of GWAS across ancestral groups (72–74). Assuming that this holds across the majority of complex diseases, LD differences are likely the main challenge to overcome. Some proposed solutions include a single pan-ancestry PRS or creating different ancestry-specific PRSs (62,75–77). A related challenge will then be how to accurately align an individual to a PRS based on their ancestry.
Another important yet relatively unexplored aspect of PRS predictive differences is how they differ by sex. Many traits, including disease risk, differ by sex and some of that may be partly genetic (78). However, most GWAS are not sex-specific and often exclude sex chromosomes (particularly X) from the analysis. This is an area of interest for future PRS research, with recent results showing stronger predictive power for obesity (79) and Alzheimer’s disease (80) using sex-specific PRS.
What is the value of PRS and how do we achieve it?
In this review, we have outlined the benefits of how PRS can improve risk prediction and highlighted cases of potential clinical utility. However, the evaluation of a PRS in public health and health economic terms as well as in feasibility of implementation is necessary to motivate adoption; these aspects, however, have not been extensively explored. Public health, economic and implementation assessment will be highly dependent on the PRS use case and costs of the clinical action (e.g. medication or altered screening guidelines). A previous review outlined the potential value of PRS in optimally allocating therapies in reducing the number needed to treat (81); however, cost-benefit analysis represents another large step to be taken. To our knowledge, the cost-benefit of PRS testing has only been explored in CAD and breast cancer. In a simulation framework of the Finnish health system, it was found that an optimal allocation of a CAD PRS alongside traditional risk factors would be cost-beneficial if deployed in a targeted, rather than population-wide, approach (20). In a UK-based analysis of the benefits of allocating breast cancer screening using risk-based (a combined predictor including a PRS) rather than age-based estimates would improve the cost-effectiveness and the benefit-to-harm ratio over current guidelines (26). While these cases suggest the value of genetic testing in their specific use cases, they may be underestimating the potential benefit due to the multiple PRS that can be estimated from a single genotype array. It is possible that there would be a significant health and economic benefit for genotyping once and receiving concurrent risk predictions for multiple diseases, optimizing treatment or screening for each.
Conclusions
Sixteen years since the human genome sequence was finished and nearly 13 years since the GWAS era began, PRS have emerged as a powerful tool to predict genetic predisposition of disease. For the seven diseases we evaluate here, the addition of PRS generally increased the accuracy of existing risk models of established risk factors, with the resulting improved risk prediction models affecting clinical management (diagnostic screening and/or treatment) in sizeable fractions of patients (~10% in the case of breast cancer). While these studies demonstrate the potential clinical impact and benefits of using PRS, there are still open questions regarding their eventual utility.
The utility of PRS for informing disease risk is further evidenced by its practical implementation as a one-time, minimally invasive DNA extraction (e.g. saliva or blood draw) at any point in a lifetime, coupled with low-cost array genotyping and, in the future, genome sequencing. A single individual’s genotype data allows for the parallel calculation of PRS for many diseases. From this single test, preexisting risk prediction models for multiple diseases appear to be improved, and lifetime risk trajectories can be estimated. In the future, these risk estimates may be used to guide screening frequencies, therapeutic interventions and targeted recommendations for lifestyle. Regardless, the predictive accuracy of PRS will continue to improve with larger and more diverse cohorts as well as improved methods to derive and apply PRS, all of which are likely to increase the potential clinical utility of PRS and accelerate translation.