Prediction of Breast Cancer Risk Based on Profiling With Common Genetic Variants

Background: Data for multiple common susceptibility alleles for breast cancer may be combined to identify women at different levels of breast cancer risk. Such stratification could guide preventive and screening strategies. However, empirical evidence for genetic risk stratification is lacking. Methods: We investigated the value of using 77 breast cancer-associated single nucleotide polymorphisms (SNPs) for risk stratification, in a study of 33 673 breast cancer cases and 33 381 control women of European origin. We tested all possible pair-wise multiplicative interactions and constructed a 77-SNP polygenic risk score (PRS) for breast cancer overall and by estrogen receptor (ER) status. Absolute risks of breast cancer by PRS were derived from relative risk estimates and UK incidence and mortality rates. Results: There was no strong evidence for departure from a multiplicative model for any SNP pair. Women in the highest 1% of the PRS had a three-fold increased risk of developing breast cancer compared with women in the middle quintile (odds ratio [OR] = 3.36, 95% confidence interval [CI] = 2.95 to 3.83). The ORs for ER-positive and ER-negative disease were 3.73 (95% CI = 3.24 to 4.30) and 2.80 (95% CI = 2.26 to 3.46), respectively. Lifetime risk of breast cancer for women in the lowest and highest quintiles of the PRS were 5.2% and 16.6% for a woman without family history, and 8.6% and 24.4% for a woman with a first-degree family history of breast cancer. Conclusions: The PRS stratifies breast cancer risk in women both with and without a family history of breast cancer. The observed level of risk discrimination could inform targeted screening and prevention strategies. Further discrimination may be achievable through combining the PRS with lifestyle/environmental factors, although these were not considered in this report.


Results:
There was no strong evidence for departure from a multiplicative model for any SNP pair. Women in the highest 1% of the PRS had a three-fold increased risk of developing breast cancer compared with women in the middle quintile (odds ratio [OR] = 3.36, 95% confidence interval [CI] = 2.95 to 3.83). The ORs for ER-positive and ER-negative disease were 3.73 (95% CI = 3.24 to 4.30) and 2.80 (95% CI = 2.26 to 3.46), respectively. Lifetime risk of breast cancer for women in the lowest and highest quintiles of the PRS were 5.2% and 16.6% for a woman without family history, and 8.6% and 24.4% for a woman with a first-degree family history of breast cancer.

Conclusions:
The PRS stratifies breast cancer risk in women both with and without a family history of breast cancer. The observed level of risk discrimination could inform targeted screening and prevention strategies. Further discrimination may be achievable through combining the PRS with lifestyle/environmental factors, although these were not considered in this report.
Breast cancer is the most common cancer among Western women, with approximately 1.67 million cases diagnosed annually worldwide (1). Strategies such as endocrine risk-reducing medication and early detection by breast cancer screening can reduce the burden of disease but have disadvantages including side effects, overdiagnosis, and increased cost (2)(3)(4). Stratification of women according to the risk of developing breast cancer could improve risk reduction and screening strategies by targeting those most likely to benefit (5)(6)(7)(8).
Both genetic and lifestyle factors are implicated in the aetiology of breast cancer. Women with a history of breast cancer in a first-degree relative are at approximately two-fold higher risk than women without a family history (9). Rare high-risk mutations particularly in the BRCA1 and BRCA2 genes explain less than 20% of the two-fold familial relative risk (FRR) (10) and account for a small proportion of breast cancer cases in the general population. Low frequency variants conferring intermediate risk, such as those in CHEK2, ATM, and PALB2, explain 2% to 5% of the FRR. Genome-wide association studies (GWAS) have led to the discovery of multiple common, low-risk variants (single nucleotide polymorphisms [SNPs]) associated with breast cancer risk (11), many of which are differentially associated by estrogen receptor (ER) status (12,13). Recently, new risk-associated variants have been identified in a large-scale replication study conducted by the Breast Cancer Association Consortium (BCAC) as part of the Collaborative Oncological Gene-Environment Study (COGS). SNPs were genotyped in over 40 000 breast cancer cases and 40 000 control women, using a custom array (iCOGS). This experiment increased the number of SNPs robustly associated with breast cancer from 27 to more than 70 and identified additional variants specific to ER-negative breast cancer (14)(15)(16)(17).
Risks conferred by SNPs are not sufficiently large to be useful in risk prediction individually. However, the combined effect of multiple SNPs could achieve a degree of risk discrimination that is useful for population-based programmes of breast cancer prevention and early detection (8,18). In this report, we investigated the value of using all 77 breast cancer susceptibility loci identified to date for risk stratification. Previous studies of polygenic risk have assumed a log-additive model for combining SNPs; however, this assumption needs to be evaluated empirically. We first assessed whether interaction between SNP pairs could influence the joint contribution of genetic factors on disease risk by testing for all possible pair-wise interactions between SNPs. We then constructed polygenic risk scores (PRSs) to capture the combined article effects of the 77 SNPs on overall breast cancer risk, as well as on the risk of ER-positive and ER-negative disease separately. We estimated absolute risks of developing breast cancer for different levels of the PRS, accounting for the competing risk of mortality from other causes. Effect sizes were confirmed in one large study (pKARMA) that was not part of any SNP discovery set. We discuss the degree of breast cancer risk stratification obtained in women with and without a family history of breast cancer.

Study Subjects and Genotyping
Study participants for the primary analyses (set 1) were 89 049 women of European origin participating in 41 studies in BCAC. All studies were approved by the relevant institutional review boards, and all individuals gave written informed consent. Samples were genotyped using a custom Illumina iSelect array (iCOGS) comprising 211 155 SNPs (15). For some analyses, a further 72 014 women in BCAC genotyped for the relevant SNPs in earlier experiments were included (set 2). For PRS analyses (67 054 women), studies that oversampled breast cancer cases with a family history (21 995 women) were excluded. Supplementary Tables 1-3 (available  online) show study designs and numbers of breast cancer cases and control women included.
Analyses were based primarily on variants reported to be associated (at P < 5x10 -8 ) by COGS or previous publications, with either breast cancer overall or ER-negative disease. SNPs and regions included are summarized in Supplementary Table 4 (available online).

Statistical Methods
Tests for pair-wise SNP*SNP interactions (departures from a multiplicative model) were carried out using logistic regression, with breast cancer as the outcome. The two SNPs were each coded as a categorical variable (ie, fitting a separate parameter for heterozygous and risk-allele homozygous genotypes), while the interaction term (SNP1*SNP2) was included as continuous covariate. All analyses were adjusted for study and seven principal components (PC) to account for population substructure (15). Additional interaction tests used are described in the Supplementary Methods (available online).
To investigate the association between breast cancer risk and the combined effects of 77 SNPs, a PRS was derived for each individual using the formula: where β k is the per-allele log odds ratio (OR) for breast cancer associated with the minor allele for SNP k, and x k the number of alleles for the same SNP (0, 1, or 2), and n = 77 is the total number of SNPs. Thus, the PRS summarizes the combined effect of the SNPs, ignoring departures from a multiplicative model (18). SNPs and corresponding odds ratios used in derivation of PRSs are summarized in Supplementary Table 4 (available online). Logistic regression models were used to estimate the odds ratios for breast cancer by percentile of the PRS, with the middle quintile category (40 th to 60 th percentile) as the reference. Observed odds ratios for breast cancer by percentile of the PRS were compared with predicted odds ratios under a multiplicative polygenic model of inheritance. Modification of the PRS by age or by family history of breast cancer in a first-degree relative was evaluated by fitting additional interaction terms in the model. All tests of statistical significance were two-sided. The thresholds for statistical significance are indicated below.
The absolute risk of overall breast cancer, ER-positive and ER-negative breast cancer for individuals in each risk category, was calculated taking into account the competing risk of dying from other causes apart from breast cancer. Approximate confidence limits for the absolute risk were derived from the variance-covariance matrix of the log (relative risk) parameters in the logistic regression analysis. Detailed methods are provided in Supplementary Methods (available online).

Pairwise Multiplicative SNP*SNP Interaction Analyses
Data on 46 450 breast cancer cases and 42 599 controls from 41 studies were included in the interaction analyses  Table 3, available online). There was no strong evidence for interaction between any particular SNP pair after Bonferroni correction (Supplementary Tables 5-6, available online). Plots of expected vs observed log 10 P values for SNP*SNP interaction tests showed slight departure from the null hypothesis of multiplicative effects (Supplementary Figure 1, A and B, available online), and the number of statistically significant interactions with P interaction values of less than .01 was larger than expected by chance (Table 1). To investigate whether there was an excess of synergistic or antagonistic interactions, the direction of the interaction term relative to the main effects was examined for SNP pairs with P interaction values of less than .01. For case-control analyses, 47% of interactions were synergistic and 53% antagonistic, and for case-only analyses 53% were synergistic and 46% antagonistic. These proportions were not statistically significantly different from the null expectation (P > .05). Meta-analysis of SNP*SNP interaction test results from the iCOGS dataset with those from 72 014 additional women in BCAC yielded similar results (Supplementary Table 7, available online). Given that no SNP pair showed strong evidence for departure from the multiplicative model, subsequent analyses were based on a PRS that included the main effects of SNPs but no SNP*SNP interaction terms.

Association Between PRS and Breast Cancer Risk
As predicted by the polygenic, multiplicative model, the number of breast cancer risk alleles and the 77-SNP PRS approximated a normal distribution for both breast cancer cases and control women ( Figure 1). The odds ratios for developing breast cancer by percentiles of the PRS, compared with women in the middle quintile (40 th to 60 th percentile) are shown in Figure 2A. The observed odds ratios were similar to the odds ratios predicted under a polygenic multiplicative model; the 95% confidence interval (CI) included the predicted odds ratio at all points except the 80 th to 90 th percentile ( Figure 2A; Supplementary  Table 3). A validation analysis including only one large study (pKARMA) that was not part of any SNP discovery analyses found similar odds ratio estimates to those in the remaining studies, except for the 60% to 80% and 90% to 95% categories, for which estimates were higher in pKARMA (Table 4; Supplementary Table 9, available online). The log OR per unit SD was also similar for pKARMA alone (log OR per unit SD = 0.4). The associations between PRS and breast cancer in different age groups are summarized in Table 3 and Supplementary Figure 2 (available online). There was a statistically significant interaction between PRS and age, the association between PRS and breast cancer risk decreasing with age (Table 3).
A family history of breast cancer in one or more affected firstdegree relatives was reported by 18.5% of breast cancer cases and 11.1% of control women. The odds ratio for family history was attenuated from 1.81 to 1.68 (12.6% attenuation) after adjusting for the PRS (Table 2). At younger ages (<40 years), there was less attenuation (from 2.90 to 2.76, 4.6% attenuation) ( Table 2). The joint effects of the PRS and family history were largely consistent with a multiplicative model (P interaction = .34 for the interaction between the PRS and family history; data not shown); however, we observed a stronger effect of family history for women at the lowest 1% of the PRS (Supplementary Table 10, available online).
The discriminative accuracy of the PRS, as measured by the C-statistic, was 0.622 (95% CI = 0.619 to 0.627); discrimination was * Age of breast cancer cases (age at diagnosis) and control women (age at interview). CI = confidence intervals; PRS = polygenic risk score; log OR = log odds ratio. † log OR for association between the PRS coded as a continuous variable and breast cancer risk (per unit SD of the PRS) ‡ OR per 10 years for interaction between PRS and age.

Absolute Risks of Developing Breast Cancer by Levels of PRS
The estimated risk of developing breast cancer by age 80 years for women in the lowest and highest 1% of the PRS was 3.5% (95% CI = 2.6% to 4.4%) and 29.0% (95% CI = 24.9% to 33.5%), respectively ( Figure 3A). For the lowest and highest quintiles of the PRS, the risk was 5.3% (95% CI = 5.1% to 5.7%) and 17.2% (95% CI = 16.1% to 18.1%), respectively (data not shown). The corresponding risks of developing ER-positive disease were 4.1% and 15.7% for women in the lowest and highest quintiles, respectively, of the ER-positive PRS (averaged over all ER-negative PRS categories), whereas the highest lifetime risk for ER-negative disease was 2.4% (women in the highest quintile of ER-negative PRS and average ER-positive risk) ( Figure 3). Lifetime risk of breast cancer for women in the lowest and highest quintiles of the PRS were 5.2% and 16.6% for a woman without family history and 8.6% and 24.4% for a woman with a first-degree family history of breast cancer ( Figure 4). We estimated the 10-year absolute risk of breast cancer at different ages and evaluated the age at which women at different levels of the PRS reach a threshold of 2.4%, which corresponds to the average 10-year risk of breast cancer for women age 47 years. This threshold was reached at 32 years for women whose PRS is above the 99th percentile of the PRS, and 57 years for women in the 20th to 40th percentiles of the PRS, and was never reached for women in lower percentiles ( Figure 3D). As expected, lifetime risks were higher, and the ages at which the 2.4% threshold was reached were lower for women with a family history of breast cancer (Figure 4).

Discussion
In this report, we evaluated the degree of breast cancer risk stratification that can be attained in women of European ancestry using data for 77 common genetic variants, summarized as a PRS. Our results show that the PRS stratifies breast cancer risk in women without family history and refines genetic risk in women with a family history of breast cancer.
The PRS we used (sum of the minor alleles weighted by the per-allele log OR) is the most efficient, assuming that SNP odds ratios combine multiplicatively (ie, no interactions on a log-additive scale) (18). Evaluation of pairwise SNP interactions showed that this was a reasonable assumption. Although no individual interactions could be established, we observed an excess of multiplicative interactions at P less than .01. This could be the result of underlying population stratification not accounted for by principal components adjustment or reflect the presence of multiple interactions too weak to be established individually. A recent study also found no evidence for interactions among SNPs with weaker evidence for main effects (19). Although we did not test for higher order interactions among SNPs, consistency between empirical and predicted odds ratios assuming multiplicative effects suggests that across all possible multiway interactions the overall effect is close to multiplicative.
The 77-SNP PRS was associated with a larger effect than previously reported for a 10-SNP PRS (20). For example, our odds ratio for breast cancer for women in the highest compared with the middle quintile was 1.82 (95% CI = 1.73 to 1.90) vs 1.44 (95% CI = 1.35 to 1.53) for the 10-SNP PRS (20). A potential concern is that the PRS was constructed using iCOGS data that were, in part, the basis for discovery of many of the loci. This could lead to some upward bias in the odds ratio estimates (winner's curse); however, analyses based on a large study (pKARMA) that was not part of any discovery set obtained similar estimates indicating that any winner's curse effect is likely to be small.
There has been little evidence of differences by age in the per-allele odds ratio for individual SNPs. However, we observed a small but statistically significant decrease in odds ratio for PRS with increasing age. As expected, the odds ratio for family history was reduced after adjustment for the PRS. This attenuation (~12.6%) was consistent with the estimated fraction of the twofold FRR explained by the 77-SNPs under a polygenic risk model (15). The joint effects of PRS and family history were consistent with a multiplicative model. A stronger FRR was observed for women at the lowest percentile of the PRS, but this was based on small numbers and requires confirmation. The degree of attenuation of the family history odds ratio was lower below age 40 years, as a result of the higher FRR at young ages, suggesting that rarer genetic variants may be more important at young ages.
We calculated the absolute risk of developing breast cancer for women at different levels of genetic risk according to the PRS. The lifetime risk for women below the first and above the 99 th percentile of the PRS was 3.5% (95% CI = 2.6% to 4.4%) and 29.0% (95% CI = 24.9% to 33.5%), respectively. UK NICE guidelines recommend enhanced surveillance for women with a family history with lifetime risk of developing breast cancer over 17% (21). Figure 3 indicates that the PRS alone could identify approximately 8% of all women in the UK population at this level of risk, regardless of family history or other risk factors; approximately 17% of all breast cancer cases in the population would be expected to occur among these women. By contrast, the low absolute risk of breast cancer among women at the lowest end of the risk distribution raises the possibility that such women might be recommended more limited surveillance. Women at different levels of the PRS reach the same 10-year risk threshold at different ages, supporting the notion that using SNP profiles rather than age alone as a criterion to offer routine mammographic screening could lead to more effective screening programs (6). The utility of such an approach article would, however, depend on the acceptability of risk-based surveillance, together with health economic considerations. Prediction of subtype-specific breast cancer should also be informative for prevention (4). Recently updated NICE guidelines include recommendations to use endocrine treatments (tamoxifen and raloxifene) for primary prevention of breast cancer for women at moderate to high risk (21). These guidelines are based on risk of overall breast cancer for women with a family history of breast cancer. However, because these drugs prevent only ER-positive tumours, risk estimates incorporating the ER-positive PRS could better define the subset of women most likely to benefit. Our sample was derived from studies in Europe, North America, and Australia and restricted to women of European origin. While the results should be widely applicable in these populations, additional studies will be required to develop and validate genetic profiles for other populations, in particular Asian and African populations, where SNP associations, background incidence rates and distribution of tumour characteristics are substantially different.
Our analysis summarized family history in terms of a single binary variable, but familial risk of breast cancer also depends on the number of affected and unaffected relatives and their ages. Risk prediction algorithms that combine full family history data with a polygenic component perform better than simpler models (22). It is possible to incorporate the current PRS into family-history based models for breast cancer, such BOADICEA, to improve genetic risk prediction (23).
The COGS project includes the largest set of breast cancer studies with both phenotype and genotype information, and our analysis utilized by far the largest number of SNPs with confirmed associations with breast cancer, including all SNPs discovered to date. Further refinement of the risk stratification should be possible through incorporating additional SNPs exhibiting evidence for association, but not at formal genome-wide article statistical significance, together with variants in genes conferring intermediate or high risk (15). The risk discrimination provided by the genetic profile, summarised in the PRS and family history, should be further improved by combining, with lifestyle risk factors, benign breast disease, and mammographic density (24,25,28). Although we did not consider lifestyle factors explicitly in this dataset, other large studies have found no good evidence for interactions between common susceptibility SNPs and lifestyle factors for breast cancer, suggesting that SNPs generally combined multiplicatively (26,27). Darabi et al. (25) estimated a C-statistic of 0.60 for lifestyle risk factors including mammographic density. By comparison, we estimated the C-statistic for the PRS to be 0.62. Assuming that the multiplicative model is correct, the C-statistic would increase to 0.66 with the addition of the lifestyle risk factors. If modifiable risk factors and the PRS act multiplicatively, targeting public health interventions to women at higher genetic risk should result in a larger absolute risk reduction. For example, the decision to prescribe hormone replacement therapy might be guided by the PRS (28). Similar considerations would apply to risk-reducing interventions such as preventive medication and oophorectomy.
Some limitations of this study should be noted. Although the study was extremely large, the numbers of breast cancer cases and control women were still too limited to provide precise estimates of relative risks in the extremes of the PRS (for example, the highest 1%). Numbers were also limited to explore the effects at very young ages, and estimates were less precise for ER-negative disease. There was heterogeneity among the studies, both in population and design, but we saw no evidence of heterogeneity in SNP odds ratios among studies, suggesting that the estimates should be broadly applicable. Oversampling for family history could have led to a bias in the odds ratios by PRS, and for this reason we excluded studies that were sampled on the basis of family history. Finally, we were not able to consider lifestyle/environmental risk factors in our model, as data on all of these risk factors were not consistently available across all studies. Interactions between the PRS and environmental factors will need to be explicitly tested for in future studies.
In previous reports, improvement in risk discrimination by genomic profiling over that conferred by known risk factors was not substantial (24,29), although better discrimination was obtained for certain subgroups of women (30,31). Previous analyses, however, were based on a much smaller set of SNPs than included in this report. This study provides precise empirical estimates of the combined effects of multiple SNPs and the level of risk stratification possible. These estimates may inform the debate on public health utility and implementation of the PRS in clinical practice. Our work suggests that the PRS, particularly when used in combination with other risk factors, could help identify subsets of women at different levels of risk, for whom management would differ. The PRS may facilitate early detection of cancers in younger women and, importantly, identify individuals at risk of specific subtypes of breast cancer. Finally, there is potential for a stronger impact in modifying environmental factors in women at higher risk of breast cancer. Prospective analyses of the 77 SNP PRS, in combination with other risk factors, will be required to validate the overall accuracy of risk prediction. Such a comprehensive risk prediction   The red line shows the 2.4% risk threshold corresponding to the risk for women age 47 years who were eligible for screening. Absolute risks were calculated using PRS relative risks estimated as described in Methods, and breast cancer incident rates and mortality from other causes obtained from the UK National Office for Statistics. The Spanish National Cancer Centre Breast Cancer Study (CNIO-BCS) was supported by the Genome Spain Foundation, the Red Temática de Investigación Cooperativa en Cáncer, and by grants from the Asociación Española Contra el Cáncer and the Fondo de Investigación Sanitario (PI11/00923, PI081120).
The California Teachers Study (CTS) was initially supported by the California Breast Cancer Act of 1993 and the California Breast Cancer Research Fund (contract 97-10500) and is currently funded through the National Institutes of Health (R01 CA77398). The CTS study was also funded by the Lon V. Smith Foundation (LVS39420) to HAC. Collection of cancer incidence data was supported by the California Department of Public Health as part of the statewide cancer reporting program mandated by California Health and Safety Code Section 103885. The Esther Breast Cancer Study (ESTHER) was supported by a grant from the Baden Württemberg Ministry of Science, Research and Arts. Additional cases were recruited in the context of the VERDI study, which was supported by a grant from the German Cancer Aid (Deutsche Krebshilfe).
The Familial Breast Cancer Study (FBCS) study was supported by funds from Cancer Research UK (C8620/A8372, C8620/ A8857), a US Military Acquisition (ACQ) Activity, an Era of Hope Award (W81XWH-05-1-0204 Biological sample preparation for several studies was conducted at the Epidemiology Biospecimen Core Lab, supported in part by the Vanderbilt-Ingram Cancer Center (P30 CA68485).

Notes
Author contributions: NM, DFE, and MG analyzed data relating to this manuscript and drafted the initial manuscript. DFE coordinated the BCAC and led the iCOGS genotyping. PH led the COGS collaboration. KM performed statistical analyses of the iCOGS data, with assistance from JT. JD provided bioinformatics support. MKB and QW coordinated the BCAC database. The remaining authors led individual studies and contributed to the design of the study, data collection, and revising the manuscript. The sponsors had no role in the study design, in the collection, analysis, or interpretation of data, in the writing of the report, or in the decision to submit the paper for publication.
We extend our thanks to the many women who generously took part in these studies. We also thank all the researchers, nurses, clinicians, technicians, and administrative staff who have enabled this work to be carried out. In particular, we thank: Andrew Lee, Ed Dicks, and the staff of the Centre for Genetic Epidemiology Laboratory, the staff of the CNIO genotyping unit, Sylvie LaBoissière and Frederic Robidoux and the staff of the McGill University and Génome Québec Innovation Centre, the staff of the Copenhagen DNA laboratory, and Julie M. Cunningham, Sharon A. Windebank, Christopher A. Hilker, Jeffrey Meyer and the staff of Mayo Clinic Genotyping Core Facility. The authors would also like to thank the West Midlands Cancer Intelligence Unit (WMCIU) for providing data on breast cancer incidence by ER status for 2010.