Assessment of interactions between 205 breast cancer susceptibility loci and 13 established risk factors in relation to breast cancer risk, in the Breast Cancer Association Consortium

Abstract Background Previous gene-environment interaction studies of breast cancer risk have provided sparse evidence of interactions. Using the largest available dataset to date, we performed a comprehensive assessment of potential effect modification of 205 common susceptibility variants by 13 established breast cancer risk factors, including replication of previously reported interactions. Methods Analyses were performed using 28 176 cases and 32 209 controls genotyped with iCOGS array and 44 109 cases and 48 145 controls genotyped using OncoArray from the Breast Cancer Association Consortium (BCAC). Gene-environment interactions were assessed using unconditional logistic regression and likelihood ratio tests for breast cancer risk overall and by estrogen-receptor (ER) status. Bayesian false discovery probability was used to assess the noteworthiness of the meta-analysed array-specific interactions. Results Noteworthy evidence of interaction at ≤1% prior probability was observed for three single nucleotide polymorphism (SNP)-risk factor pairs. SNP rs4442975 was associated with a greater reduction of risk of ER-positive breast cancer [odds ratio (OR)int = 0.85 (0.78-0.93), Pint = 2.8 x 10–4] and overall breast cancer [ORint = 0.85 (0.78-0.92), Pint = 7.4 x 10–5) in current users of estrogen-progesterone therapy compared with non-users. This finding was supported by replication using OncoArray data of the previously reported interaction between rs13387042 (r2 = 0.93 with rs4442975) and current estrogen-progesterone therapy for overall disease (Pint = 0.004). The two other interactions suggested stronger associations between SNP rs6596100 and ER-negative breast cancer with increasing parity and younger age at first birth. Conclusions Overall, our study does not suggest strong effect modification of common breast cancer susceptibility variants by established risk factors.


Introduction
Breast cancer is a complex disease with both environmental and genetic factors contributing to risk. Wellestablished modifiable and non-modifiable environmental factors include age at menarche, parity, age at first birth, breastfeeding, body mass index (BMI), use of menopausal hormonal therapy (MHT) and alcohol consumption. [1][2][3][4][5][6] In addition, high/moderate-risk gene mutations such as BRCA1, BRCA2, TP53, ATM and CHEK2 increase the risk of breast cancer, 7-14 as well as multiple common, low-risk single nucleotide polymorphisms (SNPs) discovered through genome-wide association studies (GWAS). Approximately 170 genome-wide significant breast cancer susceptibility loci have been identified, including the recently published 65 novel loci associated with overall breast cancer and 10 loci with estrogen receptor (ER)-negative breast cancer risk, identified through the OncoArray project. 15,16 Estimation of any combined effect of genetic and environmental factors, including gene-environment (G x E) interactions, is considered to possibly improve breast cancer risk prediction, and hence identification of women at high risk for targeted prevention. However, development of these risk models depends on knowledge of the joint effects of genetic and environmental risk factors, in particular departures from a multiplicative model (that is, G x E interaction on relative risk scale). 17 More importantly, G x E studies of individual susceptibility loci may also provide insight on potential underlying biological mechanisms that could mediate causal effects of a factor on risk of breast cancer.
Previous G x E interaction studies of breast cancer have reported nearly 30 potential G x E interactions, with little evidence of departures from the multiplicative model. 18,19 Most reported G x E interactions for breast cancer have not been replicated in independent datasets. Two G x E interactions were replicated using data from the Breast Cancer Association Consortium (BCAC), 20 but were not replicated in a smaller study by the Breast and Prostate Cancer Cohort Consortium. 21 In this study, we assess interactions between 205 known common breast cancer susceptibility loci and 13 established environmental risk factors in relation to risk of overall and of ER-specific breast cancer for women of European ancestry, using the largest available dataset to date from the Breast Cancer Association Consortium (BCAC). Additionally, we attempted to replicate previously reported potential G x E interactions. 18

Study population
We analysed data from 46 studies (16 prospective cohorts, 14 population-based case-control studies and 16 nonpopulation based studies) participating in BCAC (Supplementary Table 1, available as Supplementary data at IJE online). Participants were excluded if they were male, were of non-European descent, had breast tumours of unknown invasiveness, or had in situ disease or prevalent disease at the time of assessment. Women with unknown age at reference date (defined as date of diagnosis for cases and of interview for controls) were also excluded. For each risk factor, only studies with risk factor information for at least 150 cases and 150 controls were included. All participating studies were approved by the relevant ethics committees and informed consent was obtained from study participants.

Data harmonization and variable definition
Data for risk factors from different studies were harmonized according to a common data dictionary and were centrally quality controlled. For both case-control and cohort studies, epidemiological risk factor data were derived with reference to reference date (described above). We used reference age as surrogate to categorize women as probably premenopausal (<54 years) or postmenopausal (54 years) status. The environmental variables available for analysis were: age at menarche (per 2 years); ever parous (yes or no); for parous women, number of full-term pregnancies (1, 2, 3 and 4), age at first full-term pregnancy (per 5 years), ever breastfed (yes or no), duration of breastfeeding (per 12 months); and for all women, ever use of oral contraceptives (yes or no), adult body mass index (BMI) separately for pre-and postmenopausal women (per 5 kg/m 2 ), adult height (per 5 cm), lifetime alcohol consumption (per 10 g/day), current smoking (yes or no) and current use of combined estrogen-progesterone menopausal hormonal therapy (MHT) (yes or no) as well as current use of estrogen-only MHT (yes or no) for postmenopausal women.

Statistical analysis
Unconditional logistic regression analysis was employed to assess associations of SNPs and risk factors with breast cancer risk. For SNPs, the estimated number of minor alleles based on imputation was included as a continuous variable. SNP-risk factor interactions were assessed using likelihood ratio tests, based on unconditional logistic regression models with and without an interaction term between the SNP and risk factor of interest. All analyses were adjusted for study, reference age and 10 ancestryinformative principal components. To account for differential main effects of risk factors by study design, we included an interaction term between the risk factor of interest and an indicator variable for study design (population-based and non-population-based), along with the main effect for study design.
Analyses were conducted separately for overall breast cancer risk and for ER subtype-specific breast cancer risk. The analyses were performed separately for women genotyped by iCOGS or OncoArray, and the results were metaanalysed using a fixed-effects inverse-variance weighted model. Between-study heterogeneity in the G x E interaction effect estimates was assessed by Cochrane's Q test and I 2 index.
MHT was classified into estrogen-progesterone therapy (EPT) and estrogen-only therapy (ET). Models assessing the association with current MHT use by type were adjusted for former use of MHT and use of any MHT preparation other than the one of interest. All analyses of MHT use were restricted to postmenopausal women. Models evaluating the association with current smoking were adjusted for former smoking.
To assess the noteworthiness of the observed G x E interactions, we calculated Bayesian false discovery probability (BFDP) at five different prior probabilities for a true association (20%, 10%, 1%, 0.1% and 0.01%). G x E interactions with BFDP <80% were considered as noteworthy. This was based on the assumption of a 4-fold cost of a false non-discovery compared with the cost of a false discovery and that the probability of observing a true interaction odds ratio (OR) inside the range of 0.66-1.50 was 95%, as proposed by Wakefield et al. 53 We also computed a complementary measure to BFDP known as approximate Bayes factor (ABF). This approximates the ratio of the probability of the data given that the null hypothesis is true to the probability of the data when the alternative hypothesis is true, the null hypothesis being absence of any interaction. Therefore, a lower ABF favours the alternative hypothesis over the null hypothesis of absence of an interaction. For noteworthy G x E interactions, we performed stratified analyses by categories of the environmental risk factor using logistic regression. Analyses were carried out using SAS 9.4 or R version 3.4.2. Meta-analyses and tests of between-study heterogeneity were conducted using the R package 'meta' (version 4.9-2).

Results
The studies included in this analysis are listed in Supplementary Table 1, available as Supplementary data at IJE online. The number of cases and controls with data for each risk factor varied, ranging from 23 755 cases and 30 153 controls with data for parity to 5078 cases and 6867 controls with data for cumulative lifetime intake of alcohol in the iCOGS dataset, and from 37 863 cases and 44 533 controls with data for parity to 12 213 cases and 13 232 controls with data for lifetime alcohol intake in the OncoArray dataset (Supplementary Tables 4 and 5, available as Supplementary data at IJE online).
The SNP associations with risk of overall as well as ER subtype breast cancer were consistent with those reported in literature 15,16 (Supplementary Tables 2 and 3, available as Supplementary data at IJE online). The associations of the environmental risk factors with breast cancer risk were as expected in the population-based studies; in brief, age at menarche, being parous, number of full-term pregnancies, ever breastfeeding, cumulative duration of breastfeeding and premenopausal BMI were negatively associated with breast cancer risk, whereas age at first full-term pregnancy, ever use of oral contraceptives, postmenopausal BMI, current use of EPT, adult height, current smoking and cumulative alcohol consumption were all positively associated with breast cancer risk ( We identified three SNP-risk factor interactions as noteworthy (BFDP <0.8) at 1% prior probability ( Table 2). The strongest G x E interaction was found for SNP rs4442975 and current use of EPT [OR meta-int ¼ 0.85, 95% confidence interval (CI) ¼ 0.78-0.92, P meta-int ¼ 7.4 x 10 -5 , BFDP ¼ 0.73] with overall breast cancer at 0.1% prior probability. The minor allele of SNP rs4442975 was associated with a stronger reduced risk of breast cancer for current users of EPT (OR meta ¼ 0.74, 95% CI ¼ 0.69-0.80) than for never users of MHT (OR meta ¼ 0.87, 95% CI ¼ 0.84-0.90) ( Figure 1A). This interaction was also found to be noteworthy at 1% prior probability for risk of ER-positive breast cancer (OR meta-int ¼ 0.85, 95% CI ¼ 0.78-0.93, P meta-int ¼ 2.8 x 10 -4 , BFDP ¼ 0.46). The association of rs4442975 with reduced risk of ER-positive breast cancer was stronger for current users of EPT (OR meta ¼ 0.73, 95% CI ¼ 0.68-0.79) than for never MHT users (OR meta ¼ 0.86, 95% CI ¼ 0.83-0.89) ( Figure 1B).
The two other noteworthy SNP-risk factor interactions were found for ER-negative breast cancer risk. The interaction between rs6596100 and number of full-term pregnancies was noteworthy at 1% prior probability (OR meta- The minor allele of the rs6596100 variant was associated with a reduced risk of overall breast cancer (OR meta ¼ 0.96, 95% CI ¼ 0.94-0.98) and ER-positive breast cancer (OR meta ¼ 0.94, 95% CI ¼ 0.92-0.96), respectively, but not ER-negative breast cancer (OR meta ¼ 1.01, 95% CI ¼ 0.97-1.05). The rs6596100 associated risk of ER-negative breast cancer appears to decrease with number of full-term pregnancies for parous women, with the estimated per-allele OR meta being 1.06 (95% CI ¼ 0.95-1.17) for women who had had one full-term pregnancy and 0.92 (95% CI ¼ 0.82-1.04) for women who had had four or more full-term pregnancies ( Figure 1C).
For parous women, we observed noteworthy evidence that the ER-negative breast cancer risk associated with rs6596100 was also modified by age at first full-term pregnancy (OR meta-int ¼ 1.12, 95% CI ¼ 1.05-1.19, P meta-int ¼ 3.3 x 10 -4 , BFDP ¼ 0.56). The risk conferred by rs6596100 on ER-negative breast cancer was decreased for women with age at first full-term pregnancy below 20 years (OR meta ¼ 0.90, 95% CI ¼ 0.79-1.03) but increased for women with age at first full-term pregnancy 30 years (OR meta ¼ 1.10, 95% CI ¼ 0.97-1.24) ( Figure 1D). However, we observed between-study heterogeneity for the interaction between rs6596100 and age at first full-term pregnancy (Supplementary Figure 4, available as Supplementary data at IJE online). Several other interactions were found to be noteworthy (BFDP <0.8) at 5% prior probability (Supplementary Table 6, available as Supplementary data at IJE online). Meta-analysed results of all the G x E interactions for overall and ER subtype risk are shown in Supplementary Tables 7-9, available as Supplementary data at IJE online. In replication analyses, we found evidence for two previously reported associations in the independent subset of OncoArray data (Supplementary Table 10, available as Supplementary data at IJE online). We estimated an interaction OR for overall breast cancer of 0.80 (95% CI ¼ 0.69-0.93, P int ¼ 0.004) for current EPT use and rs13387042, a SNP for which we had previously reported an interaction OR of 0.83 (95% CI ¼ 0.74-0.94, P int ¼ 2.43 x 10 -3 ). 20 SNP rs13387042 is in strong linkage disequilibrium with rs4442975; hence this result is consistent with the interaction observed for rs4442975 in the full dataset. In addition, we also observed evidence for a G x E interaction between rs941764 and cumulative lifetime intake of alcohol (<20 g/day vs 20 g/day) with ER-negative breast cancer risk (OR int ¼ 0.64, 95% CI ¼ 0.45-0.92, P int ¼ 0.01), compared with OR int of 0.53 (95% CI ¼ 0.36-0.76, P int ¼ 6.8 x 10 -4 ) in Rudolph et al. 54 The corresponding meta-analysed interaction OR (per 10 g/day cumulative lifetime alcohol intake) based on OncoArray and iCOGS datasets was 0.90 (95% CI ¼ 0.81-0.99, P int ¼ 0.03). For the G x E interaction between SNP rs3817198 and number of children for parous women, which had the strongest evidence for overall risk of breast cancer in previous analyses (OR int ¼ 1.06, 95% CI ¼1.04-1.08, P int ¼ 2.4 x 10 -06 ), 20 there was weak evidence of interaction, but in the opposite direction in the replication analyses (OR int ¼ 0.94, 95% CI ¼ 0.94-1.00, P int ¼ 0.03).

Discussion
In this study, we evaluated all known common susceptibility loci for interactions with breast cancer risk factors, and found little evidence for departures from a multiplicative model. We refer to G x E interactions as effect modification conferred by epidemiological risk factors on the association between SNPs and breast cancer risk, but it can very well be SNPs modifying the association of risk factors with breast cancer risk. We identified three noteworthy (BFDP <0.8) G x E interactions related to breast cancer risk based on prior probabilities 1%. The strongest evidence was found for effect modification between rs4442975 and current use of EPT with overall and ERpositive breast cancer risk. Moreover, we found evidence of interactions between the SNP rs6596100 and number of full-term pregnancies and age at first full-term pregnancy, respectively, for ER-negative breast cancer risk.
The SNP rs4442975 is located in an intergenic region on the long arm of chromosome 2 (2q35). Another SNP within the same genomic region, rs13387042, was previously reported to show an interaction also with current use of EPT. 20 We replicated this interaction between rs13387042 and current use of EPT using the OncoArray dataset. The two SNPs rs13387042 and rs4442975 are highly correlated (r 2 ¼ 0.93) and conditional analysis yielded a significant association only for rs4442975, so that these results reflect the same interaction. Finemapping and functional analyses have identified rs4442975 to be the most likely causal variant in this region. 43 Thus despite the small difference in the risk estimates between never and current EPT, replication of this G x E interaction reinforced what we found previously, implicating the role of the IGFBP5 gene and estrogen pathway in breast cancer.
Functional analyses indicate that SNP rs4442975 lies near a transcriptional enhancer which physically interacts with the IGFBP5 promoter, suggesting that the T-allele of rs4442975 decreases susceptibility to breast cancer via increased expression of insulin-like growth factor binding protein 5 (IGFBP5). 43 IGFBP5 is a key member of the insulin-like growth factor (IGF) axis which plays an important role in cellular differentiation, proliferation and apoptosis in breast cancer. 55 Activation of the IGF receptors by IGF causes phosphorylation of insulin receptor substrates (IRS-1 and IRS-2). This phosphorylation cascades multiple downstream signalling pathways such as Ras/mitogen-activated protein kinase (MAPK) and phosphoinositide (PI3K) serine-threonine kinase (AkT), which play a role in breast carcinogenesis. 56,57 Estrogen can stimulate the IGF pathway via increased expression of both insulin-like growth factor receptor-1 and IRS-1. Some studies have also reported a positive correlation between overexpression of IGFBP5 and the presence of ER in breast cancer cell lines. Progesterone has been shown to act by increasing levels of IRS-2 and sensitizing breast cancer cells to downstream signalling pathways such as MAPK and Akt. [58][59][60] It is plausible that exogenous hormone exposure due to estrogen and progesterone therapy may affect the regulation of the IGF pathway and thereby modulate germline IGFPB5 variant-related susceptibility to breast cancer. Note however that two other independent breast cancer risk variants in this region (tagged by rs16857609 13 and a 1.3 kb insertion/deletion 49 ) are also believed to target IGFBP5, but we did not find evidence for interactions between these variants and current EPT use.
Women of young age at first pregnancy are known to have increased circulating sex hormone-binding globulin and prolactin but decreased total estrogen levels. 61,62 Likewise, women who have had multiple full-term pregnancies have an overall decreased lifetime exposure to estrogen. 61,63,64 The association of rs6596100 with ERnegative breast cancer risk was found to be modified by number of full-term pregnancies and age at first full-term pregnancy for parous women. Based on INQUISIT, 15 the target genes of rs6596100 and highly correlated SNPs are predicted to be heat shock protein family A member 4 (HSPA4) and AF4/FMR2 family member 4 (AFF4). INQUISIT predicts HSPA4 as the most likely target, due to overlap of multiple correlated SNPs lying in HSPA4 promoter region, distal regulatory elements and coding sequence. HSPA4 gene is responsible for production of heat shock proteins (Hsps), particularly those belonging to the family HSP70. The underlying mechanisms regarding the relationship between rs6596100 and these pregnancyrelated risk factors are unknown at present. It is plausible that a lower estrogenic milieu due to reproductive factors may affect the formation of multicomplexes between steroid receptors like ER and heat shock proteins (HSPs), therefore affecting signalling pathways such as Wnt, ErbB, serine/threonine and tyrosine protein kinase, which are known to be involved in breast carcinogenesis. Whereas there is some biological plausibility regarding the observed interactions with rs6596100, the findings nevertheless could be by chance, and thus require independent replication.
The SNP rs941764 is located on chromosome 14 in intron of CCDC88C gene. 15,22 The effect modification of rs941764-associated ER-negative breast cancer risk by lifetime intake of alcohol was first reported by Rudolph et al. 54 We replicated this G x E interaction in an independent dataset in our study. Mutations in this gene region have been associated with dysregulation of Wnt signalling in neural disorders such as congenital hydrocephalus. 65 This gene codes a Hook-related protein (HkRP2) that binds to an important scaffold protein, Dishevelled, in the Wnt signalling pathway, affecting all downstream activity. 65 A role of alcohol has been well recognized in initiation and progression of breast cancer, presumably via multiple cellular and molecular mechanisms, including the EGFR/ ErbB2 pathways. Downstream to EGFR/ErbB2 pathways lie multiple pathways such as the MAPK, Wnt/GSK3b/bcatenin pathways. 66 Therefore, alcohol consumption could affect the risk of ER-negative breast cancer through dysregulation of Wnt signalling.
Our study provides the most comprehensive evaluation to date of potential effect modification of all known common genetic susceptibility variants by environmental risk factors for breast cancer. Our findings are based on the largest available dataset on breast cancer. Despite its large sample size, the study may remain statistically underpowered, considering the rather modest effect sizes of most of the common variants associated with breast cancer risk, and particularly for risk factors for which we have fewer data (Supplementary Table 11, available as Supplementary data at IJE online). 18 Statistical power was further diminished for subtype-specific analyses due to reduced sample sizes, especially for ER-negative breast cancer (10 896 ERnegative cases in the combined iCOGS and OncoArray dataset). 18 The lack of strong effect modifications for breast cancer could also be explained by the overall weak to moderate associations of environmental risk factors, except for MHT use with breast cancer risk along with the modest associations of common genetic variants. A further limitation of our study is that the findings may not be generalizable to other racial/ethnic groups since the analyses were restricted to women of European ancestry.
In conclusion, our analyses suggest that most of the associated effects of breast cancer susceptibility loci and environmental risk factors are consistent with a multiplicative model. The strongest evidence for an interaction was between the candidate causal variant rs4442975 at 2q35 and current use of EPT. The associated effect is supported by a plausible underlying biological mechanism, but further epidemiological and functional validation will be required to determine whether the interaction is genuine. The newly reported results for ER-negative breast cancer risk generate plausible biological hypotheses and may inform future functional studies. Overall, the results from our analyses do not suggest strong effect modification of the association between breast cancer susceptibility loci and risk of breast cancer by established epidemiological risk factors.