Fine-mapping identifies two additional breast cancer susceptibility loci at 9q31.2

We recently identified a novel susceptibility variant, rs865686, for estrogen-receptor positive breast cancer at 9q31.2. Here, we report a fine-mapping analysis of the 9q31.2 susceptibility locus using 43 160 cases and 42 600 controls of European ancestry ascertained from 52 studies and a further 5795 cases and 6624 controls of Asian ancestry from nine studies. Single nucleotide polymorphism (SNP) rs676256 was most strongly associated with risk in Europeans (odds ratios [OR] = 0.90 [0.88–0.92]; P-value = 1.58 × 10−25). This SNP is one of a cluster of highly correlated variants, including rs865686, that spans ∼14.5 kb. We identified two additional independent association signals demarcated by SNPs rs10816625 (OR = 1.12 [1.08–1.17]; P-value = 7.89 × 10−09) and rs13294895 (OR = 1.09 [1.06–1.12]; P-value = 2.97 × 10−11). SNP rs10816625, but not rs13294895, was also associated with risk of breast cancer in Asian individuals (OR = 1.12 [1.06–1.18]; P-value = 2.77 × 10−05). Functional genomic annotation using data derived from breast cancer cell-line models indicates that these SNPs localise to putative enhancer elements that bind known drivers of hormone-dependent breast cancer, including ER-α, FOXA1 and GATA-3. In vitro analyses indicate that rs10816625 and rs13294895 have allele-specific effects on enhancer activity and suggest chromatin interactions with the KLF4 gene locus. These results demonstrate the power of dense genotyping in large studies to identify independent susceptibility variants. Analysis of associations using subjects with different ancestry, combined with bioinformatic and genomic characterisation, can provide strong evidence for the likely causative alleles and their functional basis.


Introduction
Breast cancer is the most common female cancer worldwide, in both developed and less developed regions, including Asia and Africa. An estimated 1.38 million new breast cancer cases were diagnosed worldwide in 2008, and this burden is likely to increase in the coming decades as a result of population ageing and adoption of western lifestyles (1).
Susceptibility to breast cancer involves contributions from genetic, environmental, lifestyle and hormonal factors. Pathogenic mutations in the DNA-repair genes BRCA1 and BRCA2 confer high lifetime risks of the disease and are responsible for the majority of cases that occur in families with many affected members but account for only 20% of the excess familial relative risk (FRR) of the disease (2). Rare germline variants in genes including CHEK2, PALB2 and ATM each confer moderately increased relative risks (RR) of breast cancer but make only small contributions to the excess FRR (3)(4)(5). Genome-wide association studies (GWAS) have identified 79 single nucleotide polymorphisms (SNPs) that influence breast cancer susceptibility and explain a further 15% of the FRR (6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19). Statistical modelling suggests that several thousands of additional breast cancer susceptibility SNPs remain undetected (9). Genetic variants can be incorporated into risk prediction models that can stratify women by level of risk. The power of such models will improve as more variants are identified (20). One productive approach to identifying additional susceptibility variants is through fine-mapping of regions known to harbour susceptibility alleles.
The 9q31.2 breast cancer susceptibility locus, delineated by rs865686, was identified by a GWAS that utilised genetically enriched cases from the UK with either bilateral breast cancer or with a family history of the disease (7). A replication study using samples from the Breast Cancer Association Consortium (BCAC) indicated that the association with rs865686 was restricted to estrogen-receptor (ER) positive breast cancer (21). SNP rs865686 localises to a gene desert and consequently the mechanism of association is assumed to be through long-range regulation of target gene expression. The nearest neighbouring genes to rs865686 include Kruppel-like factor 4 (KLF4), RAD23 homologue B (RAD23B; both >600 kb proximal), actin-like 7B (ACTL7B) and inhibitor of kappa light polypeptide gene enhancer in B-cells, kinase complex-associated protein (IKBKAP; both >700 kb distal).
We performed a fine-mapping study, using over 85 000 European and 12 000 Asian ancestry samples from BCAC, in order to localise the causal variant underlying the association between rs865686 and susceptibility to breast cancer. In addition we assessed whether other independent breast cancer susceptibility SNPs could be detected at the 9q31.2 locus.

Results
We successfully genotyped a total of 424 SNPs spanning 110 740 582-111 100 826 bp (NCBI HG37) on chromosome 9. These SNPs captured ∼94% and 86% of common 1000 Genomes Project (1KGP) variants at r 2 ≥ 0.8 in European and Asian populations, respectively. Association analyses were performed using 85 760 subjects of European ancestry, 12 491 subjects of Asian ancestry and 1978 subjects of African ancestry (Supplementary Material, Table S1). We report only the results from the European and Asian studies, as there were too few samples for meaningful analyses of women of African ancestry. However, the full results from the European, Asian and African studies are presented in Supplementary Material, Table S2A-C. We used statistical imputation of unobserved genotypes to increase the density of our finemapping analysis; a total of 2035 SNPs and insertion/deletion (indel) polymorphisms were inferred using 1000 Genomes Project (1KGP) reference data, from which 1529 variants were imputed with high certainty (Impute2 (22) information measure ≥0.5) and included in subsequent association analyses. Because no imputed variant was more significantly associated with breast cancer risk than the highest ranked, directly genotyped SNPs, they were not considered in the following analyses unless explicitly stated.
The most significantly associated SNP was rs676256 (odds ratio [OR] = 0.90 [0.88-0.92]; P = 1.58 × 10 −25 ; Fig. 1A and Table 1; Supplementary Material, Table S2A). SNP rs676256 was one of a 14.4 kb cluster of 38 genotyped or imputed correlated SNPs (r 2 > 0.8 in controls of European ancestry) that also included SNP rs865686. Of the 38 SNPs correlated with rs676256 at r 2 ≥ 0.8, 27 had likelihood ratios >1:100 relative to rs676256 (Supplementary Material, Table S3); hence it is likely that at least one of the 28 SNPs in this independent set of correlated highly associated variants (iCHAV) is causal (23).
To determine whether additional SNPs at 9q31.2 confer risks of breast cancer independently of rs676256, we fitted a series of stepwise logistic regression models ( Fig. 1B-D), stopping when no additional SNPs reached genome-wide significance (Fig. 1D). We identified SNPs rs10816625 (stepwise OR = 1.12 [1.07-1.16]; P = 3.49 × 10 −08 ; Fig. 1B) and rs13294895 (stepwise OR = 1.08 [1.06-1.11]; P = 4.56 × 10 −10 ; Fig. 1C). The P-values and effect estimates for all three susceptibility SNPs, adjusted by study and ancestry-informative principal components, but not adjusted for the other SNPs, are shown in Table 1 . There was little evidence of between-study effect heterogeneity for each SNP (rs10816625: Cochran's Q P-value = 0.48, I 2 = 0; rs13294895: Cochran's Q P-value = 0.86, I 2 = 0; rs676256: Cochran's Q P-value = 0.27, I 2 = 0.11). rs676256 is essentially uncorrelated with either rs10816625 or rs13294895 (rs676256|rs10816625: r 2 = 2.5 × 10 −04 , D′ = 0.08; rs676256|rs13294895: r 2 = 0.013, D′ = 0.31). rs10816625 and rs13294895, which are within 103 bp of each other, lie in the same LD block (D′ = 1). The risk alleles rarely occur together: analysis of computationally phased genotype data estimated only 160 haplotypes carrying the risk alleles of both rs10816625 and rs13294895 from a total of over 183 000, corresponding to an estimated population frequency of 0.09% (compared with 1.2% expected under equilibrium). analysis using data from the Caucasian studies, in which the most strongly associated SNP from a given model is included as a covariate in the subsequent model. Chromosome position is indicated on the x-axis, and -log10 P-value on the y-axis. The models represented are adjusted for study and seven ancestry-informative principal components. Each directly genotyped SNP is represented as a single red diamond and the most significant SNP that attained genome-wide significance from each step of the stepwise regression is indicated by a yellow diamond. (E) Regional association plot for the 9q31.2 fine-mapping SNPs in subjects with Asian ancestry tested using a model adjusted for study and two ancestry-informative principal components. However, given the relative rarity of the risk alleles, there is little correlation between the SNPs (r 2 = 0.014). SNPs rs10816625 and rs13294895 were uncorrelated with any other variant at r 2 ≥ 0.8.
In Asians, rs10816625 was notable for being the only SNP that showed evidence of association with breast cancer risk, albeit not at genome-wide levels of significance (OR = 1.12 [1.06-1.18]; P = 2.77 × 10 −05 ; Fig. 1E and Table 1; Supplementary  Material, Table S2B). SNP rs10816625 has a relatively low minorallele frequency (MAF; 6%) in European populations but is common in Asian populations (MAF averaged across controls from nine Asian studies = 38%). There was no evidence of inter-study heterogeneity for rs10816625 in the contributing Asian studies (Cochran's Q P-value = 0.51, I 2 = 0  . We further explored the association of rs676256 with ER-negative/PR-positive breast cancer using case-only analysis for PR, adjusted for ER (P = 0.06). SNP rs10816625 was significantly associated with only ER-positive/ PR-positive breast cancer; rs13294895 was significantly associated with ER-positive/PR-positive breast cancer and nominally associated with ER-positive/PR-negative disease (Table 3). There was little evidence for heterogeneity in the effects conferred by SNPs rs10816625, rs13294895 and rs676256 according to human epidermal growth factor receptor 2 (HER2) expression ( Table 2). We also observed no evidence of heterogeneity in effects conferred by rs10816625 according to either tumour ER or PR status in subjects with Asian ancestry ( Table 2).
Because all three SNPs reported in our fine-mapping analysis of Europeans were primarily associated with ER-positive, but not ER-negative tumours, we restricted further stratified analyses of additional breast cancer risk factors to cases with ER-positive disease. However, the results from analyses of all breast cancers combined and from ER-negative breast cancers are presented in Supplementary Material, Tables S4-S7. In Europeans, but not Asians, the effect of rs10816625 was stronger in cases with node-negative (OR = 1.19 [1.12-1.25], P = 4.55 × 10 −09 ; Table 4) than in those with node-positive disease (OR = 1.07 [0.99-1.14], P = 0.07, P het = 5.98 × 10 −03 ; Table 4). There was no significant evidence of interaction according to tumour morphology (Table 5). We observed evidence of a linearly increasing trend in the OR by grade for rs10816625 in Asians only (P trend = 4.91 × 10 −04 ; Table 6). We previously reported a trend in per-allele OR for rs865686 with increasing age at diagnosis in ER-positive breast cancer, with a stronger association at younger ages (21). Here we report that the same was true for rs676256 in women of European ancestry (P trend = 0.02; Table 7); we saw no compelling evidence of a similar age interaction for rs10816625 or rs13294895 ( Table 7). Because the 9q31.2 breast cancer locus was initially discovered in a study enriched for bilateral and familial cases we estimated ORs for each SNP in sporadic, familial and bilateral cases (Supplementary Material, Table S8). There were no statistically significant differences in ORs between sporadic and either bilateral or familial cases.
In an effort to identify putative causal variants underlying each of the three associations, we performed a bioinformatic analysis. We used data from the ENCODE project (24) and elsewhere (25) to explore the co-localisation of the association signals with features indicative of functional genomic elements in     breast cancer models, including evidence of transcription factor binding, DNase hypersensitivity and relevant histone modification marks. Both SNPs rs10816625 and rs13294895 localise to a region of putative regulatory significance in MCF7 cells, demarcated by histone H3 lysine 27 acetylation (H3K27ac) and histone H3 lysine 4 mono-methylation (H3K4me1), both of which are characteristic features of active enhancers ( Fig. 2A) (26,27). There was less evidence for either histone modification mark in human mammary epithelial cells (HMEC; not shown). Both SNPs are located directly under the binding sites for a number of breast cancer-relevant transcription factors, including forkhead box M1 (FOXM1) and GATA binding protein 3 (GATA3; Fig. 2A) (28,29). To reduce the number of candidate functional polymorphisms for the rs676256 iCHAV, we applied a heuristic scoring system to prioritise variants that localise to regions with cistromic and epigenetic activity (30). We identified three variants in this iCHAV that co-localise with potentially relevant genomic features ( Fig. 2A) (H3K27me3) and has evidence of FOXM1 and GATA3 binding in MCF7 cells ( Fig. 2A).
Estrogen receptor-α (ER-α) and forkhead box A1 (FOXA1) are key drivers of ER-positive breast cancer. Because there are currently limited ENCODE data on either of these factors, we explored their binding at the 9q31.2 susceptibility locus in MCF7 cells using data from Hurtado et al. (31). We found that the three lead SNPs localise to binding sites for both transcription factors ( Fig. 2B and C). SNPs rs10816625 and rs13294895 map directly under ER-α and FOXA1 binding peaks which co-localise to the putative active enhancer described above. rs5899787, from the rs676256 iCHAV, also maps directly under an ER-α and FOXA1 binding peak; none of the other SNPs in the rs676256 iCHAV map to this, or any other ER-α and FOXA1 peaks.
A recent integrative analysis of data from The Cancer Genome Atlas suggested that the original 9q31.2 risk locus influences transcript levels of KLF4 (32). We investigated, using chromosome conformation capture (3C) in HindIII digested MCF7 (Fig. 3A) and SUM44 (Fig. 3B) 3C libraries, whether the locus containing SNPs rs10816625 and rs13294895 also interacts with KLF4 through long-range chromatin interaction. We detected elevated interaction frequencies between HindIII fragments containing SNPs rs10816625 and rs13294895 and those containing KLF4; interactions with HindIII fragments either side of KLF4 were lower in comparison. Moreover no interaction was detected between the fragment containing SNPs rs10816625 and rs13294895 with RAD23B.
To determine whether either locus had enhancer activity we performed a series of dual luciferase assays using a minimal promoter vector, pGL4minP. To explore the rs10816625/rs13294895 locus we inserted a 1 kb fragment containing the common alleles of both variants, plus flanking DNA, into pGL4minP ( pGL4minP-AB). We observed an increased level of activity of the minimal promoter in the pGL4minP-AB construct relative to pGL4minP in both MCF7 (8.2-fold increase; P = 6.12 × 10 −05 ; Fig. 3C) and T47D cells (3.1-fold increase; P = 6.66 × 10 −04 ; Fig. 3D). To determine whether the risk alleles of rs10816625 and rs13294895 disrupted this enhancer activity we generated three additional constructs, carrying a single risk allele of either rs10816625 ( pGL4minP-aB) or rs13294895 ( pGL4minP-Ab), or carrying risk alleles of both SNPs ( pGL4minP-ab). We observed significant evidence for a difference in the means of the dual luciferase ratios of these constructs in MCF7 and T47D cells (P < 7 × 10 −04 ; Fig. 3C and D). In T47D cells we found a statistically significant difference between pGL4minP-AB and either pGL4minP-aB (P = 5.45 × 10 −03 ), pGL4minP-Ab (P = 0.04) or pGL4minP-ab (P = 4.97 × 10 −04 ; Fig. 3D). In MCF7 cells there was a statistically significant difference between pGL4minP-AB and pGL4minP-aB (P = 6.62 × 10 −05 ), but not pGL4minP-Ab (Fig. 3C). There was no significant difference between the construct containing both risk alleles and constructs containing one risk allele in T47D cells (Fig. 3D). We performed a similar series of analyses to explore the putative poised enhancer centred on SNP rs5899787. Relative to pGL4minP, we observed a reduction in reporter gene expression but saw no evidence to support an allele-specific effect (data not shown).

Discussion
In a combined analysis of data from 50 case-control studies comprising more than 100 000 women, we have refined the localisation of the breast cancer association signal on chromosome 9q31.2 to a set of 28 highly correlated variants in a 14.5 kb region in which SNP rs676256 was the most strongly associated variant. Furthermore we have demonstrated the presence of two novel independent susceptibility alleles at 9q31.2, SNPs rs10816625 and rs13294895, both of which are strong candidates to be causal variants. Breast cancer is a heterogeneous disease comprising multiple subtypes that can be classified according to histological, immunophenotypic and molecular characteristics. Although the majority of known breast cancer susceptibility loci are preferentially associated with ER-positive tumours (33), a number of recent subtype-specific studies have detected genetic associations unique to ER-negative tumours, suggesting distinct underlying aetiologies for each subtype (17,34,35). The index 9q31.2 breast cancer susceptibility association, demarcated by SNP rs865686 (7), was largely restricted to ER-positive breast cancer (21) and this was confirmed for rs676256 in the European samples analysed in this study. SNPs rs10816625 and rs13294895 were also associated with ER-positive, but not ER-negative, breast cancer in Europeans, albeit with more modest statistical evidence of heterogeneity than for rs672656.
The majority of susceptibility loci for breast and other cancers have been detected using studies of predominantly European ancestry. However, confirmation of associations in populations with different ethnicity from those used for discovery can add weight to their validity (36). Approximately 10% of the samples genotyped in our fine-mapping study were from subjects of Asian ancestry. In Asians, rs10816625 had a higher MAF than in Europeans and was the only SNP that was significantly associated with breast cancer risk; the OR was similar to that in Europeans. Neither rs676256 nor rs13294895 were significantly associated with risk in Asians, but the MAFs were much smaller than in Europeans and the ORs did not differ by ethnicity. SNP rs10816625 resides on a strong hotspot of recombination in Europeans and exhibits low pairwise correlation with all but two other SNPs, each of which has a P-value for association with breast cancer several orders of magnitude larger than that of rs10816625. These observations provide evidence that rs10816625 was causally associated with breast cancer.
The third breast cancer susceptibility SNP that we detected, rs13294895, localises to within ∼100 bp of rs10816625. Analysis of computationally phased haplotypes indicates that their risk were mapped on to the breast cancer associated regions identified by fine-mapping. For SNPs rs10826625 and rs13294895, the iCHAVs were defined as SNPs having r 2 ≥ 0.8 with either SNP; for rs676256 it was defined as all SNPs with r 2 ≥ 0.8 and likelihood ratios >1:100 relative to rs676256. There were no other SNPs in the iCHAVs for rs10816625 and rs13294895. The rs676256 iCHAV comprised 28 SNPs. SNPs whose identifiers are shown in red type were of putative functional significance (see Materials and Methods). Where the lead SNP was not deemed to be of putative functional significance, it is indicated in green, as is the index 9q31.2 SNP, rs865686. (B) Regional binding profiles for ER-α in MCF7 cells shown plotted across the fine-mapping region using data from (31). The locations of the lead SNPs are indicated with yellow diamonds. (C) Regional binding profiles for FOXA1 in MCF7 cells shown plotted across the fine-mapping region using data from (31). The locations of the lead SNPs are indicated with yellow diamonds. alleles rarely occur together, consistent with having arisen independently on the same ancestral haplotype with little subsequent recombination.
We used bioinformatic annotation of the regions demarcated by SNPs rs10816625, rs13294895 and rs676256 to identify a set of variants that had putative regulatory potential and, as such, were candidates to be the causal alleles underlying the observed associations. SNPs rs10816625 and rs13294895 localise to a region with a histone modification signature that suggests it is an active enhancer in MCF7 cells. We also saw evidence that supports Results from three replicate libraries are plotted; each quantitative PCR reaction was performed in triplicate. Error bars represent standard mean errors. (B) Chromatin interaction data from HindIII 3C libraries generated using SUM44 cells. (C) Dual luciferase assays for reporter constructs containing the common alleles of both rs10816625 and rs13294895 ( pGL4minP-AB), risk allele of rs10816625 ( pGL4minP-aB), risk allele of rs13294895 ( pGL4minP-Ab) and risk alleles of both SNPs ( pGL4minPab) transiently transfected into MCF7 cells. Ratios were normalised to a minimal promoter construct ( pGL4minP). Each transfection was repeated five times and constructs were generated in both forward and reverse orientations. (D) Dual luciferase assays for reporter constructs containing the common alleles of both rs10816625 and rs13294895 ( pGL4minP-AB), risk allele of rs10816625 ( pGL4minP-aB), risk allele of rs13294895 ( pGL4minP-Ab) and risk alleles of both SNPs ( pGL4minPab) transiently transfected into T47D cells.
binding of ER-α, FOXA1 and GATA3 at this locus, directly over the sites of rs10816625 and rs13294895. ER-α is an established driver of luminal breast cancer and FOXA1 is a pioneer factor that physically interacts with compacted chromatin, facilitating binding of ER-α, and is necessary for ER-α mediated transcription (31,37). GATA3 is thought to play a key role in making enhancer elements accessible to ER-α and its expression is highly correlated with both ER-α and FOXA1 in breast tumours (38,39). Of note, Cowper-Sal·lari et al have recently demonstrated that breast cancer susceptibility loci are enriched for ER-α and FOXA1 binding events (40). Our in vitro data support the hypothesis that this locus possesses enhancer activity and indicate that the risk alleles of rs10816625 and rs13294895 can diminish its activity, indicating that these are independent risk susceptibility variants acting through the same mechanism.
Li et al. have recently suggested the original 9q31.2 breast cancer susceptibility locus acts via regulation of the transcription factor KLF4 (32). In their article these authors identified KLF4 as the target of the 9q31.2 locus on the basis of a trans-eQTL analysis in which they first identified the set of eQTL genes associated with rs471467 (a perfect proxy for rs865686) and then looked for enrichment of transcription factor binding sites within ENCODE defined enhancer elements of these genes. We have demonstrated an excess of long-range chromatin interactions between the rs10816625/rs13294895 region and the KLF4 gene locus. Our results and those of Li et al. suggest therefore that KLF4 is the target of multiple 9q31.2 breast cancer susceptibility SNPs. In contrast to recent eQTL analysis by Li and colleagues implicating RAD23B as the target of the prostate cancer susceptibility SNP rs817826, we found no evidence that these breast cancer SNPs interacted with RAD23B (41). KLF4 has both oncogenic and tumour suppressive roles depending on the tissue in which it is expressed (42). It is thought to be expressed at low levels in normal breast epithelium, but is overexpressed in a large proportion of both ductal carcinoma in situ and invasive breast cancer (43). Our reporter assays targeting the rs10816625/rs13294895 SNPs suggest that lower levels of expression of KLF4 are associated with increased breast cancer risk.
In contrast to the rs10816625/rs13294895 locus, refinement of the association signal at the rs676256 locus was complicated by the large number of variants in high LD with the lead SNP. Of the 28 highly correlated variants in this iCHAV, analysis of ENCODE data identified three that fall into two distinct functional regions. SNPs rs662694 and rs471467 localise to a predicted insulator region, defined by CTCF binding and H3K27me3 marks (44). SNP rs5899787 was located in a region that shared similar functionally significant features to those of the rs10816625/rs13294895 locus. It localises directly to a second site of strong ER-α and FoxA1 co-localisation and had strong evidence of GATA-3 binding in the ENCODE data. Our data suggested that a construct containing the common allele of rs5899787 suppressed the activity of the minimal promoter in our reporter gene system, but we saw no evidence for an allelespecific effect. Further work will be required to determine the identity and mode of action of the causative variant (or variants) at this locus.
Including the variants identified in our study, 81 common germline polymorphisms conferring susceptibility to breast cancer have now been identified. Our study, and those of others, demonstrate the power of fine-mapping in large studies both for the detection of novel independent susceptibility SNPs and determining a minimal set of likely causal variants (15,16).

Sample selection
Samples (n = 103 991) were selected from 52 studies participating in BCAC and genotyped as part of the COGS project (9). Most contributing studies were either population or hospital-based casecontrol studies, while some were nested in cohorts or selected for family history, age or tumour characteristics. Full details of contributing studies can be found in Supplementary Material, Table S1. Four studies, Demokritos (DEMOKRITOS), Ohio State University (OSU), Städtisches Klinikum Karlsruhe Deutsches Krebsforschungszentrum Study (SKKDKFZS) and the Roswell Park Cancer Institute Study (RPCI) were genotyped as part of the Triple Negative Breast Cancer Case-control Consortium, but are analysed here in their component studies. Analyses were restricted to cases with invasive breast cancer. All analyses reported were stratified according to ancestry of the study participants, categorised as having predominantly European (n = 43 160 cases; 42 600 controls), Asian (n = 5795 cases; 6624 controls) or African ancestry (n = 1046 cases; 932 controls), determined by a principal components analysis of 37 000 uncorrelated SNPs ancestry-informative markers, described elsewhere (9). All BCAC studies had local ethical approval.

Genotyping and quality control
A total of 447 fine-mapping SNPs were selected to interrogate the 9q31.2 locus. The fine-mapping region was defined as the region that included including all SNPs correlated with the index SNP, rs865686, at r 2 > 0.1. For genotyping we first selected all SNPs with an Illumina Design Score >0.8 and r 2 with rs865686 >0.1. We then selected an additional set of SNPs designed to tag all remaining SNPs in the interval at r 2 > 0.9. Genotyping was performed using a custom-designed International Collaborative Oncology Gene-environment Study (iCOGS) genotyping array (Illumina, San Diego, CA). The iCOGS array comprised assays for 211 155 SNPs, primarily selected for replication analysis of loci putatively associated with breast, ovarian or prostate cancer and for fine-mapping of the known susceptibility loci for these cancers. Full details of the iCOGS array design, sample handling and post-genotyping QC processes are described in-depth elsewhere (9). Briefly, samples were excluded from the analytic dataset for any of the following reasons: gender discordance according to array data, call rate <95%, excess heterozygosity (P < 1 × 10 −06 ), individuals not concordant with previous genotyping, discordant duplicate pairs, within-study duplicates with discordant phenotype data, or inter-study duplicates, first degree relatives, phenotypic exclusions and concordant replicates. Multi-dimensional scaling was used to infer ethnicity; individuals with greater than 15% mixed ancestry were excluded from analyses. Clustering of significantly associated, directlygenotyped SNPs was verified by manual inspection of genotype cluster plots (Supplementary Material, Fig. S1). Of the 447 target-SNPs selected for fine-mapping, 424 passed post-genotyping quality control measures; we excluded six SNPs that were monomorphic in Europeans and a further six that showed strongly significant deviation of genotype frequencies from Hardy-Weinberg proportions in controls (P < 1 × 10 −04 ).

Bioinformatics
We used publically available DNase hypersensitivity, transcription factor binding and histone modification ChIP-seq data from the ENCODE project (24) and elsewhere (27,31) to overlay functional annotations on the fine-mapping region and investigate enrichment of functional elements at associated loci. For the rs676256 locus we first identified a subset of polymorphisms that had r 2 ≥ 0.8 with the lead SNP and then filtered the putative functional significance of variants by applying a heuristic score using RegulomeDB (http://regulome.stanford. edu/) to prioritise candidate functional variants prior to further investigation.

Quantitative 3C
MCF7 and SUM44 3C libraries were generated using 2 × 10 7 cells fixed with 2% paraformaldehyde for 5 min. 3C was carried out using the digestion and ligation steps of a Hi-C protocol (45), replacing the biotin dNTP fill-in with the addition of 56.7 µl of water. A control 3C library was generated as previously described (46) using minimally overlapping BAC clones (Children's Hospital Oakland Research Institute, Oakland CA; Life Technologies, Carlsbad, CA, USA) which covered the HindIII fragments between rs10816625 and the target region, combined in equimolar amounts. To optimise the Taqman PCR reactions and normalise the data, we generated a standard curve using the control templates. Taqman PCR was carried out using Taqman Universal PCR Mastermix no UNG (Life Technologies, Carlsbad CA) with 250 ng of 3C library. Three separate 3C libraries were prepared for each cell-line, then from each library three quantitative PCR reactions were performed for each restriction fragment. Interactions between rs10816625/rs13294895 and target loci were expressed as relative interaction frequencies compared with the control BAC library standard curve. BAC libraries and primer sequences are available on request.

Dual luciferase assays
DNA fragments containing either rs10816625 and rs13294895 or rs5899787 were cloned into the multiple cloning site of pGL4.23 [luc2/minP] (Promega, Madison, WI). Site-directed mutagenesis with the Quickchange Lightning Site Directed Mutagenesis Kit (Agilent Technologies, Berkshire, UK) was used to create constructs containing all combinations of rs10816625/rs13294895 common and risk alleles (rs10286625 common/rs13294895 common, pGL4minP-AB; rs10286625 risk/rs13294895 common, pGL4minP-aB; rs10286625 common/rs13294895 risk, pGL4minP-Ab; rs10286625 risk/rs13294895 risk, pGL4minP-ab). In addition, we created reverse orientation constructs for each insert to verify orientation independence. The allelic status of each construct was confirmed by Sanger sequencing. PCR primers for cloning and site-directed mutagenesis are available on request. We used gBlocks Gene Fragments (Integrated DNA Technologies, Leuven, Belgium) to create constructs ( pGL4minP-A and pGL4minP-a) for the common and risk alleles of the rs5899787 SNP.

Statistics
Analysis of the association between each SNP and risk of breast cancer was performed using unconditional logistic regression assuming a log-additive genetic model, adjusted for study and ancestry-informative principal components (n = 7 for European studies; n = 2 for Asian and African studies). P-values were calculated using a one-degree of freedom likelihood-ratio test. We also estimated the effects of each heterozygote and minor-allele homozygote genotype relative to the common homozygote in a two-degrees-of-freedom model (Supplementary Material, Table S2). Forward stepwise logistic regression was used to explore whether additional loci in the fine-mapping region were independently associated with breast cancer risk. I 2 statistics were used to assess heterogeneity of the RR estimates between studies at significantly associated loci. We conducted analyses of SNP associations by tumour receptor status, morphology, lymph node involvement, grade and age for the European and Asian ancestry studies using polytomous logistic regression. Tumour information in BCAC was collected as previously described (47). There were too few samples with African ancestry to conduct stratified analyses. We also considered a polytomous logistic regression model comprising all four possible combinations of ER and PR status. Case-only analyses of tumour receptor status, morphology and lymph node involvement were used to assess heterogeneity between disease subtypes. Case-only allelic logistic regression using number of copies of each minor allele as response variable was used to test for linear trends in OR by grade and age at diagnosis.
We used a t-test to assess the difference in mean dual luciferase ratios for reporter gene constructs. One-way analysis of variance was used to assess equality of means of log-transformed dual luciferase ratios. Homogeneity of variances was assessed using Bartlett's test and QQ-plots of standardised residuals were visually inspected for evidence of departure from those expected under a normal distribution.
Post-hoc comparison of group means was carried out using Tukey's HSD test. All statistical analyses were conducted using R (www.R-project.org/) and the Genotype Libraries and Utilities package (GLU; code.google.com/p/glu-genetics).

Supplementary Material
Supplementary Material is available at HMG online.
the views or policies of the National Cancer Institute or any of the collaborating centres in the Breast Cancer Family Registry (BCFR), nor does mention of trade names, commercial products or organizations imply endorsement by the USA Government or the BCFR. The ABCFS was also supported by the National Health