Common non-synonymous SNPs associated with breast cancer susceptibility: findings from the Breast Cancer Association Consortium

Candidate variant association studies have been largely unsuccessful in identifying common breast cancer susceptibility variants, although most studies have been underpowered to detect associations of a realistic magnitude. We assessed 41 common non-synonymous single-nucleotide polymorphisms (nsSNPs) for which evidence of association with breast cancer risk had been previously reported. Case-control data were combined from 38 studies of white European women (46 450 cases and 42 600 controls) and analyzed using unconditional logistic regression. Strong evidence of association was observed for three nsSNPs: ATXN7-K264R at 3p21 [rs1053338, per allele OR = 1.07, 95% confidence interval (CI) = 1.04–1.10, P = 2.9 × 10−6], AKAP9-M463I at 7q21 (rs6964587, OR = 1.05, 95% CI = 1.03–1.07, P = 1.7 × 10−6) and NEK10-L513S at 3p24 (rs10510592, OR = 1.10, 95% CI = 1.07–1.12, P = 5.1 × 10−17). The first two associations reached genome-wide statistical significance in a combined analysis of available data, including independent data from nine genome-wide association studies (GWASs): for ATXN7-K264R, OR = 1.07 (95% CI = 1.05–1.10, P = 1.0 × 10−8); for AKAP9-M463I, OR = 1.05 (95% CI = 1.04–1.07, P = 2.0 × 10−10). Further analysis of other common variants in these two regions suggested that intronic SNPs nearby are more strongly associated with disease risk. We have thus identified a novel susceptibility locus at 3p21, and confirmed previous suggestive evidence that rs6964587 at 7q21 is associated with risk. The third locus, rs10510592, is located in an established breast cancer susceptibility region; the association was substantially attenuated after adjustment for the known GWAS hit. Thus, each of the associated nsSNPs is likely to be a marker for another, non-coding, variant causally related to breast cancer risk. Further fine-mapping and functional studies are required to identify the underlying risk-modifying variants and the genes through which they act.


INTRODUCTION
Few common non-synonymous genetic variants have been implicated in breast cancer susceptibility. Earlier candidategene association studies focused heavily on such variants but generally failed to produce robust findings (1). Agnostic approaches using genome-wide panels of single-nucleotide polymorphisms (SNPs) have been much more successful, having identified .70 common breast cancer susceptibility loci to date (2 -21). No missense variants have been clearly shown to explain these observed associations with marker SNPs. The fact that the effect sizes detected by these large-scale studies were relatively small [for the vast majority, the associated odds ratio (OR) was ,1.20] suggests that most, if not all, of the earlier candidate-gene studies were underpowered to detect associations of a realistic magnitude.
The Wellcome Trust Case-Control Consortium (WTCCC) previously conducted an association study of 14 436 nonsynonymous SNPs (nsSNPs) across the genome, using a custom array genotyped in 1053 breast cancer cases and 1500 controls (22). No clear associations were identified. However, no replication stage was carried out and the study had ,15% power to detect a per-allele OR of 1.20 for even the most common variants at a Bonferroni-corrected nominal significance threshold of 3.5 × 10 26 . One of the SNPs on the array has previously been studied by Breast Cancer Association Consortium (BCAC); we found evidence that AKAP9-M463I (rs6964587) was associated with breast cancer risk, with a recessive model appearing to be the best fit, although evidence of association (P ¼ 0.001) did not reach genome-wide statistical significance (23).
We aimed to assess the most promising association signals from the WTCCC study in a much larger BCAC case -control study that formed part of the Collaborative Oncological Gene-Environment Study (COGS). COGS is a multi-consortium project that seeks to identify common variants contributing to susceptibility to breast, ovarian and prostate cancer (http:// www.nature.com/icogs/primer/cogs-project-and-design-of-theicogs-array/). It is based on genotyping case -control samples using a custom iSelect SNP genotyping array (iCOGS). The principal criterion for inclusion of SNPs on this array by BCAC was statistical evidence of association from a combined analysis of nine genome-wide association studies (GWASs); the analysis of these SNPs selected from GWAS, identifying .40 novel breast cancer susceptibility loci (2 -4), has been completed. We also included on the iCOGS array, and successfully genotyped, 41 nsSNPs from the WTCCC study, including rs6964587, for which the strongest evidence of association had been observed. In the present analysis, we attempted to replicate these associations using the BCAC component of COGS, comprising 53 835 female breast cancer cases and 50 156 controls (Table 1).

RESULTS
After quality control (QC), all genotyped SNPs in the present analysis had overall call rates .95% and duplicate and HapMap sample concordance .98%. No evidence of departure from Hardy -Weinberg equilibrium was observed in controls overall (P ≥ 0.11 for Europeans), and no strong evidence was  figure). SNP rs10510592 (L513S) in NEK10 is located 83 kb from a known breast cancer susceptibility GWAS hit, rs4973768 (9), which was also genotyped on iCOGS; the two SNPs are in modest linkage disequilibrium (LD; r 2 ¼ 0.36). The evidence of association using the same dataset was stronger for rs4973768 (P ¼ 3.0 × 10 222 ). A multivariate analysis including both SNPs resulted in substantial attenuation in the OR for rs10510592 (per-allele OR ¼ 1.05, 95% CI ¼ 1.02-1.07, P ¼ 0.0010), while the evidence of association for rs4973768 remained strong (P ¼ 1.0 × 10 28 ). The variant rs10510592 was included on iCOGS, both as part of the present study and as part of a finemapping study of 899 SNPs in an 881 kb region of 3p24. More detailed multivariate analyses of these fine-mapping SNPs, complemented by functional analysis, will be required to pinpoint the underlying causal variant(s).
The nsSNP in AKAP9, rs6964587, had been previously studied by the BCAC (23,24). The dataset used in the previous analysis overlapped partially with the present study (14 423 cases and 12 785 controls were in both datasets). Table 3 presents results from both analyses after removing overlapping samples from the latter. In the present study, we observed strong independent evidence of replication of the reported association (P ¼ 9.2 × 10 27 ). After combining published and new data from European women (55 445 cases and 62 668 controls), the per-T-allele OR estimate was 1.05 (95% CI ¼ 1.04-1.07, P ¼ 2.5 × 10 29 ) and the OR relative to the GG genotype was 1.04 (95% CI ¼ 1.01 -1.07, P ¼ 0.0034) for GT and 1.12 (95% CI ¼ 1.08 -1.16, P ¼ 1.1 × 10 29 ) for TT. The per-allele OR estimates and 95% CI to two decimal places were unchanged when analyses were repeated excluding 3734 cases with carcinoma in situ or unknown invasiveness (P ¼ 3.7 × 10 29 ). The above analyses were adjusted only for study as principal components could not be determined for published data; however, when adjustment was made for principal components for the iCOGS data alone (setting the principal components to zero for other samples), the results were similar (per-T-allele OR ¼ 1.05, 95% CI ¼ 1.03-1.07, P ¼ 1.3 × 10 28 ). All subsequent analyses for this SNP included published and new data, unless otherwise specified. The genotype-specific ORs were consistent with a log-additive (per-allele) model; a recessive model as previously proposed could be rejected (OR ¼ 1.05, 95% CI ¼ 1.03-1.06, P ¼ 8.4 × 10 28 ; P ¼ 0.0034 compared with a two-parameter model). No notable between-study heterogeneity was observed (I 2 ¼ 32%, Fig. 1).
We also had access to the original combined data from nine GWASs used to select the majority of the BCAC SNPs on iCOGS. These included either measured or imputed genotypes for rs6964587 (4). Data for 7938 cases and 11 809 controls had not been included in the analyses conducted to date. The estimated OR based on a meta-analysis of these GWAS data was 1.05 per T-allele (95% CI ¼ 1.01-1.10, P ¼ 0.027). This model was a better fit than a recessive model (OR ¼ 1.07, 95% CI ¼ 1.00-1.14, P ¼ 0.043). When these GWAS data were combined with the iCOGS and previously published data, the estimated per-allele OR for rs6964587 was 1.05 (95% CI ¼ 1.04-1.07, P ¼ 2.0 × 10 210 ).
The T allele of rs6964587 was less frequent in Asians (0.19) and more frequent in African-American women (0.51) than in Europeans (0.39). While there was no statistically significant evidence of association in either Asian or African-American women, the estimated OR in Asians (after combining available data, OR ¼ 1.05, 95% CI ¼ 0.99 -1.11) was similar to that in Europeans, and in both non-European populations the 95% CIs included the OR estimate in Europeans (Table 3). Based on data for European women, there was evidence of association for both ER-positive (OR ¼ 1.06, 95% CI ¼ 1.04-1.08, P ¼ 3.2 × 10 28 ) and ER-negative breast cancer (OR ¼ 1.04, 95% CI ¼ 1.01-1.07, P ¼ 0.019; P ¼ 0.47 for difference in OR by ER disease).There was no evidence of differences in the OR by age (P ¼ 0.58), family history (P ¼ 0.74) or any of the other tumor characteristics considered (PR status, HER2 status, axillary node status, grade, size or morphology; P ≥ 0.084).
There were no other SNPs genotyped on iCOGS within 500 kb of rs6964587 that gave stronger evidence of association in Europeans, based on the BCAC data. However, there were 133 SNPs that gave stronger evidence based on imputed genotypes (all with imputation r 2 . 0.90); an intronic single-base deletion in AKAP9 (chr7:91681597), located 51 kb from rs6964587, was the best imputed hit (P ¼ 4.4 × 10 27 , compared with 1.7 × 10 26 for rs6964587 in the same dataset). This variant was also well imputed in Asians and African Americans (imputation r 2 ¼ 0.99), but no independent evidence of association was observed in either (P . 0.35). There were three genotyped and 63 imputed (with imputation r 2 . 0.8) SNPs with P below an arbitrary cut-off of 0.001 in Asian women, but the evidence of association for these SNPs in European women was weak (P ≥ 0.0029) relative to that for rs6964587.
The minor T allele of rs1053338 has a similar frequency (0.13) in European and Asian women, but was much less frequent in African Americans (0.032). The results for Asian and African-American women were consistent with those for Europeans (P-het ¼ 0.77; Table 4). There was no evidence of a differential association with the risk of disease subtypes We assessed associations with other SNPs within 500 kb either side of rs1053338, both genotyped and imputed, based on BCAC iCOGS data. Slightly stronger evidence of association was observed in Europeans for one other genotyped SNP: rs3821902, an intronic variant in ATXN7 located 26 kb away (OR ¼ 1.08, 95% CI ¼ 1.05-1.11, P ¼ 7.4 × 10 28 ). For Asians and African Americans the P-value for this SNP was 0.48 and 0.54, respectively. Two other imputed SNPs (rs2241822 and rs6445387, imputation r 2 ≥ 0.98), both within 5 kb of rs1053338 and both intronic to ATXN7, had a slightly lower P-value (P ¼ 5.1 × 10 28 ). All three SNPs were strongly correlated with rs1053338 (r 2 ≥ 0.83). No independent evidence was observed for these SNPs in the other ethnic groups (P . 0.31). There was only one imputed SNP with P , 0.01 in Asian women (rs9837159; P ¼ 0.0093); the evidence of association for this SNP in European women was weak (P ¼ 0.078).

DISCUSSION
In this study of 41 non-synonymous coding SNPs, selected based on prior evidence of association with breast cancer, we have identified a novel susceptibility locus at 3p21 based on SNP rs1053338 (K264R) in ATXN7. We have also confirmed for the first time at genome-wide statistical significance, that AKAP9-rs6964587 (M463I) at 7q21 is a marker of breast cancer susceptibility in European women. In both cases, a nominally statistically significant result was observed in a meta-analysis of independent data from nine GWASs, with very similar OR estimates to those found in the BCAC COGS dataset. Both nsSNPs are associated with relatively small per-allele effects (estimated OR ¼ 1.07 and 1.05, respectively) and appeared to confer susceptibility to ER-positive and ER-negative disease. The potentially differential association of rs1053338 with risk of breast cancer by grade requires confirmation.
That independent confirmation of these associations was not observed for Asian and African-American women may be explained by the limited power to detect these effect sizes. We estimate that at 5% statistical significance our study had ,50% power to detect the ORs estimated for European women for these SNPs in Asian women and much lower power (,15%) for African-American women. However, weaker associations in non-European populations have been observed for many breast cancer susceptibility loci and may reflect differences in LD patterns, genetic background and/or the distribution of interacting environmental risk factors.
The nsSNP giving the strongest signal in our study was rs10510592 (L513S) in NEK10, located within an established breast cancer susceptibility region. However, substantially stronger evidence of association with risk was observed for the originally reported SNP at this locus (rs4973768), and further analyses revealed that the association with rs10510592 was substantially attenuated after adjusting for rs4973768. Hence, if there is a single causal variant in this region, it is unlikely to be rs10510592, despite the fact that this SNP is an amino acid substitution with strong evidence of association with disease risk (P ¼ 5.1 × 10 217 ). Further work, including in vitro analyses to functionally characterize candidate variants, will be required identify to the biological mechanism behind this clear association.
The same phenomenon was observed for the two nsSNPs marking novel breast cancer susceptibility loci that we have identified in the present study. In both cases, the nsSNP could not be definitively ruled out as the causal variant. Nevertheless, in the case of ATXN7-K264R, three intronic SNPs in the same gene, one genotyped and two imputed, gave stronger signals of association. Similarly, while AKAP9-M463I gave the strongest signal among the genotyped SNPs, an imputed intronic SNP had an associated P-value almost an order of magnitude smaller. Future studies that fine-map these two regions through dense genotyping, in even larger sample sizes, will therefore be required to identify the casual variants and targeted genes.
The WTCCC also noted that an observed association with an nsSNP does not necessarily imply that the SNP, or even the gene in which it is located, is causal (22). That is, a candidate variant approach may identify novel susceptibility loci, but the variant in question cannot be assumed to be causal, highlighting the importance of rigorous fine-scale mapping analyses, even when an association with a potentially functional SNP has been identified. These results are also consistent with previous observations that the vast majority of common susceptibility alleles for breast cancer are non-coding; even after deliberately selecting potentially associated nsSNPs, the confirmed associations appear to be markers for other, presumably non-coding, functional SNPs.
For both the AKAP9 and ATXN7 nsSNPs, a consistent association was observed in the BCAC dataset and the combined analysis of nine GWASs. It is interesting to note, however, that neither locus was selected for inclusion on the iCOGS array based on evidence of association in the combined GWAS, despite the fact that the array included .35 273 SNPs selected for replication of the GWAS (4); both loci failed to reach the cut-off of P , 0.008. Indeed, the probability that loci with associated effects of this magnitude would have been selected for inclusion on iCOGS on the basis of their GWAS-based results was ,0.40. These results emphasize that, for associations of this magnitude (OR ¼ 1.05-1.07), even a combined GWAS of .10 000 cases and 10 000 controls has limited power. They also highlight that further loci with associated effects of similar magnitude remain to be identified (4).
A key strength of this study is the sample size; the iCOGS study is the largest genotyping study in breast cancer, and by far the largest study to evaluate non-synonymous SNPs. There is potentially some overlap between the samples used in the WTCCC study and the current analysis. The WTCCC study used samples from a UK study of familial breast cancer (FBCS) that was also used in one of the GWAS (UK2). Although it is not possible to check directly, any overlap with the samples used in the COGS would have been incidental: we estimate that ,3% of samples in the BCAC COGS analysis could have been used in the WTCCC analysis. Moreover, since both loci reach genomewide levels of significance, the evidence for these associations being real does not depend strongly on their selection through the WTCCC study.
In summary, in this very large case -control study focused on common candidate non-synonymous variants, we have identified a novel susceptibility locus at 3p21 and confirmed AKAP9-rs6964587 as a marker of a breast cancer risk at 7q21. Additional analyses of other common variants in these regions, the majority imputed from the 1000 genomes project, suggest that the nsSNPs genotyped are unlikely to be causal and that further fine-mapping studies are required to identify the variants and corresponding genes that modify breast cancer risk.

Participants
Samples for the main study were drawn from 49 case -control studies participating in the BCAC (Table 1): 38 from populations of predominantly European ancestry (46 450 cases and 42 600 controls), nine from populations of Asian ancestry (6269 cases and 6624 controls) and two of African-American women (1116 cases and 932 controls). Studies were either population based or hospital based; some studies sampled cases according to age, or oversampled for cases with a family history or bilateral  Table S1). All study participants gave informed consent and all studies were approved by the corresponding local ethics committees.

SNP selection
We considered the 48 SNPs for which the strongest evidence of association (per-allele test P-value , 0.005) with breast cancer was observed in the original analysis by the WTCCC (22). In addition, we considered an nsSNP in AKAP9 based on previous evidence from the BCAC (23,24) and for which consistent results were reported in the WTCCC study, even though the P-value did not meet the 0.005 threshold (22). Pairwise LD was assessed based on the correlation coefficient (r 2 ) in Europeans from HapMap data release 28 (Phases II and III) and visualised using Haploview version 4.2. Two nsSNPs (rs4148077 and rs4986791) were in complete LD (r 2 ¼ 1.0) with other variants considered (rs3742801 and rs4986790, respectively) and were therefore excluded. A further three SNPs (rs11465716, rs3790549 and rs7313899) were excluded because they were reported to have an MAF , 5%. Genotyping assays could not be designed for nine SNPs (Illumina design score ,0.8), but surrogate SNPs could be genotyped for six of these, five in complete LD with the original SNP and one in high LD (r 2 ¼ 0.94); the remaining three SNPs (rs4255378, rs2074491 and rs4730283) could not be assessed. The 41 SNPs considered in this analysis are listed in Table 2 and their selection is summarized in the Supplementary Material, Fig. S1.

Genotyping
Genotyping was conducted using a custom Illumina Infinium array (iCOGS) in four centers, as part of the COGS, as described previously (4). Genotypes were called using Illumina's proprietary GenCall algorithm. QC procedures have been previously described (4). Subjects with an overall call-rate ,95% were excluded. Genotype intensity cluster plots were checked manually for SNPs for which evidence of association at P , 0.0001 was found, and all were judged to be acceptable, with the exception of that for rs6964587. However, clearly defined clusters were observed for rs6964587 after excluding 1259 samples from plates with call-rates ,90% and all subsequent analyses for this SNP were based on this slightly reduced sample.

Statistical methods
Ethnic outliers were identified by multi-dimensional scaling, combining the iCOGS data with the three Hapmap2 populations, based on a subset of 37 000 uncorrelated markers that passed QC (including 1000 selected as ancestry informative markers). Most studies were predominantly of a single ancestry (European or Asian), and individuals with .15% minority ancestry, based on the first two components, were excluded. Exceptions to this were the two studies of African Americans (NBHS and SCCS) and two of the Asian studies, from Singapore (SGBCC) and Malaysia (MYBRCA), which contained a substantial fraction of individuals of mixed ancestry and so no exclusions were made based on genetically determined ethnicity. Principal components analyses were then carried out separately for the European, Asian and African-American subgroups, based on the same subset of SNPs. Results presented are for women of European ancestry, unless otherwise stated.
Departure from Hardy -Weinberg equilibrium (HWE) was tested for in controls using a study-stratified x 2 test (1 d.f.) (25,26). The association of each SNP with breast cancer risk was assessed by estimating genotype-specific and per-allele ORs using logistic regression, adjusted for study. For the analyses of European women, we also included the first six principal components as covariates, together with a seventh component specific to one study (LMBC) for which there was substantial inflation not accounted for by the components derived from the analysis of all studies. The inclusion of additional principal components did not reduce inflation further. We included two racespecific principal components in the analyses of Asian and African-American women.
Between-study heterogeneity in ORs was assessed for each of the three broad racial groups using the metan command in Stata (Release 10) (27) to meta-analyse study-specific per-allele log-OR estimates and generate I 2 statistics; values .50% were considered notable (28). Differences in ORs by ethnicity were assessed using a likelihood ratio test (LRT) comparing the model with interaction terms for the per-allele log-OR by study population (European, Asian, African American) to the model with no interaction terms. Differences by age (,40, 40-49, 50-59, 60-69 and ≥70 years) were evaluated using a similar LRT, but modeling a linear trend by fitting the median age for each of these defined categories. Heterogeneity in the OR by first degree family history (no, yes), by subtypes defined by ER, PR and HER2 status (positive, negative) and by axillary node status (none, ≥1 affected), tumor grade (1-3), tumor size (≤10, 11-20, .20 mm) and tumor morphology (ductal, lobular), was assessed by applying polytomous logistic regression to cases only, with the number of rare alleles as the outcome and restricting, for each explanatory variable, the beta coefficient for the comparison of 2 -0 minor alleles to be double that for the comparison of 1 -0 minor alleles. Linear trends were tested by fitting as continuous variables values 1, 2 and 3 for grade and the median value for each the defined categories of size. ORs specific to disease subtypes defined by ER status were estimated for Europeans using polytomous logistic regression with control status as the reference outcome. All statistical tests were two sided. The term 'genome-wide statistically significant' is taken to imply P , 5 × 10 28 ; otherwise 'statistically significant' implies P , 0.05. Power calculations were carried out using Quanto v.1.2.4 (http://biostats.usc.edu/softwa re). All other analyses were conducted using Stata: release 10 (27). The analysis pipeline is summarized in the Supplementary Material, Fig. S1.
Genotype data for iCOGS SNPs in regions surrounding rs6864587 and rs1053338 were used to estimate genotypes for other common variants across those regions for the BCAC study subjects by imputation, using IMPUTE v2.2 and the Figure 2. Per-allele OR estimates for ATXN7-K264R (rs1053338) for European women by study, based on data from the Breast Cancer Association Consortium. MAF, minor allele frequency; pHWE, P-value for departure from Hardy-Weinberg equilibrium; CI, confidence interval. work was supported by grant UM1 CA164920 from the National Cancer Institute (USA). The content of this manuscript does not necessarily reflect the views or policies of the National Cancer Institute or any of the collaborating centers in the Breast Cancer Family Registry (BCFR), nor does mention of trade names, commercial products or organizations imply endorsement by the US Government or the BCFR. The