Fine-mapping of the HNF1B multicancer locus identifies candidate variants that mediate endometrial cancer risk

Common variants in the hepatocyte nuclear factor 1 homeobox B (HNF1B) gene are associated with the risk of Type II diabetes and multiple cancers. Evidence to date indicates that cancer risk may be mediated via genetic or epigenetic effects on HNF1B gene expression. We previously found single-nucleotide polymorphisms (SNPs) at the HNF1B locus to be associated with endometrial cancer, and now report extensive fine-mapping and in silico and laboratory analyses of this locus. Analysis of 1184 genotyped and imputed SNPs in 6608 Caucasian cases and 37 925 controls, and 895 Asian cases and 1968 controls, revealed the best signal of association for SNP rs11263763 (P = 8.4 × 10−14, odds ratio = 0.86, 95% confidence interval = 0.82–0.89), located within HNF1B intron 1. Haplotype analysis and conditional analyses provide no evidence of further independent endometrial cancer risk variants at this locus. SNP rs11263763 genotype was associated with HNF1B mRNA expression but not with HNF1B methylation in endometrial tumor samples from The Cancer Genome Atlas. Genetic analyses prioritized rs11263763 and four other SNPs in high-to-moderate linkage disequilibrium as the most likely causal SNPs. Three of these SNPs map to the extended HNF1B promoter based on chromatin marks extending from the minimal promoter region. Reporter assays demonstrated that this extended region reduces activity in combination with the minimal HNF1B promoter, and that the minor alleles of rs11263763 or rs8064454 are associated with decreased HNF1B promoter activity. Our findings provide evidence for a single signal associated with endometrial cancer risk at the HNF1B locus, and that risk is likely mediated via altered HNF1B gene expression.


Introduction
Endometrial cancer is the most common type of uterine cancer, and the fourth most diagnosed cancer in European and North American women (http://globocan.iarc.fr/). Traditionally, this cancer is divided into two etiological types (1): hormonally driven Type 1, endometrioid histology subtype with 'good' prognosis (∼80% of cases), and Type 2, non-endometrioid largely serous or clear cell subtypes with poor prognosis. Recently, indepth studies by The Cancer Genome Atlas (TCGA) have identified four distinct tumor categories with different prognostic characteristics, namely 'copy number high', 'copy number low', 'POLEultramutated' and 'microsatellite instability hypermutated' (2). We have previously identified single-nucleotide polymorphisms (SNPs) associated with endometrial cancer risk at the hepatocyte nuclear factor 1 homeobox B (HNF1B) locus using a genome-wide association study (GWAS) approach (3). The most significantly associated SNP was rs4430796 located in HNF1B intron 2, with the minor 'G' allele protective for endometrial cancer (3).
HNF1B is a member of the homeodomain-containing superfamily of transcription factors (TFs), and SNPs at this locus are already known to be associated with risk of Type II diabetes (4), prostate cancer (4)(5)(6)(7)(8)(9) and two different ovarian cancer subtypes (10,11). However, fine-mapping studies have revealed a complex genetic architecture at the HNF1B locus, demonstrated by lead SNPs and the direction of genetic effects being inconsistent between cancer types (Table 1). For example, in prostate cancer the signal is explained by a five-SNP haplotype that includes SNPs from two peaks of association (5) in HNF1B intron 2 (lead SNP rs4430796) and intron 4 (lead SNP rs4794758) (12). For ovarian cancer subtypes, SNP rs757210, in high linkage disequilibrium (LD) with rs4430796, was shown to be associated with decreased risk of clear cell ovarian cancer but increased risk of serous ovarian cancer (10,11). Signals were subsequently refined to rs11651755 in intron 1 for the clear cell ovarian cancer subtype, and rs7405776 in intron 3 for the serous subtype (11).
Various analyses have been undertaken to assess the relationship between HNF1B locus cancer risk SNPs and altered regulation of HNF1B mRNA expression. Expression quantitative trait loci (eQTL) analysis indicates that rs4430796 is associated with altered HNF1B mRNA expression in lymphoblastoid cell lines generated from cord blood or circulating lymphocytes (3), and also in benign prostate tissue (13). However, SNP rs757210 in high LD with rs4430796 was not associated with HNF1B expression in normal ovarian tissue (10). Instead, this SNP was determined to be a methylation eQTL (mQTL), associated with HNF1B promoter methylation in serous ovarian tumors (10,11). In contrast, no such association is indicated for clear cell ovarian tumors, which mostly present with a CpG island methylator phenotype (CIMP) but are nevertheless unmethylated at the HNF1B promoter (11).
Here, we report the fine-scale mapping of the HNF1B locus incorporating data for 1184 genotyped and imputed SNPs in 6608 endometrial cancer cases and 37 925 controls of European ancestry, and analyses aimed at exploring the function of the most likely causal variants. Our results provide evidence for a single signal associated with endometrial cancer risk at the HNF1B locus, and that risk is likely mediated via altered HNF1B gene expression.

Results
Fine-mapping and association analysis reveals one independent signal for endometrial cancer Meta-analysis of the 1184 HNF1B region SNPs genotyped or imputed in the four Caucasian datasets [iCOGS fine-mapping, Australian National Endometrial Cancer Study (ANECS), Studies of Epidemiology and Risk factors in Cancer Heredity (SEARCH) and National Study of the Genetics of Endometrial Cancer (NSECG GWASs)] and passing our quality control measures identified 18 SNPs that reached genome-wide significance (P < 5.0 × 10 −8 ) ( Table 2; results for individual sample sets provided in Supplementary Material, Table S2). The best overall signal was observed for imputed SNP rs11263763 [P = 8.4 × 10 −14 , odds ratio (OR) = 0.86], located in HNF1B intron 1 ( Fig. 1A; Supplementary Material, Table S3). All 17 additional SNPs reaching genomewide significance were moderately to highly correlated (r 2 = 0.57-0.95) with rs11263763, including the original endometrial cancer GWAS SNP rs4430796 (r 2 to rs11263763 = 0.95, P = 9.7 × 10 −12 , OR = 0.87), and the best SNP genotyped in all four datasets rs7501939 (r 2 to rs11263763 = 0.67, P = 3.7 × 10 −9 , OR = 0.88; Supplementary Material, Fig. S1). No SNP remained significant at P < 10 −4 after analyses conditioning on rs11263763, indicating that there are no additional independent SNPs associated with endometrial cancer risk at this locus. Haplotype analysis in the iCOGS finemapping dataset (Table 3) confirmed that there was a single association signal arising from the set of SNPs in strong LD with genotyped SNPs rs11651755, rs8064454 and rs11651052; the three haplotypes containing the minor alleles of these SNPs were all similarly associated with endometrial cancer risk (P for the best haplotype = 8.1 × 10 −6 , OR = 0.88).
There was no significant heterogeneity in risk between studies for the best genotyped or imputed SNPs (Table 4; Fig. 1B). The OR for rs11263763 in the Asian SECGS dataset was non-significant (P = 5.7 × 10 −1 , OR = 0.96), although the power was low to detect an effect equivalent to that seen for the Caucasian datasets given the sample size (834 cases and 1936 controls) and lower minor allele frequency (MAF) (0.267) (see Materials and Methods). For both the Caucasian and Asian datasets high LD extends centromeric from rs11263763 to encompass part of intron 2, with a slightly larger LD block in the Asian dataset (7 versus 5 kb; Supplementary Material, Fig. S2): assuming the risk SNPs are the same in both populations, this indicates that the search for candidate causal SNPs should focus on the 5 kb region identified from analyses of Caucasian datasets. Meta-analysis of the five datasets (iCOGS  f One-degree-of-freedom P trend . The best imputed and best genotyped SNPs are noted in bold.  Table S1).
fine-mapping and ANECS, SEARCH, NSECG and SECGS GWASs) revealed an overall OR of 0.86 (P = 2.7 × 10 −13 ). The association was similar for endometrioid-subtype cases only, with rs11263763 retaining the strongest association signal in the meta-analysis of the four Caucasian datasets (P = 4.1 × 10 −12 , OR = 0.86), and a genome-wide significant signal seen for the same 18 SNPs as above (Table 5; Supplementary Material,  Table S2). Despite a reduction in power to detect an association in non-endometrioid cases due to the smaller case sample size (see Materials and Methods), these 18 SNPs also retained the best association signal for non-endometrioid cases (iCOGS fine-mapping and NSECG GWAS datasets). The top SNP in this analysis was rs10908278 (r 2 to rs11263763 = 0.84; P = 1.3 × 10 −3 , OR = 0.85: signal for rs11263763 P = 2.4 × 10 −3 , OR = 0.85) (Supplementary Material, Table S3).
The minor alleles of risk-associated SNPs are associated with reduced HNF1B expression Of the five prioritized SNPs, the top SNP rs11263763 and rs11 651755 (r 2 = 0.91 to rs11263763 in TCGA dataset) were included in the Affymetrix 6.0 array used by TCGA to type their tumor samples. The remaining prioritized SNPS were well captured by rs11651755 (r 2 = 1.00 for rs11651052, rs8064464; r 2 = 0.96 for rs10908278). One additional SNP reaching genome-wide significance for association with endometrial cancer risk ( Table 2) was directly genotyped in the TCGA dataset (rs11658063, r 2 = 0.71 to rs11263763). There was evidence for association between genotype and HNF1B expression levels in endometrioid tumors for rs11263763 (P = 1.3 × 10 −2 ), rs11658063 (P = 5.0 × 10 −3 ) and marginally so for rs11651755 (P = 8.3 × 10 −2 ). We also tested the allelic effect of rs11263763 and rs11658063 on HNF1B expression in non-endometrioid tumors (total N = 52), and similarly identified eQTLs for both rs11263763 (P = 3.0 × 10 −2 ) and rs1165806 (P = 4.8 × 10 −2 ). However, these associations would not be considered statistically significant after conservatively correcting for the total number of genes analyzed across the region, where P for significance = 5.0 × 10 −3 (0.05/10). In all instances, the minor allele was associated with decreased levels of HNF1B mRNA (rs11263 763 Fig. 2A, rs11658063 Fig. 2B). SNP rs11658063 was well imputed in the Caucasian datasets (information score >0.94), but statistically is not a likely candidate causal SNP, with a likelihood over 13 000 times smaller than that of rs11263763 in the Caucasian meta-analysis (P = 1.41 × 10 −9 ). There was no evidence for association between genotypes of any of these three SNPs and expression of any of the other nine genes located within 1 Mb of HNF1B (data not shown).
Differential HNF1B isoform usage has been suggested to occur between benign and tumor prostate tissue (15). We also Base haplotype comprised of the major alleles of the seven genotyped SNPs included in the simplified haplotype.
c The best associated haplotype is comprised of the minor alleles of the seven genotyped SNPs included in the simplified haplotype.
investigated the association between rs11263763 genotype and HNF1B isoform expression and usage in the TCGA endometrial tumor sample. Three HNF1B isoforms were measured by TCGA, the presence of which was confirmed by our own mRNA analysis of endometrial cancer cell lines (Supplementary Material, Fig. S3): isoform A (uc010wdi.1), isoform B (uc002hok.3) and isoform C (uc010cve.1). Overall, there was no evidence for differential isoform usage by genotype (P = 0.45). The relationship between rs11263763 genotype and HNF1B expression level (decreased expression in 'G' allele carriers) was consistent across isoform A (P = 2.2 × 10 −2 ) and isoform B (P = 2.1 × 10 −2 ), but not with isoform C (P = 5.8 × 10 −1 ), although this isoform was expressed at very low levels or absent in those samples assessed.

No association between SNP rs11263763 and HNF1B methylation
There was no association between genotype and HNF1B CpG island methylation for rs11263763 (P = 0.42, Fig. 2C), or for rs11658063 (P = 0.42, Fig. 2D). Most of the TCGA samples (94%) were unmethylated (beta values < 0.2) at the 18 probes located within the HNF1B CpG region. We also assessed methylation of the mutL homolog 1 (MLH1) gene in tumor samples, as this is a marker of the CIMP-like phenotype in numerous cancers, including endometrial, colorectal and ovarian cancers (2,16). In the TCGA dataset of 196 tumors, there was no association between HNF1B expression and MLH1 methylation (P = 0.93) and no association between HNF1B genotype and MLH1 methylation (P = 0.58 for rs11263763, P = 0.22 for rs11658063). There was also no association between HNF1B genotype and MLH1 methylation in the independent sample of 182 ANECS endometrial cancer tumors (P = 0.91; assessed for rs4430796, r 2 = 0.95 to rs11263763). That is, endometrial tumors present with unmethylated HNF1B promoter status irrespective of CIMP phenotype, resembling the presentation observed for the clear cell ovarian cancer subtype (11).
The strongest candidate causal SNPs map to the extended HNF1B promoter region The five SNPs most strongly associated with endometrial cancer cluster within a 5.5 kb region in HNF1B intron 1 (Fig. 3). Using Encyclopedia of DNA Elements (ENCODE) data, we show that three of these SNPs (rs11263763, rs11651052 and rs8064454) fall within the extended HNF1B promoter that is marked by H3K4Me3 and H3K4Me1, indicative of regulatory activity associated with promoters. This region also contains DNaseI hypersensitivity sites indicating open chromatin in multiple cell lines, including the endometrial cancer cell lines ECC1 and Ishikawa (Fig. 3). Furthermore, this region also covers a strong CpG island, and has a chromatin state in numerous ENCODE cell lines indicative of enhancer and promoter elements. While none of the 21 TFs tested to date in the ECC1 cell line bind in this region, several additional TFs do bind in other cancer and normal cell lines. All five SNPs are predicted to affect the ability of several TFs to bind DNA (Supplementary Material, Table S5). Notably, several of these TFs are implicated in endometrial cancer. This includes rs11263763 and rs11651755, both identified to be associated with HNF1B expression in tumors, see above. Both SNPs are predicted to alter binding of p53, a prominent TF that plays a key role in response to DNA damage and other stress signals, and may have prognostic value in endometrial cancer (17). In addition, rs8064454 is predicted to create a binding site for zinc finger E-box-binding protein (ZEB) 1, a well-characterized transcriptional repressor (18) that has previously reported to be aberrantly expressed in aggressive endometrial cancers (19,20).
Two of the candidate causal SNPs reduce the extended HNF1B promoter activity We used luciferase reporter assays to examine activity associated with the wild-type promoter region, and whether the riskassociated SNPs in the extended promoter region were associated with altered HNF1B promoter activity. Transfection of Ishikawa and EN-1078D cell lines showed that the minimal HNF1B promoter construct produced a significant increase in reporter gene activity above the empty pGL3 vector control (Fig. 4). However, the extended HNF1B promoter construct significantly reduced this basal promoter activity by 40-50%, suggesting the presence of a silencer element in the extended region. Notably, inclusion of the minor alleles of rs11263763 or rs8064454 in the extended promoter constructs decreased relative wild-type HNF1B promoter activity by a further ∼25% compared with the construct containing the major alleles (Fig. 4).

Discussion
Fine-mapping of the multi-cancer HNF1B locus on chromosome 17q12 has revealed the presence of one multivariant haplotype associated with the risk of endometrial cancer. The most significantly associated SNP rs11263763 is highly correlated with the original endometrial cancer hit at this locus, rs4430796 (3), an SNP also associated with the risk of prostate cancer and in high-to-moderate LD with risk SNPs for serous and clear cell ovarian cancers. Multiple independent HNF1B associations have now been reported for the lead SNPs in prostate cancer (in introns 1 and 4) (12), while associations are limited to a single peak in intron 3 for the serous ovarian cancer subtype, and a single peak in  HNF1B (Min prom) or extended HNF1B (Ext prom) promoters were cloned upstream of a luciferase reporter. An Ext prom construct containing either the wild-type haplotype or minor alleles of rs11263763, rs11651052 or rs8064454 were also generated. Cells were transiently transfected with each of these constructs and assayed for luciferase activity after 48 h. Error bars denote standard error of the mean (SEM) from three independent experiments. P-values were determined with a two-tailed intron 1 for the clear cell ovarian cancer subtype (11). Our analyses refines the endometrial cancer association signal to a distinct peak in intron 1, and show that our top SNPs are associated with HNF1B expression in endometrial tumors, and are located within the extended HNF1B promoter that contains a negative regulatory element that inhibits gene expression HNF1B expression is altered in numerous cancers, with evidence to support a role as a tumor suppressor or oncogene depending on the tissue context. Down-regulation of HNF1B is associated with progression of hepatocellular carcinomas (21), and indicates poor prognosis of renal (22) and prostate (23) carcinomas. HNF1B expression has also been reported to be lower in primary serous ovarian tumors than in normal ovarian tissue (24). Epigenetic inactivation of HNF1B is seen in serous ovarian tumors, and has been detected in ovarian, colorectal, gastric and pancreatic cancer cell lines, suggesting that HNF1B promoter hypermethylation can be a feature of tumorigenesis (25).
Conversely, the HNF1B promoter is typically unmethylated and gene expression increased in clear cell ovarian tumors and cell lines compared with other ovarian cancer subtypes (11,26). HNF1B hypomethylation has recently been detected in additional clear cell histologies, including endometrial, cervical and renal clear cell cancers, suggesting HNF1B expression and promoter hypomethylation to be a general biomarker of cytoplasmic clearing (27). HNF1B over-expression in immortalized endometriosis epithelial cells (hypothesized cell of origin for clear cell ovarian cancer) led to altered morphology and multinucleation of cells (11), while siRNA knock-down of HNF1B led to the induction of apoptosis in clear cell ovarian cancer cells lines (26) and significantly inhibited the proliferation and anchorage-dependent colony formation in the prostate cancer cell lines LNCaP and RWPE1 (13). Additionally, a genome-wide screen of RNAi data generated for ∼100 cell lines identified HNF1B as a major oncogene required for cancer cell survival (28).
Analyses by us and others indicate that HNF1B is the target gene for genetic risk associations with cancer in this region (3,10,11,13), although the mechanism of regulation mediated by risk SNPs is not necessarily the same between cancer subtypes. SNP rs4403796 is an eQTL (expression quantitative trail locus) associated with decreased HNF1B mRNA expression in benign prostate tissue (the at-risk tissue for prostate cancer) (3), while the serous ovarian cancer subtype lead risk SNP rs7405776 is an methylation quantitative trait locus (mQTL) associated with decreased expression in serous ovarian tumor tissue. At this point in time, neither eQTLs nor mQTLs have been reported for clear cell ovarian tumor tissue. TCGA datasets show the HNF1B promoter is unmethylated in both endometrioid endometrial and prostate tumors. For prostate cancer, no significant difference in HNF1B mRNA expression levels has been reported between malignant prostate tissue and between benign tissue (15), or observed from our analysis of tumor and normal prostate tissue from TCGA (data not shown). Further, although a shift in isoform usage was reported between benign tissue [predominantly isoform C, a transcriptional repressor (29)] and malignant tissue [ predominantly isoform B, a transcriptional activator (29)] (15), this was not evident from our analysis of the larger prostate dataset from TCGA.
Our analyses of the TCGA and other data indicate that the effects of causal SNPs on endometrial cancer risk at this locus are more similar to those of prostate rather than ovarian cancer subtypes. There was no association between risk genotype and HNF1B promoter methylation as implicated for the serous cancer clear cell subtype. SNP rs11263763 is indicated as an eQTL in endometrial tumor tissue, with the minor ( protective) allele associated with decreased HNF1B expression, although this SNP appears to have no effect on isoform usage as previously reported for prostate cancer. Importantly, our functional analysis showed that two of the three candidate causal SNPs located in the extended HNF1B promoter are associated with reduced promoter activity in vitro, suggesting that these SNPs are likely to be associated with reduced HNF1B expression in vivo. Further functional follow-up experiments focusing on the region encompassing this association peak, including additional SNPs belonging to our risk haplotype, will be required to confirm if any of the other prioritized likely causal SNP(s) exert additional effects on expression via alternative mechanisms (30). Such findings, once linked to genetic and regulatory data from multiple cancers, will provide a greater understanding of the mechanism by which the HNF1B genomic locus and the HNF1B protein mediate risks particularly of endometrial cancer, but also of different cancer subtypes. We also note the incomplete overlap between prioritized candidate causal SNPs identified as eQTLs in the TCGA dataset, and those shown to demonstrate altered function in vitro from our functional studies to date. It is likely that future eQTL fine-mapping studies that encompass direct genotyping of likely causal SNPs of interest in larger datasets of tumor and normal tissue will inform the role of eQTL data in the design of time-consuming functional analysis studies of candidate causal SNPs.
Building on recent findings reporting multiple shared cancer susceptibility loci (10,(31)(32)(33)(34)(35), the knowledge that endometrial in addition to prostate, serous ovarian and clear cell ovarian cancer are associated with SNPs that influence HNF1B activity gives additional support for the concept of regulatory regions harboring multiple cancer risk SNPs that act in a tissue-specific manner. Further, these findings provide rationale for expansive multi-cancer studies of novel loci identified for any single cancer, including bioinformatically directed investigation of novel loci discovered for endometrial cancer in multiple other cancers. It will be relevant for such future genetic epidemiological studies to consider molecular stratification of all tumor types, since analyses documenting the genomic characteristics of endometrial and other solid tumors have shown that distinct molecular subgroups within endometrial cancer histological subtypes share genomic features with different subtypes of other hormonally related tumors (2). Together, such expansive cross-cancer studies may further our understanding of the different biological pathways that lead to cancer.

Fine-mapping dataset
The fine-mapping case dataset comprised 4402 women of European ancestry with a confirmed diagnosis of endometrial cancer (3535 with confirmed endometrioid histology), recruited via 11 separate studies in seven countries collectively called the Endometrial Cancer Association Consortium. The control dataset comprised 28 758 healthy female controls from the same countries, all participating in the Breast Cancer Association Consortium (BCAC) (31) or Ovarian Cancer Association Consortium (OCAC) (10) (see Supplementary Material, Information and Table S1). All cases and controls were genotyped at 211 155 SNPs using a custom Illumina Infinium iSelect array ['iCOGS'; arrays and control genotyping methods are summarized in (10,(31)(32)(33)(34)], designed by the Collaborative Oncological Geneenvironment Study ('COGS'). The iCOGS array includes 286 SNPs located 1 Mb upstream and downstream of the HNF1B (RefSeq NM_000458.2) gene, selected with the intention to carry out fine-mapping studies of this locus (34). See section entitled 'HNF1B fine-mapping SNPs' below for further information.

ANECS and SEARCH
The results presented here are based on a re-analysis of our original GWAS dataset, including additional samples, all called using the Illuminus program (36). Cases comprised 1287 endometrioid subtype endometrial cancer cases from the ANECS (n = 606) and the UK SEARCH (n = 681) genotyped using Illumina 610K arrays (3). ANECS cases were compared with 3083 Australian controls recruited as part of the Brisbane Adolescent Twin Study (37,38) (n = 1846) and the Hunter Community Study (39) (n = 1237), also genotyped using Illumina Infinium 610k arrays. SEARCH cases were compared with 5190 individuals genotyped using Illumina Infinium 1.2M arrays as part of the Wellcome Trust Case Control Consortium (40).

National Study of Endometrial Cancer Genetics Group
In addition to the above samples we obtained genotype data from 919 endometrial cancer cases (795 with confirmed endometrioid histology) collected by the UK NSECG) and genotyped using Illumina 660K arrays. These cases were compared with data generated for 895 controls drawn from the UK1/CORGI colorectal cancer sample set (41) previously genotyped using Illumina Hap550 arrays (Supplementary Material, Information).

Shanghai Endometrial Cancer Genetic Study
To assess LD structure of HNF1B SNPs in other populations, we analyzed data previously generated for a GWAS including 834 Asian endometrial cancer cases recruited to the Shanghai Endometrial Cancer Study (SECS) and 1936 controls who were recruited to the Shanghai Breast Cancer Study (SBCS; collectively termed SECGS here), genotyped using Affymetrix 6.0 arrays (42).

Data quality control
Genotypes for the ANECS and SEARCH GWAS samples (cases and controls) were subjected to quality control as described previously (3). Genotypes for the iCOGS fine-mapping and NSECG GWAS samples were called using Illumina's proprietary GenCall algorithm (31), and subjected to quality control as follows. SNPs were excluded for call rate <95% (<99% for MAF <5%), MAF <0.1% or deviations from Hardy-Weinberg equilibrium significant at 10 −7 . Samples were excluded for low overall call rate (<95%), heterozygosity >5 standard deviations from the mean, non-female genotype (XO, XY or XXY) or <85% estimated European ancestry based on identity by state (IBS) scores between study individuals and individuals in HapMap (http://hapmap. ncbi.nlm.nih.gov/) and multidimensional scaling. For cases, any 96-well plate containing ≥5 excluded samples was entirely excluded. For duplicate samples or those identified as close relatives by IBS probabilities >0.85, the sample with the lower call rate was excluded, except for case-control relative pairs for which the case was retained. Following quality control, the iCOGs sample retained data for 197 627 SNPs, and the NSECG GWAS sample 504 515 SNPs.

Regional imputation
As the aim of this study was to investigate the association signal around the HNF1B locus, we restricted our analyses to SNPs located within an ∼1 Mb region surrounding HNF1B (Build37, chr17:35599377-36602919). To increase the number of SNPs in the analysis and provide identical coverage across the four Caucasian and one Asian datasets, we imputed genotypes for SNPs present in the 1000 Genomes dataset v3 (April 2012 release) which had not been genotyped in our studies using IMPUTE v2 (43) software. We allowed the IMPUTE software to select the most appropriate haplotypes from among the complete set of 1000 Genomes haplotypes (44). Imputation was conducted on inference panels based on the SNPs typed for each dataset (e.g. SNPs included on the iCOGS array, various Illumina arrays for the ANECS, SEARCH and NSECG GWASs and the Affymetrix 6.0 array for the SECGS GWAS). Imputation was conducted separately for the five datasets, and SNPs with imputation information score <0.7 and/or MAF <0.01 excluded prior to analysis. Following quality control 1184 genotyped and imputed SNPs were retained in all four Caucasian datasets. The most significant imputed SNP was individually genotyped in a subset of cases using standard protocols for the Fluidigm BioMark™ HD System (Fluidigm, South San Francisco, CA, USA) (Supplementary Material, Information) to confirm imputation accuracy, resulting in 99% concordance between the genotyped and imputed genotypes.

Association analysis
The four imputed datasets were analyzed separately using unconditional logistic regression with a per-allele (1 degree of freedom) model using SNPTEST v2 (45). For the iCOGS dataset, analyses were performed adjusting for strata (six of the eight strata were defined by country, while the large UK dataset was divided into 'SEARCH' and 'NSECG') and for the first 10 principal components of the genomic kinship matrix, based on 37 000 uncorrelated iCOGs SNPs (r 2 < 0.1), including ∼1000 selected as ancestry informative markers, using an in-house C++ program incorporating the Intel MKL libraries for eigenvectors (http:// ccge.medschl.cam.ac.uk/software/). One principal component was derived specifically for the Leuven (LES/LMBC) studies, for which there was substantial inflation not accounted for by the other principal components. The Caucasian GWAS datasets were analyzed as single stratum, with adjustment for the first two (ANECS and NSECG) and three (SEARCH) principal components.
Results (ORs) of the four studies were combined using standard fixed-effects meta-analyses. The I 2 statistic (46) was used to estimate the proportion of the variance due to between-study heterogeneity and the Q statistic to test for such heterogeneity. Analyses for all SNPs were repeated adjusting for the most significant SNP to assess whether multiple independent causal variants were present (i.e. a forward stepwise regression approach). The analyses were also repeated restricting the iCOGS and NSECG studies to those cases with endometrioid or nonendometrioid histology (the ANECS and SEARCH GWAS sample sets contained only endometrioid histology cases), and to iCOGS cases and controls for whom BMI data were available. All statistical analyses used R software unless otherwise stated, and all statistical tests were two-sided. The association plot was produced using LocusZoom (14). LD between SNPs is reported as calculated for the HapMap3 (release 2) population (http://www. broadinstitute.org/mpg/snap/ldsearchpw.php). Haplotype analyses including the top genotyped SNPs in the iCOGS finemapping dataset were performed in Haplostats (http://www. mayo.edu/research/labs/statistical-genetics-genetic-epidemiology/ software).
The power to detect an effect in the smaller Caucasian nonendometrioid tumor and Asian SECGS datasets, equivalent to that seen for the best SNP (rs11263763) in the main analysis including the four Caucasian datasets for all histologies, was calculated using QUANTO 1.1 (47). For the non-endometrioid dataset with an MAF of 0.47 in 887 cases and 37 925 controls, power to detect an equivalent effect was 87% at the 5% significance threshold, and 22% at 10 −4 . For the SECGS dataset with an MAF of 0.27 in 834 cases and 1936 controls, power was 61% at the 5% significance threshold and 5% at 10 −4 .

Likelihood tests to select the most likely causal SNPs affecting endometrial cancer risk
To determine the most likely causative SNPs from among the top associated SNPs, the log-likelihoods of all tested SNPs were compared with that of the top SNP (rs11263763), using P-values from the overall (all-histologies) analysis in Caucasians. SNPs with loglikelihood ratios of <1 : 100 of being the top SNP were prioritized as potentially causal variants for follow-up in the bioinformatic and functional analyses (10,30,48).

Expression and methylation by genotype in endometrial tumors
To investigate in endometrial tumors the SNP effects previously demonstrated in benign prostate tissue (eQTL) and serous ovarian tumors (mQTL), we analyzed data from two different sources.
TCGA: preprocessed SNP (Affymetrix 6.0 arrays), gene expression (RNA-Seq data generated using Illumina GAIIx and Illumina HiSeq platforms) and DNA methylation (Illumina Infinium HumanMethylation 450 Beadchips) data generated by TCGA for endometrial cancer tumor samples (2) were obtained through TCGA and the cBioPortal for Cancer Genomics (49,50) (Supplementary Material, Information). Analyses were restricted to samples of Caucasian ancestry with endometrioid subtype endometrial cancer, adjusting for copy number at the HNF1B locus. Associations between genotype and tumor HNF1B expression, HNF1B promoter methylation and tumor TCGA type were assessed by Kruskal-Wallis and Pearson correlation tests, with two-sided P-values <0.05 indicating a significant association.
ANECS: association between genotype at HNF1B SNP rs4430 796 [generated through the original GWAS (3)] and tumor methylation at the MLH1 gene (a marker of the CIMP-like phenotype in endometrial cancer) (51) was assessed for 182 ANECS endometrial cancer cases for whom both data types were available.

Bioinformatic analysis to assess SNP functionality
Bioinformatic analyses to determine the most likely location and identity of putative causal SNPs that may influence the expression of HNF1B were conducted using a number of databases. Data produced by the ENCODE (52) project, indicating the location of open chromatin, DNA methylation, histone modification and TF binding in numerous cell lines including the endometrial cancer lines ECC1 and Ishikawa, were accessed through the UCSC Genome Browser (http://genome.ucsc.edu/ENCODE/). Multiple cell lines in addition to the endometrial cancer cell lines were included in the analysis to allow investigation of the range of possible potential regulatory mechanisms present across the HNF1B region. The is-rSNP software was used to predict which SNPs altered the ability of a TF to bind DNA (53). The is-rSNP program uses JASPAR and TRANSFAC databases to first determine if the two SNP alleles are predicted to localize in a potential TF binding site, based on binding scores computed using Position Weighted Matrices (PWM). For each potential TF, is-rSNP then calculates whether any of the two SNP alleles significantly alters the binding score.

Cell lines, plasmid construction and luciferase assays
Endometrial cancer cell lines Ishikawa and EN-1078D (kindly provided by Pamela Pollock, QUT, Brisbane) were grown in DMEM or DMEM:F12 medium, respectively, with 10% fetal calf serum and antibiotics. Cell lines were maintained under standard conditions routinely tested for Mycoplasma and short tandem repeat profiled. The HNF1B promoter-driven luciferase reporter constructs were generated by inserting a 908 bp (minimum promoter (Min prom), hg19; chr17:36104874-36105781) or 4651 bp fragment [extended promoter (Ext prom), hg19; chr17:36101131-36105781] with or without the minor alleles of rs11263763, rs11651052 or rs8064454 into the KpnI and HindIII sites of pGL3-basic. All HNF1B promoter sequences were commercially synthesized using GenScript (Life Research, Australia). Ishikawa and EN-1078D cells were transfected with equimolar amounts of luciferase reporter plasmids and 50 ng of pRLTK using Lipofectamine 2000. The total amount of transfected DNA was kept constant per experiment by adding carrier plasmid ( pUC19). Luciferase activity was measured 48 h post-transfection using the Dual-Glo Luciferase Assay System on a Beckman-Coulter DTX-880 plate reader. To correct for any differences in transfection efficiency or cell lysate preparation, Firefly luciferase activity was normalized to Renilla luciferase. The activity of each test construct was calculated relative to an empty pGL3-basic construct, the activity of which was arbitrarily defined as 1.

Supplementary Material
Supplementary Material is available at HMG online.