Genome-wide association study identiﬁes nidogen 1 ( NID1 ) as a susceptibility locus to cutaneous nevi and melanoma risk

We conducted a genome-wide association study on the number of melanocytic nevi reported by 9136 individuals of European ancestry, with follow-up replication in 3581 individuals. We identiﬁed the nidogen 1 ( NID1 ) gene on 1q42 associated with nevus count (two linked single nucleotide polymorphisms with r 2 > 0.9: rs3768080 A allele associated with reduced count, P 5 6.5 3 10 2 8 ; and rs10754833 T allele associated with reduced count, P 5 1.5 3 10 2 7 ). We further determined that the rs10754833 [T] was associated with a decreased melanoma risk in 2368 melanoma cases and 7432 controls [for CT genotype: odds ratio (OR) 5 0.86, 95% conﬁdence interval (CI) 5 0.75–0.99, P 5 0.04; for TT genotype: OR 5 0.84, 95% CI 5 0.71–0.98, P 5 0.03]. Expression level of the NID1 locus was 2-fold higher for the rs10754833 T allele carriers than that with the CC genotype ( P 5 0.017) in the 87 HapMap CEU cell lines. The NID1 gene is a biologically plausible locus for nevogenesis and melanoma development, with decreased expression levels of NID1 in benign nevi ( P 5 3.5 3 10 2 6 ) and in primary melanoma ( P 5 4.6 3 10 2 4 ) compared with the normal skin.


INTRODUCTION
The number of melanocytic nevi (moles) occurring in sun-exposed sites (e.g. on the extensor forearms) is a wellknown risk factor for melanoma (1,2). As a presumed precursor to melanoma, nevi occur when melanocytes cluster, forming nests at the dermo-epidermal junction. Both benign and dysplastic nevi are characterized by disruption of the epidermal melanin transfer system through which each melanocyte transfers melanin-containing melanosomes to the suprabasal keratinocytes via dendrites (3). Among different types of nevi (most of which are benign), melanocytic dysplastic nevi are believed to confer a higher risk for melanoma (4). The variation in nevus counts between individuals is primarily determined by genetic factors, with twin studies suggesting that 60% of the variance is explained by additive genetic variance (5,6).
A recent genome-wide association study (GWAS) using 1524 healthy adult female twins from the TwinsUK registry identified genetic variants at 9p21 and 22q13 associated with the development of melanocytic nevi (7). To identify additional genetic variants conferring development of † These authors contributed equally to this work. 2673-2679 doi:10.1093/hmg/ddr154 Advance Access published on April 9, 2011 melanocytic nevi, we performed a multistage GWAS of the number of melanocytic nevi. First, as the discovery set, we conducted a GWAS of melanocytic nevi in 9136 men and women of European ancestry who lived in the USA from the following five case -control studies nested within the Nurses' Health Study (NHS) and Health Professionals Follow-up Study (HPFS): a type 2 diabetes case -control study nested within the NHS (T2D_NHS, n ¼ 2900); a type 2 diabetes case-control study nested within the HPFS (T2D_HPFS, n ¼ 2068); a coronary heart disease casecontrol study nested within the NHS (CHD_NHS, n ¼ 1068); a coronary heart disease case -control study nested within the HPFS (CHD_HPFS, n ¼ 978); and a postmenopausal invasive breast cancer case-control study nested within the NHS (BC_NHS, n ¼ 2122). We performed a GWAS on mole count in each set separately and then combined the data using meta-analysis. Second, we conducted a fast-track replication of eight promising single nucleotide polymorphisms (SNPs) in the replication set of 3581 participants, including individuals with and without a history of kidney stones within the NHS and HPFS (KS_NHS_HPFS, n ¼ 961) and individuals being studied for renal function within the NHS study (RF_NHS, n ¼ 2620). Detailed descriptions of the study population for each study of discovery set and replication set are presented in Supplementary Methods. There was no sample overlap among the five studies of the discovery set and the two studies of the replication set, nor between discovery and replication sets. The study protocol was approved by the Institutional Review Board of Brigham and Women's Hospital and the Harvard School of Public Health.

RESULTS
In the discovery set, 2 318 094 SNPs were available for meta-analysis. The quantile -quantile (QQ) plots based on the five individual GWASs and combined meta-analysis in the discovery set are presented in Figure 1. The QQ plots did not demonstrate a systematic deviation from the expected distribution, consistent with a minimal likelihood of systematic genotype error or bias due to the underlying population substructure. The overall genomic control inflation factor was l GC ¼ 1.006.
We selected top four regions (chromosomes 1, 9, 11 and 22) for a fast-track replication (Table 1). To ensure the validity of genotyping, in each of these four regions, we selected two SNPs in complete linkage disequilibrium (LD) as mutual surrogates (r 2 ¼ 1.0 in HapMap CEU). Although there were other SNPs in LD, we selected those two SNPs showing the strongest associations with nevus count. We excluded SNPs with P-value for heterogeneity test ,0.05. All eight SNPs were at P-value ,1.5 × 10 26 and ranked among the lowest 25 P-values in the discovery set (Supplementary Material, Tables S1 and S2).
We genotyped the selected eight SNPs in the replication set, and only the signal from the nidogen 1 (NID1) on chromosome 1 showed nominal significance ( Table 1). The rs3768080 was replicated with the smallest P-value in the combined analysis of discovery set and replication set (8). The regression parameter beta (P-value) for replication set was 20.05 (0.01) of the A allele, i.e. the A allele of the SNP rs3768080 was associated with fewer melanocytic nevi. The other SNP

2674
Human Molecular Genetics, 2011, Vol. 20, No. 13 (rs10754833) selected in this region showed a similar association pattern (T allele: beta ¼ 20.05, P ¼ 0.03) (r 2 ¼ 0.92 between the two SNPs in our discovery set). Replication results for other three regions, including chromosomes 9, 11 and 22, are presented in Table 1. When combining the discovery set and replication set, the P-value was 6.5 × 10 28 for the rs3768080 and 1.5 × 10 27 for the rs10754833 ( Table 2). The regional association plot for the NID1 region in the discovery set is presented in Figure 2. After adjusting for rs10754833 or rs3768080 in the discovery set, none of the remaining 318 SNPs in the NID1 region was significant at P , 0.001 before correction for multiple comparisons. We further evaluated the association of rs3768080 and rs10754833 with melanoma risk among 2389 cases and 7527 controls, including 585 melanoma cases and 6500 controls nested within the NHS and HPFS cohorts as set 1, and a case-control study of 1804 melanoma cases and 1027 controls from the MD Anderson Cancer Center as set 2. Details of the study population are described in Supplementary Methods. The meta-analysis was used to combine the results from the two sets. As shown in Table 3, the T allele of rs10754833, which was associated with a reduced number of melanocytic nevi in our study, was associated with a decreased risk of melanoma [additive model: odds ratio (OR) ¼ 0.92, 95% confidence interval (CI) ¼ 0.84-0.99, P ¼ 0.03]. This effect was significant in the co-dominant models (for CT genotype: The A allele of rs3768080, which had an effect similar to the T allele of rs10754833 in our mole count GWAS, was also associated with a decreased risk of melanoma (Table 3).
In addition, we evaluated the expression of NID1 in the normal skin, benign nevi and primary malignant melanoma, and found that decreased expression of NID1 is associated with benign nevi and primary malignant melanoma, with a 3.1-fold lower expression in benign nevi (P ¼ 3.5 × 10 26 ) and 1.5-fold lower expression in primary melanoma (P ¼ The regression parameter beta (of the rs3768080 A allele and rs10754833 T allele) for each GWAS of the discovery set was calculated based on the linear regression adjusted for the age and top three principal components of genetic variance. The regression parameter beta (of the rs3768080 A allele and rs10754833 T allele) for the KS_NHS_HPFS of replication set was calculated based on the linear regression adjusted for the age, gender and top three principal components of genetic variance. The regression parameter beta (of the rs3768080 A allele and rs10754833 T allele) for the RF_NHS of replication set was calculated based on the linear regression adjusted for the age. BC_NHS: postmenopausal invasive breast cancer case-control study nested within the NHS. T2D_NHS: type 2 diabetes case-control study nested within the NHS. T2D_HPFS: type 2 diabetes case-control study nested within the HPFS. CHD_NHS: coronary heart disease casecontrol study nested within the NHS. CHD_HPFS: coronary heart disease case-control study nested within the HPFS. KS_NHS_HPFS: kidney stone study nested within the NHS and HPFS. RF_NHS: renal function study nested within the NHS. The allele corresponding to the regression parameter beta is listed in the parentheses.
Human Molecular Genetics, 2011, Vol. 20, No. 13 2675 4.6 × 10 24 ) compared with the normal skin ( Fig. 3; the data set was downloaded from NCBI GEO, accession number GSE3189; 9). Moreover, we evaluated the potential function of these linked two SNPs (rs3768080 and rs10754833) in the NID1 gene on the chromosome 1. The rs3768080 is located in intron 10 of NID1, and the rs10754833 is located in intron 9 of NID1. The mapping of enhancer and promoter histone marker H3K4Me1 in the ENCODE project (10) suggested that they are located in the enhancer regions. Furthermore, the analysis of potential transcriptional factorbinding sites around these two SNPs using the Biobase MATCH tool (11) shows that an extra binding site for the transcriptional factor AP1 was introduced by the rs3768080 A allele (cTGACCca, with a core match score of 0.957, and a matrix match score of 0.959). The AP1 is a well-accepted potent transcriptional activator. Hence, these two linked SNPs, rs3768080 A allele and the rs10754833 T allele, lead to an increased expression of NID1 with the transcription activator AP1. Based on the transcript expression profiling data of 87 HapMap CEU cell lines (NCBI GEO database, accession GSE7792) (12), the expression level of the NID1 locus was 2-fold higher among the rs10754833 T allele carriers than those of the CC genotype (P ¼ 0.017, Fig. 4) (data not available for rs3768080).
In this study, among the selected four regions for a fasttrack replication, the PLA2G6 gene on 22q13.1 (rs738322 Figure 2. Regional association plot in the 400 kb neighborhood of NID1. The left-hand Y-axis shows the association P-value of individual SNPs in the discovery set, which is plotted as 2log 10 (P) against chromosomal base-pair position. The right-hand Y-axis shows the recombination rate estimated from the HapMap CEU population. Genotyped SNPs are plotted as diamonds, and imputed as circles in grey. Blue highlights the SNPs of rs3768080 and rs10754833; bright red indicates high LD (r 2 ≥ 0.8) with rs3768080; orange moderate LD (r 2 ≥ 0.5 but ,0.8); yellow weak LD (r 2 ≥ 0.2 but ,0.5); and white no LD (r 2 , 0.2). The genomic coordinate is in NCBI35/hg17. Table 3. Associations of rs3768080 and rs10754833 in the NID1 gene with melanoma risk The OR was adjusted for the age and gender. Set 1: a case -control study nested within the NHS and HPFS. Set 2: a case -control study from the MD Anderson Cancer Center.

DISCUSSION
In this study, we identified a novel susceptibility locus to mole count, NID1. The NID1, a 150 kDa sulfated glycoprotein, is a member of the nidogen family of basement membrane (BM) proteins. As a major component of BM in the skin, nidogen plays an active role in BM assembly, i.e. it interacts with many other BM molecules, promoting those components as integrating elements for BM assembly (13). Hence, the NID1 gene is a biologically plausible candidate locus for nevogenesis and melanoma development.
Supporting this mechanism, we found that the rs3768080 A allele and rs10754833 T allele in the NID1 gene were associated with lower mole count and reduced melanoma risk. These data are consistent with their association with increased expression of the NID1 gene, considering that the expression levels of NID1 are decreased in benign nevi and primary melanoma compared with the normal skin. Although the association of the NID1 gene with mole count did not quite reach the genome-wide significance threshold, our additional functional data and melanoma results for this locus altogether substantiated the NID1 gene as a novel susceptibility locus to cutaneous nevi. One limitation of this study is that nevus count was selfreported; however, the majority of studies on nevus counts have shown a substantial agreement between nevus self-counts and dermatologist counts (14,15), and the Spearman correlation coefficient between nevus counts by patients and physicians was 0.91 (16). We have used the number of moles on arms as a proxy for the total body mole counts and systematic melanoma risk (17).
In this GWAS of individuals of European ancestry, we identified a novel locus, the NID1 gene on 1q42, which was associated with the number of melanocytic nevi on arms and melanoma risk. Identification of genetic susceptibility loci to nevi as melanoma precursors and intermediate phenotypes could contribute to improvements in risk assessment and stratification, early detection and prevention of melanoma, a highly lethal cancer.

Description of study populations
Nurses' health study (NHS). The NHS was established in 1976, when 121 700 female US-registered nurses between the ages of 30 and 55 residing in 11 larger US states completed and returned an initial self-administered questionnaire on their medical histories and baseline health-related exposures,  forming the basis for the NHS cohort. Biennial questionnaires with collection of exposure information on risk factors have been collected prospectively. Along with exposures every 2 years, outcome data with appropriate follow-up of reported disease events are collected. Overall, follow-up has been very high; after more than 20 years 90% of participants continue to complete questionnaires. From May 1989 through September 1990, we collected blood samples from 32 826 participants in the NHS cohort. The information on the number of melanocytic nevi on arms was collected in the 1986 questionnaire.
Health professionals follow-up study (HPFS). In 1986, 51 529 men from all 50 US states in health professions (dentists, pharmacists, optometrists, osteopath physicians, podiatrists and veterinarians) aged 40-75 answered a detailed mailed questionnaire, forming the basis of the study. The average follow-up rate for this cohort over 10 years is .90%. On each biennial questionnaire, we obtained disease-and health-related information. Between 1993 and 1994, 18 159 study participants provided blood samples by overnight courier. The information on mole count was collected in the 1987 questionnaire using wording similar to that used in the NHS.

Laboratory assays
Genotyping in five GWASs of the discovery set. We performed genotyping in BC_NHS using the Illumina HumanHap550 array, as part of the National Cancer Institute's Cancer Genetic Markers of Susceptibility (CGEMS) Project (18). For the other four GWASs of the discovery set, we performed genotyping using the Affymetrix 6.0 array. Based on the genotyped SNPs and haplotype information in the NCBI build 35 of phase II Hapmap CEU data, we imputed genotypes for .2.5 million SNPs using the program MACH. The quality control procedures for five GWAS sets of the discovery phase are described in Supplementary Methods. The number of genotyped SNPs passed quality control procedures and the imputed SNPs with minor allele frequency (MAF) . 2.5% and imputation R 2 . 0.3 in each study of the discovery set are presented as follows: Genotyping in replication set. Eight promising SNPs from the discovery set were selected for further replication in the replication set. (i) The genotyping for the KS_NHS_HPFS was performed using the Illumina HumanHap610 Quad, and the imputation was performed in the same fashion as in the discovery set. The genotype data we extracted for those eight SNPs and their imputation quality data are presented in Supplementary Material, Table S1. (ii) The genotyping for the RF_NHS was performed using Open Array assays at the Dana Farber/Harvard Cancer Center Polymorphism Detection Core.

Statistical methods
Both the NHS and HPFS prospectively collected self-reported information on the number of melanocytic nevi on the arms (larger than 3 mm in diameter) using similar wording; this variable is a highly significant predictor of melanoma risk in these cohorts (19). In each GWAS of the discovery set and each study of the replication set, we regressed an ordinal coding for the number of melanocytic nevi on arms (1 ¼ none; 2 ¼ 1-2; 3 ¼ 3-5; 4 ¼ 6-9; 5 ¼ 10-14; and 6 ¼ 15+) on an ordinal coding for genotype (0, 1 or 2 copies of the SNP minor allele) individually for each SNP that passed quality control filters. We adjusted for three largest principal components of genetic variation in the regression model of each GWAS of the discovery set and the replication set KS_NHS_HPFS. Among the 10 largest principal components, none was significantly associated with nevus count at twosided alpha of 0.0l, after adjusting for age in each GWAS of the discovery set in this study. These principal components were calculated for all individuals on the basis of 10 000 unlinked markers using the EIGENSTRAT software (20).
In each study of the discovery set, those SNPs with MAF . 2.5% and imputation R 2 . 0.3 were included in further meta-analysis. Betas from each study of the discovery set were combined by a meta-analysis with weights proportional to the inverse variance of the beta in each study. The same meta-analysis method was used to combine the results from the discovery set and replication set.