Abstract

Genetic susceptibility accounts for ∼35% of all colorectal cancer (CRC). Ten common low-risk variants contributing to CRC risk have been identified through genome-wide association studies (GWASs). In our GWAS, 610 664 genotyped single-nucleotide polymorphisms (SNPs) passed the quality control filtering in 371 German familial CRC patients and 1263 controls, and replication studies were conducted in four additional case–control sets (4915 cases and 5607 controls). Known risk loci at 8q24.21 and 11q23 were confirmed, and a previously unreported association, rs12701937, located between the genes GLI3 ( GLI family zinc finger 3) and INHBA (inhibin, beta A) [ P  = 1.1 × 10 −3 , odds ratio (OR) 1.14, 95% confidence interval (CI) 1.05–1.23, dominant model in the combined cohort], was identified. The association was stronger in familial cases compared with unselected cases ( P  = 2.0 × 10 −4 , OR 1.36, 95% CI 1.16–1.60, dominant model). Two other unreported SNPs, rs6038071, 40 kb upstream of CSNK2A1 (casein kinase 2, alpha 1 polypeptide) and an intronic marker in MYO3A (myosin IIIA), rs11014993, associated with CRC only in the familial CRC cases ( P  = 2.5 × 10 −3 , recessive model, and P  = 2.7 × 10 −4 , dominant model). Three software tools successfully pointed to the overrepresentation of genes related to the mitogen-activated protein kinase (MAPK) signalling pathways among the 1340 most strongly associated markers from the GWAS (allelic P value < 10 −3 ). The risk of CRC increased significantly with an increasing number of risk alleles in seven genes involved in MAPK signalling events (P trend  = 2.2 × 10 −16 , OR per allele  = 1.34, 95% CI 1.11–1.61).

Introduction

Colorectal cancer (CRC) is the third most common cancer and the fourth-leading cause of cancer death worldwide, with a lifetime risk in Western European and North American populations ∼5%. The majority of CRC cases (up to 80%) are sporadic ( 1 ), indicating that both genetic and environmental factors contribute to the disease aetiology, with inherited susceptibility being responsible for ∼35% of all CRC ( 2 ). However, hereditary high-risk germ line mutations in the APC gene, the DNA mismatch repair genes, MUTYH , and more rarely in the SMAD4, BMPR1A and STK11/LKB1 genes account for 3.6% of all cases, according to a Finish study ( 3 ).

To date, three genome-wide association studies (GWASs) ( 4–6 ), followed by a series of fast-tracked publications ( 7–9 ), have been carried out for CRC, leading to the identification of six susceptibility loci (8q23.3, 8q24.21, 10p14, 11q23, 15q13.3 and 18q21). A subsequent meta-analysis ( 10 ) of two of the three GWASs yielded four novel risk loci (14q22.2, 16q22.1, 19p13.1 and 20p12.3), including confirmation of these associations in additional large sample sets. Several follow-up studies have replicated these associations through genotyping of thousands of individuals ( 11–19 ). Despite this enormous effort, these 10 common low-penetrance variants exert only subtle effects on cancer risk, with odds ratios (ORs) ranging ∼1.1 to 1.2 ( 20 ), whereas the joint population attributable fraction for the first six risk loci described (without including the meta-analysis) was 52% ( 21 ). Thus, most of the estimated inherited susceptibility for CRC remains unknown.

We performed a GWAS on 384 German familial cases and 1285 ethnically matched healthy controls on the Affymetrix Genome Wide Human SNP Array 6.0 [906 600 single-nucleotide polymorphisms (SNPs)]. Familial cases were selected because they might be genetically enriched for susceptibility alleles, thus providing enhanced power to detect an association ( 22 , 23 ). Selected associated SNPs were included in up to four replication studies. There is an increasing focus on the idea of a complex interplay of molecules in a common pathway contributing to disease aetiology ( 24 ). Accordingly, we performed overrepresentation analyses among the most strongly associated SNPs from our GWAS (allelic P value < 10 −3 ) using three software tools, in order to search for enrichment of pathways or Gene Ontology (GO) categories.

Materials and methods

Study populations

A total of 384 unrelated familial CRC cases were included in the GWAS, 315 of them were identified from the regional cancer registry of Schleswig-Holstein (Germany) ( 25 ) and another 69 were included from the German HNPCC Consortium ( Table I ). Samples from the HNPCC Consortium were tested to be microsatellite stable (none of the five markers tested showed instability) and thus negative for germ line mutations in MSH2 and MLH1 ( 26 ). All patients had at least one first-degree relative with CRC (71% one relative, 16% two relatives, 7% three or more relatives and 6% with unknown number). The 1285 healthy controls, free of CRC at the time of recruitment, were randomly selected from the population-based control pool of the PopGen project in Schleswig-Holstein (Germany) ( 27 ).

Table I.

Details of individual case–controls series used in the genome-wide and the replication studies

 Origin Sample size Average age at recruitment (range) Male:female ratio Average age of CRC diagnosis (range) 
GWAS 
    Cases Germany (Kiel + HNPCC Consortium) 384  68.2 (49–86) a 0.99  64.4 (47–83) a 
No data (HNPCC)  46.5 (15–76) b 
    Controls Germany (PopGen) 1285 56.0 (19–77) 1.25 — 
Replication 1 
    Cases Germany (HNPCC Consortium) 575 No data 0.98 43.4 (9–82) 
    Controls Germany (Mannheim) 760 45.9 (26–68) — 
Replication 2 
    Cases Germany (Kiel) 2173 66.9 (17–94) 1.17 63.0 (10–90) 
    Controls Germany (PopGen) 322 73.6 (50–81) 0.86 — 
Germany (SHIP) 2230 60.9 (21–81) 0.92 — 
Replication 3 
    Cases Germany (DACHS) 1373 68.6 (33–95) 1.35 68.1 (33–94) 
    Controls Germany (DACHS) 1480 68.0 (34–98) 0.94 — 
Replication 4 
    Cases Czech Republic 794 62.3 (27–90) 1.33 61.1 (27–90) 
    Controls Czech Republic 815 54.4 (28–91) 1.38 — 
 Origin Sample size Average age at recruitment (range) Male:female ratio Average age of CRC diagnosis (range) 
GWAS 
    Cases Germany (Kiel + HNPCC Consortium) 384  68.2 (49–86) a 0.99  64.4 (47–83) a 
No data (HNPCC)  46.5 (15–76) b 
    Controls Germany (PopGen) 1285 56.0 (19–77) 1.25 — 
Replication 1 
    Cases Germany (HNPCC Consortium) 575 No data 0.98 43.4 (9–82) 
    Controls Germany (Mannheim) 760 45.9 (26–68) — 
Replication 2 
    Cases Germany (Kiel) 2173 66.9 (17–94) 1.17 63.0 (10–90) 
    Controls Germany (PopGen) 322 73.6 (50–81) 0.86 — 
Germany (SHIP) 2230 60.9 (21–81) 0.92 — 
Replication 3 
    Cases Germany (DACHS) 1373 68.6 (33–95) 1.35 68.1 (33–94) 
    Controls Germany (DACHS) 1480 68.0 (34–98) 0.94 — 
Replication 4 
    Cases Czech Republic 794 62.3 (27–90) 1.33 61.1 (27–90) 
    Controls Czech Republic 815 54.4 (28–91) 1.38 — 

SHIP, Study of Health In Pomerania; DACHS, Darmkrebs: CHancen der Verhuetung durch Screening (CRC: chances of prevention through screening).

a

Only for the 170 cases from Kiel with full data.

b

HNPCC Consortium samples.

Replication study 1 included 575 additional, independent CRC cases from the German HNPCC Consortium ( 26 ), also tested to be microsatellite stable. The 760 healthy controls were healthy voluntary blood donors recruited in Mannheim (Germany) ( 28 ). Replication 2 comprised 2173 or 1908 additional, independent cases identified from the cancer registry of Schleswig-Holstein, as well as 2552 or 1752 controls ( Table II ): 322/293 additional controls from the PopGen project and 2230/1459 randomly selected population-derived controls from the SHIP (Study of Health In Pomerania) study, free of CRC at the time of recruitment, and also from Northern Germany ( 29 ). Replication 3 included 1373 CRC cases and 1480 controls from the population-based DACHS (DArmkrebs: CHancen der Verhuetung durch Screening—CRC: chances of prevention through screening) study in the Rhine-Neckar region in the southwest of Germany ( 30 ). DACHS cases were recruited during first hospitalization due to cancer treatment or shortly afterwards at their homes, whereas controls were randomly selected from lists of residents supplied by population registries. Finally, replication 4 consisted of 794 CRC cases and 815 hospital-based colonoscopy negative controls from the Czech Republic ( 31 ). Controls were undergoing colonoscopy for various gastrointestinal complaints and in the frame of preventive examination. Only subjects whose colonoscopy results were negative for malignancy, colorectal adenomas, benign polyps or idiopathic bowel disease were chosen as controls, and this guarantees a very low risk of CRC for the next ≥20 years ( 30 ). For a further description of all patient and control groups see Table I .

Table II.

Association for the six SNPs genotyped in replication 1 (575 familial cases and 760 controls) and results for the three SNPs selected for further replications

 Cases Controls MAF cases MAF controls  OR het. (95% CI)   OR hom. (95% CI)  Best model P value  OR (95% CI) 
rs12701937 (7p14.1) [T] 
GWAS 371 1263 0.47 0.39 1.09 (0.83–1.42) 2.22 (1.59–3.11)    
Replication 1 575 760 0.48 0.40 1.58 (1.23–2.03) 1.76 (1.28–2.41)    
Replication 2 a 2173 2552 0.46 0.44 1.12 (0.98–1.28) 1.06 (0.90–1.26)    
Replication 3 1373 1480 0.42 0.42 1.00 (0.85–1.19) 1.02 (0.82–1.26)    
Replication 4 794 815 0.45 0.46 1.00 (0.77–1.29) 0.88 (0.64–1.21)    
All replications 4915 5607   1.12 (1.02–1.22) 1.10 (0.98–1.23) Dominant  1.3 × 10 −2 1.11 (1.02–1.21) 
GWAS + replications 5286 6870   1.11 (1.02–1.21) 1.21 (1.09–1.35) Dominant  1.1 × 10 −3 1.14 (1.05–1.23) 
rs6038071 (20p13) [T] 
GWAS 371 1263 0.11 0.07 1.66 (1.22–2.24) 4.61 (1.23–17.3)    
Replication 1 575 760 0.12 0.09 1.19 (0.90–1.57) 2.54 (1.01–6.42)    
Replication 2 a 1908 1752 0.09 0.09 0.94 (0.79–1.12) 0.73 (0.33–1.60)    
Replication 3 1373 1480 0.09 0.09 0.89 (0.72–1.08) 2.28 (1.03–5.05)    
All replications 3856 3992   0.96 (0.85–1.08) 1.50 (0.94–2.39) Recessive  8.4 × 10 −2 1.50 (0.94–2.40) 
GWAS + replications 4227 5225   1.05 (0.94–1.17) 1.80 (1.15–2.80) Recessive  9.4 × 10 −3 1.78 (1.15–2.77) 
rs11014993 (10p12.1) [C] 
GWAS 371 1263 0.24 0.21 0.86 (0.67–1.10) 3.05 (1.78–5.22)    
Replication 1 575 760 0.25 0.20 1.49 (1.18–1.88) 1.18 (0.73–1.91)    
Replication 2 a 1908 1752 0.21 0.22 0.98 (0.85–1.14) 0.78 (0.56–1.08)    
Replication 3 1373 1480 0.20 0.21 0.96 (0.82–1.13) 0.81 (0.56–1.18)    
All replications 3856 3992   1.05 (0.95–1.16) 0.85 (0.69–1.06) Recessive  1.0 × 10 −1 0.84 (0.67–1.03) 
GWAS + replications 4227 5225   0.99 (0.91–1.08) 1.05 (0.86–1.29) Recessive  7.2 × 10 −1 1.04 (0.85–1.27) 
rs12296076 (11q23 locus) [G] 
GWAS 371 1263 0.38 0.30 1.99 (1.51–2.62) 1.49 (0.92–2.40) Dominant  1.8 × 10 −6 1.90 (1.46–2.49) 
Replication 1 575 760 0.33 0.28 1.30 (1.03–1.64) 1.53 (1.04–2.24) Dominant  8.8 × 10 −3 1.34 (1.08–1.67) 
rs10956369 (8q24.21 locus) [T] 
GWAS 371 1263 0.48 0.40 1.25 (0.96–1.64) 2.03 (1.46–2.82) Recessive  6.4 × 10 −5 1.77 (1.33–2.35) 
Replication 1 575 760 0.44 0.37 1.35 (1.06–1.72) 1.75 (1.27–2.41) Recessive  7.1 × 10 −3 1.48 (1.11–1.97) 
rs10441105 (7p14.1) [C] 
GWAS 371 1263 0.45 0.53 0.55 (0.41–0.72) 0.55 (0.40–0.78) Dominant  2.2 × 10 −5 0.55 (0.42–0.72) 
Replication 1 575 760 0.42 0.48 0.83 (0.64–1.07) 0.61 (0.44–0.83) Recessive  6.1 × 10 −3 0.68 (0.52–0.90) 
 Cases Controls MAF cases MAF controls  OR het. (95% CI)   OR hom. (95% CI)  Best model P value  OR (95% CI) 
rs12701937 (7p14.1) [T] 
GWAS 371 1263 0.47 0.39 1.09 (0.83–1.42) 2.22 (1.59–3.11)    
Replication 1 575 760 0.48 0.40 1.58 (1.23–2.03) 1.76 (1.28–2.41)    
Replication 2 a 2173 2552 0.46 0.44 1.12 (0.98–1.28) 1.06 (0.90–1.26)    
Replication 3 1373 1480 0.42 0.42 1.00 (0.85–1.19) 1.02 (0.82–1.26)    
Replication 4 794 815 0.45 0.46 1.00 (0.77–1.29) 0.88 (0.64–1.21)    
All replications 4915 5607   1.12 (1.02–1.22) 1.10 (0.98–1.23) Dominant  1.3 × 10 −2 1.11 (1.02–1.21) 
GWAS + replications 5286 6870   1.11 (1.02–1.21) 1.21 (1.09–1.35) Dominant  1.1 × 10 −3 1.14 (1.05–1.23) 
rs6038071 (20p13) [T] 
GWAS 371 1263 0.11 0.07 1.66 (1.22–2.24) 4.61 (1.23–17.3)    
Replication 1 575 760 0.12 0.09 1.19 (0.90–1.57) 2.54 (1.01–6.42)    
Replication 2 a 1908 1752 0.09 0.09 0.94 (0.79–1.12) 0.73 (0.33–1.60)    
Replication 3 1373 1480 0.09 0.09 0.89 (0.72–1.08) 2.28 (1.03–5.05)    
All replications 3856 3992   0.96 (0.85–1.08) 1.50 (0.94–2.39) Recessive  8.4 × 10 −2 1.50 (0.94–2.40) 
GWAS + replications 4227 5225   1.05 (0.94–1.17) 1.80 (1.15–2.80) Recessive  9.4 × 10 −3 1.78 (1.15–2.77) 
rs11014993 (10p12.1) [C] 
GWAS 371 1263 0.24 0.21 0.86 (0.67–1.10) 3.05 (1.78–5.22)    
Replication 1 575 760 0.25 0.20 1.49 (1.18–1.88) 1.18 (0.73–1.91)    
Replication 2 a 1908 1752 0.21 0.22 0.98 (0.85–1.14) 0.78 (0.56–1.08)    
Replication 3 1373 1480 0.20 0.21 0.96 (0.82–1.13) 0.81 (0.56–1.18)    
All replications 3856 3992   1.05 (0.95–1.16) 0.85 (0.69–1.06) Recessive  1.0 × 10 −1 0.84 (0.67–1.03) 
GWAS + replications 4227 5225   0.99 (0.91–1.08) 1.05 (0.86–1.29) Recessive  7.2 × 10 −1 1.04 (0.85–1.27) 
rs12296076 (11q23 locus) [G] 
GWAS 371 1263 0.38 0.30 1.99 (1.51–2.62) 1.49 (0.92–2.40) Dominant  1.8 × 10 −6 1.90 (1.46–2.49) 
Replication 1 575 760 0.33 0.28 1.30 (1.03–1.64) 1.53 (1.04–2.24) Dominant  8.8 × 10 −3 1.34 (1.08–1.67) 
rs10956369 (8q24.21 locus) [T] 
GWAS 371 1263 0.48 0.40 1.25 (0.96–1.64) 2.03 (1.46–2.82) Recessive  6.4 × 10 −5 1.77 (1.33–2.35) 
Replication 1 575 760 0.44 0.37 1.35 (1.06–1.72) 1.75 (1.27–2.41) Recessive  7.1 × 10 −3 1.48 (1.11–1.97) 
rs10441105 (7p14.1) [C] 
GWAS 371 1263 0.45 0.53 0.55 (0.41–0.72) 0.55 (0.40–0.78) Dominant  2.2 × 10 −5 0.55 (0.42–0.72) 
Replication 1 575 760 0.42 0.48 0.83 (0.64–1.07) 0.61 (0.44–0.83) Recessive  6.1 × 10 −3 0.68 (0.52–0.90) 

Minor allele for each SNP is indicated in square brackets; MAF, minor allele frequency; OR (95% CI) for heterozygous (OR het. ) and homozygous (OR hom. ) genotypes are shown; P value and OR (95% CI) for the best model. All joint analyses were adjusted for study centre.

a

Due to DNA availability, different numbers of samples were genotyped for the three SNPs in replication 2 (see Materials and Methods).

The study protocols were approved by the ethics committee of all participating clinical centres. Written informed consent was obtained from all study participants.

GWAS

DNA was extracted from peripheral venous blood. All samples initially included in the GWAS (384 cases, 1285 controls) were genotyped for 906 600 SNPs using the Affymetrix 6.0 array by Affymetrix (Santa Clara, CA) according to the Affymetrix’s protocol, together with 14 Affymetrix controls.

Thirteen CRC cases and 22 controls failed genotyping due to standard quality control of genotype data by Affymetrix (call rate < 86% or preliminary Birdseed call rate < 95%), leaving a total of 371 cases and 1263 controls that were included in the association study.

Primary data analysis was performed with the Affymetrix Genotyping Console software, and the genotype calling was done using the Birdseed 2 algorithm in Affymetrix Power Tool ( http://www.affymetrix.com/partners_programs/programs/developer/tools/powertools.affx ). Before analysis, we excluded markers with one or more of the following criteria: <90% genotype call rate (61 152 SNPs), minor allele frequency < 5% in cases or controls (224 200 SNPs) or Hardy–Weinberg equilibrium exact P value < 10 −5 in cases or controls (78 289 SNPs). Additional 352 SNPs were rejected after visual quality inspection of the clustering pattern ( 32 ) using Affymetrix Genotyping Console (v3.0.2). Finally, 610 664 SNPs passed all quality control filters, with an average call rate of 97.5%.

To evaluate the evidence for association between each genotyped SNP and CRC, we first calculated the association between SNPs and CRC for allelic, genotype, dominant and recessive models based on logistic regression, using the whole-genome analysis package PLINK with the ‘model’ option ( 33 ), as we do not know whether variants are probably to add additively, dominantly or recessively. A total of 523 SNPs showed at least one of the four calculated P values for association (allelic, genotype, dominant or recessive) < 10 −4 and reliable clustering. The Manhattan plot, which plots −log10 of the allelic P value of each SNP along chromosomes and physical positions, was created by the genetic analysis package in CRAN R ( http://cran.r-project.org/web/packages/gap/index.html ). The impact of population stratification was evaluated by calculating the genomic control inflation factor lambda (λ) using the ‘qqunif()’ function in CRAN R. The genomic control λ value of 1.08 indicated a minimal overall inflation due to population stratification. This value of λ is identical with the recent results comparing Northern and Southern German populations ( 34 ). Given that the impact factor was found to be minimal, all the statistical results are reported without genomic control correction.

Since Bonferroni correction for multiple testing is agreed to be overly conservative in GWASs ( 35 ), we followed the Wellcome Trust Case Control Consortium thresholds of P < 1 × 10 −5 and P < 5 × 10 −7 as moderate and strong evidence for association, respectively ( 36 ).

Forward SNP selection procedure

Among the 523 SNPs with one of the four calculated P values for association <10 −4 and reliable clustering, selection for replication studies was based not only on the most extreme P values, but also on the following criteria, in an attempt to select the best candidate markers for CRC ( 37 ): (i) Two or more SNPs located within or close to a gene or intergenic but not further than 150 kb from each other (so-called ‘SNP-clusters’) (37 SNPs); (ii) SNPs located within known risk loci for CRC, according to previous GWASs [8q24.21 ( 5 , 6 ), 18q21 ( 7 ), 8q23.3 ( 9 ), 10p14 ( 9 ), 15q13.3 ( 8 ), 11q23 ( 4 ), the not confirmed 9p24 ( 6 ) and the four loci from a meta-analysis 14q22.2, 16q22.1, 19p13.1 and 20p12.3 ( 10 )] (2 SNPs); (iii) SNPs located in 451 candidate genes for CRC, based on high-throughput mutation analyses in CRC tumours ( 38 , 39 ), their possible relevance for CRC biology ( 6 ), or a transposon-based genetic screen in mice ( 40 ) (15 SNPs); (iv) coding SNPs (5 SNPs) and/or (v) comparison with the results of an unpublished GWAS for CRC (Jochen Hampe, personal communication) (18 SNPs). This resulted in selection of 53 SNPs for replication, 51 of them were successfully genotyped in replication 1 (see supplementary Table 1 , available at Carcinogenesis Online). The 470 SNPs not included in replication studies are listed in supplementary Table 2 , available at Carcinogenesis Online.

Replication studies

All DNAs used in the replication studies were extracted from peripheral venous blood and subjected to genome-wide amplification using the illustra GenomiPhi V2 DNA Amplification Kit (GE Healthcare, Freiburg, Germany) according to manufacturer’s instructions. Genotyping was performed at the German Cancer Research Center in Heidelberg and at the University of Kiel (Germany), using 5 ng of genomic amplified DNA in 384-well plate format, using KASPar assays on demand (50 SNPs) (KBiosciences, Hoddesdon, UK) or TaqMan (one SNP) (Applied Biosystems, Darmstadt, Germany) technologies and following the manufacturers’ protocols. Average genotyping call rate was 98%. Calculation of linkage disequilibrium (LD) and reconstruction of haplotypes between rs12701937 and rs10441105 at 7p14.1 were performed using Haploview ( 41 ). All joint analyses of replication studies were adjusted for study centre.

Overrepresentation analysis and gene–gene interactions

Three different tools were used to search for enrichment of pathways or Gene Ontology (GO) categories among the associated polymorphisms. The recently published approach ALIGATOR ( 42 ) considers the biological pathways as a unit of analysis, and tests for overrepresentation of categories of genes defined by GO terms on a list of significant SNPs, which are selected according to an arbitrary threshold of significance in the GWAS. The 610 664 SNPs analysed and their corresponding allelic P values for association were used as input, with default settings for calculations (5000 replicate gene lists and 1000 replicate studies). Three different thresholds for allelic P values of SNPs in the GWAS were tested in order to define a significantly associated SNP set for the enrichment analysis; 10 −4 (153 SNPs), 10 −3 (1340 SNPs) and 5 × 10 −3 (5130 SNPs). The highest level of significance for overrepresentation of GO categories was obtained using, as input list, the 1340 SNPs with allelic P value < 10 −3 , located within 350 different genes ( Table IV ). For all three cut-offs, ALIGATOR analysis was restricted to those SNPs from the input list located within genes, and only GO categories containing two or more genes were counted.

ConsensusPathDB ( 43 ) integrates more than 12 different interaction databases and 1738 pathways and carries out overrepresentation analysis within an uploaded gene list (pathway-based set enrichment tool). Input list of genes consisted of the 350 genes with SNPs associated in the GWAS with allelic P value < 10 −3 , also used by ALIGATOR for overrepresentation analysis. Background consisted of all genes with SNPs among the 610 664 analysed in the GWAS.

Finally, ToppGene Suite ( 44 ) was applied to rank a list of genes (test set) based on functional similarity (ToppGene tool) or topological feature in protein–protein interaction network (ToppNet tool) to a training gene list (training set). In both cases (ToppGene and ToppNet tool), the test gene list (test set) consisted of the same 350 genes from associated SNPs of the GWAS, also used for ALIGATOR and ConsensusPathDB. The training gene list (training set) consisted of 459 candidate genes for CRC, which included the 451 genes used for selection of SNPs for replication analysis (criterion c, see Materials and Methods/forward SNP selection procedure) and the eight genes located in or close to the known risk loci for CRC from previous GWASs ( SMAD7, EIF3H, SCG5, GREM1, FMN1, BMP4, CDH1 and RHPN2 ) ( 20 ). For each gene in the test gene list, the ToppGene tool derives a similarity score to the training gene list for 14 different features and combines all similarity scores into an overall score using statistical meta-analysis.

All three software tools pointed out to an overrepresentation of genes related to mitogen-activated protein kinase (MAPK) signalling events, with seven genes directly implicated in the MAPK signalling pathways in CRC ( 45 ). Therefore, the effect of an increasing number of risk alleles (one risk allele per each of the seven genes) on CRC risk was calculated from the GWAS data by counting two for a homozygote and one for a heterozygote genotype.

Epistasis between associated polymorphisms of the seven selected MAPK-related genes was tested using both logistic regression and multifactor-dimensionality reduction (MDR) methods for interaction. Logistic regression and MDR are two commonly used methods to estimate interaction between polymorphisms in association studies for complex diseases ( 46 ). Pair-wise interactions between the seven SNPs from MAPK-related genes were studied by logistic regression using the option ‘epistasis’ in the package PLINK ( 33 ), which assumes an allelic by allelic epistasis model for the interactions. For each SNP, all possible pair-wise combinations were tested. MDR is considered a non-parametric alternative to logistic regression for detecting interaction between polymorphisms. This model-free, non-parametric data reduction method classifies multi-locus genotypes into high-risk and low-risk groups ( 47 ). The MDR version 2.0 and the MDRpt version 1.0 module for permutation testing are open-source and freely available software ( http://www.epistasis.org/ ). The software estimates the importance of the signals by using both cross-validation and permutation testing, which generates an empirical P value for the result. Additionally, a dendrogram shows the type of interaction between each pair of SNPs (epistasis, independence or redundancy).

Results

Supplementary Figure 1 , available at Carcinogenesis Online, describes the individual steps from GWAS to replication studies and overrepresentation analysis.

GWAS

Four SNPs reached genome-wide strong evidence for association at allelic level ( P  < 5 × 10 −7 ) ( Figure 1 ): rs2209907 ( P  = 3.4 × 10 −8 ), located 0.8 Mb from TLE4 (transducin-like enhancer of split 4); rs16823149 ( P  = 5.5 × 10 −8 ), intronic in the open reading frame C1orf21 ; rs4140904 ( P  = 1.4 × 10 −7 ), located 0.5 Mb from NCAPG (non-SMC condensing I complex, subunit G) and rs4574118 ( P  = 1.8 × 10 −7 ), located 250 kb from PLGLA (plasminogen-like A, non-coding RNA). The polymorphism rs2209907 was selected for replication because it fulfilled two of the selection criteria used. For the other three markers, none of the linked SNPs showed association, and they were considered association artefacts.

Fig. 1.

Genome-wide association results from the initial GWAS analysis. The −log10 of the allelic P values from 610 664 polymorphic SNPs comparing 371 familial CRC cases and 1263 controls of German origin are represented. The chromosomal distribution is shown (Manhattan plot). The horizontal line represents a threshold of 10 5 for moderate evidence for association.

Fig. 1.

Genome-wide association results from the initial GWAS analysis. The −log10 of the allelic P values from 610 664 polymorphic SNPs comparing 371 familial CRC cases and 1263 controls of German origin are represented. The chromosomal distribution is shown (Manhattan plot). The horizontal line represents a threshold of 10 5 for moderate evidence for association.

No clear SNP peaks were identified in the Manhattan plot at the genome-wide significance level ( P  < 5 × 10 −7 ), whereas a cluster of three SNPs at chromosome 3 on the gene FAM19A1 [family with sequence similarity 19 (chemokine C-C motif-like), member A1] reached moderate evidence for association ( P  < 1 × 10 −5 ) ( Figure 1 ).

Sixteen additional SNPs reached moderate evidence for association (allelic P  < 1 × 10 −5 ) ( Figure 1 ). From the 10 CRC risk loci identified in previous GWASs ( 20 ), our study identified rs10956369 (allelic P  = 4.2 × 10 −5 ) at locus 8q24.21 and rs12296076 (allelic P  = 2.4 × 10 −4 ) at locus 11q23 among the 523 SNPs with a P value < 10 −4 . At chromosome 8q24.21, three additional polymorphisms reported in the previous GWASs were also associated, but at a lower level (rs10505477, P  = 5.9 × 10 −4 ; rs6983267, P  = 7 × 10 −4 and rs7014346, P  = 2.5 × 10 −4 ). At the other eight risk loci from previous GWASs, our study did not show association at P < 10 −2 for any of the reported SNPs or for any marker in LD with them. At the 20p12.3 region, the two previously reported SNPs rs961253 and rs355527 ( 10 ) were not present in our array, and the linked polymorphism rs2423154 showed a P value of 2.1 × 10 −3 for a dominant model.

Using additional functional and positional criteria as described in Materials and Methods, we selected 53 SNPs for replication studies from the SNPs with a P value <10 −4 ( supplementary Tables 1 and 2 are available at Carcinogenesis Online).

Replication 1

From the 53 SNPs selected for replication, 51 were successfully genotyped in 575 familial CRC cases (collected by the German HNPCC Consortium) and 760 controls ( supplementary Table 1 is available at Carcinogenesis Online). A total of six SNPs showed nominal association at allelic level ( P  < 5 × 10 −2 ): the two SNPs located in the known CRC risk loci, rs10956369 at 8q24.21 ( P  = 2.5 × 10 −4 ) and rs12296076 at 11q23 ( P  = 5.9 × 10 −3 ); one intronic marker in MYO3A (myosin IIIA), rs11014993 ( P  = 7.2 × 10 −3 ); rs6038071 ( P  = 3.9 × 10 −2 ), located 40 kb upstream of CSNK2A1 (casein kinase 2, alpha 1 polypeptide); two SNPs in high LD (Haploview LD plot: D′  = 1.0, r 2 = 0.62 in the 760 controls) located between the genes GLI3 ( GLI family zinc finger 3) and INHBA (inhibin, beta A), rs12701937 ( P  = 1.1 × 10 −4 ) and rs10441105 ( P  = 2.1 × 10 −3 ). For these last two SNPs, the haplotype T-G, containing the risk alleles of the two SNPs, was also associated with CRC ( P  = 8.0 × 10 −4 in the GWAS, P  = 1.0 × 10 −4 in replication 1).

Additional replications

From the six SNPs associated in replication 1, we selected three for further replication analysis ( Table II ): rs12701937 between GLI3 and INHBA , which is in high LD with rs10441105; rs6038071, close to CSKN2A1 , and rs11014993, the MYO3A intronic polymorphism. The two SNPs located in the risk loci 8q24.21 (rs10956369) and 11q23 (rs12296076) were not included in the further replications.

For rs12701937, none of the replications 2–4 showed association of this SNP with CRC. Joint analysis of all replication studies reached a borderline significance ( P  = 1.3 × 10 −2 for dominant model, OR 1.11, 95% CI 1.02–1.21) ( Table II ).

For rs11014993 and the rare rs6038071, neither replications 2–3 nor joint analysis of all of them showed association for any of the two SNPs with the risk of CRC ( Table II ). Only for rs6038071, a recessive inheritance model did show an association ( P  = 9.4 × 10 −3 , OR 1.78, 95% CI 1.15–2.77, for the joint analysis of GWAS and the replications, Table II ). However, this result should be taken with caution due to the low number of samples homozygous for the risk allele (48 cases and 34 controls).

When considering only familial cases of CRC (all cases from replication 1, 6% from replication 2 and 13% from replication 3), the associations were stronger than for all patients together, which were mostly sporadic cases ( Table III ). For a total of 880 familial cases and 4790 controls from replications 1–3, the P value for rs12701937 was 2.0 × 10 −4 and OR 1.36 (95% CI 1.16–1.60) for a dominant model. For rs6038071, the P value was 2.5 × 10 −3 for a recessive model (OR 2.49, 95% CI 1.35–4.59), and for rs11014993, it was 2.7 × 10 −4 for a dominant model (OR 1.32, 95% CI 1.14–1.54).

Table III.

Association for the three SNPs in familial CRC cases of replications 1–3

 Familial cases Controls  OR het. (95% CI)   OR hom. (95% CI)  Best model P value  OR (95% CI) 
rs12701937 (7p14.1) [T] 
All replications 880 4790 1.35 (1.13–1.60) 1.39 (1.13–1.71) Dominant  2.0 × 10 −4 1.36 (1.16–1.60) 
GWAS + replications 1251 6053 1.27 (1.09–1.46) 1.51 (1.27–1.80) Dominant  3.5 × 10 −5 1.33 (1.16–1.52) 
rs6038071 (20p13) [T] 
All replications 861 3992 1.09 (0.90–1.32) 2.53 (1.37–4.67) Recessive  2.5 × 10 −3 2.49 (1.35–4.59) 
GWAS + replications 1232 5225 1.21 (1.03–1.42) 2.73 (1.58–4.73) Recessive  3.0 × 10 −4 2.64 (1.52–4.57) 
rs11014993 (10p12.1) [C] 
All replications 861 3992 1.36 (1.17–1.60) 1.06 (0.75–1.50) Dominant  2.7 × 10 −4 1.32 (1.14–1.54) 
GWAS + replications 1232 5225 1.20 (1.05–1.37) 1.38 (1.04–1.82) Dominant  2.0 × 10 −3 1.22 (1.08–1.39) 
 Familial cases Controls  OR het. (95% CI)   OR hom. (95% CI)  Best model P value  OR (95% CI) 
rs12701937 (7p14.1) [T] 
All replications 880 4790 1.35 (1.13–1.60) 1.39 (1.13–1.71) Dominant  2.0 × 10 −4 1.36 (1.16–1.60) 
GWAS + replications 1251 6053 1.27 (1.09–1.46) 1.51 (1.27–1.80) Dominant  3.5 × 10 −5 1.33 (1.16–1.52) 
rs6038071 (20p13) [T] 
All replications 861 3992 1.09 (0.90–1.32) 2.53 (1.37–4.67) Recessive  2.5 × 10 −3 2.49 (1.35–4.59) 
GWAS + replications 1232 5225 1.21 (1.03–1.42) 2.73 (1.58–4.73) Recessive  3.0 × 10 −4 2.64 (1.52–4.57) 
rs11014993 (10p12.1) [C] 
All replications 861 3992 1.36 (1.17–1.60) 1.06 (0.75–1.50) Dominant  2.7 × 10 −4 1.32 (1.14–1.54) 
GWAS + replications 1232 5225 1.20 (1.05–1.37) 1.38 (1.04–1.82) Dominant  2.0 × 10 −3 1.22 (1.08–1.39) 

Minor allele for each SNP is indicated in square brackets; OR (95% CI) for heterozygous (OR het. ) and homozygous (OR hom. ) genotypes are shown; P value and OR (95% CI) for the best model. All results were adjusted for study centre.

Table IV.

ALIGATOR results

P value SNPs (GWAS)  Number top SNPs Number genes P < 0.05
 
P < 0.01
 
P < 0.001
 
Number categories P value  Number categories P value  Number categories P value  
10 −4 153 46 16 0.740 0.607 0.445 
10 −3 1340 350 213 73 0.001 0.072 
5 × 10 −3 5130 1057 156 0.155 27 0.342 0.400 
P value SNPs (GWAS)  Number top SNPs Number genes P < 0.05
 
P < 0.01
 
P < 0.001
 
Number categories P value  Number categories P value  Number categories P value  
10 −4 153 46 16 0.740 0.607 0.445 
10 −3 1340 350 213 73 0.001 0.072 
5 × 10 −3 5130 1057 156 0.155 27 0.342 0.400 

Number of GO categories reaching various levels of significance for overrepresentation ( P  < 0.05, P  < 0.01 or P  < 0.001) on the list of significant SNPs and their corresponding genes, together with P values indicating whether this number is significantly greater than that expected by chance. Most significant categories were obtained using the 1340 SNPs from the GWAS with allelic P value < 10 −3 , located within 350 different genes.

Overrepresentation analysis and gene–gene interactions

All three software tools used to detect enrichment of particular GO or pathway categories among associated SNPs from the GWAS pointed out to an overrepresentation of genes related to MAPK signalling events. First, the ALIGATOR software identified a significant enrichment of 73 GO categories among the 1340 most significantly associated SNPs (in 350 genes) from the GWAS with allelic P value < 10 −3 ( Table IV ). Nine of these enriched categories were related to the MAPK activities and four of them ranked in the first eight most significant positions ( supplementary Table 3A is available at Carcinogenesis Online). Genes with SNPs among the nine MAPK categories were MAP4K1, MAP4K5 , epidermal growth factor receptor (EGFR) ( ERBB1 ), AXIN1, GRM4, RAPGEF2 and TNIK . Second, and using the same 350 input genes from the GWAS, ConsensusPathDB identified ErbB4 signalling events as the category with the most prominent overlap (six overlapping genes) with the input gene list ( P  = 8.8 × 10 −4 ). This overlap included the genes ERBB4, EGFR (ERBB1), FYN, GRIN2B, PTPRZ1 and WWOX ( supplementary Table 3B is available at Carcinogenesis Online). Finally, and also with the same 350 input genes, ToppGene ranked three genes related to MAPK signalling pathways ( EGFR, AXIN1 and PRKCA ) among the four best candidate genes from the GWAS ( P < 2 × 10 −5 ), whereas ToppNet tool selected five genes from the same categories ( EGRF, FYN, AXIN1, PRKCA and GRIN2B ) among the 10 best candidates ( P < 3 × 10 −4 ) ( supplementary Tables 3C and D are available at Carcinogenesis Online). Combining the three approaches, a total of 15 genes related to the overrepresented MAPK signalling pathways were detected among the 350 genes from the 1340 most significantly associated SNPs from the GWAS.

Based on their biological function in the Entrez Gene database and their direct implication in the MAPK signalling pathways for CRC ( 45 ), only 7 of the 15 genes were selected as being directly involved in MAPK signalling events. These genes were EGFR (ERBB1), ERBB4, MAP4K1, MAP4K5, PRKCA, AXIN1 and RAPGEF2 . Combining genotypes of the seven most significantly associated SNPs (one from each gene) for the 371 cases and 1263 controls from the GWAS, we calculated ORs corresponding to an increasing number of risk alleles. The risk of CRC increased significantly, with a per allele OR of 1.34, 95% CI 1.11–1.61 (P trend = 2.2 × 10 −16 ). For carriers of more than four risk alleles, the risk of disease was increased ∼3-fold (OR 2.88, 95% CI 2.26–3.67), compared with carriers of less than or equal to four risk alleles ( Figure 2 ; supplementary Table 4 is available at Carcinogenesis Online).

Fig. 2.

Sample distribution according to the number of risk alleles in seven of the most significantly associated SNPs from the MAPK signalling pathway genes. CRC cases ( n = 371, grey columns), controls ( n = 1263, black columns).

Fig. 2.

Sample distribution according to the number of risk alleles in seven of the most significantly associated SNPs from the MAPK signalling pathway genes. CRC cases ( n = 371, grey columns), controls ( n = 1263, black columns).

Pair-wise interactions using logistic regression did not yield evidence for epistasis between any pair of risk alleles, with only borderline evidences of interaction for rs6593210–rs7251456 ( EGFR–MAP4K1 , P  = 0.095) and rs11157745–rs7251456 ( MAP4K5–MAP4K1 , P  = 0.064). Using MDR, the pair rs6593210–rs7251456 ( EGFR–MAP4K1 ) showed nominal evidence of an interaction, with a cross-validation consistency of 10/10, testing balanced accuracy of 55.5% and permutation P value of 0.015 (based on 1000 permutations). However, visualization of the corresponding interaction dendrogram revealed redundant interaction between any pair of SNPs, without evidence of epistatic relationship.

Discussion

In our GWAS, 371 familial CRC cases and 1263 healthy controls were successfully genotyped. After stringent quality control, a total of 523 polymorphic SNPs showed a P value for association <10 −4 for either allelic, genotype, dominant or recessive model.

Application of several functional and positional criteria led to the selection of 51 SNPs for replication in a cohort of 575 familial CRC cases and 760 controls (replication 1). Six of them showed nominal association ( P < 5 × 10 −2 ) with the same allele as in the GWAS. Three SNPs were selected for further replications (replications 2–4). Polymorphism rs12701937, located between the genes GLI3 and INHBA remained associated in the combined cohort ( P  = 1.1 × 10 −3 , OR 1.14, 95% CI 1.05–1.23 for a dominant model). Both rs12701937 and its neighbouring marker rs10441105 ( D′  = 1.0, r 2 = 0.62 in the HapMap CEU population) are located in a 37 kb LD block at chromosome 7 (41.588–41.625 Mb), 50 kb upstream from INHBA and 180 kb downstream from GLI3 , not in LD with any of the genes. The role of the sonic hedgehog (Shh) signalling, which includes GLI3 as a mediator, in CRC remains controversial. Alterations in the signalling have been shown in several cancer types ( 48 ), although they do not seem to be common in CRC ( 49 ). INHBA has been proposed as a potential marker for gastric ( 50 ) and head and neck squamous cell carcinomas ( 51 ).

Already published GWASs for CRC and a meta-analysis have so far identified 10 risk loci, with ORs ranging between 1.10 and 1.26 ( 20 ). For loci on chromosome regions 8q24.21 and 11q23, our GWAS identified SNPs (rs10956369 and rs12296076, respectively) with P values <10 −4 , which were confirmed in replication 1. The CRC susceptibility locus at 8q24.21 (rs6983267) ( 5 , 6 ), also associated with prostate, breast, ovarian and other cancers ( 52 ), was represented with several SNPs in our GWAS. Apart from rs10956369, the most significantly associated SNP in the region (allelic P  = 4.2 × 10 −5 , OR 1.41, 95% CI 1.20–1.66), the three polymorphisms reported in the previous GWASs showed also association (rs10505477, P  = 5.9 × 10 −4 ; rs6983267, P  = 7 × 10 −4 and rs7014346, P  = 2.5 × 10 −4 ). Recently, rs6983267 has been functionally identified as the plausible causal variant in the region ( 53 , 54 ). Since all these markers belong to the same LD block, our association with rs10956369 and its adjacent SNPs reflects the same signal as rs6983267. At chromosome 11q23, rs12296076 (allelic P  = 2.4 × 10 −4 , OR 1.42, 95% CI 1.18–1.71) is located 5 kb upstream of rs3802842, the most significantly associated SNP in the original GWAS ( 4 ). This SNP is not present in the Affymetrix 6.0 array used by us, but it belongs to the same LD block ( D′  = 0.95, r 2  = 0.79 in the HapMap CEU population). Interestingly, rs12296076 has been identified in silico as a polymorphic-binding site for miRNAs ( 4 ). No mutations have been identified in any of the transcripts in the region ( 16 ), thus the exact nature of the association remains unknown.

In an attempt to enhance the case group with ‘genetically enriched’ cases ( 55 ), all 371 CRC patients in our GWAS had at least one first-degree relative with CRC. This strategy of selecting only familial cases increases the efficiency and power of the case–control association study ( 23 ) and reduces the sample size required to detect common disease susceptibility alleles ( 22 ). Association results for familial cases, which were genotyped for three SNPs in replications 1–3, are consistent with this theory because the association signal for familial cases was stronger than for all patients, which were mostly sporadic. These results might indicate that markers selected from the GWAS for replication are more specific for familial than for sporadic CRC cases. On the other hand, the limited number of patients investigated in the GWAS (371 patients) compromised the power to detect weak signals expected in a common disease like CRC. For example, the known risk locus at 8q24.21 would not have passed our SNP selection criteria, if it had not been implicated in the previous GWASs.

Only a small fraction of the estimated heritability for CRC (6% of the full-sibling relative risk) can be explained by the so far discovered 10 low-risk loci ( 20 ). The missing heritability might be accounted by many additional common genetic variants with even smaller effect sizes ( 56 ), by rare variants with moderate to high penetrance which are not gauged by the GWA platforms ( 21 ), by copy number variants ( 57 ) or by more complex gene–gene ( 46 ) and gene–environment interactions ( 58 ). There is an increasing interest in searching for networks of genes, instead of single genes, contributing to disease ( 24 ). Accordingly, we searched for enrichment of particular pathways or GO categories among the 1340 most significantly associated markers from the GWAS (allelic P value < 10 −3 ), located within 350 genes. Using this input information, three different software tools pointed to the overrepresentation of genes related to the MAPK signalling pathways. We consider these results of special interest because enrichment tools are being an object of discussion due to their heterogeneity in the type of statistical analysis performed and lack of reproducibility of results among tools ( 59 ). The direct involvement of the MAPK signalling pathways in CRC is well known ( 45 ), and several recently developed drugs for CRC treatment are antagonist of EGFR, activation of which is the main trigger of the MAPK signalling cascade ( 60 ). Thus, these results demonstrate the validity of the enrichment approaches used and they provide a proof-of-principle demonstration ( 42 ). Using the genotypes from the GWAS, we observed a significant increase in ORs with the number of risk alleles from seven SNPs selected from the overrepresented MAPK categories ( Ptrend  = 2.2 × 10 −16 , OR per allele  = 1.34, 95% CI 1.11–1.61), with an OR 2.88 (95% CI 2.26–3.67) for carriers of more than four risk alleles. Pair-wise combinations and MDR analysis provided no evidence of interactive effects between any of the seven risk alleles from MAPK-related genes, suggesting that each locus independently carries CRC risk.

In conclusion, our GWAS replicated two known risk loci for CRC at chromosomes 8q24.21 and 11q23 and identified a possible new risk polymorphism, rs12701937, which should now be tested in larger studies. The overrepresentation analysis successfully identified an enrichment of the MAPK signalling pathways among the most significantly associated markers, and accumulation of risk alleles in an individual increased CRC risk. This and other similar approaches are clearly needed in order to reveal the large proportion of missing heritability of complex diseases such as CRC.

Supplementary material

Supplementary Tables 14 and Figure 1 can be found at http://carcin.oxfordjournals.org/

Funding

German National Genome Research Network (NGFN-Plus) (01GS08181); Deutsche Krebshilfe (German Cancer Aid) (107318); European Union (HEALTH-F4-2007-200767); Czech Science Foundation (GACR) (310/07/1430).

Abbreviations

    Abbreviations
  • CI

    confidence interval

  • CRC

    colorectal cancer

  • EGFR

    epidermal growth factor receptor

  • GO

    Gene Ontology

  • GWAS

    genome-wide association study

  • LD

    linkage disequilibrium

  • MAPK

    mitogen-activated protein kinase

  • MDR

    multifactor-dimensionality reduction

  • OR

    odds ratio

  • SNP

    single-nucleotide polymorphism

We would like to thank all participants who joined the study and the NGFN-Plus CCN Group.

Conflict of Interest Statement: None declared.

References

1.
Cheah
PY
Recent advances in colorectal cancer genetics and diagnostics
Crit. Rev. Oncol. Hematol.
 , 
2009
, vol. 
69
 (pg. 
45
-
55
)
2.
Lichtenstein
P
, et al.  . 
Environmental and heritable factors in the causation of cancer–analyses of cohorts of twins from Sweden, Denmark, and Finland
N. Engl. J. Med.
 , 
2000
, vol. 
343
 (pg. 
78
-
85
)
3.
Aaltonen
L
, et al.  . 
Explaining the familial colorectal cancer risk associated with mismatch repair (MMR)-deficient and MMR-stable tumors
Clin. Cancer Res.
 , 
2007
, vol. 
13
 (pg. 
356
-
361
)
4.
Tenesa
A
, et al.  . 
Genome-wide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21
Nat. Genet.
 , 
2008
, vol. 
40
 (pg. 
631
-
637
)
5.
Tomlinson
I
, et al.  . 
A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21
Nat. Genet.
 , 
2007
, vol. 
39
 (pg. 
984
-
988
)
6.
Zanke
BW
, et al.  . 
Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24
Nat. Genet.
 , 
2007
, vol. 
39
 (pg. 
989
-
994
)
7.
Broderick
P
, et al.  . 
A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk
Nat. Genet.
 , 
2007
, vol. 
39
 (pg. 
1315
-
1317
)
8.
Jaeger
E
, et al.  . 
Common genetic variants at the CRAC1 (HMPS) locus on chromosome 15q13.3 influence colorectal cancer risk
Nat. Genet.
 , 
2008
, vol. 
40
 (pg. 
26
-
28
)
9.
Tomlinson
IP
, et al.  . 
A genome-wide association study identifies colorectal cancer susceptibility loci on chromosomes 10p14 and 8q23.3
Nat. Genet.
 , 
2008
, vol. 
40
 (pg. 
623
-
630
)
10.
Houlston
RS
, et al.  . 
Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer
Nat. Genet.
 , 
2008
, vol. 
40
 (pg. 
1426
-
1435
)
11.
Berndt
SI
, et al.  . 
Pooled analysis of genetic variation at chromosome 8q24 and colorectal neoplasia risk
Hum. Mol. Genet.
 , 
2008
, vol. 
17
 (pg. 
2665
-
2672
)
12.
Curtin
K
, et al.  . 
Meta association of colorectal cancer confirms risk alleles at 8q24 and 18q21
Cancer Epidemiol. Biomarkers Prev.
 , 
2009
, vol. 
18
 (pg. 
616
-
621
)
13.
Li
L
, et al.  . 
A common 8q24 variant and the risk of colon cancer: a population-based case-control study
Cancer Epidemiol. Biomarkers Prev.
 , 
2008
, vol. 
17
 (pg. 
339
-
342
)
14.
Matsuo
K
, et al.  . 
Association between an 8q24 locus and the risk of colorectal cancer in Japanese
BMC Cancer
 , 
2009
, vol. 
9
 pg. 
379
 
15.
Middeldorp
A
, et al.  . 
Enrichment of low penetrance susceptibility loci in a Dutch familial colorectal cancer cohort
Cancer Epidemiol. Biomarkers Prev.
 , 
2009
, vol. 
18
 (pg. 
3062
-
3067
)
16.
Pittman
AM
, et al.  . 
Refinement of the basis and impact of common 11q23.1 variation to the risk of developing colorectal cancer
Hum. Mol. Genet.
 , 
2008
, vol. 
17
 (pg. 
3720
-
3727
)
17.
Poynter
JN
, et al.  . 
Variants on 9p24 and 8q24 are associated with risk of colorectal cancer: results from the Colon Cancer Family Registry
Cancer Res.
 , 
2007
, vol. 
67
 (pg. 
11128
-
11132
)
18.
Schafmayer
C
, et al.  . 
Investigation of the colorectal cancer susceptibility region on chromosome 8q24.21 in a large German case-control sample
Int. J. Cancer
 , 
2009
, vol. 
124
 (pg. 
75
-
80
)
19.
Slattery
ML
, et al.  . 
Increased risk of colon cancer associated with a genetic polymorphism of SMAD7
Cancer Res.
 , 
2010
, vol. 
70
 (pg. 
1479
-
1485
)
20.
Tenesa
A
, et al.  . 
New insights into the aetiology of colorectal cancer from genome-wide association studies
Nat. Rev. Genet.
 , 
2009
, vol. 
10
 (pg. 
353
-
358
)
21.
Hemminki
K
, et al.  . 
Surveying the genomic landscape of colorectal cancer
Am. J. Gastroenterol.
 , 
2009
, vol. 
104
 (pg. 
789
-
790
)
22.
Antoniou
AC
, et al.  . 
Polygenic inheritance of breast cancer: implications for design of association studies
Genet. Epidemiol.
 , 
2003
, vol. 
25
 (pg. 
190
-
202
)
23.
Lange
EM
, et al.  . 
Family-based samples can play an important role in genetic association studies
Cancer Epidemiol. Biomarkers Prev.
 , 
2008
, vol. 
17
 (pg. 
2208
-
2214
)
24.
Hardy
J
, et al.  . 
Genomewide association studies and human disease
N. Engl. J. Med.
 , 
2009
, vol. 
360
 (pg. 
1759
-
1768
)
25.
Schafmayer
C
, et al.  . 
Genetic investigation of DNA-repair pathway genes PMS2, MLH1, MSH2, MSH6, MUTYH, OGG1 and MTH1 in sporadic colon cancer
Int. J. Cancer
 , 
2007
, vol. 
121
 (pg. 
555
-
558
)
26.
Mangold
E
, et al.  . 
Spectrum and frequencies of mutations in MSH2 and MLH1 identified in 1,721 German families suspected of hereditary nonpolyposis colorectal cancer
Int. J. Cancer
 , 
2005
, vol. 
116
 (pg. 
692
-
702
)
27.
Krawczak
M
, et al.  . 
PopGen: population-based recruitment of patients and controls for the analysis of complex genotype-phenotype relationships
Community Genet.
 , 
2006
, vol. 
9
 (pg. 
55
-
61
)
28.
Frank
B
, et al.  . 
Ten recently identified associations between nsSNPs and colorectal cancer could not be replicated in German families
Cancer Lett.
 , 
2008
, vol. 
271
 (pg. 
153
-
157
)
29.
John
U
, et al.  . 
Study of Health In Pomerania (SHIP): a health examination survey in an east German region: objectives and design
Soz. Praventivmed.
 , 
2001
, vol. 
46
 (pg. 
186
-
194
)
30.
Brenner
H
, et al.  . 
Does a negative screening colonoscopy ever need to be repeated?
Gut
 , 
2006
, vol. 
55
 (pg. 
1145
-
1150
)
31.
Pechlivanis
S
, et al.  . 
Genetic variation in adipokine genes and risk of colorectal cancer
Eur. J. Endocrinol.
 , 
2009
, vol. 
160
 (pg. 
933
-
940
)
32.
McCarthy
MI
, et al.  . 
Genome-wide association studies for complex traits: consensus, uncertainty and challenges
Nat. Rev. Genet.
 , 
2008
, vol. 
9
 (pg. 
356
-
369
)
33.
Purcell
S
, et al.  . 
PLINK: a tool set for whole-genome association and population-based linkage analyses
Am. J. Hum. Genet.
 , 
2007
, vol. 
81
 (pg. 
559
-
575
)
34.
Nelis
M
, et al.  . 
Genetic structure of Europeans: a view from the North-East
PLoS One
 , 
2009
, vol. 
4
 pg. 
e5472
 
35.
McKay
JD
, et al.  . 
Lung cancer susceptibility locus at 5p15.33
Nat. Genet.
 , 
2008
, vol. 
40
 (pg. 
1404
-
1406
)
36.
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls
Nature
 , 
2007
, vol. 
447
 (pg. 
661
-
678
)
37.
Williams
SM
, et al.  . 
Problems with genome-wide association studies
Science
 , 
2007
, vol. 
316
 (pg. 
1840
-
1842
)
38.
Sjoblom
T
, et al.  . 
The consensus coding sequences of human breast and colorectal cancers
Science
 , 
2006
, vol. 
314
 (pg. 
268
-
274
)
39.
Wood
LD
, et al.  . 
The genomic landscapes of human breast and colorectal cancers
Science
 , 
2007
, vol. 
318
 (pg. 
1108
-
1113
)
40.
Starr
TK
, et al.  . 
A transposon-based genetic screen in mice identifies genes altered in colorectal cancer
Science
 , 
2009
, vol. 
323
 (pg. 
1747
-
1750
)
41.
Barrett
JC
, et al.  . 
Haploview: analysis and visualization of LD and haplotype maps
Bioinformatics
 , 
2005
, vol. 
21
 (pg. 
263
-
265
)
42.
Holmans
P
, et al.  . 
Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder
Am. J. Hum. Genet.
 , 
2009
, vol. 
85
 (pg. 
13
-
24
)
43.
Kamburov
A
, et al.  . 
ConsensusPathDB—a database for integrating human functional interaction networks
Nucleic Acids Res.
 , 
2009
, vol. 
37
 (pg. 
D623
-
D628
)
44.
Chen
J
, et al.  . 
ToppGene Suite for gene list enrichment analysis and candidate gene prioritization
Nucleic Acids Res.
 , 
2009
, vol. 
37
 (pg. 
W305
-
W311
)
45.
Fang
JY
, et al.  . 
The MAPK signalling pathways and colorectal cancer
Lancet Oncol.
 , 
2005
, vol. 
6
 (pg. 
322
-
327
)
46.
Cordell
HJ
Detecting gene-gene interactions that underlie human diseases
Nat. Rev. Genet.
 , 
2009
, vol. 
10
 (pg. 
392
-
404
)
47.
Ritchie
MD
, et al.  . 
Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer
Am. J. Hum. Genet.
 , 
2001
, vol. 
69
 (pg. 
138
-
147
)
48.
Kim
JE
, et al.  . 
Sonic hedgehog signaling proteins and ATP-binding cassette G2 are aberrantly expressed in diffuse large B-cell lymphoma
Mod. Pathol.
 , 
2009
, vol. 
22
 (pg. 
1312
-
1320
)
49.
Chatel
G
, et al.  . 
Hedgehog signaling pathway is inactive in colorectal cancer cell lines
Int. J. Cancer
 , 
2007
, vol. 
121
 (pg. 
2622
-
2627
)
50.
Takeno
A
, et al.  . 
Integrative approach for differentially overexpressed genes in gastric cancer by combining large-scale gene expression profiling and network analysis
Br. J. Cancer
 , 
2008
, vol. 
99
 (pg. 
1307
-
1315
)
51.
Shimizu
S
, et al.  . 
Identification of molecular targets in head and neck squamous cell carcinomas based on genome-wide gene expression profiling
Oncol. Rep.
 , 
2007
, vol. 
18
 (pg. 
1489
-
1497
)
52.
Ghoussaini
M
, et al.  . 
Multiple loci with different cancer specificities within the 8q24 gene desert
J. Natl Cancer Inst.
 , 
2008
, vol. 
100
 (pg. 
962
-
966
)
53.
Pomerantz
MM
, et al.  . 
The 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC in colorectal cancer
Nat. Genet.
 , 
2009
, vol. 
41
 (pg. 
882
-
884
)
54.
Tuupanen
S
, et al.  . 
The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling
Nat. Genet.
 , 
2009
, vol. 
41
 (pg. 
885
-
890
)
55.
Thomas
DC
, et al.  . 
Recent developments in genomewide association scans: a workshop summary and review
Am. J. Hum. Genet.
 , 
2005
, vol. 
77
 (pg. 
337
-
345
)
56.
Frazer
KA
, et al.  . 
Human genetic variation and its contribution to complex traits
Nat. Rev. Genet.
 , 
2009
, vol. 
10
 (pg. 
241
-
251
)
57.
Manolio
TA
, et al.  . 
Finding the missing heritability of complex diseases
Nature
 , 
2009
, vol. 
461
 (pg. 
747
-
753
)
58.
Maher
B
Personal genomes: the case of the missing heritability
Nature
 , 
2008
, vol. 
456
 (pg. 
18
-
21
)
59.
Huang da
W
, et al.  . 
Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists
Nucleic Acids Res.
 , 
2009
, vol. 
37
 (pg. 
1
-
13
)
60.
Ciardiello
F
, et al.  . 
EGFR antagonists in cancer treatment
N. Engl. J. Med.
 , 
2008
, vol. 
358
 (pg. 
1160
-
1174
)