Abstract

There are considerable expectations about the ability of genome-wide association (GWA) studies to make exciting discoveries about the role of genes in common diseases. GWA studies may allow researchers to identify causal pathways that have not been unveiled before, thus opening new avenues to disease understanding, prevention and therapy. However, there are still many open challenges. One is how to analyse these studies. The problem of false positives and false negatives provides an interesting methodological stimulus to find optimal solutions. Once main genetic effects have been concretely documented, the next question is how to proceed with the investigation of gene–gene and gene–environment interactions. It is possible that what really counts is not the main effect of genes but complex interactions. Finding and interpreting such interactions is not straightforward. Finally, continuous updated integration of all evidence, from both old studies, current GWA investigations and future replication studies, and careful interpretation of the strength of the evidence are crucial to maximize transparency and lead to informative selection of the next steps of research in this field. The present Commentary is a report of an Environmental Cancer Risk, Nutrition and Individual Susceptibility network Workshop held in Venice in October 2007 and discusses some of the problems outlined above, with examples.

What we expect from genome-wide association studies

Genome-wide association (GWA) studies have considerably contributed to the advancement of our knowledge of the genetic basis of disease. Knowledge of genetic risk factors may help elucidating mechanisms of disease aetiology and finding targets for drug development. Also, GWA studies help quantifying the genetic component of disease risk, at the population or individual level.

The design of current GWA studies makes them suitable mainly for the discovery of common variants conferring low/moderate risks, in the context of the common disease–common variant hypothesis (1). Common and some uncommon variants associated with high risks have presumably been already identified by other means (e.g. linkage studies), while rare variants with low risks clearly pose serious methodological challenges.

The HapMap project (2) provided the necessary background for the conduct of GWA studies by allowing to identify sets of tagging single-nucleotide polymorphisms (SNPs) that cover the entire variability at the genomic level, or most of it. The availability of inexpensive technologies for high-throughput genotyping has allowed performing GWA scans for several diseases, including various cancers, cardiovascular and neurological conditions, diabetes as well as traits of pharmacogenetic interest, and non-disease end points, such as gene expression and height.

Most GWA studies yield at least one highly statistically significant association but almost all the associations are weak [small odds ratios (ORs), in the order of 1.1–1.6]. With consistent replication across many teams, the statistical support tends to be very strong. So far, almost all results from cancer GWA studies pointed to novel loci, unidentified before. Some of the most robust results for cancer are shown in Table I, with relative risks in the order of 1.06–2.23, but mostly ∼1.2.

Table I

Examples of associations found in recent cancer GWAS (up to October 2007)

Cancer site Locus (SNP) ORa PReference 
Breast FGFR2 (rs2981582) 1.23–1.63 2 × 10−76 (3,4
TNRC9 (rs3803662) 1.23–1.39 10−36 (3,5
MAP3K1 (rs889312) 1.13–1.27 7 × 10−20 (3
8q (rs13281615) 1.06–1.18 5 × 10−12 (3
LSP1 (rs3817198) 1.06–1.17 3 × 10−9 (3
2q35 (rs13387042) 1.20 1.2 × 10−13 (5
Prostate 8q24 (rs1447295) 1.43–2.23 1.53 × 10−14 (6,7
8q24 (rs6983267) 1.26–1.58 9.42 × 10−13 (6
Colorectum 8q24 (rs6983267) 1.21 1.27 × 10−14 (8,9
Cancer site Locus (SNP) ORa PReference 
Breast FGFR2 (rs2981582) 1.23–1.63 2 × 10−76 (3,4
TNRC9 (rs3803662) 1.23–1.39 10−36 (3,5
MAP3K1 (rs889312) 1.13–1.27 7 × 10−20 (3
8q (rs13281615) 1.06–1.18 5 × 10−12 (3
LSP1 (rs3817198) 1.06–1.17 3 × 10−9 (3
2q35 (rs13387042) 1.20 1.2 × 10−13 (5
Prostate 8q24 (rs1447295) 1.43–2.23 1.53 × 10−14 (6,7
8q24 (rs6983267) 1.26–1.58 9.42 × 10−13 (6
Colorectum 8q24 (rs6983267) 1.21 1.27 × 10−14 (8,9

Only associations reaching genome-wide statistical significance are reported. [Additional updated information is available in (33).]

a

ORs of association with cancer risk, in combined analysis of scan and replication samples. When two ORs are reported, they correspond to the association of the heterozygotes and of the homozygotes for the risk allele, respectively. If only one OR is reported, it corresponds to the risk per allele.

b

P-value of trend of association with cancer risk, in combined analysis of scan and replication samples. When an association has been reported multiple times, only the results with the strongest statistical significance are shown.

There is a large number of challenges that emerge in this new era of gene discovery. The present paper summarizes the discussion of some of these challenges in an ECNIS workshop that was held in Venice in October 12, 2007. We focus on the following issues: what criteria to use to choose variants for replication, how to deal with false positives and false negatives, statistical issues in assessing gene–gene interactions (GGIs), challenges in summarizing and grading the accumulated genome-wide evidence in synopses that cover information across entire fields of investigation and how to integrate evidence on gene–environment interactions (GEIs) in the new paradigm.

Criteria to choose variants for further replication

Replication is really crucial in GWA studies and different strategies have been suggested and used. The multistage design represented in Figure 1 is used in the US National Cancer Institute studies. One critical question that affects the number of false negatives and false positives is how are the SNPs chosen from one phase to the following? The P-value criterion has been used almost exclusively, but it is not clear yet how much it costs in terms of false negatives. A true positive variant with low minor allele frequency (MAF) will have few carriers, therefore a wide confidence interval (CI) and weak statistical significance, even if it has a slightly higher OR. This is obviously the main limitation of the P-value criterion. Bayesian prioritization approaches that offer the possibility to give different a priori weights to different classes of SNPs may be attractive, as discussed below.

Fig. 1

Multistage design for GWAS (from http://cgems.cancer.gov). It should be noticed that the numbers of phases and of subjects in each phase is not fixed. With continuously falling prices for genotyping, the current trend is to have a whole-genome scan phase as large as possible (e.g. 2000 case–control pairs).

Fig. 1

Multistage design for GWAS (from http://cgems.cancer.gov). It should be noticed that the numbers of phases and of subjects in each phase is not fixed. With continuously falling prices for genotyping, the current trend is to have a whole-genome scan phase as large as possible (e.g. 2000 case–control pairs).

A provocative question is who cares about risks of 1.15–1.3? Individually, a small risk is irrelevant, but the combination of several low-risk alleles can add up to substantial risks, even in the absence of multiplicative statistical interactions (Table II). A critical piece of missing data is that we have no idea of how many common variants with OR  1.2 there are in the genome. One potential approach to this problem could be to meta-analyse the studies conducted so far, reporting risks of that magnitude and also consider the power of those studies. On the basis of such information, it would be possible to compute at least an expected number of associations of that magnitude. Once many variants giving small risks are ‘convincingly’ demonstrated (see below, ‘Venice guidelines’), multifactorial risk models can be built, with the aim of group and individual risk prediction.

Table II

Additive combinations of multiple weak risk factors

Number of risk factors (k)a % of subjects who carry k risk allelesb % of subjects who carry at least k risk alleles ORc PAF 
0.003 – 1.00 – 
0.030 99.99 1.25 0.0001 
0.162 99.97 1.50 0.0008 
0.589 99.81 1.75 0.0044 
1.587 99.22 2.00 0.0156 
3.387 97.63 2.25 0.0406 
5.958 94.24 2.50 0.0820 
8.890 88.28 2.75 0.1346 
11.482 79.39 3.00 0.1868 
13.042 67.91 3.25 0.2269 
10 13.187 54.87 3.50 0.2479 
11 11.988 41.68 3.75 0.2479 
12 9.879 29.70 4.00 0.2286 
13 7.430 19.82 4.25 0.1945 
14 5.130 12.39 4.50 0.1522 
15 3.268 7.26 4.75 0.1092 
16 1.929 3.99 5.00 0.0716 
17 1.059 2.06 5.25 0.0431 
18 0.543 1.00 5.50 0.0238 
19 0.260 0.46 5.75 0.0122 
20 0.117 0.20 6.00 0.0058 
100 10−98 5.38 × 10−13 26.00 0.0000 
Number of risk factors (k)a % of subjects who carry k risk allelesb % of subjects who carry at least k risk alleles ORc PAF 
0.003 – 1.00 – 
0.030 99.99 1.25 0.0001 
0.162 99.97 1.50 0.0008 
0.589 99.81 1.75 0.0044 
1.587 99.22 2.00 0.0156 
3.387 97.63 2.25 0.0406 
5.958 94.24 2.50 0.0820 
8.890 88.28 2.75 0.1346 
11.482 79.39 3.00 0.1868 
13.042 67.91 3.25 0.2269 
10 13.187 54.87 3.50 0.2479 
11 11.988 41.68 3.75 0.2479 
12 9.879 29.70 4.00 0.2286 
13 7.430 19.82 4.25 0.1945 
14 5.130 12.39 4.50 0.1522 
15 3.268 7.26 4.75 0.1092 
16 1.929 3.99 5.00 0.0716 
17 1.059 2.06 5.25 0.0431 
18 0.543 1.00 5.50 0.0238 
19 0.260 0.46 5.75 0.0122 
20 0.117 0.20 6.00 0.0058 
100 10−98 5.38 × 10−13 26.00 0.0000 

PAF, population attributable fraction.

a

A total of 100 true risk factors were simulated, each conferring a relative risk of 1.25 and with an MAF of 10%.

b

In each line are counted the subjects carrying combinations of any k risk alleles.

c

ORs calculated with a simple additive model. For example, for subjects with 10 risk alleles the relative risk would be 3.5. These subjects would represent 13% of the population, and over 54% of the population would carry 10 risk alleles or more.

False positives: prioritization approaches for association signals

Consideration of the false-positive report probability (10) can provide a genome-wide P-value threshold, a priori, for defining strong signals (11). A more stringent prior P-value threshold may need to be employed in a smaller GWA study (with lower power) than a larger one (10–12). However, defining genome-wide significance by means of a threshold is not ideal (11).

Bayes factor (BF) has been proposed as a test-based alternative to P-values for prioritization (11,13). BF is obtained from the observed association data and takes the form 

graphic
where M1 denotes a statistical model with some (loosely) specified effects of every copy of a given allele on the log odds of the disease and M0 denotes the model under the null effect. Hence, calculation of BFs requires some assumption about reasonable effect sizes. Wakefield (13) considered an approximate BF approach and proposed a new measure, the Bayesian false discovery probability, for assessing the noteworthiness of an observed association. Analogous to the false-positive report probability, the Bayesian false discovery rate incorporates the prior odds of an association, with additional consideration to Bayesian decision theory.

The probability of pronounced (also referred to as ‘marked’) effect size has been proposed as another association signal measure in GWA studies (14). By contrast to the P-values and BFs of ‘test-based’ procedures (with respect to the null hypothesis/null effect model M0), respectively, the probabilities of marked effect size are obtained from a semi-Bayesian ‘estimation-based’ procedure. First, the effects of each candidate SNPs, together with their 95% CIs, are estimated using a conventional statistical method. Second, a semi-Bayes method is used, adjusting the conventional effect size estimates. Such adjustments pull outlying effect size estimates towards the null effect and lead to narrower 95% CIs than with the conventional estimation method. Third, the probability of pronounced effect size is calculated for each candidate SNP. The pronounced effect size can, for example, be defined as a per-allele OR of at least 10% above––or below––the null effect. The estimation-based approach requires a prior assumption regarding the variance of the true values of the effect sizes across the candidate SNPs (or across different classes of SNPs).

In a recent evaluation of the three types of association signals––calculated as part of the Type 2 diabetes component of the Wellcome Trust Case Control Consortium GWAS (1924 cases and 2938 controls; 393 453 candidate autosomal SNPs)––close agreement between the test-based and estimation-based approaches was found for strong association signals only (P < 5 × 10−7) (Strömberg et al., submitted). When examining variants showing up to moderate evidence of test-based association (5 × 10−7 < P < 1 × 103), markedly weaker estimation-based signals, while similar test-based BF signals, were observed for rare SNPs (<10% MAF among the controls), compared to the corresponding signals for common SNPs with equivalent P-values. Strömberg et al. (submitted) proposed tailoring of the signal selection strategy to (i) the genetic architecture of disease examined and (ii) power afforded by follow-up samples for establishing replication.

False negatives

GWA studies may miss truly positive/causal associations if the variants of interest are infrequent, they are associated with low/moderate risks or they not tagged by the tested SNPs. In fact, false negatives tend to arise because they are outside the remit of GWA studies. For example, the CHEK-2 T/C (I157T) polymorphism is rare in most populations that have been studied (1% or less), and it increased the risk of some cancers (breast, prostate, kidney) possibly due to impaired CHEK2 function (15,16). It appears to decrease the risk of cancers related to tobacco, but the reason is not clear (17). Given the rarity of the minor allele, it is unlikely that this gene would show up in GWA studies. A second example is the ADH genes that govern the rate of elimination of alcohol to acetaldehyde. Some individuals may eradicate ethanol up to 100 times faster than others depending on ADH1B genotype. ADH genes appear to be strong risk factors also for head and neck cancer (18). In one study, the ADH1B fast metabolizing genotype had a strong protective effect (P < 0.0003), and a greater protective effect was found among heavy drinkers (19).

Statistical problems in GGIs

One of the main challenges with GWA studies is how to analyse and model GGIs, also called epistasis. There are many analytical options, but no single method is best for all scenarios. Methods can be classified into two groups, extensions of regression analysis and data reduction. A list is shown in the Appendix and only the MDR method will be described as an example.

Single-locus versus epistasis analysis

Let us start with the simpler case, single-locus analysis. Consider 500 000 SNPs across the human genome and imagine we perform 500 000 chi-square analyses. If Type I error is set at 5%, we can expect 25 000 false-positive results; a lower threshold for Type I error will reduce false positives, but also limit power. Now let us imagine that we perform a complete epistasis analysis. This means 2 × 1026 combinations, i.e. 106 combinations per second, i.e. 5.8 × 1018 days to complete (1.5 × 1016 years)! Due to an effectively infinite number of combinations, we cannot look at all of them, thus the need to ‘filter’ them before conducting an MDR analysis.

Multifactor dimensionality reduction

Multifactor dimensionality reduction (MDR) is one of the proposed methods. It is a model-free, data-based exploratory method, more flexible and powerful than conventional statistical methods. An MDR method has been developed by M. D. Ritchie, L. W. Hahn and J. H. Moore to detect and characterize high-order GGIs and GEIs in studies with relatively small sample sizes. J. H. Moore et al. also developed MDR open-source software. The goal of MDR is to find the main factors and the combinations of 2, …, N factors that are more frequently associated with case than with control status (adjusted for the ratio between them). Briefly, to search for the best n-loci model (with n = 1, …, N), the data set is randomly divided into a number of equal parts for cross-validation. A training set of 9/10 of the data is used to search for the best model, i.e. to classify each genotype combination as a high-risk or a low-risk pattern, depending on the number of cases and controls that present that combination. The remaining 1/10 of the data is the testing set, used to control the goodness-of-fit of the model. This procedure is repeated 10 times, in order to use all the possible testing sets. For each n-loci model, the MDR method gives two scores, a ‘mean prediction error percentage’ and the ‘cross-validation consistency’ (CVC) frequency. The former is the proportion of subjects for whom an incorrect class prediction was made, while the latter is the number of times a particular combination of loci (model) is identified in each possible testing set. The best model is that with lower prediction error and maximum CVC. Finally, to evaluate the magnitude of the prediction error and the CVC, one permutes the status of cases and controls in the data set and repeats the analysis 1000 times, obtaining for each n-loci model the prediction error and the CVC distributions under the null hypothesis of no association. Comparing the results and these distributions, one obtains the P-values associated with each prediction error and CVC.

Limitations of MDR

Although MDR is an example of the algorithmic approach to data analysis favoured by data miners, it has many similarities with the model-based methods currently popular with statisticians. Statistical models are themselves used in two ways, to develop understanding and for prediction (Table III). In a logistic regression analysis of the interaction between two SNPs, we could fit three models to the proportion, p, with the disease,

Table III

Summary of some important differences between modelling for prediction and modelling for understanding

 Prediction Understanding 
Question Who is at high risk? Which genes interact? 
Consequences Treatment of subjects Bioinformatics, replication studies 
Model choice Less important Vital 
Key model terms Joint effects Interactions 
Model complexity May include high-order terms Low-order terms 
 Prediction Understanding 
Question Who is at high risk? Which genes interact? 
Consequences Treatment of subjects Bioinformatics, replication studies 
Model choice Less important Vital 
Key model terms Joint effects Interactions 
Model complexity May include high-order terms Low-order terms 
  • I: logit(p) = constant

  • II: logit(p) = constant + SNP1 + SNP2

  • III: logit(p) = constant + SNP1 + SNP2 + interaction

Interaction is usually assessed by contrasting the fit under models II and III and as such tells us if there is any effect over and above the additive main effects of the two SNPs. It would also be possible to contrast model III with model I in a test that looks for any effect of either SNP. Under model I, the fitted values are all equal to the overall proportion with the disease and, as model III is saturated, its fitted values are equal to the observed proportions in each cell. In MDR, we classify the cells of the table of SNP1 by SNP2 into high and low risk using the contrast between the observed proportions and the overall proportion. The only difference from the statistical comparison is that in place of the likelihood ratio, MDR compares the fit of models I and III using the prediction error. Two immediate consequences of this are that MDR is unable to distinguish main effects from interactions, and its measure of difference is only really suitable if we are modelling for prediction. If we are interested in questions of understanding, for example, if we wish to ask “do SNPs X and Y influence disease risk” then likelihood ratio tests are likely to be more sensitive.

MDR is sometimes advocated because it can handle large complex interactions between sets of many SNPs. However, as with any statistical analysis, MDR can be adversely affected by sparseness in the data. Consider the following hypothetical example in which we compute the prediction error with the MDR method. Randomly, allocate 200 cases and 200 controls to cells so that the average prediction error should be 50%. Start with one SNP so that there are three cells and then increase the number of SNPs to six so that there are then 36 cells most with no or very few subjects. As expected, MDR's simulated prediction error is 50% with three cells, but it falls to 11% with 36 cells. The interpretation of this requires significant caution. First, with a sample of 200 cases and 200 controls, building a six-SNP model is not likely to yield relevant results. Second, as you increase in dimensions, the samples become sparse. This results in the inability to estimate prediction error as there must be samples observed in training and testing to evaluate a prediction error. Thus, as we increase the dimension of the problem, sparseness increases and MDR might appear to find solutions with low prediction error even when no such solution exists. It is imperative that users of this approach are cautious when exploring high dimensions and pay attention to the number of samples actually evaluated in the prediction error estimates in high dimensions. We should be cautious of using MDR with large numbers of SNPs because of the impact of sparseness [for further reading on these issues see (20–22)].

How to summarize and grade the genome-wide epidemiological evidence

Field synopses

The rapid increase in the amount of data on genetic associations creates an ongoing challenge to synthesize findings and to appraise the credibility of cumulative evidence on the relationship between human genome variation and common complex diseases. This information is crucial not only to drive research in the field but also to translate its results into useful applications for health care and disease prevention (23–26).

The data to be summarized are expected to come from older candidate gene studies, GWA studies and their early replication studies that are often published in the same article as GWA results and subsequent studies that try to further replicate independently or refine (e.g. testing more markers in the vicinity) signals that emerge from GWA testing. A major challenge is to keep comprehensive and updated synopses of all associations that have been tested in a wider field. Fields may usually be defined by disease phenotype, but some fields may focus on specific families of genes and covering all types of phenotypes with which these genes have been associated.

Examples of disease-specific fields where such synopses have already been performed and continue being updated on a routine basis include Alzheimer's disease (27) where data from GWA studies are already incorporated along with data from traditional candidate gene studies (http://www.alzforum.org/res/com/gen/alzgene/default.asp), schizophrenia (http://www.schizophreniaforum.org/res/sczgene/) and Parkinson's disease (http://www.pdgene.org).

As an example of gene-specific field, we have recently undertaken the task of collecting and regularly updating the cumulative data on associations between DNA repair genes and diverse cancers. Variation in four different pathways where DNA repair genes are involved may be associated with cancer at different sites. An online database is maintained at http://www.episat.org/episat/index.php with detailed information on each eligible study. The database started with the inclusion of all candidate gene studies (based on PubMed and HuGE PubLit searches) and is being expanded to include also data from GWA studies that have covered cancer phenotypes and have included any of the gene variants of interest. Meta-analyses are performed whenever there are at least three data sets that have studied a specific gene variant in association with a specific cancer. The searches and the respective meta-analyses are expected to be updated on a 6-month schedule to keep up with the rapid pace of data accumulation. We identified all published articles in which the frequencies of DNA repair alleles were determined for patients with any type of human cancer and cancer-free controls so that a case–control comparison could be performed. Meta-analyses use random effects models and the I2 metric is used as a measure of the extent of between-study heterogeneity.

Assessment of cumulative evidence: the Venice guidelines

In the ongoing field synopses, we have started applying to each nominally significant association a grading system that has been recently developed to assess the epidemiological strength of the cumulative evidence (the Venice guidelines) (28). Briefly, each meta-analysed association is graded based on the amount of evidence, replication and protection from bias. For amount of evidence, grade is A when the total number of minor alleles of cases and controls combined in the meta-analyses exceeds 1000, B when it is between 100 and 1000 and C when it is <100. For ‘replication and consistency’, point estimates of I2 exceeding 50% get a C grade, values of 25–50% get a B grade, and values <25% get A grade. For ‘protection from bias’, grade A means that bias, if present, may change the magnitude but not the presence of an association; grade B means that there is no evidence of bias that would invalidate an association, but important information is missing and grade C means that there is demonstrable potential or clear bias that may invalidate the mere presence of an association. The potential sources of bias that are considered include errors in phenotypes, genotypes, confounding (population stratification) and errors/biases at the meta-analysis level (publication and other selection biases). Associations that get three A grades are considered to have ‘strong’ epidemiological credibility, associations that get any B but not any C grade are assigned ‘moderate’ credibility and associations that get any C grade as considered to have ‘weak’ credibility.

Based on preliminary experience from the first field synopses where the Venice criteria have been applied (schizophrenia, DNA repair), a substantial number of associations that have been studied in several studies have nominally statistical significant results. Most of these have sufficient sample size and get an A for amount of evidence. However, many have at least modest or large between-study heterogeneity and thus they do not get an A for replication consistency. Finally, the majority have some hints of possible or obvious bias that do not allow them to get an A for protection from bias. The main reasons for low protection from bias is the presence of small effect sizes (OR < 1.15) in retrospective meta-analyses that can easily be dissipated even by relatively small biases that may arise in this setting (publication bias and other selective reporting). For meta-analyses that are based on prospective consortia, such as the meta-analyses of the replicating teams that accompany the initial GWA signals in the same paper, small effect sizes can be trusted since the consortium represents a mechanism that protects from publication bias, provided that all eligible obtained data within the consortium are synthesized. The advent of dbGAP and Genetic Association Information Network initiative with transparent sharing of all data in the public domain will hopefully be useful in further diminishing the problems of selective reporting biases (29).

Integrating evidence on GEI data

Genetic associations that emerge from both candidate gene and GWA approaches need to be further examined in terms of the interactions that they may have with environmental exposures. Data on GEIs are still relatively scant compared with the tons of information generated on gene-only associations. However, they will be essential in understanding the full picture of susceptibility for various diseases and how we could target specific modifiable exposures to decrease disease risk.

The quality of studies on GEI has been extensively examined within a large database, Genetic Susceptibility to Environmental Carcinogens (GSEC, http://www.upci.upmc.edu/research/ccps/ccontrol/g_intro.html), through the example of a single-gene and a single-organ site, GSTM1 and lung cancer. We identified 78 published studies, including 17 400 cases and 22 146 controls, with an average number of subjects per study of 507. There were three meta-analyses on this subject before: Houlston (30), reporting an OR of 1.13 (95% CI: 1.04–1.25), Benhamou (31), reporting OR = 1.17 (95% CI: 1.07–1.27) and Ye (32), reporting OR = 1.18 (95% CI: 1.14–1.23). Our meta-analysis gave origin to an OR of 1.15 (95% CI: 1.08–1.23), i.e. very similar to the previous ones. However, we noticed large heterogeneity (P = 0.029, Egger's test). Heterogeneity persisted after stratification for some of the known possible sources of variation, such as ethnicity and control origin.

To clarify the reasons for large heterogeneity, we have undertaken a pooled analysis of the GSEC database. At the Current Data Entry Status (last update––April 17, 2007), there were 167 investigators and 289 studies in GSEC with 116 055 subjects (cases: 49 719, controls: 66 336). Individual data on GSTM1 were used to study whether methodological discrepancies, population-specific effects or bias could explain variations observed in gene–disease association and its interaction with environmental exposure. The quality of environmental data, including smoking, was rather heterogeneous and sometimes poor. Our results suggest that generally sample size is the best proxy for quality of the data, i.e. most qualitative indicators were correlated with sample size. This suggests that the conduct of GWA studies with large consortia may also provide an impetus to improve the quality of the data not only on the genetic measurement side but also regarding environmental exposures. However, we also observed that reasons for data heterogeneity and bias were still not obvious in our analyses, and more parameters on individual studies needed to be captured. Concerning publication bias, a possible approach (Taioli and Vineis, in preparation) is to classify studies according to a comparison between published (meta) analyses and pooled analyses. This leads to four possible combinations: the results in GSEC are larger or lower than in the published studies and the number of subjects is larger or lower in GSEC than in published studies. Publication bias is typically exemplified by combination 4, where GSEC has more studies (or subjects) included in the analysis, but the meta-OR is greater for the published studies.

Conclusions

Assessment of the overall evidence coming from GWA studies remains a complex endeavor. With increasing sample sizes and more carefully conducted studies with strict quality control measures, we are likely to continue unearthing an increasing number of associations that stand very stringent tests of statistical support and have strong epidemiological credibility. Given the rapid evolution of the evidence, keeping track of the status of wide fields of research becomes essential. Well-documented main effects give an excellent starting point to investigate GGI and GEI from a more solid ground than previously possible. Nevertheless, much of the literature still remains fragmented and meanderingly exploratory.

The open challenges we have identified can be grouped into three large categories. The first is how to limit the number of false positives and false negatives. Although several alternatives have been suggested to tackle false positives, generally using a Bayesian approach, there is not yet an agreed standard. Second, once main genetic effects have been concretely documented, the next question is how to proceed with the investigation of GGIs and GEIs. It is possible that what really counts is not the main effect of genes but complex interactions. Finding and interpreting such interactions are not straightforward and, again, no standard is yet available. Finally, continuous updated integration of all evidence, from both old studies and GWA investigations, and careful interpretation of the strength of the evidence are crucial to lead to informative selection of the most credible findings and to set priorities for future research. Improving and homogenizing the minimal acceptable standards of designing, conducting and reporting of genome epidemiology research (31,32) may be useful and large-scale consortia of investigators may be instrumental in setting and improving these standards. At the same time, flexibility of introducing and applying new methods should be allowed since we still have only an incomplete knowledge of the genetic and non-genetic architecture of risk of common diseases.

Funding

European Commission to the Environmental Cancer Risk, Nutrition and Individual Susceptibility Network of Excellence (grant FOOD-CT-2005-513943) (WP4-8).

This is a report of the ECNIS Workshop held in Venice, 12 october 2007.

Conflict of interest statement: None declared.

Appendix

Methods to model GGIs

Extensions to regression analysis.

Automated detection of informative combined effects (DICE) (Tahri-Daizadeh et al. 2003. Genome Res., 13, 1952–1960).

Classification and regression trees (CART)/patterning and recursive partitioning (PRP) (Bastone L et al. 2004. Hum. Hered., 58, 82–92).

Logic regression. 2001. (Kooperberg et al.Genet. Epidemiol., (S1), 626–631).

Penalized logistic regression. 2004. (Zhu et al.Biostatistics, 5, 427–443).

Multivariate adaptive regression spline. 2004. (Cook et al.Stat. Med., 23, 1439–1453).

Data reduction approaches.

Combinatorial partitioning method (CPM). 2001. (Nelson et al.Genome Res.11, 458–470).

Restricted partitioning method (RPM). 2004. (Culverhouse R, Klein T, Shannon W. Genet. Epidemiol.27, 141–152).

Multifactor dimensionality reduction (MDR). 2001. (Ritchie et al., Am. J. Hum. Genet.69, 138–147).

Set association. 2001. (Hoh J, Wille A, and Ott J. Genome Res.11, 2115–2119).

References

1.
Reich
DE
Lander
ES
On the allelic spectrum of human disease
Trends Genet.
 , 
2001
, vol. 
17
 (pg. 
502
-
510
)
2.
International HapMap Consortium
A second generation human haplotype map of over 3.1 million SNPs
Nature
 , 
2007
, vol. 
449
 (pg. 
851
-
861
)
3.
Easton
DF
Pooley
KA
Dunning
AM
, et al.  . 
Genome-wide association study identifies novel breast cancer susceptibility loci
Nature
 , 
2007
, vol. 
447
 (pg. 
1087
-
1093
)
4.
Hunter
DJ
Kraft
P
Jacobs
KB
, et al.  . 
A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer
Nat. Genet.
 , 
2007
, vol. 
39
 (pg. 
870
-
874
)
5.
Stacey
SN
Manolescu
A
Sulem
P
, et al.  . 
Common variants on chromosomes 2q35 and 16q12 confer susceptibility to estrogen receptor-positive breast cancer
Nat. Genet.
 , 
2007
, vol. 
39
 (pg. 
865
-
869
)
6.
Yeager
M
Orr
N
Hayes
RB
, et al.  . 
Genome-wide association study of prostate cancer identifies a second risk locus at 8q24
Nat. Genet.
 , 
2007
, vol. 
39
 (pg. 
645
-
649
)
7.
Gudmundsson
J
Sulem
P
Manolescu
A
, et al.  . 
Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24
Nat. Genet.
 , 
2007
, vol. 
39
 (pg. 
631
-
637
)
8.
Tenesa
A
Farrington
SM
Prendergast
JG
, et al.  . 
Genome-wide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21
Nat. Genet.
 , 
2008
, vol. 
40
 (pg. 
631
-
637
)
9.
Tomlinson
IP
Webb
E
Carvajal-Carmona
L
, et al.  . 
A genome-wide association study identifies colorectal cancer susceptibility loci on chromosomes 10p14 and 8q23.3
Nat. Genet.
 , 
2008
, vol. 
40
 (pg. 
623
-
630
)
10.
The Wellcome Trust Case Control Consortium
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls
Nature
 , 
2007
, vol. 
447
 (pg. 
661
-
678
)
11.
Wacholder
S
Chanock
S
Garcia-Closas
M
El Ghormli
L
Rothman
N
Assessing the probability that a positive report is false: an approach for molecular epidemiology studies
J. Natl Cancer Inst.
 , 
2004
, vol. 
96
 (pg. 
434
-
441
)
12.
Thomas
DC
Clayton
DG
Betting odds and genetic associations
J. Natl Cancer Inst.
 , 
2004
, vol. 
96
 (pg. 
421
-
423
)
13.
Wakefield
J
A Bayesian measure of the probability of false discovery in genetic epidemiology studies
Am. J. Hum. Genet.
 , 
2007
, vol. 
81
 (pg. 
208
-
227
)
14.
Strömberg
U
Björk
J
Broberg
K
Mertens
F
Vineis
P
Selection of influential genetic markers among a large number of candidates based on effect estimation rather than hypothesis testing: an approach for genome-wide association studies
Epidemiology
 , 
2008
, vol. 
19
 
2
(pg. 
302
-
308
)
15.
CHEK2 Breast Cancer Case-Control Consortium
CHEK2*1100delC and susceptibility to breast cancer: a collaborative analysis involving 10,860 breast cancer cases and 9,065 controls from 10 studies
Am. J. Hum. Genet
 , 
2004
, vol. 
74
 (pg. 
1175
-
1182
)
16.
Cybulski
C
Gorski
B
Huzarski
T
, et al.  . 
CHEK2 is a multiorgan cancer susceptibility gene
Am. J. Hum. Genet.
 , 
2004
, vol. 
75
 (pg. 
1131
-
1135
)
17.
Brennan
P
McKay
J
Moore
L
, et al.  . 
Uncommon CHEK2 mis-sense variant and reduced risk of tobacco-related cancers: case control study
Hum. Mol. Genet.
 , 
2007
, vol. 
1615
 (pg. 
1794
-
1801
)
18.
Brennan
P
Lewis
S
Hashibe
M
, et al.  . 
Pooled analysis of alcohol dehydrogenase genotypes and head and neck cancer: a HuGE review
Am. J. Epidemiol.
 , 
2004
, vol. 
159
 (pg. 
1
-
16
)
19.
Hashibe
M
Boffetta
P
Zaridze
D
, et al.  . 
Evidence for an important role of alcohol- and aldehyde-metabolizing genes in cancers of the upper aerodigestive tract
Cancer Epidemiol. Biomarkers Prev.
 , 
2006
, vol. 
15
 (pg. 
696
-
703
)
20.
Breiman
L
Statistical modeling: the two cultures (with comments and a rejoinder by the author)
Stat. Sci.
 , 
2001
, vol. 
16
 (pg. 
199
-
231
)
21.
Chatfield
C
Model uncertainty, data-mining and statistical inference
J. R. Stat. Soc. Ser A
 , 
1995
, vol. 
158
 (pg. 
419
-
466
)
22.
McCullagh
P
Nelder
J
Generalized Linear Models
 , 
1989
2nd edn
London
Chapman and Hall
23.
Lin
B
Clyne
M
Walsh
M
Gomez
O
Yu
W
Gwinn
M
Khoury
MJ
Tracking the epidemiology of human genes in the literature: the HuGE published literature database
Am. J. Epidemiol.
 , 
2006
, vol. 
164
 (pg. 
1
-
4
)
24.
Khoury
MJ
Dorman
JS
The Human Genome Epidemiology Network
Am. J. Epidemiol.
 , 
1998
, vol. 
148
 (pg. 
1
-
3
)
25.
Ioannidis
JPA
Gwinn
M
Little
J
, et al.  . 
A road map for efficient and reliable human genome epidemiology
Nat. Genet.
 , 
2006
, vol. 
38
 (pg. 
3
-
5
)
26.
Ioannidis
JPA
Bernstein
J
Boffetta
P
, et al.  . 
A network of investigator networks in human genome epidemiology
Am. J. Epidemiol.
 , 
2005
, vol. 
162
 (pg. 
302
-
304
)
27.
Bertram
L
McQueen
MB
Mullin
K
Blacker
D
Tanzi
RE
Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database
Nat. Genet.
 , 
2007
, vol. 
39
 (pg. 
17
-
23
)
28.
Ioannidis
JP
Boffetta
P
Little
J
, et al.  . 
Assessment of cumulative evidence on genetic associations: interim guidelines
Int. J. Epidemiol.
 , 
2008
, vol. 
37
 
1
(pg. 
120
-
132
)
29.
Mailman
MD
Feolo
M
Jin
Y
, et al.  . 
The NCBI dbGaP database of genotypes and phenotypes
Nat. Genet.
 , 
2007
, vol. 
39
 (pg. 
1181
-
1186
)
30.
Houlston
RS
Glutathione S-transferase M1 status and lung cancer risk: a meta-analysis
Cancer Epidemiol. Biomarkers Prev.
 , 
1999
, vol. 
8
 (pg. 
675
-
682
)
31.
Benhamou
S
Lee
WJ
Alexandrie
AK
, et al.  . 
Meta- and pooled analyses of the effects of glutathione S-transferase M1 polymorphisms and smoking on lung cancer risk
Carcinogenesis
 , 
2002
, vol. 
23
 (pg. 
1343
-
1350
Erratum in Carcinogenesis 2002; 23, 1771
32.
Ye
Z
Song
H
Higgins
JP
Pharoah
P
Danesh
J
Five glutathione s-transferase gene variants in 23,452 cases of lung cancer and 30,397 controls: meta-analysis of 130 studies
PLoS Med.
 , 
2006
, vol. 
3
 pg. 
e91