Abstract

Gene–environment studies have been motivated by the likely existence of prevalent low-risk genes that interact with common environmental exposures. The present study assessed the statistical advantage of the simultaneous consideration of genes and environment to investigate the effect of environmental risk factors on disease. In particular, we contemplated the possibility that several genes modulate the environmental effect. Environmental exposures, genotypes and phenotypes were simulated according to a wide range of parameter settings. Different models of gene–gene–environment interaction were considered. For each parameter combination, we estimated the probability of detecting the main environmental effect, the power to identify the gene–environment interaction and the frequency of environmentally affected individuals at which environmental and gene–environment studies show the same statistical power. The proportion of cases in the population attributable to the modeled risk factors was also calculated. Our data indicate that environmental exposures with weak effects may account for a significant proportion of the population prevalence of the disease. A general result was that, if the environmental effect was restricted to rare genotypes, the power to detect the gene–environment interaction was higher than the power to identify the main environmental effect. In other words, when few individuals contribute to the overall environmental effect, individual contributions are large and result in easily identifiable gene–environment interactions. Moreover, when multiple genes interacted with the environment, the statistical benefit of gene–environment studies was limited to those studies that included major contributors to the gene–environment interaction. The advantage of gene–environment over plain environmental studies also depends on the inheritance mode of the involved genes, on the study design and, to some extend, on the disease prevalence.

Introduction

Molecular genetic research has been quite successful in identifying genes that cause family cancer syndromes. Rare, highly penetrant mutations in genes such as p53, Rb and BRCA1/2 may result in high individual risks, but their contribution to the population burden of cancer is small (1). This fact, together with the continuing difficulty of identifying low-penetrance genes in a robust, replicable manner, has raised concern about the value of genomic research in cancer prevention (2–6). Most cancers are thought to be multifactorial, i.e. they are probably related to the interactions of multiple genetic and environmental risk factors (7–12). A relatively new field of cancer research has focused on the identification of common genetic alterations that by themselves may not substantially impact risk, but that in concert with environmental exposures may lead to tumor development. Primary candidates have been genes that seem to be mechanistically linked to environmental exposure and cancer (13–15). Because many of the variant forms of these genes are relatively common, the dependence of prevalent environmental effects on the genotype of the individuals may have major impact on the population attributable risk of cancer (15,16).

The assumed biological basis for gene–environment interactions is the joint participation of genes and environment in the same causal mechanism leading to disease. In statistical terms, gene–environment interaction is present when the effect of exposure on disease risk depends on the genotype, or vice versa (17,18). From a practical point of view and despite the complexity of gene–environment interactions, weak environmental effects may emerge more clearly when research is focused on subgroups with heightened susceptibility, thus holding potential in terms of cancer prevention (19). The present study explores the practical advantage of gene–environment over plain environmental studies in power to identify environmental factors that influence the risk of cancer, given the plausibility of gene–environment interactions with weak marginal environmental effects (relative risks of 1.1–2.0). In order to place the simulation in a realistic, multifactorial setting, we consider that cancer susceptibility may be modulated by many genes but only one is investigated.

Methods

Our first gene–environment interaction model (Model I) assumed that exposed individuals were at increased risk only if they carried an allele A (dominant gene, Table I). Exposed carriers are referred to as ‘individuals at increased risk’ in this paper. Non-exposed individuals and exposed individuals who did not carry A are denominated ‘individuals at baseline risk’. In Table I, which also depicts the simulated risk model under recessive inheritance, the baseline disease prevalence is denoted by f and the relative risk for individuals at increased risk by ΦA×E. Let pA represent the frequency of A, the frequency of environmentally exposed individuals by pE, the disease prevalence by κ and the odds ratio (ORs) for individuals at increased risk versus individuals at baseline risk by ORA×E. The population-average OR for exposed versus unexposed individuals was ORE. In this study, we fixed pE and ORE and then examined the relationship between pA and ORA×E. The proportion of cases attributable to the interaction of the gene A and the environmental exposure (PAFA×E) was also investigated; the formulas used for the calculations are presented in the Appendix.

Table I.

Risks of disease under gene–environment interaction Model I

 Interacting gene 
 aa aA AA 
Dominant gene    
    Exposed A×E A×E 
    Unexposed 
Recessive gene    
    Exposed f ΦA×E 
    Unexposed 
 Interacting gene 
 aa aA AA 
Dominant gene    
    Exposed A×E A×E 
    Unexposed 
Recessive gene    
    Exposed f ΦA×E 
    Unexposed 

f = baseline risk. ΦA×E = risk ratio for individuals at increased risk versus individuals at baseline risk.

In order to explore the statistical advantage of gene–environment studies, environmental exposures, genotypes and phenotypes were simulated according to reasonable parameter combinations (pE = 5–50%, κ = 0.1% or 10%). For illustration, the percentage of smoking men in Sweden in the 70s was ∼40%, but much lower exposures frequencies are found in most occupational studies. Although the cumulative incidence of colorectal cancer in Sweden is 8%, the corresponding incidences for anal or bone cancers are around 0.1% (20,21). Since most association studies focus on genes for which the minor allele frequency is 5% or higher, we considered pA ≥ 0.05. Hardy–Weinberg equilibrium, independence of genotype and environmental exposure and a sample size of 1000 cases plus 1000 controls were assumed. Relatively weak environmental effects (ORE between 1.1 and 2.0) were considered, since environmental factors with strong effects would result in a high power and less interesting scenarios. For each parameter combination, 100 000 data sets were simulated and they were analyzed using two different approaches. In the first approach, the association of exposure with disease was assessed by a classical case–control study. The power to detect the environmental effect was estimated as the proportion of data sets that resulted in a significant association at the 5% confidence level (two-sided test). In the second approach, we used the test proposed by Breslow and Day to compare the ORE among carriers of A with the ORE for non-carriers (22). The power to detect the dependence of the environmental effect on the individual genotype was calculated as the percentage of data sets that resulted in rejection of the null hypothesis ‘the two ORs are homogeneous’ at the 5% confidence level (the alternative hypothesis was two sided). Note that this test is statistically equivalent to standard unconditional logistic regression when the model includes a main environmental effect, a main genetic effect and their interaction, and the hypothesis of interest is the absence of interaction. Alternatively, the gene–environment interaction was also assessed using a case-only design.

To accommodate the multifactorial etiology of cancer, the Model I was augmented with a second, dominant gene B. The susceptibility gene B represents a gene that is not examined in the case–control study, but the effect of B can also be interpreted as the sum of all genetic effects that remain unexplored. Table II shows the different gene–gene–environment interaction models that were considered; the new parameters in Table II included the relative risk for exposed carriers of BB×E) and the relative risk for carriers of B in the absence of interaction between exposure and BB). In Model II, unexposed individuals did not show increased risks, independently of their genotype. Unexposed carriers of B showed increased risks in Model III. The interaction of A and B was multiplicative (Model A) or additive (Model B). We assumed that the genes A and B were unlinked and in Hardy–Weinberg equilibrium. The relative contribution of each gene to the overall gene–environment interaction was assessed by investigating the relationship between PAFA×E and the proportion of cases attributable to the interaction of environmental exposure and gene B (PAFB×E). Calculations are described in the Appendix. All simulations were implemented using random numbers generated by the SAS function RANUNI and statistical analyses were carried out in SAS version 9.1.

Table II.

Gene–gene–environment interaction models

 Unknown genes Studied gene 
 aa aA AA 
Model IIA     
    Exposed bb A×E A×E 
 bB B×E A×EΦB×E A×EΦB×E 
 BB B×E A×EΦB×E A×EΦB×E 
    Unexposed bb 
 bB 
 BB 
Model IIB     
    Exposed bb A×E A×E 
 bB B×E f(ΦA×E + ΦB×E − 1) f(ΦA×E + ΦB×E − 1) 
 BB B×E f(ΦA×E + ΦB×E − 1) f(ΦA×E + ΦB×E − 1) 
    Unexposed bb 
 bB 
 BB 
Model IIIA     
    Exposed bb A×E A×E 
 bB B A×EΦB A×EΦB 
 BB B A×EΦB A×EΦB 
    Unexposed bb 
 bB B B B 
 BB B B B 
Model IIIB     
    Exposed bb A×E A×E 
 bB B f(ΦA×E + ΦB − 1) f(ΦA×E + ΦB − 1) 
 BB B f(ΦA×E + ΦB − 1) f(ΦA×E + ΦB − 1) 
    Unexposed bb 
 bB B B B 
 BB B B B 
 Unknown genes Studied gene 
 aa aA AA 
Model IIA     
    Exposed bb A×E A×E 
 bB B×E A×EΦB×E A×EΦB×E 
 BB B×E A×EΦB×E A×EΦB×E 
    Unexposed bb 
 bB 
 BB 
Model IIB     
    Exposed bb A×E A×E 
 bB B×E f(ΦA×E + ΦB×E − 1) f(ΦA×E + ΦB×E − 1) 
 BB B×E f(ΦA×E + ΦB×E − 1) f(ΦA×E + ΦB×E − 1) 
    Unexposed bb 
 bB 
 BB 
Model IIIA     
    Exposed bb A×E A×E 
 bB B A×EΦB A×EΦB 
 BB B A×EΦB A×EΦB 
    Unexposed bb 
 bB B B B 
 BB B B B 
Model IIIB     
    Exposed bb A×E A×E 
 bB B f(ΦA×E + ΦB − 1) f(ΦA×E + ΦB − 1) 
 BB B f(ΦA×E + ΦB − 1) f(ΦA×E + ΦB − 1) 
    Unexposed bb 
 bB B B B 
 BB B B B 

f = baseline risk. ΦA×E = risk ratio for exposed carriers of A versus unexposed ‘aabb’ individuals. ΦB×E = risk ratio for exposed carriers of B versus unexposed ‘aabb’ individuals. ΦB = risk ratio for carriers of B versus unexposed ‘aabb’ individuals.

Results

The different types of studies considered in the article are illustrated in Table III. The example assumes that exposed carriers of the allele A are at an increased risk of disease. A plain environmental study based on 1000 cases and 1000 controls (Table IIIA) would fail to detect the environmental effect (ORE = 1.17, 95% CI 0.88–1.55, P = 0.28). In contrast, a case–control study considering both environmental exposure and genotype (Table IIIB) would be able to identify the environmental effect among carriers of the allele A, and also the heterogeneous effect of the environmental exposure for carriers compared with non-carriers (P value for homogeneity test < 0.001). Similarly, a case-only study taking into account simultaneously environmental exposure and genotype (Table IIIC) would identify a significantly increased risk of exposure for cases who carry the allele A versus affected non-carriers (OR = 3.52, 95% CI 2.35–5.20, P < 0.001).

Table III.

Illustration of the types of study considered in the article

Exemplary distribution of cases and controls according to environmental exposure and genotype 
Exposure status Exposed Unexposed Total  
Genotype AA/Aa aa AA/Aa aa   
Cases 52 63 168 717 1000  
Controls 19 81 171 729 1000  
(A) Case–control study assessing only the environmental effecta 
Exposure status Exposed Unexposed Total  
Cases 115 885 1000  
Controls 100 900 1000  
       
(B) Case–control study assessing both environmental exposure and genotypeb 
Individuals ‘aa’ Exposed Unexposed  Individuals ‘AA/Aa’ Exposed Unexposed 
Cases 63 717  Cases 52 168 
Controls 81 729  Controls 19 171 
       
(C) Case–only study considering both environmental exposure and genotypec 
Exposure status Genotype     
AA/Aa aa     
    Exposed 52 63     
    Unexposed 168 717     
Exemplary distribution of cases and controls according to environmental exposure and genotype 
Exposure status Exposed Unexposed Total  
Genotype AA/Aa aa AA/Aa aa   
Cases 52 63 168 717 1000  
Controls 19 81 171 729 1000  
(A) Case–control study assessing only the environmental effecta 
Exposure status Exposed Unexposed Total  
Cases 115 885 1000  
Controls 100 900 1000  
       
(B) Case–control study assessing both environmental exposure and genotypeb 
Individuals ‘aa’ Exposed Unexposed  Individuals ‘AA/Aa’ Exposed Unexposed 
Cases 63 717  Cases 52 168 
Controls 81 729  Controls 19 171 
       
(C) Case–only study considering both environmental exposure and genotypec 
Exposure status Genotype     
AA/Aa aa     
    Exposed 52 63     
    Unexposed 168 717     
a

The estimated OR of disease for exposed versus unexposed individuals is ORE = 1.17 (95% CI 0.88–1.55), P = 0.28.

b

ORE_aa = 0.79 (95% CI 0.56–1.12). ORE_AA/Aa = 2.79 (95% CI 1.58–4.91). ORE_aa and ORE_AA/Aa are heterogeneous (homogeneity test P < 0.001).

c

The estimated OR of exposure for affected ‘AA/aa’ versus affected ‘aa’ individuals is OR = 3.52 (95% CI 2.35–5.20), P < 0.001.

Let contemplate first a plain environmental study as depicted in Table IIIA. Figure 1 shows the proportion of cases attributable to the environmental exposure as a function of ORE, pE and κ. The figure illustrates the limitations in statistical power of a case–control environmental study based on 1000 cases plus 1000 controls (type I error 5%). When ORE = 1.3, pE = 10% and κ = 10%, the power to identify the environmental effect was 47% and the PAFE was 2.63%. If ORE = 1.3, pE = 20% and κ = 10%, the power to identify an environmental effect which explained as much as 5.12% of the cases was only 69%. The situation was slightly worse for rare diseases: if ORE = 1.3, pE = 20% and κ = 0.1%, the power was 69% and PAFE = 5.66%. Note that the PAFE is unequivocally determined by pE, κ and the population-average ORE. In other words, once pE and ORE have been fixed, the PAFE does not depend on the proportion of individuals at increased risk among all exposed individuals.

Fig. 1.

Proportion of cases attributable to the exposure (PAFE) as a function of the exposure OR (ORE), the frequency of exposed individuals (pE) and the disease prevalence (thin curves κ = 0.1%, thick curves κ = 10%). The figure illustrates the power limitations of a case–control environmental study based on 1000 cases plus 1000 controls (type I error 5%). The power is <0.8 to the left of the power curve (dashed lines).

Fig. 1.

Proportion of cases attributable to the exposure (PAFE) as a function of the exposure OR (ORE), the frequency of exposed individuals (pE) and the disease prevalence (thin curves κ = 0.1%, thick curves κ = 10%). The figure illustrates the power limitations of a case–control environmental study based on 1000 cases plus 1000 controls (type I error 5%). The power is <0.8 to the left of the power curve (dashed lines).

Model I assumes that the risk of environmental exposure is only noted in carriers of at least one copy (dominant gene) or two copies (recessive gene) of the allele A. Figure 2, based on Model I, defines the relationship between the frequency of the allele A and the OR for individuals at an increased risk versus individuals at a baseline risk. The ORA×E increases with decreasing pA. For example, if the gene was dominant, ORE = 1.3, pE = 10%, κ = 0.1% and pA = 0.1, the proportion of individuals at increased risk was pE[1 − (1 − pA)2] = 0.1 (1 − 0.92) = 0.019. In this situation, the simulation gives an ORA×E = 2.81 when the overall population environmental effect was ORE = 1.3.

Fig. 2.

OR for individuals at increased risk versus individuals at baseline risk (ORA×E) as a function of the frequency of the interacting allele (pA) and the inheritance mode. The assumed parameters were: environmental OR, ORE = 1.3, frequency of exposed individuals pE = 10% and disease prevalence κ = 10%.

Fig. 2.

OR for individuals at increased risk versus individuals at baseline risk (ORA×E) as a function of the frequency of the interacting allele (pA) and the inheritance mode. The assumed parameters were: environmental OR, ORE = 1.3, frequency of exposed individuals pE = 10% and disease prevalence κ = 10%.

With decreasing pA, the increase in the magnitude of the ORA×E is accompanied by an increase in its variance, so it is not obvious what will happen to the power of the interaction test. Figure 3 shows the relationship between the frequency of allele A and the power to detect the dependence of ORE on genotype, according to inheritance mode and study design (compare with Table IIIB and C). The power increased with decreasing pA, i.e. when few carriers of risk genotypes contributed to the overall environmental effect, individual contributions were large and resulted in easily identifiable gene–environment interactions. For example, when the gene was dominant, the design was a classical case–control study, the population-average ORE was 1.3, pE = 10%, κ = 0.1% and pA = 0.1, the power to detect a significant ORA×E was 85.6%. Remember that the power of a plain environmental study with ORE = 1.3 and pE = 10% was only 47%. Our simulation results in Figure 3 are practically identical to those provided by the QUANTO program for power determination in gene–environment interaction studies (23). In order to present results from gene–environment studies in parallel to environmental main effect analyses, we determined the allele frequency at which environmental and gene–environment studies show the same statistical power (see arrows). For example, when pA was under 0.24 in Figure 3A, the power of a gene–environment case–control study was higher than the power of a plain environmental study. These ‘threshold allele frequencies’ should help to characterize models and effect sizes where gene–environment interactions improve the detection of environmental risk factors.

Fig. 3.

Dependence of the power to detect the gene–environment interaction on the frequency of the interacting allele (pA) under dominant (A) or recessive (B) inheritance. The assumed parameters were: ORE = 1.3, pE = 10%, κ = 10%, sample size 1000 cases (plus 1000 controls) and type I error 0.05. Full lines indicate that the power to detect the gene–environment interaction was higher than the power to detect the environmental effect, the opposite was represented by dotted lines. The allele frequencies at which the power of the environmental study equals that of the gene–environment study are denoted by ‘threshold pA’.

Fig. 3.

Dependence of the power to detect the gene–environment interaction on the frequency of the interacting allele (pA) under dominant (A) or recessive (B) inheritance. The assumed parameters were: ORE = 1.3, pE = 10%, κ = 10%, sample size 1000 cases (plus 1000 controls) and type I error 0.05. Full lines indicate that the power to detect the gene–environment interaction was higher than the power to detect the environmental effect, the opposite was represented by dotted lines. The allele frequencies at which the power of the environmental study equals that of the gene–environment study are denoted by ‘threshold pA’.

Figure 4 shows the relationship between the allele frequency at which the power of the environmental study equals that of the gene–environment study (threshold pA) and the ORE. The assumed parameters in the reference scenario were pE = 5%, κ = 10%, case–control study with 1000 cases and 1000 controls, dominant inheritance and type I error 0.05. The presented results are restricted to a power of the environmental study under 90%. If the threshold pA was <0.2 under the reference scenario, gene–environment studies showed a statistical advantage over environmental studies. The threshold pA decreased with increasing size of the overall environmental effect. The threshold pA was practically independent of exposure frequency (Figure 4A) and sample size (Figure 4C). The threshold pA slightly increased with increasing disease prevalence (Figure 4B). Gene–environment studies are particularly interesting when recessive genes are involved (Figure 4D). The threshold pA for the case-only study was higher than for the case–control study (Figure 4E).

Fig. 4.

Relationship between the allele frequency at which the power of the environmental study equals that of the gene–environment study (threshold pA) and the ORE. The assumed parameters in the reference scenario were exposure frequency pE = 5%, disease prevalence κ = 10%, case–control study with 1000 cases and 1000 controls, dominant inheritance and type I error 0.05. The series shows the dependence of the threshold pA on exposure frequency (A), disease prevalence (B), sample size (C), inheritance mode (D) and study design (E). Results are only presented for a power of the environmental study under 90%.

Fig. 4.

Relationship between the allele frequency at which the power of the environmental study equals that of the gene–environment study (threshold pA) and the ORE. The assumed parameters in the reference scenario were exposure frequency pE = 5%, disease prevalence κ = 10%, case–control study with 1000 cases and 1000 controls, dominant inheritance and type I error 0.05. The series shows the dependence of the threshold pA on exposure frequency (A), disease prevalence (B), sample size (C), inheritance mode (D) and study design (E). Results are only presented for a power of the environmental study under 90%.

The possible interaction of environmental risk factors with not only one but multiple genes motivated the analysis of the relative contribution of each gene to the overall PAFE. Figure 5 represents the relationship between PAFA×E and PAFB×E for ORE = 1.3, pE = 10% and κ = 10% (overall PAFE = 2.63%). When the unknown gene B did not interact with the environment (Model III), PAFA×E did not depend on PAFB×E. If B interacted with the environment (Model II), the influence of PAFB×E on PAFA×E was more important when the interaction of A and B was additive, than when it was multiplicative. For example, under the simulated scenario, a PAFB×E = 1% reduced the PAFA×E to 1.56% (additive interaction) compared with PAFA×E = 1.71% (multiplicative interaction). These results imply that, for a fixed ORE, the power to detect the interaction of the studied gene A and the environmental factor also depends on the likely existence of additional genes that interact with the environment.

Fig. 5.

Proportion of cases attributable to the interaction of the studied gene A and the environmental exposure (PAFA×E) versus the corresponding proportion for the unknown genes(s) B (PAFB×E), according to different models of gene–gene–environment interaction. The magnitude of the environmental effect was fixed to pE = 10% and ORE = 1.3; the corresponding overall PAFE was 2.63%.

Fig. 5.

Proportion of cases attributable to the interaction of the studied gene A and the environmental exposure (PAFA×E) versus the corresponding proportion for the unknown genes(s) B (PAFB×E), according to different models of gene–gene–environment interaction. The magnitude of the environmental effect was fixed to pE = 10% and ORE = 1.3; the corresponding overall PAFE was 2.63%.

Figure 6 represents the power to identify the dependence of ORE on genotype when, in addition to the studied gene, other genes interact with the environment. Results are based on a multiplicative interaction between the genes A and B (Model IIA). The dotted lines indicate that the power of the gene–environmental study is lower than the power of the plain environmental study. The results showed that the threshold pA decreases when other than the investigated gene A increase the susceptibility to environmental exposure. For example, if the PAFB×E was 1% under the assumed scenario, a power to detect the interaction of the gene A and the environment higher than 80% was only reached for allele frequencies under 0.1.

Fig. 6.

Dependence of the power to detect the gene–environment interaction on the frequency of studied allele (pA), according to PAFB×E under Model IIA. The assumed parameters were: ORE = 1.3, pE = 10%, κ = 10%, sample size 1000 cases (plus 1000 controls) and type I error 0.05. Full curves indicate that the power to detect the gene–environment interaction was higher than the power to detect the environmental effect, the opposite is represented by dotted lines. Thin curves represent case-only studies, and thick curves represent case–control studies.

Fig. 6.

Dependence of the power to detect the gene–environment interaction on the frequency of studied allele (pA), according to PAFB×E under Model IIA. The assumed parameters were: ORE = 1.3, pE = 10%, κ = 10%, sample size 1000 cases (plus 1000 controls) and type I error 0.05. Full curves indicate that the power to detect the gene–environment interaction was higher than the power to detect the environmental effect, the opposite is represented by dotted lines. Thin curves represent case-only studies, and thick curves represent case–control studies.

Discussion

The present study investigated the practical advantage of gene–environment over plain environmental studies in the identification of environmental risk factors. In particular, we simulated a scenario where only individuals with certain genotypes were at increased risk due to environmental exposure, and then compared the statistical power with and without consideration of genotypes (see Table III for illustration). We found that environmental exposures with weak effects, and therefore hardly identifiable, may account for a significant proportion of the population prevalence of the disease. The data indicated that gene–environment studies have a higher power than plain environmental studies when the involved variants are rare, they show recessive inheritance and the genes included in the study are the most important variants responsible for the gene–environment interaction. The benefit of gene–environment over environmental studies depended also on the study design and, to some extend, on the disease prevalence. In contrast, the role of exposure frequency and sample size seemed to be small.

Compromises were made in order to keep the study as simple as possible. Reasonable values were adopted for some parameters (exposure OR, disease prevalence, sample size …). A multiplicative interaction between genes and environment, unlinked genes and independence of genotype and environmental exposure were assumed. Alternative designs (family based, counter-matching, incomplete-data case–control) have been proposed to increase the power to detect gene–environment interactions (24–31). The most appropriate approach will depend on characteristics of the disease, parameters of the associated risk factors and data availability (31). Although these designs were not explored in the present study, we note that the present conclusions hold equally under improved approaches. Inaccurate measurement of exposures may lead to underestimation of environmental effects (32,33), but measurement errors were not taken into account. Instead of deriving closed formulas for power calculations, we used simulation techniques, since this approach is easily generalized to more complex scenarios (e.g. a trio design with one environmental factor interacting with two genes). The feasibility of gene–environment studies should not be explored without considering first the problem of multiple comparisons for genetic and environmental factors separately. However, we refer to a recent paper with a short introduction to this issue (34). The use of standard statistical techniques and the consideration of the multifactorial etiology of cancer are important advantages of the present study.

In addition to gene–environment studies, Mendelian randomization has been also proposed as a genetic tool to boost the identification of modifiable causes of cancer (35–37). The background of Mendelian randomization is that, if a genetic variant reflects the biological function of an environmental risk factor, the relationship between environmental exposure and disease could be assessed through the analysis of the association of the genotype with disease risk. Both approaches, gene–environment and Mendelian randomization, try to control for confounding of the environmental effect: gene–environment studies try to identify the individual genotypes particularly affected by the environmental exposure and Mendelian randomization attempts to homogenize the exposure level based on the genotype (38). Another parallelism of the two designs is that they are limited by the present knowledge of biologic pathways and gene function. In gene–environment studies, this limitation could be circumvented by a genome-wide approach or, alternatively, by stratifying environmental risks on family history (25). Data concerning the interaction of environmental exposures and family history of cancer are emerging, e.g. family history seems to modify the association between obesity in early adolescence and subsequent risk of breast cancer (39).

Gene–environment studies may search for new causes of disease (when the effects of the tested genes, environmental exposures or both are unknown), or they may explore the mechanisms of cellular action of established environmental factors, such as smoking. Dissection of gene–environment interactions has been seen as a great chance for advancing the etiological understanding of cancer, but the true progress has been limited (34). The justification of gene–environment studies was the predicted large population impact of the common variants (15,16). However, the results from the present study, taking into account the multifactorial etiology of common cancers, indicate that the advantage of gene–environment over plain environmental studies is limited to rare variants, thus substantially negating the added value of gene–environment studies. These data are opposite to the power calculations for genotyping studies, which show that relatively common variants afford the highest power (8,11) and, accordingly, guide selection of the tested variants, in practice to those with allele frequencies over 10%. Further complications of gene–environment studies are the low precision of categorization based on environmental exposure, compared with classification on genotypes, and the possibility of biased results due to dependence of genotype and environmental exposures (40).

In conclusion, the statistical advantage of gene–environment over environmental studies seems to be limited to situations where few individuals show increased susceptibility to the environmental exposure. This limitation is particularly important when multiple genes interact with the environment, which is probably always the case. It has been hoped that genomic research would play an important role in the understanding of disease development, with the subsequent implications in disease prevention and public health. It would therefore be important for the field to provide a proof-of-principle which could be demonstrated in the study of interactions of known environmental risk factors and the related candidate genes.

Abbreviations

    Abbreviations
  • OR

    odds ratio

The study was supported by Deutsche Krebshilfe, the Swedish Cancer Society and the EU, LSHC-CT-2004-503465.

Conflict of Interest Statement: None declared.

Appendix

This appendix describes the calculation of the population-average ORE and the PAFA×E. Only one environmental risk factor and two multiplicatively interacting dominant susceptibility genes are considered (see Table II, Model IIA), but the formulas can be easily modified to accommodate other scenarios. Unlinked genes and independence of genotype and environmental exposure are assumed. Lets represent the frequency of the allele A by pA, the frequency of the allele B by pB, the frequency of environmentally exposed individuals by pE, the baseline disease prevalence by f, the relative risk for exposed carriers of A by ΦA×E and the relative risk for exposed carriers of B by ΦB×E. The probability that an individual is exposed (E = 1), has genotype ‘aabb’ (G = aabb) and is affected (D = 1) is: 

graphic

Similarly, 

graphic

The prevalence of the disease in the population is then: 

graphic
and the distribution of cases according to exposure status and genotype is given by: 
graphic

Analogue calculations permit to derive the distribution of controls (D = 0). The expected population-average ORE is then: 

graphic

If the allele A were not present in the population, the disease prevalence would be: 

graphic
and the PAFA×E is: 
graphic

References

1.
Ponder
BA
Cancer genetics
Nature
 , 
2001
, vol. 
411
 (pg. 
336
-
341
)
2.
Khoury
MJ
, et al.  . 
Do we need genomic research for the prevention of common diseases with environmental causes?
Am. J. Epidemiol.
 , 
2005
, vol. 
161
 (pg. 
799
-
805
)
3.
Weiss
KM
, et al.  . 
How many diseases does it take to map a gene with SNPs?
Nat. Genet.
 , 
2000
, vol. 
26
 (pg. 
151
-
157
)
4.
Terwilliger
JD
, et al.  . 
Confounding, ascertainment bias, and the blind quest for a genetic ‘fountain of youth’
Ann. Med.
 , 
2003
, vol. 
35
 (pg. 
532
-
544
)
5.
Merikangas
KR
, et al.  . 
Genomic priorities and public health
Science
 , 
2003
, vol. 
302
 (pg. 
599
-
601
)
6.
Blangero
J
Localization and identification of human quantitative trait loci: king harvest has surely come
Curr. Opin. Genet. Dev.
 , 
2004
, vol. 
14
 (pg. 
233
-
240
)
7.
Burton
PR
, et al.  . 
Key concepts in genetic epidemiology
Lancet
 , 
2005
, vol. 
366
 (pg. 
941
-
951
)
8.
Pharoah
PD
, et al.  . 
Association studies for finding cancer-susceptibility genetic variants
Nat. Rev. Cancer
 , 
2004
, vol. 
4
 (pg. 
850
-
860
)
9.
Glazier
AM
, et al.  . 
Finding genes that underlie complex traits
Science
 , 
2002
, vol. 
298
 (pg. 
2345
-
2349
)
10.
Guttmacher
AE
, et al.  . 
Genomic medicine—a primer
N. Engl. J. Med.
 , 
2002
, vol. 
347
 (pg. 
1512
-
1520
)
11.
Hirschhorn
JN
, et al.  . 
Genome-wide association studies for common diseases and complex traits
Nat. Rev. Genet.
 , 
2005
, vol. 
6
 (pg. 
95
-
108
)
12.
Wang
WY
, et al.  . 
Genome-wide association studies: theoretical and practical concerns
Nat. Rev. Genet.
 , 
2005
, vol. 
6
 (pg. 
109
-
118
)
13.
Doll
R
Epidemiological evidence of the effects of behaviour and the environment on the risk of human cancer
Recent Results Cancer Res.
 , 
1998
, vol. 
154
 (pg. 
3
-
21
)
14.
Hussain
SP
, et al.  . 
Molecular epidemiology and carcinogenesis: endogenous and exogenous carcinogens
Mutat. Res.
 , 
2000
, vol. 
462
 (pg. 
311
-
322
)
15.
Brennan
P
Gene-environment interaction and aetiology of cancer: what does it mean and how can we measure it?
Carcinogenesis
 , 
2002
, vol. 
23
 (pg. 
381
-
387
)
16.
Perera
FP
, et al.  . 
Molecular epidemiology: recent advances and future directions
Carcinogenesis
 , 
2000
, vol. 
21
 (pg. 
517
-
524
)
17.
Khoury
M
, et al.  . 
Genetic Epidemiology
 , 
1993
Oxford
Oxford University Press
18.
Rothman
K
, et al.  . 
Modern Epidemiology
 , 
1998
Philadelphia
Lippincott Willimas & Wilkins
19.
Millikan
R
, et al.  . 
Studying environmental influences and breast cancer risk: suggestions for an integrated population-based approach
Breast Cancer Res. Treat.
 , 
1995
, vol. 
35
 (pg. 
79
-
89
)
20.
 
Swedish Cancer Registry. (2007) Cancer incidence in Sweden 2005. Center for Epidemiology. National Board of Health and Welfare, Stockholm, Sweden
21.
 
Available from: http://www.sos.se/epc/epceng.htm, (8 May 2007 date last accessed)
22.
Breslow
NE
, et al.  . 
Statistical methods in cancer research. Volume I—the analysis of case-control studies
IARC Sci. Publ.
 , 
1980
(pg. 
5
-
338
)
23.
Gauderman
W
, et al.  . 
QUANTO 1.1: a computer program for power and sample size calculations for genetic-epidemiology studies
2006
 
Available from: http://hydra.usc.edu/gxe. (8 May 2007 date last accessed)
24.
Laird
NM
, et al.  . 
Family-based designs in the age of large-scale gene-association studies
Nat. Rev. Genet.
 , 
2006
, vol. 
7
 (pg. 
385
-
394
)
25.
Hopper
JL
, et al.  . 
Population-based family studies in genetic epidemiology
Lancet
 , 
2005
, vol. 
366
 (pg. 
1397
-
1406
)
26.
Khoury
MJ
, et al.  . 
Nontraditional epidemiologic approaches in the analysis of gene-environment interaction: case-control studies with no controls!
Am. J. Epidemiol.
 , 
1996
, vol. 
144
 (pg. 
207
-
213
)
27.
Weinberg
CR
, et al.  . 
Choosing a retrospective design to assess joint genetic and environmental contributions to risk
Am. J. Epidemiol.
 , 
2000
, vol. 
152
 (pg. 
197
-
203
)
28.
Rosenbaum
PR
The case-only odds ratio as a causal parameter
Biometrics
 , 
2004
, vol. 
60
 (pg. 
233
-
240
)
29.
Gauderman
WJ
Sample size calculations for matched case-control studies of gene-environment interaction
Statistics Med.
 , 
2002
, vol. 
21
 (pg. 
35
-
50
)
30.
Andrieu
N
, et al.  . 
Counter-matching in studies of gene-environment interaction: efficiency and feasibility
Am. J. Epidemiol.
 , 
2001
, vol. 
153
 (pg. 
265
-
274
)
31.
Goldstein
AM
, et al.  . 
Detection of interaction involving identified genes: available study designs
J. Natl. Cancer Inst. Monogr.
 , 
1999
(pg. 
49
-
54
)
32.
Vineis
P
A self-fulfilling prophecy: are we underestimating the role of the environment in gene-environment interaction research?
Int. J. Epidemiol.
 , 
2004
, vol. 
33
 (pg. 
945
-
946
)
33.
Rothman
N
, et al.  . 
The impact of misclassification in case-control studies of gene-environment interactions
IARC Sci. Publ.
 , 
1999
, vol. 
148
 (pg. 
89
-
96
)
34.
Hemminki
K
, et al.  . 
Gene-environment interactions in cancer: do they exist?
Ann. N. Y. Acad. Sci.
 , 
2006
, vol. 
1076
 (pg. 
137
-
148
)
35.
Davey Smith
G
, et al.  . 
What can Mendelian randomisation tell us about modifiable behavioural and environmental exposures?
Br. Med. J.
 , 
2005
, vol. 
330
 (pg. 
1076
-
1079
)
36.
Gray
R
, et al.  . 
How to avoid bias when comparing bone marrow transplantation with chemotherapy
Bone Marrow Transplant
 , 
1991
, vol. 
7
 
suppl. 3
(pg. 
9
-
12
)
37.
Little
J
, et al.  . 
Mendelian randomisation: a new spin or real progress?
Lancet
 , 
2003
, vol. 
362
 (pg. 
930
-
931
)
38.
Davey Smith
G
, et al.  . 
Genetic epidemiology and public health: hope, hype, and future prospects
Lancet
 , 
2005
, vol. 
366
 (pg. 
1484
-
1498
)
39.
Cerhan
JR
, et al.  . 
Interaction of adolescent anthropometric characteristics and family history on breast cancer risk in a Historical Cohort Study of 426 families (USA)
Cancer Causes Control
 , 
2004
, vol. 
15
 (pg. 
1
-
9
)
40.
Vineis
P
, et al.  . 
Issues of design and analysis in studies of gene-environment interactions
IARC Sci. Publ.
 , 
2004
(pg. 
417
-
435
)