Evidence for large-scale gene-by-smoking interaction effects on pulmonary function

Abstract Background: Smoking is the strongest environmental risk factor for reduced pulmonary function. The genetic component of various pulmonary traits has also been demonstrated, and at least 26 loci have been reproducibly associated with either FEV1 (forced expiratory volume in 1 second) or FEV1/FVC (FEV1/forced vital capacity). Although the main effects of smoking and genetic loci are well established, the question of potential gene-by-smoking interaction effect remains unanswered. The aim of the present study was to assess, using a genetic risk score approach, whether the effect of these 26 loci on pulmonary function is influenced by smoking. Methods: We evaluated the interaction between smoking exposure, considered as either ever vs never or pack-years, and a 26-single nucleotide polymorphisms (SNPs) genetic risk score in relation to FEV1 or FEV1/FVC in 50 047 participants of European ancestry from the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) and SpiroMeta consortia. Results: We identified an interaction (βint = –0.036, 95% confidence interval, –0.040 to –0.032, P = 0.00057) between an unweighted 26 SNP genetic risk score and smoking status (ever/never) on the FEV1/FVC ratio. In interpreting this interaction, we showed that the genetic risk of falling below the FEV1/FVC threshold used to diagnose chronic obstructive pulmonary disease is higher among ever smokers than among never smokers. A replication analysis in two independent datasets, although not statistically significant, showed a similar trend in the interaction effect. Conclusions: This study highlights the benefit of using genetic risk scores for identifying interactions missed when studying individual SNPs and shows, for the first time, that persons with the highest genetic risk for low FEV1/FVC may be more susceptible to the deleterious effects of smoking.


Introduction
Spirometric measures of pulmonary function, such as the forced expiratory volume in 1 second (FEV 1 ) or its ratio with the forced vital capacity (FEV 1 /FVC), form the basis of the diagnosis of chronic obstructive pulmonary disease (COPD). [1][2][3] Pulmonary function measures are also used clinically to monitor severity and control of asthma and other respiratory diseases and are independent risk factors for mortality. [1][2][3] Pulmonary function is strongly influenced by cigarette smoking and by multiple low-penetrance genetic variants. Indeed, genome-wide association studies (GWAS) of marginal genetic effects (i.e. not including interaction effects between genetic variants and smoking) have identified at least 26 loci associated with FEV 1 or FEV 1 /FVC in the general population. 4 However, the interplay between genetic factors and environmental exposures has not been well

Key Messages
• Spirometric measures of pulmonary function are influenced by both smoking and genetics. This paper reports a genetic risk score-by-ever smoking interaction on FEV 1 /FVC (forced expiratory volume in 1 second/forced vital capacity).
• In individuals of European ancestry, the reduction in FEV 1 /FVC as a result of smoking was greater among individuals who are genetically predisposed to lower FEV 1 /FVC ratio.
• Genetic risk score-by-ever smoking interaction can allow the identification of subgroups in the population whose genetic background makes them more susceptible to the deleterious effects of smoking. established for pulmonary function or its associated traits. More broadly, although considerable efforts have been made to identify interaction effects between genetic variants and environmental exposures across the wide range of human traits and diseases, 5,6 such investigations have been mostly unsuccessful in detecting robust gene-environment interactions. 5,7 The well-established effect of cigarette smoking on numerous human health outcomes 8 makes it a serious candidate for identification of novel gene-environment interactions, especially for pulmonary traits.
Hypothesizing the presence of single nucleotide polymorphism (SNP)-by-smoking interaction, Hancock et al. 9 performed a genome-wide interaction study of pulmonary function, modelling single SNP main effects and their interactions with smoking in 50 047 participants of European ancestry across 19 studies within the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) 10 and SpiroMeta consortia 11 -the largest genome-wide interaction study of pulmonary function as modified by smoking to date. However, rather than focusing on the interaction effects per se, they performed a meta-analysis of the joint test of SNP main effects and SNP-by-smoking interaction effects to improve power for identifying genetic variants associated with pulmonary function. 12,13 Although they reported new candidate variants based on this joint test, the study did not identify any SNPs with genome-wide significant interaction with smoking.
Here, we explored gene-by-smoking interaction effects limited to genetic variants previously found to be associated with pulmonary function in standard marginal effects GWAS, 4 therefore not including the new variants reported by Hancock et al. 9 based on the joint test of main effects plus interaction. Specifically, we aimed to determine whether smoking modifies the effect of established genetic variants when considered singly or in combination using a genetic risk score summarizing the genetic predisposition to abnormal pulmonary function. The primary motivation for using genetic risk score is statistical power. 14,15 Indeed, several genetic risk score-by-exposure interactions have already been identified in cases where single SNPs did not show evidence for statistically significant interactions. [16][17][18][19][20][21] Genetic risk score-by-exposure interaction testing expands on the principle of omnibus test while leveraging the assumption that, for a given choice of coded alleles, most interaction effects will have the same direction. This is similar to burden tests that have been widely used for rare variant analysis 22 where a single parameter can accumulate evidence for association without increasing the number of degrees of freedom. When interaction effects are null on average (i.e. if interaction effects are both negative and positive so that the sum of interaction coefficients tend to zero), the single SNP approach will generally outperform the risk score-based approach. Conversely, if interaction effects tend to be in the same direction, the risk scorebased approach can have dramatically higher power. 14

Study sample
The present analysis relies on the Hancock et al. 9 genomewide meta-analysis for main genetic effects plus interaction effects with smoking in relation to pulmonary function among 50 047 participants (56% women) of European ancestry from 19 studies. The mean age was 53 years at the time of pulmonary function testing. Approximately 15% were current smokers and 56% were ever smokers. Among ever smokers, the average pack-years of smoking was 21.
Supplementary Table 3 (available as Supplementary data at IJE online) provides the main characteristics of the studies included; complete details of study-specific pulmonary function testing protocols have been published. 4 For studies with spirometry at a single visit, we analysed FEV 1 / FVC and FEV 1 measured at that visit. For studies with spirometry at more than one visit, measurements from the baseline visit or the most recent examination with spirometry data was used. Smoking history (current, former and never smoking) was ascertained by questionnaire at the time of pulmonary function testing. Pack-years of smoking were calculated for current and past smokers by multiplying smoking amount (packs per day) and duration (years smoked). Approximately 2.5 million autosomal SNPs were tested for interaction with smoking status (ever smoking vs never smoking) and pack-years, for two outcomes: FEV 1 and FEV 1 /FVC (see next section). We also used two independent datasets of individuals of European ancestry to test for replication. The first replication dataset included 8859 unrelated individuals, and the second dataset included 9457 family-based individuals. The look-up was done in the GWAS for marginal genetic effects done separately in ever and never smoker as part of a recent metaanalysis of FEV 1 and FEV 1 /FVC. 23

Single SNP-by-smoking interaction
The analysis performed in this study used summary statistics data from the aforementioned meta-analysis of 19 studies performed by Hancock et al. 9 In brief, each of the 19 studies derived the residuals of FEV 1 and FEV 1 /FVC after regressing out age, age 2 , sex, standing height, principal component eigenvectors of genotypes and recruitment site if applicable. The residuals were normalized using a rank-based inverse normal transformation. Single SNP interaction effects were assessed using the following model (see Supplementary Note, available as Supplementary data at IJE online): where b G and b E l are the main effect of the SNP G and exposure E l , b GE k is the interaction effect between G and exposure E k , and b 0 the intercept. Detailed description of studies used in the replication analysis can be found in Soler Artigas et al. 23 In brief, linear regression of age, age 2 , sex, height and principal components for population structure was undertaken on FEV 1 and FEV 1 /FVC separately for ever smokers and never smokers. The residuals were normalized using a rankbased inverse normal transformation, again separately in ever smokers and never smokers. These transformed residuals were then used as the phenotype for association testing under an additive genetic model in each exposure strata. Inference of the interaction effects from the exposure-stratified analyses are described in the Supplementary Note (available as Supplementary data at IJE online).

Multivariate interaction analysis overview
First, we considered an unweighted genetic risk score-bysmoking interaction where the risk score simply sums the number of risk alleles (i.e. alleles associated with a lower pulmonary function). This unweighted genetic risk score is most powerful when the interaction effects have the same direction as marginal SNP effects (i.e. the harmful effects of smoking are magnified in individuals with a genetic predisposition to reduced pulmonary function). Second, we used a weighted genetic risk score where SNPs were weighted by the absolute value of their marginal effect estimates obtained from stage 1 screening of FEV 1 and FEV 1 / FVC from Soler Artigas et al. 4 (Supplementary Table 1, available as Supplementary data at IJE online). This weighting scheme is most powerful when the magnitude of interaction effects is proportional to the SNP marginal effects. Finally, for our third multivariate analysis, we derived a standard omnibus test of all interaction effects. This test will retain power in the presence of effects in both directions or of different magnitudes. Although there is strong correlation among the 12 tests performed (these three models, considering interaction with two smoking metrics, ever/never smoking or pack-years, for the two pulmonary function metrics FEV1 and FEV1/FVC), we used a stringent Bonferroni P-value correction threshold of 4 Â 10 -3 to account for multiple testing.
When raw data are available, the weighted genetic risk score (GRS) is usually expressed as where m is the number of SNPs included in the genetic risk score and w ¼ (w 1 ,..w m ) are the weights attributed to each single SNP. Following previous notation, the test of interaction between the GRS and the exposure E k can be applied using the following model: where c 0 , c GRS , c E l and c INT are the intercept, the main effect of the GRS, the main effect of the exposure E l and the interaction effect between E k and the GRS, respectively. However, because individual-level data were not directly available, we performed the test of c INT from summary statistics of interaction effects using an inverse-variance weighted sum as proposed by Aschard. 14 The chi-square for the interaction term c INT was derived as follows: are the estimated effects and variance of the interaction between the exposure E k and the SNP G i obtained from Equation (1) and w i is the weight applied to SNP G i . Under the null hypothesis of no interaction effect, v 2 int follows a chi-squared distribution with one degree of freedom.
The standard omnibus test of all interaction effects consisted of evaluating jointly a GÂE k ¼ ða G1ÂE k ; . . . ; a GmÂE k Þ from the model: where a 0 , a Gi , a E l and a GiÂE k are the intercept, the main effects of SNP G i and the exposure E l , and the interaction effect between G i and E k . Leveraging the independence between the SNPs considered (a single SNP was selected for each independent locus), we also derived the omnibus test using summary statistics. Under this independence assumption, the G i Â E k interaction terms would also be independents, 14 so that it can be performed by summing the chi-square from each univariate interaction test to form a chi-square with m degrees of freedom as follows: are the estimated effects and variance of the interaction between the exposure E k and the SNP G i obtained from Equation (1).

Relative risk in ever smokers vs never smokers
GRS interaction effects can further be translated in terms of risk prediction. For pulmonary function, low FEV 1 or FEV 1 / FVC increases the risk of death 24 and together they form the basis for the diagnosis of COPD. 1-3 COPD stage 2 or higher are defined by the Global Initiative for Chronic Obstructive Lung Disease (GOLD) as FEV 1 /FVC < 0.70 and FEV 1 < 80% of the predicted value. According to recent studies, 2,25 between 5% and 20% of European ancestry adults are expected to have FEV 1 /FVC < 0.70, depending on smoking characteristics and age distribution. Several studies argue for a more stringent threshold to define COPD 25,26 based on lower limit of normal predicted value, rather than a fixed absolute value, to prevent disease misclassification.
To explore the impact of interaction effect on the risk of disease, we derived the relative risk (RR) of having FEV 1 /FVC below a given threshold (1%, 5% and 20%) in ever smokers vs never smokers conditional on the unweighted GRS. This quantity is defined as the joint probability of having both FEV 1 /FVC in the interval [-1, where y, e and g are FEV 1 /FVC, smoking status and the GRS, respectively, and f 1 and f 2 are the probability density function of y and g. The detailed derivation of the above integral is available as Supplementary data at IJE online.

Results
We selected 26 loci previously found to be associated with FEV 1 or FEV 1 /FVC at genome-wide significance (P < 5 Â 10 -8 ) in marginal association tests 4 Next, using these data, we conducted three multivariate (as opposed to single SNP) interaction analyses, testing jointly for the interaction effects between those SNPs and either smoking status or pack-years on the two phenotypes (FEV 1 and FEV 1 /FVC) for a total of 12 tests. As shown in Table 1, none of the multivariate interaction tests with pack-years was significant. However, four of the six multivariate interaction tests with smoking status (ever vs never) showed nominal significance, and two tests for FEV 1 /FVC had a P-value below the Bonferroni significance level (12 tests, P < 4 Â 10 -3 ). The strongest signal was observed for the unweighted genetic risk score-by-smoking status interaction effect on FEV 1 /FVC (b int ¼ -0.036, 95% CI -0.040 to -0.032, P ¼ 0.00057). The Cochran's Q test for heterogeneity of the interaction effect across studies was not significant (P ¼ 0.97) and the forest plot of study-specific results did not display any obvious outlier (Supplementary Figure 1, available as Supplementary data at IJE online).
The contrast between this significant risk score interaction and the absence of strong single SNP interaction effects can be explained by looking at the distribution of the single SNP interaction effect estimates. Figure 1 shows this distribution for the alleles associated with decreased FEV 1 / FVC. It highlights that, although the 95% CI of most single SNP interaction effects encompass the null (and therefore the absence of significant single SNP interaction effect), there is an enrichment for negative interaction effects. Indeed, even a binomial test can be used to confirm the unbalanced direction of interaction effects (18 of 26 interactions are negative leading to a P-value of 0.014 for a binomial test with an expected equiprobable distribution of 0.5). The genetic risk score-based interaction test exploits such enrichment by testing for the average interaction effect across all SNPs. 14 As with any multivariate approach uGRS is the genetic risk score using equal weights to all SNPs; wGRS is the genetic risk score weighted by effect estimates from the marginal screening; CHISQ is the omnibus test of all interaction effects;b int is the estimated interaction effect between the GRS and the outcome; and CI is the confidence interval of that estimate. Nominally significant tests are indicated in bold.  based on a composite null hypothesis, this result indicates that at least a subset of these 26 SNPs interact with smoking status, but does not allow us to determine which or how many SNPs are driving the genetic risk score-bysmoking interaction. The three other sets of single SNP interaction tests showed a similar (but not significant after correction for multiple testing) trend with enrichment for negative interactions (Supplementary Figures 2-4, available as Supplementary data at IJE online). We summarized the contribution of the unweighted genetic risk score-bysmoking interaction on FEV 1 /FVC in Table 2 and Figure 2A. This indicates that the deleterious effect of smoking is enhanced among carriers of the risk alleles or equivalently that the deleterious effect of smoking is reduced among subjects carrying the protective alleles.
We used two independent datasets, one of 8859 unrelated individuals and another of 9457 related individuals, to test for independent replication of our results (Supplementary Note, available as Supplementary data at IJE online). Although the interaction effects were not significant, both replication samples showed consistent negative GRS-by-ever smoking interaction effect on FEV1/FVC (b int ¼ -0.0025, 95% CI -0.0165, 0.0115, P ¼ 0.72 and b int ¼ -0.0030, 95% CI -0.0214, 0.0154, P ¼ 0.74, and overall interaction effect in the combined replication datasetsb int ¼ -0.0027, 95% CI -0.0136, 0.0082 P ¼ 0.63) and a Cochran's Q test for heterogeneity showed no significant difference in the three effect estimates (P ¼ 0.51).
To quantify the impact of this result from a public health perspective, we estimated the impact of the genetic risk score-by-smoking interaction on having FEV 1 /FVC below 1%, 5% and 20% in the lower tails of the distribution in the population. Specifically, we derived the RR of having FEV 1 /FVC below these cut-off points (1%, 5% and 20%) in ever smokers compared with never smokers. Figure 2B quantifies the excess RR (i.e. the RR minus one) of individuals across five GRS quintiles. It highlights the higher risk associated with smoking among individuals carrying risk alleles (i.e. alleles associated with poorer pulmonary function) as compared with individuals carrying protective alleles (i.e. alleles associated with better pulmonary function). For example, among individuals with a GRS above the 80th percentile, smokers have on average a 26% excess RR of having FEV 1 /FVC in the lowest 1% of the population distribution, whereas ever smokers with a GRS below the 20th percentile have on average an 18% excess RR of falling in that same FEV 1 /FVC category compared with never smokers. Applying the same approach for FEV 1 , we observed a similar pattern (Supplementary Figure 5, available as Supplementary data at IJE online). However, as expected, the lower magnitude of the genetic risk score-byever smoking interaction on FEV 1 implied a lower difference in RR between ever smokers and never smokers.

Discussion
Using the largest dataset to date of European ancestry participants from the general population with pulmonary GRS is the unweighted genetic risk score; beta is the effect estimates of each predictor; and SD the standard deviation of the each beta.  function (FEV 1 /FVC and FEV 1 ), smoking and genetic data, we identified a gene-by-smoking interaction effect on FEV 1 /FVC by using a GRS composed of 26 SNPs identified and replicated in a prior GWAS meta-analysis of marginal genetic effects. To our knowledge, our study is the first to report a synergistic action of genes and smoking on pulmonary function (i.e. the reduction in FEV 1 /FVC as a result of smoking is greater among individuals who are genetically predisposed to lower FEV 1 /FVC ratio). Our study also highlights the importance of developing and applying alternative strategies to evaluate interaction effects for lung phenotypes along with other complex traits and diseases. The genetic risk score-based approach enabled us to identify an interaction when the standard univariate test (i.e. evaluating each single genetic variant for interaction independently) failed to identify any interactions.
Replication studies showed interaction effect estimates in the same direction as the discovery study but were not significant, and the magnitude of interaction effects were substantially smaller. We acknowledge that, despite careful evaluation of the interaction effects in the discovery sample, the observed signal might be overestimated or confounded by unmeasured complex factors. However, we can a priori rule out a systematic bias of the single SNP interaction effects in the discovery study, because the genomic inflation factor k, defined as the ratio of the median of the empirically observed distribution of the test statistic to the expected median, 30 was not substantially different from 1 (k ¼ 1.044 for FEV1/FVC and smoking status). Instead, differences in significance and effect estimates might be partly explained by the limited sample size in the replication study and differences in the analytical design. Indeed, the discovery analysis was performed using a saturated model including three smoking exposures and explicitly modelled the interaction effect. In comparison, the replication analysis was not adjusted for current smoking status and pack-year, and the interaction effect was approximated from analyses stratified by smoking status outcome, which has some limitations (see Supplementary Note and Supplementary Figure 6, available as Supplementary data at IJE online). Previous work has shown that combined analyses are more powerful when effects exist in both strata, 31 as observed in discovery study. Further, even with N ¼ 18 316 individuals in the combined replication population, we are underpowered. This sample size provides less than 50% power, at nominal significance of 5%, to detect interaction effects with the GRS.
Genetic risk score-by-exposure interaction can have higher clinical value than the identification of single SNPby-exposure interaction by capturing a wealth of information in a single measure to identify subgroups in the population whose genetic background makes them more susceptible to the deleterious effects of smoking. 19,32,33 Indeed, if single SNP-by-smoking interactions are distributed unconditionally on the marginal genetic effect (i.e. interaction effects are equally likely to be positive or negative given that the coded alleles are the risk alleles), the genetic effect is expected to be similar between ever and never smokers. The enrichment for negative interactions we identified through our GRS approach reveals a stronger genetic component among the ever smoker subgroup in the population and can allow the implementation of more efficient implementation of prevention strategies. For example, in the public health setting, programmes targeting smoking cessation campaigns to individuals who are genetically predisposed to low pulmonary function may have a stronger impact in preventing COPD.
Our results may also elucidate biological mechanisms underlying the interplay between genes and smoking in pulmonary function. In particular, the higher statistical power for the genetic risk score-based interaction test points towards the potential presence of an unmeasured intermediate biomarker mediating the effect of the 26 loci on FEV 1 /FVC. As shown in Figure 3, the most parsimonious model (i.e. the less complex following Occam's razor) that would explain multiple interactions going in the same direction ( Figure 1) implies that the genetic variants In scenario (A), the single SNP-by-smoking interaction test is the optimal approach, whereas, in scenario (B), the single SNP-by-smoking interaction test can become inefficient, and interaction would be easier to detect using a genetic risk score-by-smoking interaction test, because it summarizes all interaction effects in a single test.
together influence an intermediate biomarker, which itself interacts with smoking. Future studies with extended genomic data, including transcriptomic, proteomic or metabolomic data, might be able to further assess such an hypothesis by evaluating (i) the effect of the GRS on those biomarkers and (ii) testing for interactions between smoking and the candidate biomarkers identified at step (i).
This study has some limitations. The 26 selected variants together explain a relatively small proportion of the additive genetic variance in FEV 1 /FVC and in FEV 1 . 4 However, GWAS with increasing sample sizes will likely continue to provide additional associated genetic variants to further assess the role of SNP-by-smoking interaction effects on pulmonary phenotypes and may increase the gap between smokers and never smokers to allow for a significant impact in the clinic or at the population level. Moreover, we focused on genetic variants previously found to be associated at genome-wide significance level, but future studies might consider less stringent criteria to select genetic variants, including those with only suggestive evidence, or alternatively candidate variants with functional annotation relevant to the outcomes and exposures in question. Obviously, the signal-to-noise ratio might decrease when relaxing the constraint on the SNP selection. However, as we recently showed, additional gain in statistical power might be achieved even if a substantial proportion of the variants do not interact with the exposure. 14 Finally, investigation of interaction effects with other environmental exposures such as second-hand smoke, air pollution, asbestos or occupational risks may lead to a more comprehensive understanding of the biological and epidemiological significance of these variants.
In summary, the identification of interaction effects between genetic variants and environmental exposures in human traits is recognized as extremely challenging, and this quest has been mostly unsuccessful so far. In this study, we discovered novel gene-by-smoking interactions using risk scores that were not observed at the level of individual genetic variants. This risk score analysis suggests that persons with a greater genetic predisposition to low pulmonary function are more susceptible to the deleterious effects of smoking. By extension, the use of a GRS may help predict which smokers will fall below thresholds that establish the diagnosis of COPD.

Supplementary Data
Supplementary data are available at IJE online.