Appraising the causal relevance of DNA methylation for risk of lung cancer

Abstract Background DNA methylation changes in peripheral blood have recently been identified in relation to lung cancer risk. Some of these changes have been suggested to mediate part of the effect of smoking on lung cancer. However, limitations with conventional mediation analyses mean that the causal nature of these methylation changes has yet to be fully elucidated. Methods We first performed a meta-analysis of four epigenome-wide association studies (EWAS) of lung cancer (918 cases, 918 controls). Next, we conducted a two-sample Mendelian randomization analysis, using genetic instruments for methylation at CpG sites identified in the EWAS meta-analysis, and 29 863 cases and 55 586 controls from the TRICL-ILCCO lung cancer consortium, to appraise the possible causal role of methylation at these sites on lung cancer. Results Sixteen CpG sites were identified from the EWAS meta-analysis [false discovery rate (FDR) < 0.05], for 14 of which we could identify genetic instruments. Mendelian randomization provided little evidence that DNA methylation in peripheral blood at the 14 CpG sites plays a causal role in lung cancer development (FDR > 0.05), including for cg05575921-AHRR where methylation is strongly associated with both smoke exposure and lung cancer risk. Conclusions The results contrast with previous observational and mediation analysis, which have made strong claims regarding the causal role of DNA methylation. Thus, previous suggestions of a mediating role of methylation at sites identified in peripheral blood, such as cg05575921-AHRR, could be unfounded. However, this study does not preclude the possibility that differential DNA methylation at other sites is causally involved in lung cancer development, especially within lung tissue.


Background
Lung cancer is the most common cause of cancer-related death worldwide. 1 Several DNA methylation changes have been recently identified in relation to lung cancer risk. [2][3][4] Given the plasticity of epigenetic markers, any DNA methylation changes that are causally linked to lung cancer are potentially appealing targets for intervention. 5,6 However, these epigenetic markers are sensitive to reverse causation,

Key Messages
• DNA methylation is a modifiable biomarker, giving it the potential to be targeted for intervention in many diseases, including lung cancer that is the most common cause of cancer-related death.
• This Mendelian randomization study attempted to evaluate whether there was a causal relationship, and thus potential for intervention, between DNA methylation measured in peripheral blood and lung cancer, by assessing whether genetically altered DNA methylation levels impart differential lung cancer risks.
• Differential methylation at 14 CpG sites identified in epigenome-wide association analysis of lung cancer were assessed. Despite >99% power to detect the observational effect sizes, our Mendelian randomization analysis gave little evidence that any of the sites were causally linked to lung cancer.
• This is in stark contrast to previous analyses that suggested two CpG sites within the AHRR and F2RL3 loci, which were also observed in this analysis, mediate >30% of the effect of smoking on lung cancer.
• Overall findings suggest there is little or no role of differential methylation at the CpG sites identified within the blood in the development of lung cancer. Thus, targeting these sites for prevention of lung cancer is unlikely to yield effective treatments.
being affected by cancer processes, 6 and are also prone to confounding, for example by socioeconomic and lifestyle factors. 7,8 One CpG site, cg05575921 within the aryl hydrocarbon receptor repressor (AHRR) gene, has been consistently replicated in relation to both smoking 9 and lung cancer 2,3,10 and functional evidence suggests that this region could be causally involved in lung cancer. 11 However, the observed association between methylation and lung cancer might simply reflect separate effects of smoking on lung cancer and DNA methylation, i.e. the association may be a result of confounding, 12 including residual confounding after adjustment for self-reported smoking behaviour. 13,14 Furthermore, recent epigenome-wide association studies (EWAS) for lung cancer have revealed additional CpG sites which may be causally implicated in development of the disease. 2,3 Mendelian randomization (MR) uses genetic variants associated with modifiable factors as instruments to infer causality between the modifiable factor and outcome, overcoming most unmeasured or residual confounding and reverse causation. 15,16 In order to infer causality, three core assumptions of MR should be met: (i) the instrument is associated with the exposure; (ii) the instrument is not associated with any confounders; and (iii) the instrument is associated with the outcome only through the exposure. MR may be adapted to the setting of DNA methylation [17][18][19] with the use of single nucleotide polymorphisms (SNPs) that correlate with methylation of CpG sites, known as methylation quantitative trait loci (mQTLs). 20 In this study, we performed a meta-analysis of four lung cancer EWAS (918 case-control pairs) from prospective cohort studies to identify CpG sites associated with lung cancer risk, and we applied MR to investigate whether the observed DNA methylation changes at these sites are causally linked to lung cancer.

EWAS meta-analysis
We conducted a meta-analysis of four lung cancer casecontrol EWAS that assessed DNA methylation using the Illumina Infinium V R HumanMethylation450 BeadChip. All EWAS are nested within prospective cohorts that measured DNA methylation in peripheral blood samples before diagnosis: EPIC-Italy (185 case-control pairs), Melbourne Collaborative Cohort Study (MCCS) (367 case-control pairs), Norwegian Women and Cancer (NOWAC) (132 case-control pairs) and the Northern Sweden Health and Disease Study (NSHDS) (234 case-control pairs). Study populations, laboratory methods, data preprocessing and quality control methods have been described in detail elsewhere 3 and are outlined in the Supplementary Methods, available as Supplementary data at IJE online.
To quantify the association between the methylation level at each CpG and the risk of lung cancer, we fitted conditional logistic regression models for beta values of methylation [which ranges from 0 (no cytosines methylated) to 1 (all cytosines methylated)] on lung cancer status for the four studies. The cases and controls in each study were matched; details of this are in the Supplementary Methods, available as Supplementary data at IJE online. Surrogate variables were computed in the four studies using the SVA R package, 21 and the proportion of CD8þ and CD4þ T cells, B cells, monocytes, natural killer cells and granulocytes within whole blood were derived from DNA methylation. 22 The following EWAS models were included in the meta-analysis: Model 1-unadjusted; Model 2-adjusted for 10 surrogate variables (SVs); Model 3adjusted for 10 SVs and derived cell proportions. Stratification of EWAS by smoking status was also conducted [never (N ¼ 304), former (N ¼ 648) and current smoking (N ¼ 857)]. For Model 1, 2 and 3, the casecontrol studies not matched on smoking status (EPIC-Italy and NOWAC) were adjusted for smoking.
We performed an inverse-variance weighted fixed effects meta-analysis of the EWAS (918 case-control pairs) using the METAL software [http://csg.sph.umich.edu/abe casis/metal/]. Direction of effect, effect estimates and the I 2 statistic were used to assess heterogeneity across the studies in addition to effect estimates across smoking strata (never, former and current). All sites identified at a false discovery rate (FDR) <0.05 in Models 2 and 3 were also present in the sites identified in Model 1. The effect size differences between models for all sites identified in Model 1 were assessed by a Kruskal-Wallis test and a post hoc Dunn's test. There was little evidence for a difference (P > 0.1), so to maximize inclusion into the MR analyses, we took forward the sites identified in the unadjusted model (Model 1).

Mendelian randomization
Two-sample MR was used to establish potential causal effects of differential methylation on lung cancer risk. 23,24 In the first sample, we identified mQTL-methylation effect estimates (b GP ) for each CpG site of interest in an mQTL database from the Accessible Resource for Integrated Epigenomic Studies (ARIES) [http://www.mqtldb.org]. Details on the methylation preprocessing, genotyping and quality control (QC) pipelines are outlined in the Supplementary Methods, available as Supplementary data at IJE online. In the second sample, we used summary data from a GWAS meta-analysis of lung cancer risk conducted by the Transdisciplinary Research in Cancer of the Lung and The International Lung Cancer Consortium (TRICL-ILCCO) (29 863 cases, 55 586 controls) to obtain mQTLlung cancer estimates (b GD ). 25 For each independent mQTL (r 2 <0.01), we calculated the log odds ratio (OR) per standard deviation (SD) unit increase in methylation by the formula b GD /b GP (Wald ratio). Standard errors were approximated by the delta method. 26 Where multiple independent mQTLs were available for one CpG site, these were combined in a fixed effects meta-analysis after weighting each ratio estimate by the inverse variance of their associations with the outcome. Heterogeneity in Wald ratios across mQTLs was estimated using Cochran's Q test, which can be used to indicate horizontal pleiotropy. 27 Differences between the observational and MR estimates were assessed using a Z test for difference.
If there was evidence for an mQTL-CpG site association in ARIES in at least one time point, we assessed whether the mQTL replicated across time points in ARIES (FDR < 0.05, same direction of effect). Further, we re-analysed this association using linear regression of methylation on each genotyped SNP available in an independent cohort (NSHDS), using rvtests 28 (Supplementary Methods, available as Supplementary data at IJE online). Replicated mQTLs were included where possible to reduce the effect of winner's curse using effect estimates from ARIES. We assessed the instrument strength of the mQTLs by investigating the variance explained in methylation by each mQTL (r 2 ) as well as the F statistic in ARIES (Supplementary Table 1, available as Supplementary data at IJE online). The power to detect the observational effect estimates in the two-sample MR analysis was assessed a priori, based on an alpha of 0.05, sample size of 29 863 cases and 55 586 controls (from TRICL-ILCCO) and calculated variance explained (r 2 ).
MR analyses were also performed to investigate the impact of methylation on lung cancer subtypes in TRICL-ILCCO: adenocarcinoma (11 245 cases, 54 619 controls), small cell carcinoma (2791 cases, 20 580 controls) and squamous cell carcinoma (7704 cases, 54 763 controls). We also assessed the association in never smokers (2303 cases, 6995 controls) and ever smokers (23 848 cases, 16 605 controls). 25 Differences between the smoking subgroups were assessed using a Z test for difference.
We next investigated the extent to which the mQTLs at cancer-related CpGs were associated with four smoking behaviour traits which could confound the methylationlung cancer association: number of cigarettes per day, smoking cessation rate, smoking initiation and age of smoking initiation, using GWAS data from the Tobacco and Genetics (TAG) consortium (N ¼ 74 053). 29

Supplementary analyses
Assessing the potential causal effect of AHRR methylation: one-sample MR Given previous findings implicating methylation at AHRR in relation to lung cancer, 2,3 we performed a one-sample MR analysis 30 of AHRR methylation on lung cancer incidence, using individual-level data from the Copenhagen City Heart Study (CCHS) (357 incident cases, 8401 remaining free of lung cancer). Details of the phenotypic, methylation and genetic data, as well as the linked lung cancer data, are outlined in the Supplementary Methods, available as Supplementary data at IJE online.
An allele score of mQTLs located with 1 Mb of cg05575921-AHRR was created and its association with AHRR methylation tested (Supplementary Methods, available as Supplementary data at IJE online). We investigated associations between the allele score and several potential confounding factors (sex, alcohol consumption, smoking status, occupational exposure to dust and/or welding fumes, passive smoking). We next performed MR analyses using two-stage Cox regression, with adjustment for age and sex, and further stratified by smoking status.
Tumour and adjacent normal methylation patterns DNA methylation data from lung cancer tissue and matched normal adjacent tissue (N ¼ 40 squamous cell carcinoma and N ¼ 29 adenocarcinoma), profiled as part of The Cancer Genome Atlas (TCGA), were used to assess tissuespecific DNA methylation changes across sites identified in the meta-analysis of EWAS, as outlined previously. 31 mQTL association with gene expression For the genes annotated to CpG sites identified in the lung cancer EWAS, we examined gene expression in whole blood and lung tissue, using data from the gene-tissue expression (GTEx) consortium. 32 Analyses were conducted in Stata (version 14) and R (version 3.2.2). For the two-sample MR analysis we used the MR-Base R package TwoSampleMR. 33 An adjusted P-value that limited the FDR was calculated using the Benjamini-Hochberg method. 34 All statistical tests were two-sided.

Results
A flowchart representing our study design along with a summary of our results at each step is displayed in Figure 1.

EWAS meta-analysis
The basic meta-analysis adjusted for study-specific covariates identified 16 CpG sites that were hypomethylated in relation to lung cancer (FDR < 0.05, Model 1, Figure 2). Adjusting for 10 surrogate variables (Model 2) and derived cell counts (Model 3) gave similar results ( Table 1). The direction of effect at the 16 sites did not vary between studies (median I 2 ¼ 38.6) (Supplementary Table 2, available as Supplementary data at IJE online), but there was evidence for heterogeneity of effect estimates at some sites when stratifying individuals by smoking status (Table 1).

Mendelian randomization
We identified 15 independent mQTLs (r 2 <0.01) associated with methylation at 14 of 16 CpGs. Ten mQTLs replicated at FDR < 0.05 in NSHDS (Supplementary Table 3, available as Supplementary data at IJE online). MR power analyses indicated >99% power to detect ORs for lung cancer of the same magnitude as those in the meta-analysis of EWAS.
There was little evidence for an effect of methylation at these 14 sites on lung cancer (FDR > 0.05, Supplementary Table 4, available as Supplementary data at IJE online). For nine of 14 CpG sites, the point estimates from the MR analysis were in the same direction as in the EWAS, but of a much smaller magnitude (Z test for difference, P < 0.001) (Figure 3).
For nine of out the 16 mQTL-CpG associations, there was strong replication across time points (Supplementary  Table 5, available as Supplementary data at IJE online) and 10 out of 16 mQTL-CpG associations replicated at FDR < 0.05 in an independent adult cohort (NSHDS). Using mQTL effect estimates from NSHDS for the 10 CpG sites that replicated (FDR < 0.05), findings were consistent with limited evidence for a causal effect of peripheral blood-derived DNA methylation on lung cancer  Single mQTLs for cg05575921-AHRR, cg27241845-ALPPL2 and cg26963277-KCNQ1 showed some evidence of association with smoking cessation (former vs current smokers), although these associations were not below the FDR < 0.05 threshold (Supplementary Figure 4, available as Supplementary data at IJE online).
Potential causal effect of AHRR methylation on lung cancer risk: one-sample MR In the CCHS, a per (average methylation-increasing) allele change in a four-mQTL allele score was associated with a 0.73% (95% CI ¼ 0.56, 0.90) increase in methylation (P < 1 x 10 -10 ) and explained 0.8% of the variance in cg05575921-AHRR methylation (F statistic ¼ 74.2). Confounding factors were not strongly associated with the genotypes in this cohort (P ! 0.11) (Supplementary Given contrasting findings with the main MR analysis, where cg05575921-AHRR methylation was not causally implicated in lung cancer, and the lower power in the onesample analysis to detect an effect of equivalent size to the observational results (power ¼ 19% at alpha ¼ 0.05), we performed further two-sample MR based on the four mQTLs using data from both CCHS (sample one) and the TRICL-ILCCO consortium (sample two). Results showed no strong evidence for a causal effect of DNA methylation on total lung cancer risk [OR  Table 9, available as Supplementary data at IJE online).

Tumour and adjacent normal lung tissue methylation patterns
For cg05575921-AHRR, there was no strong evidence for differential methylation between adenocarcinoma tissue and adjacent healthy tissue (P ¼ 0.963), and weak evidence for hypermethylation in squamous cell carcinoma tissue (P ¼ 0.035) (Figure 4; Supplementary Table 10, available as Supplementary data at IJE online). For the other CpG sites there was evidence for a difference in DNA methylation between tumour and healthy adjacent tissue at several sites in both adenocarcinoma and squamous cell carcinoma, with consistent differences for CpG sites in ALPPL2 (cg2156642, cg05951221 and cg01940273), as well as cg23771366-PRSS23, cg26963277-KCNQ1, cg09935388-GFI1, cg0101332-ARRB1, cg08709672-AVPR1B and cg25305703-CASC21. However, hypermethylation in tumour tissue was found for the majority of these sites, which is opposite to what was observed in the EWAS analysis.   Dir, direction of effect; OR, odds ratio per SD increase in DNA methylation; SE, standard error; Chr, chromosome.

Gene expression associated with mQTLs in blood and lung tissue
Of the 10 genes annotated to the 14 CpG sites, eight genes were expressed sufficiently to be detected in lung (AVPR1B and CASC21 were not) and seven in blood (AVPR1B, CASC21 and ALPPL2 were not). Of these, gene expression of ARRB1 could not be investigated as the mQTLs in that region were not present in the GTEx data. rs3748971 and rs878481, mQTLs for cg21566642 and cg05951221, respectively, were associated with increased expression of ALPPL2 (P ¼ 0.002 and P ¼ 0.0001). No other mQTLs were associated with expression of the annotated gene at a Bonferroni corrected P-value threshold (P < 0.05/ 19 ¼ 0.0026) (Supplementary Table 11, available as Supplementary data at IJE online).

Discussion
In this study, we identified 16 CpG sites associated with lung cancer, of which 14 have been previously identified in relation to smoke exposure 9 and six were highlighted in a previous study as being associated with lung cancer. 3 This previous study used the same data from the four cohorts investigated here, but in a discovery and replication, rather than meta-analysis framework. Overall, using MR we found limited evidence supporting a potential causal effect of methylation at the CpG sites identified in peripheral blood on lung cancer. These findings are in contrast to previous analyses suggesting that methylation at two CpG sites investigated (in AHRR and F2RL3) mediated >30% of the effect of smoking on lung cancer risk. 2 This previous study used methods which are sensitive to residual confounding and measurement error that may have biased results. 12,35 These limitations are largely overcome using MR. 12 Although there was some evidence for an effect of methylation at some of the other CpG sites on risk of subtypes of lung cancer, these effects were not robust to multiple testing correction and were not validated in the analysis of tumour and adjacent normal lung tissue methylation nor in gene expression analysis.
A major strength of the study was the use of twosample MR to integrate an extensive epigenetic resource and summary data from a large lung cancer GWAS, to appraise causality of observational associations with >99% Figure 3. Mendelian randomization (MR) vs observational analysis. Two-sample MR was carried out with methylation at 14/16 CpG sites identified in the EWAS meta-analysis as the exposure and lung cancer as the outcome. cg01901332 and cg05575921 had two instruments, so the estimate was calculated using the inverse variance weighted method; for the rest, the MR estimate was calculated using a Wald ratio. Only 14 of 16 sites could be instrumented using mQTLs from [mqtldb.org]. OR, odds ratio per SD increase in DNA methylation. *Instrumental variable not replicated in independent dataset (NSHDS). The sites for which instrumental variables have not been replicated are cg01901332, cg21566642, cg05575921 and cg08709672.
power. Evidence against the observational findings was also acquired through tissue-specific DNA methylation and gene expression analyses.
Limitations include potential 'winner's curse' which may bias causal estimates in a two-sample MR analysis towards the null if the discovery sample for identifying genetic instruments is used as the first sample, as was done for our main MR analysis using data from ARIES. 36 However, findings were similar when using replicated mQTLs in NSHDS, indicating that the potential impact of this bias was minimal (Supplementary Figure 1, available as Supplementary data at IJE online). Another limitation relates to the potential issue of consistency and validity of the instruments across the two samples. For a minority of the mQTL-CpG associations (four out of 16), there was limited replication across time points and in particular, six mQTLs were not strongly associated with DNA methylation in adults. Further, our primary data used for the first sample in the two-sample MR were ARIES, which contains no male adults. If the mQTLs identified vary by sex and time, then this could bias our results. However, our replication cohort NSHDS contains adult males. Therefore, the 10 mQTLs that replicated in NSHDS are unlikely to be biased by the sex discordance. Also, we replicated the findings for cg05575921 AHRR in CCHS, which contains both adult males and females, in a two-sample MR analysis, suggesting that these results also are not influenced by sex discordance. Caution is therefore warranted when interpreting the null results for the two-sample MR estimates for the CpG sites for which mQTLs were not replicated, which could be the result of weak-instrument bias.
The lack of independent mQTLs for each CpG site did not allow us to properly appraise horizontal pleiotropy in our MR analyses. Where possible we only included cisacting mQTLs to minimize pleiotropy, and investigated heterogeneity where there were multiple independent mQTLs. Three mQTLs were nominally associated with smoking phenotypes, but not to the extent that this would bias our MR results substantially. Some of the mQTLs used influence multiple CpGs in the same region, suggesting genomic control of methylation at a regional rather than single CpG level. This was untested, but methods to detect differentially methylated regions (DMRs) and identify genetic variants which proxy for them may be fruitful in probing the effect of methylation across gene regions.
A further limitation relates to the inconsistency in effect estimates between the one-and two-sample MR analysis to appraise the causal role of AHRR methylation. Findings in CCHS were supportive of a causal effect of AHRR methylation on lung cancer [HR ¼ 0.30 (95% CI ¼ 0.10, 1.00) per SD], but in two-sample MR this site was not causally implicated [OR ¼ 1.00 (95% CI ¼ 0.83, 1.10) per SD increase]. We verified that this was not due to differences in the genetic instruments used, nor due to issues of weak instrument bias. Given that the CCHS one-sample MR had little power (19% at alpha ¼ 0.05) to detect a causal effect with a size equivalent to that of the observational analysis, we have more confidence in the results from the twosample approach.
Peripheral blood may not be the ideal tissue to assess the association between DNA methylation and lung cancer. A high degree of concordance in mQTLs has been observed across lung tissue, skin and peripheral blood DNA, 37 but we were unable to directly evaluate this here. A possible explanation for a lack of causal effect at AHRR is due to the limitation of tissue specificity, as we found that the mQTLs used to instrument cg05575921 were not strongly related to expression of AHRR in lung tissue. However, findings from MR analysis were corroborated by the lack of evidence for differential methylation at AHRR between lung adenocarcinoma tissue and adjacent healthy tissue, and weak evidence for hypermethylation (opposite to the expected direction) in squamous cell lung cancer tissue. This result may be interesting in itself, as smoking is hypothesized to influence squamous cell carcinoma more than adenocarcinoma. However, the result conflicts with that found in the MR analysis. Furthermore, another study investigating tumorous lung tissue (N ¼ 511) found only weak evidence for an association between smoking and cg05575921 AHRR methylation, which did not survive multiple testing correction (P ¼ 0.02). 38 However, our results do not fully exclude AHRR from involvement in the disease process. AHRR and AHR form a regulatory feedback loop, which means that the actual effect of differential methylation or differential expression of AHR/AHRR on pathway activity is complex. 39 In addition, some of the CpG sites identified in the EWAS were found to be differentially methylated in the tumour and adjacent normal lung tissue comparison. Whereas this could represent a false-negative result of the MR analysis, it is of interest that differential methylation in the tissue comparison analysis was typically in the opposite direction to that observed in the EWAS. Furthermore, although this method can be used to minimize confounding, it does not fully eliminate the possibility of bias due to reverse causation (whereby cancer induces changes in DNA methylation) or intra-individual confounding e.g. by gene expression. Therefore, it does not give conclusive evidence that DNA methylation changes at these sites are not relevant to the development of lung cancer.
Whereas DNA methylation in peripheral blood may be predictive of lung cancer risk, according to the present analysis it is unlikely to play a causal role in lung carcinogenesis at the CpG sites investigated. Findings from this study issue caution over the use of traditional mediation analyses to implicate intermediate biomarkers (such as DNA methylation) in pathways linking an exposure with disease, given the potential for residual confounding in this context. 12 However, the findings of this study do not preclude the possibility that other DNA methylation changes are causally related to lung cancer (or other smoking-associated disease). 40

Supplementary Data
Supplementary data are available at IJE online.